Probability does not exist; therefore, nothing has a probability, so nothing can be caused by probability, though the uncertainty of statements can be had conditional on assumptions, and this probability changes when the assumptions change. Probability is not a matter of matter, but of mind alone.
Though there is much to flesh out (as I have done), that is the entire philosophy of probability. Anything that violates these tenets, or those that can be deduced from them, isn’t probability, but something else.
I believe William Dembski’s “design inference” theory largely falls in this latter category. His technique is partly a reinvention of p-values, partly unrecognized use of causal knowledge, and the sneaking in of unquantified probability by calling it something else. There is, incidentally, no problem with unquantified, and unquantifiable, probabilities. Indeed, they are the most common kind, which we use constantly to express uncertainty in vague situations. (I’m not going to prove this here.)
Jargon and notation are barriers to clear thought; but for a very good reason, I’m going to start with these devices. I will develop a notion of probability using no example, and I ask you not try to think of examples, either. This command is especially for those who have learned probability in any formal way.
Suppose all you know about some event (a statement) E is H. H is a proposition, i.e. our premises or assumptions. I cannot emphasize too strongly that “all.” It means what it says: all. Then we can calculate the probability of E with respect to H as Pr(E|H) = p. Let’s say p is small. Put any number you like as “small”, as long as it’s non-zero. If it were 0, then E is impossible with respect to H. That italicized word is to be taken as strictly as the first.
It turns out E happens: we measure it; it occurred. That is, Pr(E|Observation) = 1. E is true with respect to our measurement.
The probability of E with respect to H is tiny. Yet E happened. Because p is wee, yet E happened, does it mean H is false, or likely false? Remember, H is all we know about E. We believe H to be true. We can write, if you like, Pr(H|Introspection) = 1. Therefore all we can say is that something with a small chance of happening, happened. I.e. Pr(E|Observation & H & Introspection) = 1.
We can also do this trick: Pr(H|E) = 1. E happened, or is assumed true in this equation. But H says E can happen. And H is all we know about E. There is no observation that can “weaken” H, for, as I might have mentioned, we believe H is true. It is all we know about E.
Technically, we should write Pr(H|E & Introspection) = 1. Great troubles often happen when all we know or assume is not written in the equations. This is because equations too often take on life, they are reified. This often happens wherever equations are used. They become realer than Reality, then comes the Deadly Sin of Reification. It is the sin all users of p-values commit.
Nothing above changes, though a mistake enters, if we assume H is “chance” in all these equations. The mistake is to believe “chance” causes, or fails to cause, E. Chance is a synonym of probability: since probability does not exist, neither does chance. Something that does not exist cannot cause anything to happen, nor can it fail to cause something to happen.
Nothing happens “by” chance, which is perfectly equivalent to saying nothing is caused by chance. Nothing is “due” to chance, another synonym for cause. Once again, the italicized word is as strict as can be. No thing.
When things happen, unless we’ve been trained in the Way of the Wee P, we seek for causes, not probabilities. Probabilities are only really useful before things happen or are revealed. Probabilities are used to express uncertainty in events (i.e. statements or propositions). Once we see by observation that an event is true (it happened), we ask “How?” And this is the right question. If we knew how, then if the event is not unique, we can refine predictions of new similar events, or control future ones; and if the event is unique, we can ascribe credit (or blame). Seeking for cause is the goal, not just of science, but in all aspects of life.
We are finally ready to tackle Dembski’s “design inference” technique. Quotes are drawn from his book of the same name (paperback edition, 2005) with the telling subtitle “Eliminating Chance Through Small Probabilities”. Here is his “Explanatory Filter” argument (p. 48):
Premise 1: E has occurred.
Premise 2: E is specified.
Premise 3: If E is due to chance, then E has small probability.
Premise 4: Specified events of small probability do not occur by chance.
Premise 5: E is not due to a regularity.
Premise 6: E is due to either a regularity, chance, or design.
Conclusion: E is due to design.
He refines this in later chapters, adding all sorts of jargon, notation and technicalities, but these refinements do not alter in any substantial way the basic argument or my criticisms, as we’ll see.
Let’s walk through the premises. The first is obvious. By “specified” he means he has assumed some evidence, but without necessarily writing that evidence down, that other things beside E, but somehow similar, could have happened. The example he has in mind is the opening of a combination lock safe at a particular combination; “off” combinations failing to open it.
Premise 3 gives “chance” causal powers, so it is false. But it might be rescued if we give these powers to some Agent. The lock needs to be turned by somebody (or thing). One possible, but not sole, premise we might entertain is that H = “This lock has n combinations, only one of which opens the safe.” The probability of E given H and the one right combination picked by the Agent ignorant of the right combination is 1/n. Technically, Pr(E|H & Ignorant Agent picks right combination) = 1/n. If n is large then the probability “due” to H and the Agent is small. Is this H all we know about E?
Premise 4 is false. Just plain false. It is not true, which I take to be obvious. If E happens, and all we know or consider is (writing A for “ignorant Agent”) HA, then E still happens and HA is still all we know. No matter how small 1/n is, if HA is all we know, then HA is true (given our introspection, which sharp readers will have realized I am omitting from the equations). Rare things will and can happen if HA is true. Premise 4 is false.
Premise 5 is where Dembski sneaks in other possible causes, his first admission (without admitting it) that HA is not all we know. By regularity he means a cause unknown to the Agent, but known by who-knows-what: he is vague about this. In this example, a “regularity” would be a malfunctioning safe, which the Agent believes has n combinations, but because of some error has a smaller number.
Regardless, Pr(E|HA and Introspection) = 1/n. But Pr(E | Regularity & A & Mysterious knowledge) = 1/m > 1/n; i.e. m < n.
Premise 5, in other words, rejects this altered hypothesis. It is not possible by assumption. The lock is as we assume in H.
Premise 6 says something caused the lock to open: a regularity, which is ruled impossible by Premise 5, “chance”, which is impossible, but fixable using our Agent, or “design.” The first two being rules out, we pick “design”. We’ll come to that next.
Since Premise 4 is false, the argument fails. Unless we can rescue it, as we did Premise 3. And that is exactly what Dembski attempts, in effect, like this:
Premise 4′: Specified events of small probability do occur by “chance”, but if we can consider a non-“chance” causal explanation operating instead of “chance”, and we believe the non-“chance” cause is more likely than “chance”, then the non-“chance” causal mechanism is preferred.
Premise 4′ is not true, either. So the conclusion again fails. Now the modified premise is not probability, per se, but a decision. It is an act of will (as when “rejecting null hypotheses” using p-values). Based on further introspection, which provides this non-“chance” cause, when the likelihood of non-“chance” cause is greater to some unknown extent than the likelihood of the “chance” cause, we decide to state “The non-‘chance’ cause did it.”
This is not a bad rule, all things equal. If you knew based on X that either Y or Y’ is true, and that Y is more likely than Y’, and you could only pick from Y or Y’, then, ignoring the costs and benefits of our choice, a not-insignificant assumption, picking Y is the way to go. But it is a decision rule and not probability.
What Dembski is doing is this. He has in mind two causal possibilities, H (the ignorant Agent), and, say, H’, the non-ignorant (and possibly lying about being ignorant) Agent. He may have only started with H, but after seeing E (the safe opens), he considers H’ because his introspection directs him to. There may be many clues he gleans from the Agent that lead him to H’. This is not unwise, but it is not in any way a formal process. Two people can disagree whether to consider H’.
Now if we never considered H’, then because we began believing H, we must necessarily end believing H. That is, we start with Pr(H|IA & Introspection) = 1 and must necessarily end with Pr(H|E Observed & IA & Introspection) = 1.
In particular, it is impossible that “H not”, or not H, the contrary of H, or Hc, is true, or even possible, because it must be that Pr(H|IA & Introspection) + Pr(Hc|IA & Introspection) = 1, and Pr(H|E Observed & IA & Introspection) + Pr(Hc|E Observed & IA & Introspection) = 1, and the first elements of these equations are already equal to 1. This is key.
Thus we must switch from believing Pr(H|IA & Introspection) = 1 to Pr(H|E Observed & IA & Introspection’) < 1, where Introspection’ means further introspection added to the original introspection. This is important. It means not only did E occur, which is new information, but we also add new information in the form of new introspection because of the circumstances in which E happened.
He makes this move because Pr(E|H & IA & Introspection) is small. That Pr(E|H & IA & Introspection) is small is not enough to make this move, as Dembski himself acknowledges. He says (p. 162) “extreme improbability by itself is not enough to preclude an event from having occurred by chance. Something else is needed.” That “something else”, it turns out, is an assumed cause for which E is more likely, and which based on further introspection is more likely than “chance”.
Here is what further introspection says: “Very often when a rare event occurred in cases like these, a cause other than that stated (such as the ignorant agent) was responsible. Here is that alternate cause, and here is why I find that alternate plausible.” Given that, and given the other cause H’, then Dembski decides the other cause operated.
In notation, Pr(E|H & Introspection) is small, and Pr(E|H’ & Introspection’) is large, or even equal to 1. Also, Pr(H’|E & Introspection’) >> Pr(H|E & Introspection’) (where I have subsumed the ignorant Agent into H).
Now Dembski rejects subjective Bayes (p. 67), as do I (and as do p-value-loving frequentists). But nobody rejects Bayes formula. We can use it here. Dembski never makes this step because he never quite realizes that what he is doing is in the end probability after all. We can write (I won’t derive this, but all students of probability can):
(1) Pr(H|E & I’) = Pr(E|H & I’)Pr(H|I’) / [ Pr(E|H & I’)Pr(H|I’) + Pr(E|H’ & I’)Pr(H’|I’) ],
recalling that not just E was observed, but E led to I’, and thus H’. If we used Bayes with just H, Hc and I, we start with Pr(H|I) = 1 and end with Pr(H|E&I) = 1, though I’ll leave the math to the reader.
Let’s step through equation (1). The left hand side is what we want to know: how likely is the “chance” explanation, i.e. the “null” hypothesis after seeing E and adding our further introspection. Pr(E|H & I’) is small because it still assumes H is true, even after the further introspection. But Pr(H|I’) is now closer to 0, if not 0, by decision.
The first part of the denominator is equal to the numerator. Pr(E|H’ & I’) is now certain, or almost. And Pr(H’|I’) is large, again by decision (and not forgetting Pr(H|I’) + Pr(H’|I’) = 1).
So we have, in cartoon form,
(2) Pr(H|E & I’) = small x small / [ small x small + 1 x large ],
where both the “large” and small” are somewhere in the closed interval [0,1], so the RHS is 1 / (1 + large/small^2), or since large/small^2 here is very large, we have (1 / very large), which is very small, which is to say tiny, or even 0.
So, as desired, the probability for H after seeing E, and adding in the further introspection, is tiny, or even 0.
Since now Pr(H|E & I’) + Pr(H’|E & I’) = 1, we have that Pr(H’|E & I’) is nearly certain, or certain.
This was all possible only because of that further introspection which involved our knowledge of cause. I do not say this is bad reasoning: I say it is good. Even ideal. Because it is cause we always seek, and not probability.
What Dembski has done, then, is to reinvent probability and call it something else. Since he didn’t write down all his information as probability, he lost track of it. This is the exact same mistake users of p-value make. Though Dembski is better because he, it appears from his writing, would never invoke the other cause unless it was much surer in his mind (that further introspection). Blind use of p-values is just silly by comparison.
Dembski lists the steps of his of further introspection (cf. pp. 184–185). These involve “patterns” that are like E, or can be E, “probabilistic resources” which are supposes about how E could happen more than once in situations like but not ours, “saturated probabilities” which are the probabilities of “probabilistic resources”, “side information” and “subinformation” which just are the further introspection, along with information theoretic measures on the alternate causes.
This is all in an attempt to make formal the process by which we suggest H’ to ourselves. Yet this can never be formal, since except in highly constrained circumstances, disagreements are inevitable. I won’t bother critique each step to show how, because I believe it is fairly obvious at this point.
Here, finally, is the running example Dembski uses. Let’s see how we would do using probability alone.
There was this clerk in New Jersey, one Nicholas Caputo, a Democratic, whose duty back in the 1980s and before was to draw names of candidates (in private) to decide whose name was printed first on the ballot, whose second, and so on. Whoever’s name is printed first has, experience shows, a substantial advantage (voters being lazy). Caputo’s nickname was the “man with the golden arm”, because he picked Democrats 40 out of 41 times.
The natural H here is that Caputo cannot see or know the names he is drawing, and that the bag (or whatever) had all the candidates names in it. The event E is given to us: 40 out 41 lucky Democrats. Given our initial introspection, it’s easy to calculate Pr(E|H&I) ~ 2 x 10^-11, or about 1 in 54 billion.
Hold up. Wait a minute. Somethin’ ain’t right. New Jersey? 1975-1985? Democrat? In private? Forty out of forty one? Caputo? Uh huh.
That’s our further introspection right there, and it’s strong. The more you know of Jersey politics, the more overwhelming it is. Obviously, Pr(H|I’) is going to be small, and Pr(H’|I’) large. What about Pr(E|H & I’)Pr(H|I’)? Pr(E|H’ & I’)Pr(H’|I’)? Fugeddaboutit.
Obviously, quite quite obviously, it was our further introspection on cause, not “chance”, that led to us giving H little credence. If it turns out we are wrong, and that Caputo was unknown to us a saint, never missed mass, voted for Reagan, and immaculately honest in his drawings, then we are still right to make the judgement we did. Because all probability is conditional on the assumptions we make, and our assumptions here on the opposite of saintliness are solid given what we know about Jersey politics.
These further introspections are causal assumptions made based on all kinds of historical evidence. Which, as we admit, may be in error. But that only means Pr(H’|I’) < 1. To us. The causal knowledge is strong here, i.e. highly probable. That is why we reject “chance”.
In the end, we don’t need anything more than just plain probability to solve these kinds of problems, and, of course, knowledge of cause. That’s the real key, this outside causal knowledge. Failing to consider this is also why p-values fail—and they fail worse because these are much more model dependent and H is rejected for much weaker reasons. However, I’ve told that story many times, so I won’t belabor it now.
It turns out that I agree, then, with Dembski’s conclusions, at least where he and I share the outside causal knowledge. And where we agree on H—which isn’t always a given. There is no such thing as “chance”, and so the H we come to can vary, again based on our causal knowledge.
In particular, given that Dembski’s book is associated with “creationism” or “intelligent design” (even though it is only a small part of his book), and his methods have been used to “prove” there is design in the universe, I don’t want anybody going away and thinking all these proofs have therefore been invalidated. They haven’t, because here he and I agree on the outside causal knowledge, even if we disagree about “chance” in this case (it can’t be defined uniquely or without controversy here). Indeed, I regard intelligent design (as I define it) as trivially true.
Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.