The OFloinn put up a most readable and recommended essay When is Weather Really “Climate”? and in one of the comments a reader named Gyan in part said:
Many economists and radical empiricists claim to reduce the whole of rationality to the Bayes’ Theorem. But John Derbyshire in his popular book on Riemann Hypothesis provides a curious counterexample.
Suppose you have a proposition that no man is more than nine feet tall. Then you find a man just a quarter inch short of nine feet.
Should your confidence in the proposition increase or not?
By Bayes’ it seems it should but common sense tells me that it should decrease.
I admit to flying somewhat blind here, because I don’t have Derbyshire’s book and can’t read his example; nevertheless, nothing ventured etc.
The economists and radical empiricists are partly right: it’s Bayes all the way, but only in the logical sense, i.e. the sense in which Bayes describes the probabilistic relationship between propositions, just as traditional logic only describes the logical relationship between propositions. About the origin of the propositions, and of fundamental truth, about which propositions are worthy of entertaining and which not, Bayes and logic are silent. In other words, radical empiricism is false, as it just-plain empiricism, and most of what economists say is best left unsaid. But of these things, another day. On to the example!
For ease of writing, let Q = “No man is more than nine feet tall,” and let D (for data) = “You find a man just a quarter inch short of nine feet.” These are two propositions and we can use Bayes, i.e. extended logic, to say something about their relationship. For instance, we can ask
(1) Pr( D | Q )
or we can ask
(2) Pr( Q | D ).
These probabilities are not the same, and are rarely the same for any two propositions; and unless you are clear about which you mean, you can easily mix them up.
Equation (1) is easily solvable. It says given that we know, or accept as true, that no man can be taller than nine feet, what is the probability of seeing a man less than nine feet, specifically a man a quarter inch shorter than nine feet. The answer is, in this interpretation, 1, or 100%. Of course it is! We have just said that it is a fact that no man can be taller; and here is a man who is indeed not taller.
This interpretation is not the same as F = “Any man a quarter inch shy of nine feet”. That would be
(3) Pr( F | Q )
and to answer it fully would require we know more about the distribution of heights (F is about any old man; D is about a man). What we do know about heights is this: we know, via deduction, they are greater than zero feet, and, by assumption, they are less than nine feet. Therefore, the best we could say about (3) is that its probability is between 0 and 1. Now you might be tempted to say it is closer to 0 than to 1, but that is because you are implicitly adding information to Q, to the right-hand-side. That is, you might add information to Q about your experience with real heights of real men, experience which suggests a decreasing probability for very high heights. If you say (3) is closer to 0 than to 1, you are actually answering
(4) Pr( F | Q & My experience about actual heights)
which you can see is not (3) and is therefore not an answer to (3).
Now turn the question around and answer (2): this is the chance that no man is taller than nine feet given we have seen one just shy of that number. The answer feels like it will be close to 0, but again that is because we are not strictly answering (2)—the strict answer to (2) is unknown, or perhaps just between 0 and 1 if we assume the contingent nature of these events. But what we really think we are answering is
(5) Pr( Q | D & My experience about actual heights),
and that seems to make (5) close to 0. Let’s call E = “My experience about actual heights.”
What about Bayes’s theorem? Well, it’s easy to work out that (5) is equal to (via Bayes’s theorem):
(5′) Pr( Q | D & E) = Pr( D | Q & E )Pr( Q | E )/Pr( D | E ).
This “updates” our belief in Q from Pr(Q | E) to Pr( Q | D & E) based on observing our “data” D. About the exact value to Pr(Q | E), I don’t know (here’s another point where we depart from economists and empiricists: Bayes does not claim all probabilities are quantifiable). As long as E doesn’t contain information contradictory to Q, such that Q is false given E, then we’re okay. In my mind, using my E, Pr(Q | E) is high, close to 1 (my E says I don’t know of any man taller than nine feet).
That leaves us Pr( D | Q & E ) and Pr(D | E) to figure out. We can attack Pr(D|E) directly or it turns out that Pr(D|E) = Pr(D|Q&E)Pr(Q|E) + Pr(D|not-Q&E)Pr(not-Q|E). The first part is just a repeat of the numerator, and “not-Q” means “it is false that no man is more than nine feet tall.” Let’s be lazy and answer Pr(D|E) directly: this is the probability of seeing a man 8′ 11.75″ given E. Pr(D|E) might be close to 0. But then so will Pr( D | Q & E ).
We already assumed Pr(Q|E) was “large”, so that if Pr( D | Q & E ) < Pr(D|E) then Pr(Q|D&E) < Pr(Q|E), i.e. our belief in Q shrinks after seeing D. But if Pr( D | Q & E ) > Pr(D|E) then Pr(Q|D&E) > Pr(Q|E) and our belief in Q increases after seeing D. Whether “Pr( D | Q & E ) < Pr(D|E)” or “Pr( D | Q & E ) > Pr(D|E)” is true depends entirely on E, which since it is so fuzzy makes this problem difficult and (sometimes) seemingly against intuition.