Suppose it is true that we have E = “A six-sided object, just one side of which is labeled 6, and when tossed only one side will show.” We want the probability of R1 = “A 6 shows.” That is, we want
(2) Pr( R1 | E ) = 1/6.
This result only follows from the evidence in E to R1. It has nothing to do with any die or dice you might have. We are in French-speaking-cat land (see Part I and links within) and only interested in what follows from given information. In particular, we don’t need in E information about “fairness” or “randomness” or “weightedness” or anything else.
(For those seeking depth, (2) is true given our knowledge of logic.)
Now suppose I want to know Rn = “A 6 shows n times.” This is just
(3) Pr( Rn | E & n) = (1/6)n,
where if want a number we have to change the equation/expression/proposition. Anyway, let n be large, as large as you like: (3) grows smaller as n grows larger. Except in the limit, (3) remains a number greater than 0, but if n is bugambo big, (3) is tiny, small, wee. Indeed, let n = 126 and (3) becomes 8.973 x 10-99, a number which is pretty near 0, but still of course larger than 0. Right, Mr McKibben?
Since we assume in (3) that E is true (and n is given), the result is of very little interest. All we have are changing and decreasing numbers as n increases. End of story.
Now suppose we turn equation (3) around and ask
(4) Pr( E | Rn & n) = ?
In words, given we have seen a 6 show in n consecutive tosses what is the chance E is true? That is, given we have seen, or we assume we have seen a 6 show in n consecutive tosses, what is the chance that we have a a six-sided object, just one side of which is labeled 6, and when tossed only one side will show? On this reading, “a 6 shows in n consecutive tosses” implies at least part of E: although not completely. There isn’t information that the object is 6-sided, or that just one side can show, but there is evidence that a 6 is there.
In other words, “Rn & n” gives us very little information except that E is possible; therefore
(4′) 0 < Pr( E | Rn & n) < 1
and that is the best we can do. And this is so no matter how large n is, no matter how small (3) is. What follows from this is that E is not proved true or false no matter what (3); indeed, in (3) we assume E is true. Again: E cannot be proved true or false whatever (3) is or whatever n is. And this is it. The end.
Yet somehow E “feels false” if n is large. It is not false, as we have just proved, but it might feel that way. And that’s because we reason something like this: given our B = “experience at other times and places with data and situations that vaguely resembles the data and situation I saw this time, simple models like E turned out to be false and other, more complicated models, turned out to be true.” That is,
(5) Pr( E | B ) = small.
This is well enough, though vague. It has nothing to do with (4) however: (4) is entirely different. Plus, (5) says absolutely nothing about any rival “theories” to E. And then there are these interpretations:
(6) Pr( E | The only theory under consideration is E ) = 1,
(7) Pr( E | E can’t be true [because of (3)]) = 0.
Equation (7) is true, but circular; equation (6) is also true because unless we have specified a rival theory for how the data arose, we have no choice but to believe E (this is key). Equation (7), which contains a fallacy, is what many have in mind when judging E.
Now let M = “A 6 shows on roll 1, a 6 shows on roll 2, …, a 6 shows on roll n.” Then
(8) Pr( M | Rn & n) = 1.
Our “model” M is obviously true, though (8) is also circular. But we can “uncircle” it by changing M to M’ = “A 6 always shows.” Then
(8′) Pr( M’ | Rn & n) = 1.
Of course, if at the n+1-th roll a 6 does not show, we have falsified M’ (but not M; we’d have to write Pr(M’|Rn & n & not-six on n+1) = 0).
M and M’ are very good models because they explain the data perfectly. But equations (8) or (8′) have nothing to do with (4) or (4′). To judge between E and M, we’d have to start with a statement which assigns a prior belief (before seeing the data R) which of these models were true; then after seeing the data we can update the probability either model is true. But we cannot, using just R, say “E is false” or “M is true.”
The main point, if it is not already obvious, is that any observation will not prove a model or belief false, unless it’s a very special and rare situation like M’ (and observing something M’ said was impossible). What we really or often mean when we say “Just look at R; E is false!” is that we have some rival model N in mind, a model which we are sure is true. McKibben is convinced that N = “death-from-the-skies global climate disruption” is true but uses an equation like (3) to prove N. This is a fallacy. Eq. (3) has nothing in the world to do with N; if E is true then no observation can prove E false, or even show it is unlikely because, of course, E is true.
Now for real temperatures, the model N could be true; but so could many other rival models, and so could a model like E, suitably modified. The climate Ec can be “the temperature can fall into one and only one of three buckets, labeled low, normal, high.” Thus Pr(high | Ec) = 1/3 and so on. Or Pr(high n times in a row | Ec & n) = (1/3)n as before. This means
(9) 0 < Pr( Ec | high n times in a row & n) < 1
just as before. Since there is no news about N in (9), N is irrelevant.
And just as before, we can start with an a priori judgment about the likelihood of N or Ec being true; and after seeing the data we can update these judgments. It will be the case, unless the a prior judgment is very skewed towards Ec, that N will be morel likely than Ec given the data.
But this does not mean that N is more likely true when judged against other models of the climate. We can, as we have just seen, compare N only against the straw man Ec, but this gives no evidence whatsoever about N and (say) W = “a climate model which does not assume the world will soon end unless new taxes are raised and given to politicians” (or any other climate model we might imagine).
It is therefore cheating, like all straw man arguments are cheating, not to use the best available competitor to N.