There Is No “Problem” Of Old Evidence In Bayesian Theory

Update I often do a poor job setting the scene. Today we have the solution to an age-old problem (get it? get it?), a “problem” thought to be a reason not to adopt (certain aspects of) Bayesian theory or logical probability. I sometimes think solutions are easier to accept if they are at least as difficult as the supposed problems.

I was asked to comment by Bill Raynor on Deborah Mayo’s article “The Conversion of Subjective Bayesian, Colin Howson, & the problem of old evidence“.

Howson is Howson of Howson & Urbach, an influential book that showed the errors of frequentism, but then introduced a few new ones due to subjectivity. We’ve talked time and again on the impossibility that probability is subjective (where probability depends on how many scoops of ice cream the scientist had before taking measurements), but we’ve never yet tackled the so-called problem of old evidence. There isn’t one.

Though there is no problem of evidence, old or new, there are plenty of problems with misleading notation. All of this is in Uncertainty.

The biggest error, found everywhere is probability, is to only partially write down the evidence one has for a proposition, and then that information “float”, so that the one falls prey to equivocation.

Mayo:

Consider Jay Kadane, a well-known subjective Bayesian statistician. According to Kadane, the probability statement: Pr(d(X) >= 1.96) = .025

“is a statement about d(X) before it is observed. After it is observed, the event {d(X) >= 1.96} either happened or did not happen and hence has probability either one or zero” (2011, p. 439).

Knowing d₀= 1.96, (the specific value of the test statistic d(X)), Kadane is saying, there’s no more uncertainty about it.* But would he really give it probability 1? If the probability of the data x is 1, Glymour argues, then Pr(x|H) also is 1, but then Pr(H|x) = Pr(H)Pr(x|H)/Pr(x) = Pr(H), so there is no boost in probability for a hypothesis or model arrived at after x. So does that mean known data doesn’t supply evidence for H? (Known data are sometimes said to violate temporal novelty: data are temporally novel only if the hypothesis or claim of interest came first.) If it’s got probability 1, this seems to be blocked. That’s the old evidence problem. Subjective Bayesianism is faced with the old evidence problem if known evidence has probability 1, or so the argument goes.

Regular readers (or those who have understood Uncertainty) will see the problem. For those who have not yet read that fine, award-eligible book, here is the explanation.

To write “Pr(d(X) > 1.96)” is to make a mistake. The proposition “d(X) > 1.96” has no probability. Nothing has a probability. Just like all logical argument require premises, so do all probabilities. They are here missing, and they are later supplied in different ways and equivocation occurs. In this case deadly equivocation.

We need a right hand side. We might write

(1) Pr(d(X) > 1.96 | H),

where H is some compound, complex proposition that supplies information about the observable d(X), and what the (here anyway) ad hoc probability model for d(X) is. If this model allows quantification, we can calculate a value for (1). Unless that model insists “d(X) > 1.96” is impossible or certain, the probability will be non-extreme (i.e. not 0 or 1).

Suppose we actually observe some d(X_o) (o-for-observed). We can calculate

(2) Pr(d(X) > d(X_o) | H)

and unless d(X_o) is impossible or certain, then again we’ll calculate some non-extreme number. (2) is almost identical with (1) but with a possibly different number than 1.96 for d(X_o). The following equation is not the same:

(3) Pr( 1.96 >= 1.96 | H),

which indeed has a probability of 1.

Of course! “I observed what I observed” is a tautology where knowledge of H is irrelevant. The problem comes in there to put the actual observation, of the right or left hand side.

Take the standard evidence of a coin flip C = “Two-sided object which when flipped by show one of h or t”, then Pr(h | C) = 1/2. One would not say because one just observed a tail on an actual flip that, suddenly, Pr(h | C) = 0. Pr(h | C) = 1/2 because that 1/2 is deduced from C about h. (h is the proposition “An h will be observed”).

Pr(I saw an h | I saw an h & C) = 1, and Pr(A new h | I saw an h & C) = 1/2. It is not different from 1/2 because C says nothing about how to add evidence of new flips.

Suppose for ease d() is “multiply by 1” and H says X follows a standard normal (ad hoc is ad hoc, so why not?). Then

(4) Pr(X > 1.96 | H) = 0.025.

If an X of (say) 0.37 is observed, then what does (4) equal? The same. But this is not (4):

(5) Pr(0.37 > 1.96 | H) = 0,

but because of the assumption H includes, as it always does, tacit and implicit knowledge of math and grammar.

Or we might try this:

(6) Pr(X > 1.96 | I saw an old X = 0.37 & H) = 0.025.

The answer is also the same because H like C says nothing about how to take old X and modify the model of X.

Now there are problems in this equation, too:

(7) Pr(H|x) = Pr(H)Pr(x|H)/Pr(x) = Pr(H).

There is no such thing as “Pr(x)” nor does “Pr(H)” exist and we already seen it is false that “Pr(x|H) = 1”.

Remember: Nothing has a probability! Probability does not exist. Probability, like logic, is a measure of a proposition of interest with respect to premises. If there are no premises, there is no logic and no probability.

Better notation is:

(8) Pr(H|xME) = Pr(x|HME)Pr(H|ME)/Pr(x|ME),

where M is a proposition specifying information about the ad hoc parameterized probability model, H is usually a proposition saying something about one or more of the parameters of M, but it could also be a statement about the observable itself, and x is a proposition about some observable number. And E is a compound proposition that includes assumptions about all the obvious things.

There is no sense that Pr(x|HME) nor Pr(x|ME) equals 1 (unless we can deduce that via H or ME) before or after any observation. To say so if to swap in an incorrect probability formulation, like in (5) above.

There is therefore no old evidence problem. There are many self-created problems, though, due to incorrect bookkeeping and faulty notation.

6 Comments

Jim Fedako

December 4, 2017, 10:32 am

I remember (though foggy) an incident where a local organization had a raffle with a grand prize (say $1000) and number of smaller prizes, listing the odds of winning each prize.

Even while they were still selling tickets, they would draw both a daily winner and a prize. On one of those draws, they picked a winner and the grand prize.

Nevertheless, they kept selling tickets noting the grand prize. When the local news station found out, the organization justified noting the odds of the grand prize by stating that the odds remained, even though the grand prize was awarded.
Briggs

December 4, 2017, 4:41 pm

Jim,

Excellent example of two sides putting the evidence on different sides of the equation.

Things do not “have” probabilities!
Jonathan Livengood

April 23, 2018, 10:40 pm

Not all logical arguments require premisses. For example, we can show for any simple sentence Q that (Q -> Q) or that (~Q v Q) without any premisses at all. It seems, then, that we should be able to say that the unconditional probability of each of those sentences is one, no?
Briggs

April 24, 2018, 5:51 am

Jonathan,

Go ahead and show with no premises that Q -> Q.

Best of luck.

Don’t write or say anything that is an implicit premise.
Jonathan Livengood

April 24, 2018, 11:36 am

That sentences such as (P -> P) may be proved with no premisses is a trivial consequence of the deduction theorem (https://en.wikipedia.org/wiki/Deduction_theorem). How to actually run the proof depends on the proof system you’re using. In Mates’ (1972) Elementary Logic, for example, you would simply write down (P -> P), since Mates’ system allows you to write a tautology on any line of a proof. In the system I typically use, we’d show that (P -> P) follows from no premisses by using the rule of conditional proof, like this:

1 (1) P A (for CP)
(2) (P -> P) 1, CP

The far left column keeps track of the premisses used at each stage of the proof. The rule of conditional proof allows us to discharge assumptions, so that in line (2), we don’t have any left. Hence, (P -> P) is proved using no premisses. We would then write either {} |- (P -> P) or simply |- (P -> P).

Now, I imagine you’ll say that the system itself or the rules of inference or some such counts as an implicit premiss. But that seems to me just to confuse rules of inference with premisses. If you do that, then won’t you find yourself in an infinite regress with literally every argument? I take that to be the lesson of Carroll’s humorous dialogue between Achilles and the Tortoise.
Briggs

April 24, 2018, 12:06 pm

Jonathan,

Well, as long as you count all this

That sentences such as (P -> P) may be proved with no premisses is a trivial consequence of the deduction theorem (https://en.wikipedia.org/wiki/Deduction_theorem). How to actually run the proof depends on the proof system you’re using. In Mates’ (1972) Elementary Logic, for example, you would simply write down (P -> P), since Mates’ system allows you to write a tautology on any line of a proof. In the system I typically use, we’d show that (P -> P) follows from no premisses by using the rule of conditional proof, like this:

1 (1) P A (for CP)
(2) (P -> P) 1, CP

as no premises then proving (Q -> Q) without premises works. But I’m going to call all that, including the implicit premises which define the terms and their meaning, as premises. And so you cannot prove anything without premises.

Maybe you prove this to yourself by imaging you’re the first logician to ever visit a remote island. You draw ‘(Q->Q)’ and say ‘True!’ Would the indigenous populants have to believe it? Even if you said ‘True!’ in their language?

There Is No “Problem” Of Old Evidence In Bayesian Theory

Related

6 Comments

Leave a Reply

Share this:

Related

6 Comments

Leave a Reply