Actually, all frequentists and Bayesians are logical probabilists, but if I put that in the title, few would believe it.
A man might call himself an Anti-Gravitational Theorist, a science which describes the belief that gravity is subject to human will. But the very moment that man takes a long walk off a short dock, he’s going to get wet.
One can say one believes anything, but in the end we are all Reality Theorists. We’re stuck with what actually is. This notion deserves full development, but today one small corner: confidence intervals. Never a stranger invention will you meet. Though these creatures have an official frequentist mathematical definition, they are always—as in always—interpreted in a Bayesian or logical probability sense.
The official definition
For some uncertain proposition, a parameterized probability model is proposed. Classical procedure, both frequentist and Bayesian, expends vast efforts in estimating this parameter or these parameters (for ease, suppose there is just one parameter). This estimate is a guess of the parameter’s “true” value given the observed data.
Nobody ever believes, nor should they believe, the guess. To compensate for the overt precision of the guess, the confidence interval was created. This is an interval, usually contiguous but not required to be, around the guess. For example, if the guess is 1.7, the confidence interval might be [1.2, 2.1]. The interval need not be symmetric. The width of the interval is determined by a pre-set number, the most typical being used like “95% confidence interval.”
If you were to repeat the “experiment” which gave rise to the data used in constructing the guess and confidence interval, a new guess and confidence interval could be calculated. The repetition is required to be identical to the old “experiment”, except that the new “experiment” is supposed to be “randomly” different in those aspects related to the parameter. Nobody knows what this means, because, of course, it is impossible to rigorously define. But let that pass.
Repeat the “experiment” a third time. Again, a new guess and confidence interval can be calculated. Repeat a fourth time, a fifth, and so on ad infinitum. At the end, we will have an infinite collection of confidence intervals. The punchline: 95% of these confidence intervals will “cover” the true value of the parameter.
This means that any individual confidence interval in the collection, e.g. [-72.2, -64.8], might not contain the true value of the parameter, but that, in the limit, 95% of those intervals will. It may be helpful to understand that the interval itself is what classicists call a “random variable”.
The natural question is, “What about my confidence interval, the one I calculated for the data I have; what might I say about it?”
Only this: that your interval either contains the true value or it doesn’t. According to frequentist theory, this is it. You may not say more. Doing so is strictly verboten.
Yet everybody—and I mean everybody—does say more. Indeed, everybody acts like a Bayesian. And that’s because, like our anti-gravitational theorist, frequentism breaks down here. Frequentism is self-consistent, though just like the anti-gravitational theory is; but like that theory, it fails upon meeting the real world.
Everybody will say that the actual interval has a 95% chance, or thereabouts, of containing the true value of the parameter. The “thereabouts” is used by the frequentist to comfort himself that he is not a Bayesian, but it’s a dodge. Frequentist theory insists that the actual interval must not be associated with any probability proposition. To state, or feel, or assume, or think that the actual interval has a chance, even an unquantified chance, of containing the true value of the parameter is to act like a Bayesian.
Bayesians, acknowledging this, call their creations “credible intervals”, which in many textbook explanations overlap (a pun!) frequentist confidence intervals exactly or nearly so.
Thinner is better
Frequentists prefer thinner, which is to say, narrower intervals over wide, assuming that, ceteris paribus, narrow intervals are more precise. For example, larger samples result in narrower intervals than small samples. But since all you can say is your interval either contains the true value or it doesn’t, its width does not matter. The temptation to interpret the width of an interval in the Bayesian fashion is so overwhelming that I have never seen it passed up.
Some complex probability models aren’t amenable to analytically calculated confidence intervals, and in these cases computer simulations are run to prove the proposed models are well behaved. These show the simulated confidence intervals “cover” the parameter (which in these cases are known exactly) at the specified percent (say 95%).
These simulated intervals, necessarily finite in number, are usually close to these percents, and their authors ask us to take this evidence to support the proposition that confidence intervals computed for real problems (with unsimulated data) will behave well. But this is to assign a (Bayesian) probability, albeit non-numerical, to future, unseen confidence intervals.
Many “null” hypotheses suppose the parameter equals 0 (the exact value matters not for this criticism) and if the confidence interval does not contain 0, the “null” is rejected. To keep within frequentist theory, this is a pure act of will. It is saying, the 0 is not in the interval because I do not want it to be. This must be so because all we can know, according to the strict view of frequentist theory, is that the 0 is in the interval or it isn’t. We cannot attach any kind of measure of uncertainty to this judgement.
It is a Bayesian interpretation to claim, conditional on the observational evidence of the interval, that 0 is likely in or out of it.
As said above, everybody takes the confidence interval to express a chance that the true value lies within the parameter, even though this is forbidden on theoretical grounds. Dzerzij Neyman, who invented confidence intervals, understood these facts, but nearly all those who follow him do not. Jim Franklin in his influential paper “Resurrecting logical probability” (Erkenntnis, 2001, pp. 277-305), in the section “Frequentists are secret Logical Probabilists” (and from where I stole today’s title), said (ellipses original):
Neyman was to some degree aware of the problem, and it is this entertaining to watch him explain what he is doing, given that it is not reasoning [Neyman wanted to remove reasoning from objective statistical procedures]. As is common among philosophers who have rejected some familiar form of justificatory reasoning, he seeks to permit the conclusion on some “pragmatic” grounds:
The statistician…may be recommended…to state that the value of the parameter θ is within…(Newyman 1937, p. 288)
Later, Neyman admitted the trick of rejecting “nulls”: “To decide to ‘assert’ does not means ‘know’ or even ‘believe’. It is an act of will” (From Neyman 1938, p. 352). To paraphrase Franklin’s quote of Neyman (to remove the unnecessary mathematical notation; Neyman 1941, p. 379, emphasis original to Neyman):
We may decide to behave as if we actually knew that that true value [of the parameter] were [in the confidence interval]. This is done as a result of our decision and has nothing to do with ‘reasoning’ and ‘conclusion’.
As Franklin said, “It is difficult to argue with pure acts of will, for the same reason that it is difficult to argue with tank columns”.
Howson and Urbach (Scientific Reasoning: The Bayesian Approach, second edition, 1993, pp. 237-241) said:
Neyman was careful to observe, however, that when asserting that the parameter is contained in the interval, the statistician is not entitled to conclude or believe that this is really true. Indeed, Neyman held the idea of inductive reasoning to a conclusion or belief to be contradictory, on the grounds that reasoning denotes “the mental process leading to knowledge” and this “can only be deductive.”
Neyman makes the category mistake, common in probabilistic reasoning, with confusing a decision with a probability, the same kind of mistake, incidentally, which leads to the “subjective” interpretation of probability—but that is a subject for another day.
As Howson and Urbach point out, the confusion about confidence intervals run deep. Suppose there are two intervals calculated, a 90% and a 95%. The frequentist must not say that either is more likely to contain the “true” value of the parameter. Indeed, he has no reason to prefer one over the other, since to do so would be to interpret the intervals in a Bayesian sense. Both intervals (individually) either contain the parameter or they do not is all the frequentist can say.
Some statisticians, noticing the discrepancy, will admit that they do not know whether the confidence interval before them contains the true (or “null”) value of the parameter, but that over time, over many implementations of frequentist procedures, i.e in the “long run”, then 95% of those intervals will cover the true value.
But this is a mistake (some Bayesians make it, too, when they argue for using procedures which are Bayes but which have “good frequentist properties”). Since one wants to know about this interval, it is pointless to argue from other intervals not yet created. Though if one could, it would be a Bayesian (logical probability) interpretation.
It would say that given all these other intervals cover the true values of the parameter 95% of the time, therefore mine is likely to cover the true value of the parameter. Some would swap in “95%” for “likely.” Either way, this is logical and not frequentist probability.
Since no frequentist can interpret a confidence interval in any but in a logical probability or Bayesian way, it would be best to admit it and abandon frequentism, as this author says.