It is well to collect cogent proofs of frequentism’s failings so that supporters of that theory can look upon them and find joy.
Alan Hájek has done yeoman service in this regard with two papers listing 30 arguments against the relative frequency theory of probability.1,2 These do not exhaust the criticisms, nor are all (as he admits) strong, but they are a good start. Today we’ll draw from his “Fifteen Arguments Against Hypothetical Frequentism” and include some of my own. Hájek defines hypothetical relative frequentism as:
The probability of an attribute A in a reference class B is p [if and only if] the limit of the relative frequency of A’s among the B’s would be p if there were an infinite sequence of B’s.
Below is my numbering, not Hájek’s. I skip some of his more technical criticisms which are not of interest to a general audience (such as those referring to Carnap’s “c-dagger” or to facts about uncountable sets) or are not quite the ticket (about different limits for a named sequence, as I think these mix up causality and evidence of the same). I also do not hold with his alternative to frequentism, but that is another matter. This list is also not complete, and essays could be written for each point, but this is enough to get us started.
Before we begin, the natural question is why does it seem that frequentism sometimes works? The answer: why does any approximation work? When frequentist methods heed close to the real definition of probability, they behave well, but the farther away they venture, the worse they get. It is not as if frequentists are bad people trying to pull the p-values over people’s eyes. It’s that they relying on a theory which has no bearing to reality, a harsh claim justified below.
Most “frequentists” implicitly know this, and tacitly and unthinkingly reject the idea of infinite sequences in practice without realizing that they have kicked out their theoretical support, i.e. that they are not using frequentism. Plus, most users of statistics haven’t the training to know details of the theory which guides them. They have memorized “Wee p-values are good”, which is all that is needed for success.
When rebutting, be sure not to invoke the So’s-Your-Old-Man fallacy, which in this case would have the form, “Oh yeah? Frequentism may stink, but what about improper priors, fella!” You have not proven frequentism is swell because some other version of probability has failings; indeed, you have admitted frequentism is dead. Help us by using the numbering, too.
1 In order to know the probability of any proposition, we have to observe an infinite sequence. There are no observed or observable infinite sequences of anything. We can imagine such sequences—we can imagine many things!—but we can never see one. Therefore, we can never know the probability of any proposition.
Hájek: “any finite sequence—which is, after all, all we ever see—puts no constraint whatsoever on the limiting relative frequency of some attribute.” The finite observed sequence may equal 0.9, but the limit may evince 0.2. Who knows? As Keynes famously said about waiting to know a frequentist probability in the long-run, i.e. the preferred euphemism for infinity, “In the long run we shall all be dead.”
In order to imagine an infinite sequence, we also, as Hájek emphasizes, must imagine a universe “utterly bizarre” and totally alien to ours. “We are supposed to imagine infinitely many radium atoms: that is, a world in which there is an infinite amount of matter (and not just the 1080 or so atoms that populate the actual universe, according to a recent census).” Universes with infinite matter are impossible (not unlikely: impossible) on any physics that I have heard of, but which are required is frequentism is true.
If you do not see this first criticism as damning, you have not understood frequentism. You have said to yourself that “Very large sequences are close enough to infinity.” No, they are not. Not if frequentism is to retain its mathematical and philosophical justification.
As you’ll see, the main critique of frequentism is that it confuses ontology and epistemology, i.e. existence with knowledge of the same.
2 If our premises are E = ‘This is an n-output machine with just one output labeled * which when activated must show an output, and this is an output before us’, the logical probability of Q = ‘An * shows’ is 1/n. A frequentist may assert that probability for use in textbook calculations (e.g. which he often does, say, in demonstrating the binomial for multiple throws of hypothetical dice), but in strict accordance with his theory he has made a grievous error. He has to wait for an infinite sequence of activations first before he knows any probability.
The only way to get started in frequentism is to materialize probability out of thin air, on the basis on no evidence except imagination. Probabilities may be guessed correctly, but never known.
3 In the absence of an infinite sequence, a finite sequence is often used as a guess of the probability. But notice that this is to accept the logical definition, which in this case is, given only E = ‘The observed finite relative frequency of A’ the probability of Q = ‘This new event is A’ is approximately equal to the observed relative frequency. (Notice that both Bayes and logical probability have no difficulty taking finite relative frequencies as evidence.)
For a frequentist to agree to that, he first has to wait for an infinite sequence of observed-relative-frequencies-as-approximations before he can know the probability that P = ‘Pr(Q | E) is approximately equal to the observed fine relative frequency’ is high or 1. Nothing short of infinity will do before he can know any approximation is reasonable. Unless he only takes a finite sequence of approximations and uses that as evidence for the probability all finite sequences are good approximations, but then he is stuck in an infinite regress of justifications.
4 Hájek: “we know for any actual sequence of outcomes that they are not initial segments of collectives, since we know that they are not initial segments of infinite sequences—period.” This follows from above: even if we accept that infinite collectives exist, how do we know the initial segments of those collectives are well behaved? “It is not as if facts about the collective impose some constraint on the behavior of the actual sequence.”
If hypothetical frequentism is right, to say any sub-sequence (Von Mises’s more technical definition relies on infinite sub-sequences embedded in infinite sequences, which is a common method in analysis; here I mean finite sub-sequence) is “like” the infinite collective is to claim that the infinite collective, which is not yet generated, “reaches back” and causes the probabilities to behave. And this is impossible. In other words, something else here and now is causing that sequence to take the values it does, and probability should be a measure of our knowledge of that here-and-now causality.
5 Hájek: “For each infinite sequence that gives rise to a non-trivial limiting relative frequency, there is an infinite subsequence converging in relative frequency to any value you like (indeed, infinitely many such subsequences). And for each subsequence that gives rise to a non-trivial limiting relative frequency, there is a sub-subsequence converging in relative frequency to any value you like (indeed, infinitely many subsubsequences). And so on.”
And how, in our finite existence, do we know which infinite subsequence we are in? Answer: we cannot. The problem with infinities is anything possible can and will happen.
6 Our evidence is E = ‘One unique never-before-seen Venusian mlorbid will be built. It has n possible ways of self-destructing once it is activated. It must be activated and must self-destruct. X is one unique way it might self-destruct.” The probability of Q = ‘X is the way this one-of-a-kind mlorbid will self-destruct’ is unknown, unclassifiable, and unquantifiable in frequency theory. In logical probability it is 1/n. Even if we can imagine an infinite collective of mlorbids, there is no way to test the frequency because Venusians build no machines. No sequence can ever be observed.
“Von Mises famously regarded single case probabilities as ‘nonsense’ (e.g. 1957, p. 17).” Yet, of course, all probabilities are for unique or finite sequences of events.
David Stove listed this as a key criticism against frequentism. The sequence into which a proposition must be embedded is not unique. Take Q = ‘Hillary Clinton wins the next presidency.’ Into which sequence does this unambiguously belong? All female leaders? All female elected leaders? All male or female leaders elected in Western democracies? All presidential elections of any kind? All leadership elections of any kind? All people name Hillary with the tile of president? And on and on and on. Plus none of these can possibly belong to an infinite collective. Of course, if probability is logical, each premise naturally leads to a different probability.
7 Hájek: “Consider a man repeatedly throwing darts at a dartboard, who can either hit or miss the bull’s eye. As he practices, he gets better; his probability of a hit increases…the joint probability distribution over the outcomes of his throws is poorly modeled by relative frequencies—and the model doesn’t get any better if we imagine his sequence of throws continuing infinitely.”
Have to be careful about causality here, but the idea is sound. The proposition is Q = ‘The man hits the bull’s eye.’ What changes each throw is our (really unquantifiable) evidence. The premises for the n-th throw are not the same as for the n+1-th throw. Hájek misses that in his notation, and lapses in the classical language of “independence”, which is a distraction. The point is that each throw is necessarily a unique event conditioned on the premise that practice brings improvements. The man can never go back (on these premises) so there is no way to embed any given throw into a unique infinite collective.
8 Our Q = “If the winter of 1941 was mild, Hitler would have won the war.” There are many ways of imagining evidence to support Q to varying degrees (books have been written!). But there is no relative frequency, not infinite and not even finite. No counterfactual Q has any kind of relative frequency, but counterfactuals are surely intelligible and common. A bank manager will say, “If I had made the loan to him, he would have defaulted”, a proposition which might be embedded in a finite sequence, but the judgement will have no observations because no loans will have been made. The logical or Bayesian view of probability handles counterfactuals effortlessly.
9 If the evidence is E = ‘Quite a few Martians wear hats and George is a Martian’ the probability of Q = ‘George wears a hat’ is not uniquely quantifiable and has no frequency infinite or finite. But it has a logical probability (which isn’t a single number). Most evidence comes to us vaguely and it is stated ambiguously such that unique probabilities are impossible to assign.
Update Note to the mathematically minded, especially in regards to criticisms 1-3. If we assume we know a probability, we can compute how good a finite approximation of that probability is, which is essentially what frequentist practice boils down to. But since, if frequentism is true, we can never know any probabilities, we can never know how good any approximation in practice is. Frequentism means flying blind while saying, “Ain’t that a pretty view?”
1Hájek, A. (1997). ‘Mises Redux’—Redux: Fifteen arguments against finite frequentism, Erkenntnis 45, 209–227.
2Hájek, A. (2009). Fifteen Arguments Against Hypothetical Frequentism, Erkenntnis 70, 211–235.