Jaynes’s book (first part): https://bayes.wustl.edu/etj/prob/book.pdf

Permanent class page: https://www.wmbriggs.com/class/

Uncertainty & Probability Theory: The Logic of Science

*Link to all Classes.* Jaynes’s book (first part):

# Video

Links:

Bitchute (often a day or so behind, for whatever reason)

**HOMEWORK:** **Read!**

# Lecture

*This is an excerpt from Chapter 5 of *Uncertainty.* All the references have been removed.*

## Probability Is Not Limiting Relative Frequency

Probability can be relative frequency, but it makes little sense to speak of limiting relative frequency. If Q = “5 of 10 Martians wear hats and George is a Martian” the probability of P = “George wears a hat” given Q is 1/2, because of the relative frequency of hat-wearing Martians. But Q does not, of course, imply real instances of real Martians, so “relative frequencies” do not have to be ontologically real. The probability of P would also be 1/2 if Q = “5 of 10 Frenchmen in this room wear hats and George is a Frenchman now in this room.” Here, Q expresses a relative frequencies of real things. Either way, probability works.

There are other popular ideas of relative frequencies. Von Mises introduced the mathematical idea of a collective, which is an infinite sequence of “attributes” in some set C—and we should stop there. There are no and can be no infinite sequences of any physical thing nor of time. If there were any instances of an infinite number of anything, that thing would be all you would ever see (if indeed you could see anything!) since our finite universe would be filled with that thing. Now whatever mathematical sense this definition makes, and it does make some, it is therefore of no use in measuring the uncertainty of any physical thing. And since most people take an interest in probability and statistics because they want to quantify uncertainty in some real thing, infinite relative frequencies are therefore of no use. Except, possibly, as approximations. We meet these later.

Nor can we use the mathematical apparatus of such a theory on the idea that, even if infinite relative frequencies cannot exist “in real life”, we can “imagine” they can. No: we cannot imagine any such thing; we can only say we can imagine it, which is very different. I can imagine a unicorn—mentally picture it, I mean. I can imagine flying through the air or scores of other things. But I can’t imagine what an infinite set looks like, or an infinite collective, or an infinite length of time, or Omniscience or Omnipotence. I can speak, think, or imagine analogically about the infinite, but I can never know it.

Limiting relative frequencies are sometimes said to *be* probability, to define them. We take some measurable attribute in an infinite collective, count the number of times the attribute is found—we count in a proper, sophisticated way, of course, doing all this at the limit—and then divide by the total. That limiting relative frequency *becomes* the probability. Thus we have to wait for the long time to know any probability. As Keynes quipped, in the long run we shall all be dead. Limiting relative frequency as a justification of or for a definition of probability suffers from the same flaw as betting as a definition does: they are backward. Study this objection closely. No probability can be known unless the infinite collective be surveyed. Since this never has yet happened, and never will in any of our lifetimes, no probability can ever be known. Probabilities can be made up, of course, in the subjective sense, and this is exactly what frequentists must do whenever they want to make a calculation: make up numbers.

Alan Hajek has done yeoman service in regard to showing the problems with limiting relative frequency with two papers listing 30 arguments against the theory. These do not exhaust all possible criticisms, nor are all (as he admits) strong, but they are all good and, taken together, are conclusively devastatingly. Let’s examine some of these.

Hajek defines hypothetical relative frequentism as: “The probability of an attribute A in a reference class B is $p$ [if and only if] the limit of the relative frequency of A’s among the B’s would be $p$ if there were an infinite sequence of B’s.” Below is my numbering, not Hajek’s. I skip some of his more technical criticisms, such as those referring to Carnap’s “c-dagger” or to facts about uncountable sets or about different limits for a named sequence, as I think these mix up causality and evidence of the same. I also do not hold with his alternative to frequentism, but that is another matter.

Before we begin, the natural question is why does it seem that frequentism sometimes works? The answer: why does any approximation work? When frequentist methods heed close to the real definition of probability, they behave well, but the farther away they venture, the worse they get. Most “frequentists” implicitly recognize the difficulties of the theory, and tacitly and unthinkingly reject the idea of infinite sequences in practice without realizing that they have kicked over their theoretical support, i.e. that they are not really using frequentism. Here are the biggest objections.

(1) In order to know the probability of any proposition, we have to observe an infinite sequence. There are no observed or observable infinite sequences of anything. We can imagine such sequences—we can imagine many things!—but we can never see one. Therefore, we can never know the probability of any proposition. Hajek: “any finite sequence—which is, after all, all we ever see—puts no constraint whatsoever on the limiting relative frequency of some attribute.” A finite observed sequence may equal 0.9, but the limit may evince 0.2, or any other number besides 0.9. Who knows?

In order to picture an infinite sequence, we also, as Hajek emphasizes, must conjure a universe “utterly bizarre” and totally alien to ours. “We are supposed to imagine infinitely many radium atoms: that is, a world in which there is an infinite amount of matter (and not just the $10^{80}$ or so atoms that populate the actual universe, according to a recent census).” Universes with infinite matter are required if frequentism is to be true; or rather, if any probability is to be had. It’s unclear whether Hajek uses *universe* in the philosophical sense of all there is, in which case this criticism has far less force, or in the physical sense of the stuff local to us (or local universe in some set of universes, to speak loosely), in which case the criticism is accurate.

If you do not see this criticism as damning, you have not understood frequentism. You have said to yourself that “Very large sequences are close enough to infinity.” No, they are not. Not if frequentism is to retain its mathematical and philosophical justification. Why? *Every* finite sequence is infinitely far away from infinity. As you’ll see, the main critique of frequentism is that it confuses ontology and epistemology, i.e. existence with knowledge of the same.

(2) If our premises are E = “This is an n-output machine with just one output labeled * which when activated must show an output, and this is an output before us”, the probability of Q = “An * shows” is 1/n as we have been defining it. A frequentist may assert that probability for use in textbook calculations (e.g. which he often does, say, in demonstrating the binomial for multiple throws of hypothetical dice), but in strict accordance with his theory he has made a grievous error. He has to wait for an infinite sequence of activations first before he *knows* any probability. The only way to get started in frequentism is to materialize probability out of thin air, on the basis of no evidence except imagination. Probabilities may be guessed correctly, but never known. Frequentists are thus, whenever they give examples, are acting as secret subjectivists.

(3) In the absence of an infinite sequence, a finite sequence is often used as a guess of a probability. But notice that this is to accept the argument definition of probability, which in this case is, given only E = “The observed finite relative frequency of A is $p$” the probability of Q = “This new event is A” is approximately equal to the observed relative frequency $p$. Notice that logical probability has no difficulty taking finite relative frequencies as evidence.

For a frequentist to agree, he first has to wait for an infinite sequence of observed-relative-frequencies-as-approximations before he can know the probability that P = $\Pr(\mbox{Q} | \mbox{E})$ is approximately equal to the observed finite relative frequency is high or 1. Nothing short of infinity will do before he can know any approximation is reasonable. Unless he only takes a finite sequence of approximations and uses that as evidence for the probability all finite sequences are good approximations, but then he is stuck in an infinite regress of justifications.

(4) Hajek: “we know for any actual sequence of outcomes that they are not initial segments of collectives, since we know that they are not initial segments of infinite sequences—period.” This follows from above: even if we accept that infinite collectives exist, how do we know the initial segments of those collectives are well behaved? “It is not as if facts about the collective impose some constraint on the behavior of the actual sequence.”

If hypothetical frequentism is right, to say any sub-sequence (Von Mises’s more technical definition relies on infinite sub-sequences embedded in infinite sequences, which is a common method in analysis; here I mean finite sub-sequence) is “like” the infinite collective, is to claim that the infinite collective, which is not yet generated, “reaches back” and causes the probabilities to behave. And this is impossible. In other words, something else here-and-now is causing that sequence to take the values it does, and probability should be a measure of our knowledge of that here-and-now causality.

(5) Hajek: “For each infinite sequence that gives rise to a non-trivial limiting relative frequency, there is an infinite subsequence converging in relative frequency to any value you like (indeed, infinitely many such subsequences). And for each subsequence that gives rise to a non-trivial limiting relative frequency, there is a sub-subsequence converging in relative frequency to any value you like (indeed, infinitely many subsubsequences). And so on.”

And how, in our finite existence, do we know which infinite subsequence we are in? Answer: we cannot. We do not. The problem with infinities is anything possible can and will happen. There is no justification whatsoever, if frequentism is true, for treating with any finite sequence.

(6) Our evidence is E = “One unique never-before-seen Venusian mlorbid will be built. It has n possible ways of self-destructing once it is activated. It must be activated and must self-destruct. X is one unique way it might self-destruct.” The probability of Q = “X is the way this one-of-a-kind mlorbid will self-destruct” is unknown, unclassifiable, and unquantifiable in frequency theory. In logical probability it is 1/n. Even if we can imagine an infinite collective of mlorbids, there is no way to test the frequency because Venusians build no machines. No sequence can ever be observed.

Hajek: “Von Mises famously regarded single case probabilities as ‘nonsense’…” Yet, of course, all probabilities are for unique or finite sequences of events. David Stove listed this as a key criticism against frequentism. The sequence into which a proposition must be embedded is not unique. Take Q = “Jane Smith wins the next presidency.” Into which sequence does this unambiguously belong? All female leaders? All female elected leaders? All male or female leaders elected in Western democracies? All presidential elections of any kind? All leadership elections of any kind? All people named Jane with the title of president? And on and on and on. Plus none of these can possibly belong to an infinite collective. Of course, if probability is logical, each premise naturally leads to a different, not necessarily quantifiable, probability.

(7) Hajek: “Consider a man repeatedly throwing darts at a dartboard, who can either hit or miss the bull’s eye. As he practices, he gets better; his probability of a hit increases…the joint probability distribution over the outcomes of his throws is poorly modeled by relative frequencies—and the model doesn’t get any better if we imagine his sequence of throws continuing infinitely.”

We have to be careful about causality here, but the idea is sound. The proposition is P = “The man hits the bull’s eye.” What changes each throw is our (really unquantifiable) evidence. The premises for the $n$-th throw are not the same as for the $n+1$-th throw. Hajek misses that in his notation, and lapses in the classical language of “independence”, which is a distraction. The point is that each throw is necessarily a unique event conditioned on the premise that practice brings improvements. The man can never go back (on these premises) so there is no way to embed any given throw into a unique infinite collective.

(8) Our Q = “If the winter of 1941 was mild” to our P = “Hitler would have won the war.” A counterfactual. There are many ways of imagining evidence to support P to varying degrees (books have been written!), but there is no relative frequency, not infinite and not even finite. No counterfactual Q-P has any kind of relative frequency, but counterfactuals are surely intelligible and common. A bank manager will say, “If I had made the loan to him, he would have defaulted”, a proposition which might be embedded in a finite sequence, but the judgement will have no observations because no loans will have been made. The logical view of probability handles counterfactuals effortlessly.

Addendum to the mathematically minded, especially in regards to criticisms 1–3. If we assume we know a probability, we can compute how good a finite approximation of that probability is, which is essentially what frequentist practice boils down to. But since, if frequentism is true, we can never know any probabilities, we can never know how good any approximation in practice is.

*Subscribe or donate to support this site and its wholly independent host using credit card click here*. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.

1 – umm.. you cannot have an infinity of anything real in a finite universe, but you can have an imaginary infinity of anything in an imaginary infinite universe, regardless of how clearly you can imagine either one.

7 – I think Hajek’s imaginary darts series converges, so this application of the argument fails – frequentism makes sense, however, if you simply understand all variations of the 1/n argument to amount to the fact that the information usually modifies our estimate of P(E) – i.e. P(E| frequency observations) = P(E| some info) with neither sequencing nor causal attributions to the info part.

ICYMI – https://arstechnica.com/science/2024/09/meet-the-winners-of-the-2024-ig-nobel-prizes/

“Probability

Citation: Frantis?ek Bartos?, Eric-Jan Wagenmakers, Alexandra Sarafoglou, Henrik Godmann, and many colleagues, for showing, both in theory and by 350,757 experiments, that when you flip a coin, it tends to land on the same side that it started.

Flipping a coin is a time-honored practice that many consider to be the epitome of a chance event—hence our reliance on a coin flip to fairly decide certain outcomes, such as which of the Wright brothers got to attempt the first flight in 1903 or who got first pick in the 1979 NBA draft, resulting in Magic Johnson playing for the Los Angeles Lakers rather than the Chicago Bulls. A physicist will tell you that a coin toss isn’t random but purely deterministic under classical Newtonian mechanics, with the perceived randomness arising from small fluctuations in initial conditions like starting position, upward force, and angular momentum, for example.

The standard model of coin flipping predicts a 50/50 chance of a coin landing either heads or tails, i.e., there is no heads-tails bias. But in 2007, a Stanford statistician named Persi Diaconis proposed that the act of flipping a coin introduces a small wobble—a change in the direction of the axis of rotation throughout the coin’s trajectory that causes a coin to spend more time in the air with the initial side facing up. So there should be a slight same-side bias, such that there should be a 51 percent chance that a coin lands on the same side as it started.

Bartos? et al. wanted to test the Diaconis model. There have been many prior coin-tossing experiments, from Count de Buffon in the 18th century to the 40,000 coin flips collected in a 2009 experiment specifically designed to test Diaconis’s hypothesis. (The results were ambiguous.) Bartos? et al. surpassed them all, collecting a total of 350,757 coin flips by 48 people (all but three of the authors), all recorded on video for posterity.

That data confirmed Diaconis’s prediction of a slight same-side bias. Nor did they find any trace of a heads/tail bias. The group proposed future research to determine whether “wobble tossers” have a more pronounced same-side bias than stable tossers but acknowledged that “the effort required to test this… appears to be excessive, as it would involve detailed analysis of high-speed camera recordings for individual flips.” There’s only so much tedium one can endure for science.”