Class 59: The Hardest Hypothesis In The World

A lady says she has the ability to identify whether tea or milk is poured in a cup first. What does “have the ability mean?” and why is it so hard to figure probabilities for it?

Video

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Given below; see end of lecture.

Lecture

This is an excerpt from Chapter 9 of Uncertainty.

Lady Tasting Tea

Finally, here is an extended analysis showing how hideously difficult it can be to discover a cause in the simplest of situations, using an example known to many historians of statistics. A certain English lady claimed to be able to tell whether her tea or milk was poured first into her cup. The statistician and geneticist Ronald Fisher put her to the test by presenting her four cups with the tea poured first and four cups with the milk poured first. The lady did not know the order of the cups.

Can we use this experiment to discover whether the lady has the ability she claims? We only have the evidence of this one test. The situation seems straightforward enough, but there are hidden depths. The difficulty lies in defining has the ability. We cannot afford to be sloppy here.

Which of these best describes “has the ability”:

She always guesses correctly (she is never wrong);
In any experiment with $N$ cups, she always gets at least $N/2$ right;
In any experiment with $N$ cups, she might get at least $N/2$ right;
She always guesses correctly when the tea is poured first, but will sometimes guess wrongly when the milk is poured first;
She always guesses correctly when the milk is poured first, but will sometimes guess wrongly when the tea is poured first;
She guesses all cups correctly until the $M$th cup ($M < N$), after which her palate becomes fatigued. $M$ may depend upon a host of factors, such as the time of day, the food she at earlier that morning, her mental attitude, and so forth;
She guesses at least $M/2$ cups correctly until the $M$th cup ($M < N$), after which her palate becomes numb, etc.?

I could have expanded this list easily. For example, “She always guesses at least W/2 cups correctly when the tea is poured first, but will sometimes guess wrongly when the milk is poured first, where she is presented with 2W = N total cups.” Some of these lead to tricky counting, because if, say, she always guesses the tea-first cups correctly, and these come first in the sequence, and she assumes she knows these guesses are correct, after she sees N/2 cups she knows all the rest will be milk-cup first and she will therefore guess accordingly.

None of these definitions is in any way strange: each could really be what we mean when we say this lady knows her elevenses. Where is the classical “She guesses better than chance?” Are you sure it’s not already there? The phrase guesses better than chance must be an idiom, because as we have learned, chance is not causative; that is, chance cannot be presented with cups of tea and asked to guess. So what is it idiomatic for?

Imagine an experiment where you are presented with N cups, but you do not touch, sniff, taste, or see inside these cup. You do not even see or know who places them in front of you; indeed, the cups can be left in a distant room, miles away from you. However, you must still make a guess whether the tea or milk was poured first into these occult cups. You could guess none right, or just 1, or just 2, and so on up to all N. What is the probability that you guess none right? Because our evidence (or premises) do not specify any known causal path for you to guess correctly, and because there is a natural ordering of guesses, we deduce via the statistical syllogism the probability you guess any individual cup correctly equals 1/2. As long as you are not told whether your prior guesses were correct, this probability remains fixed. This lack of feedback success becomes extremely important in, say, ESP experiments. If the subject knows how many successes and failures he has had, and the total number of guesses, he could use this information like in card counting to modify his future guesses. An example of how this plays is given below.

Here, you are not asked to guess the sequence, but whether tea or milk was poured first; i.e. we want to know the number of your correct guesses and are not interested in the order of these guesses. Also notice that there is no information in these premises that suppose there will be an equal number of tea-first and milk-first cups. But even if there were, even if we knew there were equal numbers of each and thus that there were 2^N possible sequences of cups, we are still not interested in the probability of your particular guessing sequence, but only in the total correct.

The uncertainty in the number you guess correctly—given no causal path—thus follows a binomial (if we don’t know how many of each cup; if we do, it’s something else). Importantly, you could guess, and we could figure the probability of your guessing, none right, or just 1, or 2, or even all. So, “guessing by chance” must mean the ability to guess any number correctly. Since you can and will guess some number (even all) correctly, you cannotguess better than chance.” There is circularity. No matter if you get 0 right, 1 right, up to N right, all are consistent with guessing by chance. But we have at least learned that “by chance” means “by no (known) causal path.”

Now suppose it’s you against the lady; same lack of causal path for you, and her using all her powers. Who will win? If she always guesses correctly, then at best you could only match her. The probability of matching is (1/2)^N, which makes the probability of her beating you 1 – (1/2)^N. We deduce this assuming she never fails. Similarly, if we assume that “had the ability” means that in any experiment with N cups, she always gets at least N/2 right”, and although the math is slightly more complicated, we could also calculate the probability of you tying, losing to her, or even winning.

We could go through each of our definitions of “has the ability” (and more like them) and calculate probabilities of you winning, losing, or tying. But none of these exercises tells us which of these definitions is true, or which is more likely true than another. For that, we must turn our thinking around.

We want to know whether for this sweet old lady “has the ability” is true or false, or if not true or false, then with what probability it might be true. To judge this probability we have the evidence of our experimental setup, and whatever facts may be deduced from these premises. We also have the evidence of the experiment itself: how many cups she got right and wrong. Can we agree that we should only use this information and no other? I mean, we should only use the evidence of what happened. What didn’t happen and what we cannot deduce from our experimental setup is information which is entirely irrelevant. So for example if we gave the lady N = 8 cups, it is irrelevant that we could have given her N = 50 cups, or whatever. We gave her 8 and we have to deal with just that information. We do not want to fool or distract ourselves. These are of course is trivial requirements, but I put them there to focus the mind on the question.

Now, if we accept that “has the ability” means “She always guesses correctly”, then the probability that the lady correctly identifies any cup placed before her must be 1. This phrase is also our model. I mean, “She always guesses correctly” is our model, our theory, our hypothesis.

Why did we assume this particular model? The choice was up to us. It is one interpretation of—it naturally follows from—“has the ability.” Given this model/hypothesis, and before putting her to the test, what is the probability distribution for our uncertainty in her guessing correctly none right, just 1 right, just 2 right, etc., up to all N right? It is 0 for all numbers except for N, where it is 1. But suppose we run our experiment and she correctly identifies only 3 < N cups. Given just our model, what is the probability that she guesses 3 correct? Again, 0. This proves the principle that any (logical) argument can only be judged by the premises given, and by no other information.

However, suppose we conjoin our model with our observation “She always guess correctly and She guessed 3< N correctly” and, conditioning on this joint statement, re-ask what is the probability that she guess 3 correct? It is unanswerable because we are conditioning on a contradiction, a statement which is necessarily false. Actually, given this necessary falsity, we could derive any numerical value for guessing 3 correct, but this is obviously absurd.

We have two probabilities, the first of which is:

Pr(Guesses 3 <N correctly | Always guess correctly) = 0.

But we can turn the question around and ask

Pr(Always guess correctly | Guesses 3 <N correctly),

which is obviously 0 (and understanding there is additional evidence about the experimental set up in the probabilities but suppressed here in notation). This is a rare instance where we have falsified a model—a situation only possibly when a model says “Y cannot be” yet Y obtains or occurs. That cannot is dogmatic: it means just what it says, X is impossible—not unlikely—but impossible.

Now, the question is this:

Pr(Has the ability | Guessed M out of N, Experiment premises),

where “has the ability” is for us to define (such as “always guesses correctly”), M and N are observations of the experiment, where we also take care to consider the Experimental set up (from this we know what N is, etc.).

Asking (3) the probability a model is true is a natural question in Bayesian probability, but not in frequentism where any statement/question must be embedded in an infinite sequence of “similar, but randomly different” statements/questions. It is difficult, perhaps impossible, to discover in what unique infinite sequence this (or any) model-statement lies. I hope you understand how limiting this is. Of course, it is possible to develop non-theory-dependent rules-of-thumb for deciding a model’s truth or falsity, but any true theory of probability must be able to answer any question put to it in a non-ad hoc manner.

What about the rest of our models/interpretations of “has the ability”? We last time outlined several possibilities, each of them consonant with the phrase “has the ability.” Which of these is the correct model and which are incorrect? That is up to us. It is an extra-logical, extra-probability question—at least with respect the premises we have allowed ourselves in this experiment.

Now, we could go through a similar procedure as above and calculate the probability each interpretation is true. That is, if we do not have a fixed idea in advance which interpretation (model) is true, we could use the evidence from the experiment to tell us which is most likely than any of the others. However, we must start from somewhere: some external evidence must tell us how likely each of these models is before we begin the experiment. It doesn’t matter what this external evidence is; it merely must exist. The most common evidence allows us to derive that each is equally likely (before the experiment commences), but that’s rather arbitrary.

There is much more to this, but I fear I have already long exhausted your patience. See the video.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use PayPal. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.

Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

4 Comments

Gespenst

July 31, 2025, 2:54 pm

Perhaps one can do no better than to say, “Miss Grundworthy was presented with n cups of tea with the milk poured first and m cups with the tea poured first. She identified correctly x/n of the cups with the milk poured first and y/m of the cups with the tea poured first. What to conclude from it or do about it is left to the interested reader.”

Or is that shirking the duty of a professional statistician?
NLR

July 31, 2025, 4:38 pm

Presumably if the tea and milk are mixed up sufficiently she couldn’t tell from the taste or texture. So I’m guessing it was poured but not mixed.
gareth

August 3, 2025, 3:24 pm

@ Gespenst

Still doesn’t account for any feedback given during the tea presentations, overt or accidental or subliminal.

Anyway Briggs, yes I followed this. What is the probability that I still will once it becomes mathematical ;-)
gareth

August 3, 2025, 3:30 pm

@ NLR

Your presumption and guessing are assumptions. Shirley it was more to do with Shaken or Stirred, as with martinis ?

C-Marie on AI: The Earth Is Spinning Faster & Slower (Because “Climate Change”)January 14, 2026
Yay for the Truth!! God bless, C-Marie
John M on AI: The Earth Is Spinning Faster & Slower (Because “Climate Change”)January 14, 2026
I asked an AI: if the big bang theory is true, where is the center of the universe? The answer…
hudbwu on Confirmation Bias In The Minneapolis ShootingJanuary 14, 2026
@R W Pearson: Okay good, we're talking about it. If I understood Correia correctly, he offers the following defence: "the…
Tim on AI: The Earth Is Spinning Faster & Slower (Because “Climate Change”)January 14, 2026
Absolutely true! After rephrasing or submitting more factual information, I get even more dismissive replies. Yeah, there are certainly "No…
John M on Confirmation Bias In The Minneapolis ShootingJanuary 14, 2026
I think law enforcement needs to revisit how they respond to vehicles. If a vehicle is classified as a deadly…

Class 59: The Hardest Hypothesis In The World

Video

Lecture

Related

Discover more from William M. Briggs

4 Comments

Leave a Reply

Video

Lecture

Share this:

Related

Discover more from William M. Briggs

4 Comments

Leave a Reply