The Lady Tasting Tea: Bayes Versus Frequentism; Part I (updated)

Since the subject arose yesterday, and for other reasons which I’ll explain later, I thought we should revisit this series, which ran a year ago. I have edited and expanded the text (just this Part so far) taking into account user comments. Plus I realize that I never provided a promised 4th installment, so I’ll do that too.

A certain English lady claimed to be able to tell whether her tea or milk was poured first into her cup. The statistician and geneticist Ronald Fisher put her to the test by presenting her four cups with the tea poured first and four cups with the milk poured first. The lady did not know the order of the cups.

Can we use this experiment to discover whether the lady has the ability she claims? We only have the evidence of this one test. The situation seems straightforward enough, but it isn’t. The difficulty lies in defining has the ability (see the Sorites Paradox Isn’t for what can happen when definitions aren’t paid sufficient attention). We cannot afford to be sloppy here.

Which of these best describes “has the ability”:

  • She always guesses correctly (she is never wrong),
  • In any experiment with N cups, she always gets at least N/2 right,
  • In any experiment with N cups, she might get at least N/2 right,
  • She always guesses correctly when the tea is poured first, but will sometimes guess wrongly when the milk is poured first,
  • She always guesses correctly when the milk is poured first, but will sometimes guess wrongly when the tea is poured first,
  • She guesses all cups correctly until the Mth cup (M < N), after which her palate becomes fatigued. M may depend upon a host of factors, such as the time of day, the food she at earlier that morning, her mental attitude, and so forth,
  • She guesses at least M/2 cups correctly until the Mth cup (M < N), after which her palate becomes numb, etc.?

We could have expanded this list easily. For example, “She always guesses at least W/2 cups correctly when the tea is poured first, but will sometimes guess wrongly when the milk is poured first, where she is presented with 2W = N total cups.” Some of these lead to tricky counting, because if, say, she always guesses the tea-first cups correctly, and these come first in the sequence, and she assumes she knows these guesses are correct, after she sees N/2 cups she knows all the rest will be milk-cup first and she will therefore guess accordingly.

None of these definitions is in any way strange: each could really be what we mean when we say this lady knows her elevenses. “Hey!”, you might ask, “Where is the classical ‘She guesses better than chance?'” Are you sure it’s not already there? Let’s see.

The phrase guesses better than chance must be an idiom, because chance is not causative; that is, chance cannot be presented with cups of tea and asked to guess. So what is it idiomatic for?

Imagine an experiment where you are presented with N cups, but you do not touch, sniff, taste, or see inside these cup. You do not even see or know who places them in front of you; indeed, the cups can be left in a distant room, miles away from you. However, you must still make a guess whether the tea or milk was poured first into these occult cups. You could guess none right, or just 1, or just 2, and so on up to all N. What is the probability that you guess none right? Because our evidence (or premises) do not specify any known causal path for you to guess correctly, and because there is a natural ordering of guesses, we deduce the probability you guess any individual cup correctly equals 1/2. As long as you are not told whether your prior guesses were correct, this probability remains fixed.1

In particular, you are not asked to guess the sequence, but whether tea or milk was poured first; i.e. we want to know the number of your correct guesses and are not interested in the order of these guesses. Also notice that there is no information in these premises that suppose there will be an equal number of tea-first and milk-first cups. But even if there were, even if we knew there were equal numbers of each and thus that there were 2N possible sequences of cups, we are still not interested in the probability of your guessing sequence. For those who know, the number-sequence distinction is what allows us to pick between Carnap’s c* and c+ measures (a stumbling block for some, which at one time caused skepticism over logical probability).

The number you guess correctly—given no causal path—thus follows a binomial (if we don’t know how many of each cup; if we do, see Part IV). Importantly, you could guess, and we could figure the probability of your guessing, none right, or just 1, or 2, or even all. So, “guessing by chance” must mean the ability to guess any number correctly. Since you can and will guess some number (even all) correctly, you cannot “guess better than chance.” There is circularity. No matter if you get 0 right, 1 right, up to N right, all are consistent with guessing by chance. But we have at least learned that “by chance” means “by no (known) causal path.”

Now suppose it’s you against the lady; same lack of causal path for you, and her using all her powers. Who will win? If she always guesses correctly, then at best you could only match her. The probability of matching is (1/2)N, which makes the probability of her beating you 1 – (1/2)N. We deduce this assuming she never fails. Similarly, if we assume that “had the ability” means that “in any experiment with N cups, she always gets at least N/2 right”, and although the math is slightly more complicated, we could also calculate the probability of you tying, losing to her, or even winning.

We could go through each of our definitions of “has the ability” (and more like them) and calculate probabilities of you winning, losing, or tying. But none of these exercises tells us which of these definition is true, or which is more likely true than another. For that, we must turn our thinking around.

Read Part II.


1This becomes extremely important in, say, ESP experiments. See Persi Diaconis (who first taught me of this) and Ron Graham’s, “The analysis of sequential experiments with feedback to subjects” in the Annals of Statistics, 1981, 9, 3-23.

Thanks to long-time reader Mike B for suggesting this example. See also the original posts defining predictive inference: here, and here.


  1. Manoel Galdino

    Waiting for part II…

    And I missed something like “she has a x percent chance (probability) of guessing correct”, but not in a Bayesian sense, in a frequentist sense. Maybe she can perceive the difference, but may confuse things a bit because this is not exact science!

  2. Briggs

    Manoel Galdino,

    Are you sure your example isn’t in the list?

    Anytime you must use probability, you are dealing with an inexact science.

  3. Lost you in the paragraph saying that the number of correct guesses is binomial. At least in the original Fisher example, the lady knew that there were 4 cups of each type, so the number of correct guesses would be hyper-geometric.

    So the expected number of the lady’s correctly guesses the four cups of type A is therefore 2, if she just selects 4 cups at random and claims that they are type A, and the other are type B. Better than chance would mean then that the number of correct type A guesses is greater than 2

  4. Hmmm… I blame my difficulty in grasping these posts in a lack of familiarity with the terminology and understanding why these questions need be asked in the first place (which I blame fully on myself). I am curious though….

    You talk about doing ‘better than chance’ and ask ‘what does that mean’ (I think). I always figured it meant… well let’s use this example.

    Say we took N amount of coins (all perfectly weighted, balanced, etc etc). We designated Heads to equal milk and tails to equal tea. Then we had someone walk down the line of cups, flipping a coin for each one and placing the result by it.

    This strikes me as a “pure chance” guessing. And thus we could then determine whether the lady’s results matched the cups any better than the random coin flips.

    (but then, I’m quite sure I’m missing something)

  5. Briggs

    Joe Levy,

    The difficulty starts with using the phrase “the expected number”, which is a frequentist concept which supposes a certain, fixed model is true (which?). I have not reached that point. I mean to say, you are considering only the one point of view, and I have not yet reached the difference between Bayes and classical.

    Also, the binomial is for you guessing (given no known causal path, and no feedback), not the lady.


    I’ll answer by asking you: how does your coin flip differ from the no-known-causal-path scenario? We begin to see the difference in philosophy: I am asking questions about physics (if you like).

  6. Wayne

    Nate: I’m just learning myself, but I think the thing you may be missing is that you are generalizing beyond your analogy. You’re not talking about a single set of coin flips and your single set of guesses. I think you’re talking about a limit if you repeated the coin flips and your guesses many, many times.

    (You also have to define “matched the cups better than”. You’re not talking about you getting 6 right (out of N) versus the coins “getting” 5 right — which might be the case in a single trial — but rather you getting n or more right versus the coins getting (n-1) or fewer right, for all n <= N, but I'm not sure if that makes any difference.)

  7. As a frequentist I demand that the lady be required to drink an infinite number of cups of tea. Is this impossible? Heck no, take my wife for instance.

  8. SteveBrooklineMA

    My understanding is that the Lady knows there are 4 cups of each type, so 8 total. If she is to label four of one type and four of the other, then it is not possible for her to get an odd number right. So I’m not sure where the binomial comes in.

    Also, I think that “by chance” here refers not to the process of the Lady choosing, but to Fisher’s placement of the cups. Say there are 8 saucers on the table, and the lady marks 4 of them as “milk first.” With that marking fixed, we can imagine cycling through all 70 ways of choosing 4 of the 8 saucers for the “milk first” cups, and note how many the Lady would have right in each of these 70 cases. We’d find that on average she would get 2 “milk first” cups right, thus 4 total cups right on average.

  9. Briggs


    If you are “not sure where the binomial comes in” then you need to re-read the article, plus my comment to Joe Levy. The idea of “chance” had nothing to do with Fisher’s placement, since he (or his agent) knew which cups were which. “Chance” isn’t an active thing, it isn’t causative.

  10. Oh… ok, I think I’m getting it.

    I’ll answer by asking you: how does your coin flip differ from the no-known-causal-path scenario?

    That would be your scenario of a person picking cups without any knowledge, correct?

    We begin to see the difference in philosophy: I am asking questions about physics (if you like).

    Judging from your past record, I was assuming I agreed with you, once I figured out what you said. 😉

    I guess there’s no real difference, except that I would always be suspicious of a person not gaining some knowledge to influence their picks. The coin example was all I could think of to ensure a truly random guess for purposes of measuring chance.

  11. Briggs


    A coin flip cannot generate a “truly random guess.” See Jaynes’s book, or any of multiple articles by Persi Diaconis. Coin flips are entirely predictable if one knows the initial conditions of the flip.

    What is random to you in the no-known-causal-model scenario, is that you do not have evidence which cup is which. Random simply means unknown.

  12. JH

    My 2 cents.

    Given that
    (1) JH has no such ability; (No, she doesn’t.)
    (2) she doesn’t know that there is an equal number of milk-first and tea-first cups;
    (3) the probability of a correct guess for each trial remains the same, ½;
    (4) she guesses independently;
    and if we observe the number (X) of correct answers in the N guesses, then X follows a binomial distribution.

    Now if (5) she knows in advance that there is an equal number of milk-first and tea-first cups, her next guess might depend on the past guesses. That is, assumption (3) doesn’t hold. Then the binomial distribution is not appropriate for X.

    Why hypergeometric distribution? The assumption (5) makes the difference.

  13. JH

    ~continuation of my previous comment~

    Given (1), (5), and that JH is to guess on all N cups, basically, she is then assumed to have randomly selected N/2 cups and assigns them as “milk-first”. Think as if there are N balls in a box, of which N/2 are red (milk-first). Assuming they are equally likely, now randomly select N/2 balls (cups) as she does, what is the probability of selecting an x number of red balls? Hypergeometric distribution.

    How we collect data and what kind of data we observed are important! For example, do we reveal the answer immediately after she guesses on each cup? Do we stop the experiment once she believes that she already identifies N/2 cups of milk-first cups?

    Assumptions and data structure matter, Bayesian method or not!

    This is a perfect example of why identification of an appropriate model/likelihood function with the data structure is one of the most fascinating components of statistical modeling.

  14. pouncer

    First, the balls in the box shouldn’t include red ones. Black for tea (in first) and white (for milk in first) only, please

    To simulate the expected value (no particular ability to distinguish tea from milk) draw from the box, without replacement. The first ball has an equal chance of matching the first cup. Assume the experimenter knows (a la Monty Hall) whether the trial produces a match. If the first trial matches, then the seven remaining balls represent a situation congruent to the seven remaining cups. But if the first trial fails, then the likelihood of the second cup matching the next ball drawn is VERY reduced. There remain 4 of the “other” cups but only 3 of the “possibly matching” balls.

    Two failed trials make the situation even worse. Or — wait. I’m not sure. If the second ball is the other color from the first ball, even if both trials failed, then the distribution of balls for the third trial is once again congruent to the distribution of cups. Wait.

    Ah… this is why there’s math. The mental picture isn’t working. But it seems that the best the third trial can do is another 50/50 chance and the chances of it being much worse are, uhm, half of one-third of a chance?

    Ah well. Will doodle on this more later. Thanks Dr Briggs.

  15. Doug M

    I am still frustrated. I know what the frequentist would do.

    The frequentist would assume that the lady has no skill. There are 70 ways to configure 8 cups of tea with exactly 4 poured milk first. (256 for an unknown number poured milk first). Assuming no skill, the probability that she is perfect is then 1/70 (1/256).

    The frequentist then says that this possibility is sufficiently remote, and therefore the lady has some skill in tea tasting. The frequentist does not attempt to ascertain the nature of her skill as broken down by our dear professor.

  16. pouncer

    But we only have this one test. Not seventy.

    Okay, I need TWO boxes containing 4 white and 4 black balls, each. I conduct 7 trials (to produce 8 pairs, the eighth being totally determined by the seven earlier pairings.) A trial is me, blindly drawing a ball from each box, without replacement.

    I don’t care about the math. I don’t care about the sequence.

    I determine how many pairs I have with the same colored balls. (If we repeated the trials every Tuesday, I might expect to see other numbers of pairs on the other weeks. But we do NOT repeat the trials. One test only. )

    That number of pairs I select from my boxes becomes my base case.

    To demonstrate her claim to me, the lady must correctly “pair” her identification with the pour-er’s; more times than I paired colored balls from two boxes. “More” is > (not >=) my number. If she does so, I will testify upon my honor as a gentleman that I have seen her “do it”. If not, I will regretfully report the lady, like many of her class and gender, has an exaggerated sense of her own good taste.

  17. SteveBrooklineMA

    It still seems like the relevant distribution here is hypergeometric, not binomial. I choose 4 of 8 cups at random. The chance my choice matches Fisher’s choice of 4 exactly is 1 in 70, simply because there are 70 ways to choose 4 out of 8.

  18. Looked up Persi Diaconis, watched a fascinating lecture he gave.

    Thanks much for the reference, Briggs. You’ve expanded my mind today.

  19. Briggs


    Perhaps it will help if you were to try to write how, in whatever order the cups come, you not being able to see, touch, etc. allows your hypergeometric.

Leave a Reply

Your email address will not be published. Required fields are marked *