Read Part I: Again, the text (up to this part) has been corrected and expanded.
Recall our overarching—our only—goal. We want to know whether the sweet old lady “has the ability” is true or false, or if not true or false, then with what probability it might be true. Never lose your grip on this. Repeat it to yourself after each paragraph.
To judge this probability we have the evidence of our experimental setup, and whatever facts may be deduced from these premises. We also have the evidence of the experiment itself: how many cups she got right and wrong. Can we agree that we should only use this information and no other? I mean, we should only use the evidence of what happened. What didn’t happen and what we cannot deduce from our experimental setup is information which is entirely irrelevant. So for example if we gave the lady N = 8 cups, it is irrelevant that we could have given her N = 50 cups, or whatever. We gave her 8 and we have to deal with just that information. We do not want to fool or distract ourselves.
These are of course is trivial requirements, but I put them there to focus the mind on the question.
Now, if we accept that “has the ability” means “She always guesses correctly”, then the probability that the lady correctly identifies any cup placed before her is 1, or 100%. This phrase is also our model. I mean, “She always guesses correctly” is our model, our theory, our hypothesis.
Why did we assume this particular model? Well, the choice was up to us. It is one interpretation of—it naturally follows from—“has the ability.”
Given this model/hypothesis, and before putting her to the test, what is the probability distribution for guessing correctly none right, just 1 right, just 2 right, etc., up to all N right? It is 0 (or 0%) for all numbers except for N, where it is 1, or 100%. Think about this.
But suppose we run our experiment and she correctly identifies only 3 < N cups. Given just our model, what is the probability that she guesses 3 correct? Again, 0. This proves the principle that any (logical) argument can only be judged by the premises given, and by no other information. However, suppose we conjoin our model with our observation "She always guess correctly" & "She guessed 3 < N correctly" and, conditioning on this joint statement, re-ask what is the probability that she guess 3 correct? It is unanswerable because we are conditioning on a contradiction, a statement which is necessarily false. Actually, given this necessary falsity, we could derive any numerical value for guessing 3 correct, but this is obviously absurd. We have two probabilities, the first of which is: (1) Pr("She guesses 3 < N correctly" | "She always guess correctly") = 0. But we can turn the question around and ask (the question):
(2) Pr(“She always guess correctly” | “She guessed 3 < N correctly”),
which is obviously 0. This is a rare instance where we have falsified a model—a situation only possibly when a model says “X cannot be” yet X obtains or occurs. That cannot is dogmatic, a logical word: it means just what it says, X is impossible—not unlikely—but impossible.
Now, the question is this:
(3) Pr(“She has the ability” | “She guessed M out of N correctly” & Experimental set up),
where “has the ability” is for us to define (such as “always guesses correctly”), M and N are observations of the experiment, where we also take care to consider the Experimental set up (from this we know what N is, etc.).
Asking (3) the probability a model is true is a natural question in Bayesian probability, but not in frequentism where any statement/question must be embedded in an infinite sequence of “similar, but randomly different” statements/questions. It is difficult, perhaps impossible, to discover in what unique infinite sequence this (or any) model-statement lies. I hope you understand how limiting this is. Of course, it is possible to develop non-theory-dependent rules-of-thumb for deciding a model’s truth or falsity, but any true theory of probability must be able to answer any question put to it in a non-ad hoc manner.
For example, Bayesian probability can handle the following situation, whereas frequentist probability cannot. Given the premise, “Only 1 out of all M green men from Mars are Y”, the probability that this green man from Mars is Y is 1 / M. Bayesian probability can also answer all counterfactual questions (“If Hillary did not cry at that press conference, she would have been the Democrat nominee for president”), whereas frequentist probability can answer none. In both instances, frequentism cannot because the statements cannot be embedded in a unique infinite sequence. There cannot be sequences of little green men, nor can there, by definition, be any counterfactual situations, let alone sequences of them.
What about the rest of our models/interpretations of “has the ability”? We last time outlined several possibilities, each of them consonant with the phrase “has the ability.” Which of these is the correct model and which are incorrect? That is up to us. It is an extra-logical, extra-probability question—at least with respect the premises we have allowed ourselves in this experiment.
Now, we could go through a similar procedure as above and calculate the probability each interpretation is true. That is, if we do not have a fixed idea in advance which interpretation (model) is true, we could use the evidence from the experiment to tell us which is most likely than any of the others.
However, we must start from somewhere: some external evidence must tell us how likely each of these models is before we begin the experiment. It doesn’t matter what this external evidence is; it merely must exist. The most common evidence allows us to derive that each is equally likely (before the experiment commences). After taking observations, we could recalculate the truth of each model given this new evidence. Once more, this scheme is natural in Bayesian probability, but not in frequentism.
Let us now assume a definite model structure and see where it gets us. We suppose the lady guesses each cup correctly or not, that she knows she will see an equal number of tea-first and milk-first cups, and that she is provided no feedback about the correctness of her guesses; we assume her palate never fatigues and that her “hit rate” is the same for either cup type. We will not assume perfection, but we allow its possibility. Indeed, it might even be that she always get every cup backwards; i.e. she is always wrong, but in a very useful way. This is as bland a set of premises as possible. In advance of the experiment, we will assume merely that she can get any number of cups right, from 0 to N.