(I’m assuming you have been reading previous posts. If not, do so.)
We still want this:
(1) Pr (Distance > 1 meter | normal with m and s specified) = something
Actually, we don’t; not really. We want somebody to tell us (1) or something like it. The customer doesn’t really care that it was a normal distribution that was used. What we really want are the exact list of premises which all us to say
(2) Pr (Distance > 1 meter | oracular premises) = 0 or 1
or, that is, we want the oracular premises which tell us the precise distance the boule will be from the cochonette. We want this:
(2′) Pr (Distance = x meters | oracular premises) = 1
where the x is filled in. But oracular premises don’t exist for most of life. We have to suffice ourselves with something less. This is why we can live with the premise that our uncertainty in the distance is quantified by a normal (or some other) distribution.
We can of course say, “It isn’t really a normal distribution” but this is a conclusion from probability argument, and as we recall all probability propositions are conditional on premises. What are the premises which tell us “It isn’t really a normal distribution” is true? Well, these are easy: we have them (look in the book; Chapter 4). Call this list NN (for “not normal”). That is, Given NN, it is true that “It isn’t really a normal distribution.”
But we do not list NN in (1), (2), or (2′). If we did, we could not compute any numbers. The premises would be self-negating. Just as we do not add the premise “There are no Martians” to the argument “All Martians wear hats and George is a Martian.” Well, we could add it of course. It is up to us, as adding any premise to a list in an argument is always up to us. But the point is this: Given just the original “All Martians…” the conclusion “George wears a hat” is deduced (and is probability 1). And given just the “We use a normal with a specified m and s” the probability the “Distance > 1 meter” is deduced (and is some number).
Incidentally, both the “All Martians…” and the “We use a normal…” are therefore models. So we can see that the word “model” is just another way to say “list of premises.”
When last we left our customer, he had just met a frequentist and a classical Bayesian to which he had put (1). Both the frequentist and the Bayesian declined to answer (1). Instead, the pair starting going on about the value of m (and maybe s, too) by discussing “confidence” and “credible” intervals. None of which are the least interest to the customer, who still wants to know (1). Or questions like (1), questions that have to do with actual distances of actual balls.
The frequentist declines to help, but if pressed might utter something about a “null” hypothesis that “m isn’t 0.” We’ll figure that out later. The classical Bayesian, if he can be jarred awake, can help. What he can do is to say, “Given the data and that I used a normal distribution, and given the assumptions which provides me the same numerical answers as the frequentist, I can say that I don’t know the precise value of the pair—the pair, I say—of (m,s), I can take my uncertainty of them into account to answer (1).”
What this now-modern Bayesian does is to say (m,s) = (m-value 1, s-value 1) with some probability, that (m,s) = (m-value 2, m-value 2) with some probability, and so on for each possible value that (m,s) can take. He knows these from the credible intervals he just calculated. Now for each of these values, he plugs in the guess of (m,s) and calculates (1). Then he takes all the possible values of (1) and weights them by the probability (m,s) take each of these values. In the end he produces
(3) Pr (Distance > 1 meter | normal and past data) = the answer.
There is no more talk of m and s, which are of no interest to anybody, most specifically the customer. There is only the answer to the question the customer wanted. Notice that this answer is still conditional on the “model”, the normal distribution. It is also conditional on the past data, which is no surprise.
But this means that if originally assumed the premise, “Our uncertainty in the distance is quantified by a gamma distribution” the answer to (3) will be different. Just as it would be different if we began with a Weibull (say) or any other mathematical probability distribution.
Which probability distribution is the “right” one? Well, that is a conclusion to a probability argument. Which premises will we supply to ascertain the probability that that normal, or gamma, or whatever, is the “right” one? That again is up to us. We’ll talk more about this in detail at another time. But for now first suppose we have the evidence/premises, “I have three probability models, normal, gamma, and Weibull. Just one of these is the right one to quantify uncertainty in distance.” Given just this information, the probability that any is right is 1/3.
We could then take this information and compute a (3) for each model, then weight the three answers (the three numerical answers to (3)) to produce this
(4) Pr (Distance > 1 meter | assumptions about distributions and past data) = better answer.
Notice that there is no talk about which distributions make up (4). They disappeared just as the m and s disappeared when we went from (1) to (3).
The point: every statistical problem the modern Bayesian does is just like this. He attempts to answer the actual questions real customers ask him.
Check for typos.
Wine tour today.
Also, have your spreadsheets ready for tomorrow.