Let’s make sure we grasped yesterday’s lesson. Emails and comments suggest we have not. These concepts are hardest for those who have only had classical training.
We want to know something like this: what is the probability the boule will land at least 1 meter from the cochonette? Notice that this is an observable, measurable, tangible question. A natural question, immediately understandable, not requiring a degree in statistics to comprehend. Of course, it needn’t be “1 meter”, it could be “2 meters” or 3 or any number which is of interest to us.
Now, as the rules of logic admit, I could just assume-for-the-sake-of-argument premises which specify a probability distribution for the distance the boule will be from the cochonette. Or I could assume the uncertainty in this distance is quantified by a normal distribution. Why not? Everybody uses these creatures, right or wrong. We may as well, too.
A normal distribution requires two parameters, m and s. They are NOT, I emphasize again, the “mean” and “standard deviation.” They are just two parameters which, when given, fully specify the normal and let us make calculations. The mean and standard deviations are instead functions of data. Everybody knows what the mean function looks like (add all the numbers, divide by the number of numbers). It isn’t of the slightest interest to us what the standard deviation function is. If you want to know, search for it.
Since I wanted to use a normal—and this is just a premise I assumed—I repeat and you should memorize that this is just a premise I assumed—since, I say, I want to use a normal, I must specify m and s. There is nothing in the world wrong with also assuming values for these parameters. After all (you just memorized this), I just assumed the normal and I am getting good as assuming.
With m and s in hand, I can calculate this:
(1) Pr (Distance > 1 meter | normal with m and s specified) = something
The “something” will depend on the m and s I choose. If I choose different m and s then the “something” will change. Obviously.
The question now becomes: what do statisticians do? They keep the arbitrary premise “The normal quantifies my uncertainty in the distance” but then add to it these premises, “I observed in game 1 the distance D1. In game 2 I observed the distance D2 and so on.”
These “observational” premises are uninteresting by themselves. They are not useful, unless we add to them the premise, the quite arbitrary premise, “I use these observations to estimate m and s via the mean and standard deviation.” This is all we need to answer (1). That is, we needed a normal distribution with the m and s specified and any way we guess m and s give us values for m and s (right?). It matters naught to (1) how m and s are specified. But without the m and s specified, (1) CANNOT be calculated. Notice the capitals.
Here is what the frequentist will do. She will calculate the mean (and standard deviation; but ignore this) and then report the “95% confidence interval” for this guess. We saw yesterday the interpretation of this strange object. But never mind that today. The point is the frequentist statistician ignores equation (1) and instead answers a question that was not asked. She contents herself with saying “The mean of the distances was this number; the confidence interval is this and such.”
And this quirky behavior is accepted by the customer. He forgets he wanted to know (1) or assumes the statement he just received is a sort of approximate answer to (1). Very well.
Here is what the classical Bayesian will do. The same thing as the frequentist. In this case, at least. The calculations the Bayesian does and the calculation the frequentist does, though they begin at different starting points, end at the same place.
The classical Bayesian will also compute the mean and he will also say “The mean is my best guess for m.” And he will also compute the exact same confidence interval but he will instead call it a credible interval. And this in fact represents a modest improvement, even though the numbers of the interval are identical. It is an improvement because the classical Bayesian can then say things like this, “There is a 95% chance the true value of m lies inside the credible interval” whereas the frequentist can only repeat the curious tongue twister we noted yesterday.
The classical Bayesian, proud of this improvement and pleased the numbers match his frequentist sister’s, also forgets (1). Ah well, we can’t have everything.
There is one more small thing. The classical Bayesian also recognizes that his numbers will not always match his frequentist sister’s. If for instance the frequentist and classical Bayesian attack a “binomial” problem, the numbers won’t match. But when normal distributions are used, as they were here and as they are in ordinary linear regression, statisticians are one big happy family. And isn’t that all that matters?
You should have been collecting your data by now. If not, start. We’ll only be doing ordinary linear regression according to the modern slogan: Regression Right Or Wrong!