We also learned in Part III that math, while pleasant to look at, can often lead to absurdities, such as the deduction that the probability of actual observable events is 0. Let’s continue in that line with normal distributions, still thinking about Susy’s GPA. (See this comment about the finiteness of Susy’s GPA.)
Our M(eta information) = “The normal distribution quantifies our uncertainty in Susy’s GPA”. Given M, our normal model is true. But the normal, as most distributions do, comes equipped with baggage in the form of parameters. We can’t just say, “Given the normal distribution the probability of Susy’s GPA equaling ei = 0 for any i in 1 to N.” We have to instead say, “Given the normal distribution with parameters equal to Q and W, the probability of Susy’s GPA equaling ei = 0 for any i in 1 to N.” The first thing to note is the probability of seeing any real observable is 0, just as before. This is an inescapable part of modern statistical theory: real things can’t happen. It is the price we pay for having beautiful theorems.
But never mind that. Focus on the parameters, which for the normal are two. Both values must be set for the model to be specified fully. Here, I chose to set them at Q and W, which are just two numbers. It doesn’t matter to us what they are, except incidentally; knowing Q and W tells us nothing much about what we want to know. Which, lest we forget, is the probability that Susy’s GPA takes a certain value. Once Q and W are specified, we are in business—kind of.
We can’t answer our actual question (the answer is always 0), but we can answer questions like this: “Given our model, what is the probability that Susy’s GPA will be less than G?” where we can fill in the blank for G. Mathematics can at least give non-zero answers to these kinds of questions. Problem with the normal is that the answer will be be > 0 for G < 0 or G > 41. That is, the normal model will tell us there is definite probability for GPAs we know are false—given our other information, which we are ignoring when we use the normal.
Let’s not pass too quickly over this. The probabilities the normal model gives are true given we assume the normal model is true. These probabilities are just as true as the probability George the Martian wears a hat. Can we however falsify the normal model? It says there is positive probability for events we know (given our other information) that we will never see. But in the case of GPAs less than 0 or greater than 4, the model does not say that certain events we know will occur cannot happen, i.e. that they have probability 0. So it can’t be falsified here.
To nauseate us by the repetition, the normal model does say that events we know we will see (given other information) have probability 0. Thus, as soon as we see a printed GPA, we have falsified the normal model. Like Nelson, however, we turn a blind eye to this argument and press on.
But we can’t pretend we don’t see the “probability leakage”, i.e. the probability for GPAs less than 0 or greater than 4. We just hope that this known error isn’t large. But because both frequentist and classical Bayesian analysis spend their time with distractions, like scrutinizing with great intensity the incidental values Q and W, the amount of probability leakage is usually unknown. That is, standard analysis is satisfied to tell us lots of information about Q (and maybe give us a word or two about W), but it forgets the original question, which again is what is the probability Susy’s GPA takes a certain value? Even if you don’t care about the probability-0-for-real-events problem and are satisfied with saying only that there is a probability that Susy’s GPA is less than some value, you can’t ignore the probability leakage (my experience shows it’s usually large and not-ignorable).
Now, as in Part III we can assume we have a probative sample which gives information about Susy’s GPA. Both frequentist and classical Bayesian techniques show how to incorporate this information, but the problem of those parameters creeps in. Turns out that allowing uncountably infinite values for the GPA causes all kinds of difficulties in how to best incorporate the probative sample into the model’s parameters. But never mind. It turns out that both camps give identical answers for the normal model (using an “improper flat prior” in Bayes). Even though this truce has been reached, both camps still forget the original question and fall to discussing Q (and rarely W).
The reason this is a problem is two-fold: (1) we can’t get an answer to our original question, or even its modification (when we ignore the probability-0-for-real-events problem); (2) we can’t fully know how well the model performed in terms of observables.
There is a way to progress beyond (ultimately unanswerable) questions about Q and W and to answer the original question, or rather its modification (Susy’s GPA less than some level). In Bayes, this is called giving the posterior predictive distribution. We actually did this in Parts II and III, though we didn’t name it. We’ll suppose that indeed we are working with probabilities that answer our modified question. We need these to demonstrate model performance, which is our last step.
Who said this would be easy!
1Yes, other distributions besides the normal can be used, at least to fix the less than 0 problem. This isn’t the point, which is what happens when a normal is used, as it often is, in situations like this.
Update How this all relates to climate models is coming!