We have before us X1 to X156. We started by assuming that something, called T, caused these data to take the values it did. We agreed that for many physical (contingent) phenomena we can’t know T, but that we can approximate it via modeling. Very well.
For global temperatures, various physical models exist. One group thinks M1 is tops, another group is happier with M2, while a third prefers M3, and so on. It may be that M1 is better than the others for predicting precipitation and that M
We won’t know which is best until after we observe X. That is, before we see (or before we acknowledge having seen) X, we can calculate
(21) Pr( X1 – 156 | Mj),
which are the model predictions for models j = 1, 2, 3, … After we observe X we can compute
(22) G( (21)j , X ),
where G() is a goodness function which measures how close the predictions (21) are to the actual observations X and the j are over the models under consideration. There are many G(), and it may be that a model is best with one G() but not best with another. The G() you should pick should reflect the decisions you make on the forecasts (21). That is, a model said X would take certain values with a given probability, you acted on this information, and in so acting you suffered or gained. That suffering or gaining is quantified by G().
There are many off-the-shelf G() from which to choose if you are unable to think of how your predictions will be used. But, just as an aside, if you can’t imagine how your predictions will be used, you probably shouldn’t be making them. Anyway, assume we all agree on some G(). We can now order the models, from least to best, according to G(). If we do not pick a G(), we cannot speak of goodness, badness, or even indifference. This is a necessary step.
Included in the list of models under consideration are the probability models we discussed last time. Isn’t that strange? If it isn’t, it ought to be. After all, we’re mixing physical with probability models. Actually, it’s physical scientists who mix up, not mix, these models. To make things easy, suppose we are considering only two models, Mphys and Mprob. That is, suppose a “consensus” develops among all physical scientists that Mphys is the one and only physical model that anybody should use, and that Mprob is some probability model, like the regression model used last time. It needn’t be a regression model: imagine instead that Mprob is the grandest probability model you can think of.
With Mphys and Mprob in hand, we can compute (21) for each. We then wait until the X come in and then calculate (22). Either Mphys will be better or Mprob will be: there is a small chance that (22) will be equal for both. Statisticians have an automatic edge because they often do not reveal their Mprob until after the X have revealed themselves (this gives them the chance to “massage” things a bit: a perquisite of office).
Suppose that, given G() and X, Mprob is better. What does this mean? Well, that Mprob was better at describing the uncertainty in X than was Mphys. Does this mean that Mprob was therefore true and that Mphys false? No. Does it mean, as most scientists oddly believe that it does, that Mphys is still probably true but that Mprob is merely some kind of “helper” in understanding uncertainty? Not only no, but, well, just no.
We can calculate, if we want,
(23) Pr( Mprob | X ) = 1 – Pr( Mphys | X ).
Usually if G(Mprob) is better than G(Mphys), Pr( Mprob | X ) > Pr( Mphys | X ) (G() might be “strange” such that the relationship is inverted; these are degenerate situations). If this is so, if the probability of Mprob being true given the data is higher than the probability of Mphys being true given the data, would anybody believe it?
No, and neither would I; at least, not for temperature. It might be, and even is true that for some physical/contingent phenomena probability models really are better than any (known) physical model at describing the uncertainty in the observable. But for temperatures, who would believe that a statistical model is better than, say, a sophisticated global climate model? As I said, not I. But this is because (23) is the wrong equation. (23) does not account for any prior understanding we have on the two models under consideration. We really want
(23′) Pr( Mprob | X & E) = 1 – Pr( Mphys | X & E).
where E is background information pertinent to the X, including our prior probabilities that Mphys and Mprob are true.
But now if we believe that Mphys is more likely true, even after we’ve seen X, and even if Mprob is better than Mphys with respect to G(), then just what are we saying? Recall it is still true that (13) says that temperatures decreased (because they did). It may be that either or both Mphys and Mprob said that it was improbable that X would have decreased, but decrease it still did. You cannot claim, in order to refute the observations that X really did decrease (over times 1 – 156), as we saw last time, that Mphys or Mprob really have to say something about X “over longer periods.” You’re stuck with the observations no matter what.
Why are we using Mprob anyway? Don’t we believe that Mphys is much more likely to be true? Well, maybe. We believe that some physical model is better than the statistical, but how do we know that the physical model before us is it? Before answering that, consider how strange it is to abandon the physical model we currently hold to entertain statistical evidence of temperature change. Because even if the probability model did in fact show a temperature increase (recalling (13) still holds), this does not mean that the physical model did. That is, the statistical model saying one thing or another is not, in any way, proof that the physical model is true.
I’ll repeat that, because it’s important. No matter what the probability model says, it is not proof for or against the physical model. Even if G(Mprob) is wonderful, this does not imply that G(Mphys) is any good. And if you claim, because G(Mprob) is good that therefore, if not our particular Mphys, that some physical model (with the same basic theory) is therefore true, you are saying what is unwarranted.
In short, Mphys must be judged on its own. If you consider Mprob as a replacement for Mphys, then it is very well to talk of G(Mprob) besting G(Mphys)) or that (23) is “large.” It is no salvation for Mphys that G(Mprob) is “good”. If G(Mphys) is “bad”, then it is “bad” period. (The inverse is also true.)
Next time: we’re finally ready to handle X measured with error, i.e. “predictive” statistics.