Carl Bialik, the Wall Street Journal‘s Numbers Guy, has two columns about BEST’s temperature reconstruction model: “Global Temperatures: All Over the Map” and “A New Trove of Global Temperature Numbers”). Read them.
Of those statistically modeled temperature reconstructions, Bialik wisely sought out a quote from a Peer1:
The Berkeley scientists “have a very complicated model,” says William Briggs, a member of the probability and statistics committee of the American Meteorology Society. “They reported on the setting of one of those dials” in the model. “That is not the actual temperature.”
Briggs is right. Sort of. The BEST model, like all statistical models, has a large number of dials which need to be set before it will produce results. How to set them all to the proper place is an interesting question, but one which is not of primary interest. Why one would want to report on the settings and not on the actual thing modeled, i.e. the yearly global average temperature, is what concerns us.
The conflation of settings of dials and the actual thing modeled is, as regular readers will know, an enormous and persistent error many users of statistical methods make. The reason it is an error is that our certainty of the dial setting does not translate into equality certainty of the thing modeled.
Let me explain. We can write down some very complicated statistical model of yearly global average temperature (GAT) as a function of all sorts of things, like location, yearly change, time, effects other locations have on this location, and on and on. Each of these things will have associated with it a dial, or parameter, which must be set to just the right position. The BEST model, for example, has a very large number of these dials.
Various formulas will let you set each of the dials, with different formulas producing different sets of configurations. Which is the right formula? Shhh. Don’t ask. Just pick a formula and set the dials. The formula will tell you the value on all the dials, but you can report on just the dial associated with, say, yearly change. This is where the problem begins.
Nobody believes, and it isn’t true, that the setting of the dials is perfect, without error, exact. There is some uncertainty. Assuming the model of GAT and the formula used to set the dials are perfection themselves allows us to produce a guess of the uncertainty of the dial settings. If you are feeling especially vigorous, you can report that uncertainty along with the dial settings. This signals to your audience your honest and tells them that you don’t quite know what the exact dial settings are.
But in the end all you are left with is information about a bunch of dials and their values. Who cares about those? Well, statisticians, maybe, but not the civilians who remember that the original goal was to produce a guess of the yearly GAT through time. How do we get that?
What we can do is to set each dial at its best guess and then use the model to give us a guess of the yearly GAT. This works. But since nobody believes that the settings of the dials is perfect, the guess of the yearly GAT will be too certain.
To fix that, we run through all possible settings of the dials, and for each setting we produce a new guess of the yearly GAT. Not all settings of the dials are equally likely, so we give more weight to the settings that are most likely, and give small weight to the settings that are not so likely. We do this for all dials simultaneously, not just one or two.
The end result will be a guess of yearly GAT that accounts for the uncertainty we have in the dial settings. We do not report on dial settings at all. Necessarily, this guess of GAT will be more uncertain (relatively speaking) than the certainty we have in any particular dial setting. This is what BEST should have done but did not.
Careful readers will have remembered that we still assumed the model and formulas are without error. How can we know that? There’s only one sure method. Bialik again quoted from an expert:
“The only way to prove these models are any good is to demonstrate that they skillfully predict new data,” said William Briggs…”None have done this. There is reason to suspect all models.”
Briggs is right again.
1As in one who reviews.