This is an edited and expanded re-post from last September; it makes a natural and needed companion to last week’s series on how to statistically handle temperature time series, particularly Part V. This applies to the criticisms I made of BEST, the Bad Astronomer’s “deniers” column, and the practices of the often hyperbolic Michael Mann.
The example below is for predicting temperature via proxy, but it is just as valid for any statistical modeling example (marketing, medical, more). If you are familiar with the lingo and want to follow along for your own situation, the proxy are the X (the “independent” variables) and the temperature are the Y (the “dependent” variables). We want to predict Y as a function of X.
Suppose we are interested in temperature (in centigrade) for times which no direct measurements are available. Instead, at those times, we are able to measure a proxy. Perhaps this is a certain ratio of isotopes of some element, or it is the width of a tree ring, or whatever. We will use this proxy (X) to predict the (missing) temperature (Y).
First step: We must be able to measure both the temperature and the proxy simultaneously at some point in history. This is usually possible by finding a location where concurrent measurements of both exist. Here is a plot of what that might look like (this is a representative simulation).
The dots are the simultaneous measurements and the dashed line the result of a statistical model of proxy predicting temperature. This is a linear regression: but the exact model does not matter). It could have been a sine wave or include squared terms or whatever. Point is we have some Y = f(θ,X), Y is a function of X indexed on some unobservable parameters θ
Second step: we go to the location where no temperature measurements exist but where there are proxy measurements. Suppose one of these proxy values is, say, 122.89. The predicted temperature, via this model, was 19 centigrade. Get it? We need merely plug the values of the proxy into the model and out pop estimates of the temperatures. We can then use those estimated temperatures to make decisions of all kinds. Simple!
Except, of course, nobody believes that because the proxy was 122.89 the temperature was exactly, precisely 19oC. There is some uncertainty. The real temperature might have been, say, 19.2oC or 18.8oC, or some other value.
The classical way to express this uncertainty is to compute the parametric prediction interval. The 95% parametric prediction interval for this model happens to be 17.6oC to 20.4oC. The classical interpretation of this interval is screwy and tongue twisting. But we can use the Bayesian interpretation and say something like “There is a 95% chance the mean temperature was between 17.6oC and 20.4oC.” Since this interval is comfortably narrow, we go away secure in our estimate of temperature.
But we shouldn’t be confident, because that interval is far, far too narrow. The uncertainty of the actual temperature is vastly larger. Why? Take another look at what the interpretation of that interval is. What is a “mean temperature” and how is that different from plain old temperature?
It turns out that this interval is only about the unobservable parameter; it does not account for all the uncertainty that exists. Usually this “extra” uncertainty is just plain ignored1. But using (Bayesian) predictive methods, it is possible to account for and explain it easily (technically, we produce the posterior predictive distribution given the new data: see Part V).
If we use the (Bayesian) predictive method, the actual uncertainty of the temperature (and not parameters) is 9.6oC and 28.4oC. That’s nearly seven times more uncertain than the parameter-based way.
Did you see that? I’ll repeat it: the actual uncertainty is nearly seven times larger. This is because we are making a prediction of what the temperature was given the value of some proxy. This is why we want the uncertainty of the prediction. This picture shows the consequences:
The blue dots are the measurements of the proxy for which we had no concurrent temperatures. The black circles are a repeat of the original data (that the blue dots don’t span the range of the old data is just the result of the simulation). The narrow, dark-tan band in the center is the classical “parameter” interval for these new proxy measurements. The wide, light-tan band is the Bayesian posterior predictive distribution and represents the uncertainty of the actual temperature.
Notice that most of the old data points lie within the Bayesian interval—as we would hope they would—but very few of them lie within the classical parameter interval. The classical interval is shockingly narrow, and if relied upon guarantees, if not cockiness, then at least over-confidence.
These results were simulations, using a standard linear regression, but the lesson is the same for real data and regardless of what kind of statistical model is used (R code is here). The Bayesian predictive interval will always be wider.
But not wide enough! There are still some sources of uncertainty not accounted for. I have said above that the “actual” certainty was this-and-such. But that assumes the model I used was true. Is it? Who knows. There is thus more uncertainty in our model choice. Because of experience, we judge it likely that the model is not perfect; therefore the prediction intervals should be wider. How much wider is unknown: but they will be wider.
We assumed that the proxy and the temperature are measured without error of any kind. But if there is any measurement error, then the prediction intervals should be wider yet again. And there are a number of other peculiarities that apply just to temperature/proxy models, all of which were they fully accounted for push the prediction intervals wider.
Third step: Okay, we’ve sorted out the proper width of our uncertainty, taking into account all sources. Everybody’s happy. We now want to use our reconstructed temperatures. Perhaps as input to climate models, perhaps as input to models showing a change in something temperature related, like polar bear population or the extent of grasslands, or whatever. Or even as a raw plot, as BEST and Mann have done.
The raw plots from those organizations (we can now see) had “error bars”, i.e. the measurements of uncertainty, which were too narrow. Their plots (in part) were predictions of temperature given proxies (which included mixed sources of temperature measurements: again, see the series linked above). They therefore should have had the prediction intervals.
If the reconstructed temperatures are used as inputs to other models, what most people do is just plug the model guess with no uncertainty. That would be like plugging in 19oC plus-or-minus nothing, zero. There is no acknowledgement that the temperature that goes into these models is measured with error. What people should do is plug the range of temperatures in, and not just a point estimate. This isn’t easy to do: it’s not as simple as “plugging in”, but because it is difficult is no excuse not to do it.
If the range of uncertainty or temperature is not input, the resulting model will itself be too certain of itself. People will go off spouting that this or that change is “nearly certain” if temperature does this-or-that.
Here’s a contest: identify secondary studies which use reconstructed temperatures as input to models. The first one to find a study which uses the full predictive uncertainty of the reconstructed temperature wins.
Prediction: we will wait a long time before announcing a winner.
1Click the “Start Here” at the top of the page and search out the teaching journal posts for a complete explanation why this is so.