A reader writes:
I am a fairly new reader of your blog, coming from WattsUpWithThat and reading with delight and frustration your thoughts on statistics and climate. I have a question in this regard, as I am trying to figure out a thing or two about my local climate and weather.
Recently we were told that Norway (my home) was 2-something degrees warmer in 2014 than the “normal”. I asked our Met office how they calculated this, and the reply I got was that they take all stations available and smear them across a 1×1 km grid through some kind of interpolation in order to get full coverage of mainland Norway. And thus they can do an average.
Now, I realise that whatever we do there is no such physical thing as a mean temperature for Norway. But say that I would like to calculate an average of the available data, would it not be more appropriate to just do the average of all stations without the interpolation?
I would really appreciate it if you would give an opinion on this.
Interpolation is a source of great over-certainty. Here’s why.
You can operationally define a temperature “mean” as the numerical average of a collection of fixed, unchanging stations. The utility of this is somewhat ambiguous, particularly for large areas, but adding a bunch of numbers together poses no theoretical difficulty.
The problem comes when the stations change—say an instrument is swapped—or when old stations are dropped and new ones added. That necessarily implies the operational definition has changed. Which is also fine, as long as it is remembered that you cannot directly compare the old and new definition. Nobody remembers, though. Apples are assigned Orange designations. It’s all fruit, right? So what the heck.
It gets worse with interpolation. This is when a group of stations, perhaps changing, are fed as input to a probability model, and that model is used to predict what the temperature was at locations other than the stations. Now, if it worked like that, I mean if actual predictions were made, then interpolation would be fine. But it doesn’t, not usually.
There are levels of uncertainty with these probability models, which we can broadly classify into two kinds. The first is that internal to the model itself, which is called the parametric uncertainty. Parameters tie observations to the model. If you can remember the “betas” of regression, these are they. Nearly all statistical methods are obsessively focused on these parameters, which don’t exist and can’t be seen. Nobody except statisticians care about parameters. When the model reports uncertainty, it’s usually the uncertainty of these parameters.
The second and more important level of uncertainty is that of the prediction itself. What you want to know is the uncertainty of the actual guess. This uncertainty is always, necessarily always, larger than the parametric uncertainty. It’s hard to know without knowing any details about the models, but my experience is that, for interpolation models, prediction uncertainty is 2 to 8 times as large as the parametric uncertainty. This is an enormous difference.
If the interpolation is used to make a prediction, it must be accompanied by a measure of uncertainty. If not, toss it out. Anybody can make a guess of what the temperature was. To be of practical use, the prediction must state its uncertainty. And that means prediction and not parametric uncertainty. Almost always, however, it’s the latter you see.
You have to be careful because parametric uncertainty will be spoken of as if it is prediction uncertainty. Why? Because of sloppiness. Prediction uncertainty is so rare that most practitioners don’t know the difference. In order to discover which kind of uncertainty you’re dealing with, you have to look into the guts of the calculations, which are frequently unavailable. Caution is warranted.
The uncertainty is needed to judge how likely that claimed “2 degrees warmer” is. If the actual prediction with prediction uncertainty is “90% chance of 2 +/- 6” (the uncertainty needn’t be symmetric, of course; and there’s no reason in the world to fixate on 95%), then there is little confidence any warming took place.
But watch out for parametric uncertainty masquerading as the real thing. It happens everywhere and frequently.