I’m just getting into the CRU code: it’s a lot of material and everything I say here about it is preliminary. Some of you will know what I’m going to say about proxies, but stick with me, it’s important. I apologize for the length: it’s necessary. Please help by linking this around to other sites which discuss the proxy data or code. I’ll make corrections as we go.
How do we know the temperature?
We have no direct observations of temperature—neither at the Earth’s surface or in the atmosphere—for the vast majority of history. So how can we know what the temperature used to be in the absence of actual measurements?
We can’t. We can only guess.
That’s right, all those squiggly-line pictures of temperature you see from before roughly 1900 are reconstructions; the lines are what is spit out of statistical models. They are therefore merely guesses. Even stronger, we have no way to know exactly how good these reconstructions are. If we did, then, obviously, we would know the actual temperatures, because the only way to know the actual model error is to compare the model’s predictions against the real data (which we don’t know).
To emphasize: the actual—as opposed to theoretical model—error is unknown. But we must try and estimate this error—it is of utmost importance—otherwise we cannot make decisions about the reconstructions.
How do we create a reconstruction? By using proxies, which are not temperatures but are observations of physical entities thought to be related to temperature. Tree ring widths are one well known proxy; bore-hole, ice core, and coral reef measurements are others.
Focus on tree rings, because CRU does. Through various methods, we can generally guess how old a tree is; or, that is, the years the various rings were grown—but sometimes this is a guess, too, but not a bad one. When it’s warmer, trees grow better—on average—and have wider rings; when it’s colder, the don’t grow as well—on average—and have narrower rings. The idea is sound: correlate (I use this word in its plain English sense) tree ring widths with temperature, and where we only have tree rings, we can use them and the correlation to guess temps.
(Incidentally, I find this correlation amusing. Can you guess why?)
Proxy reconstruction mechanics
Here’s how proxies work. We have some actual temperature measurements, call them yt, which overlap proxy measures, call them xt, where the subscript represents time. The next step is to build a model
yt = m(β, xt) + error
which says that yt is modeled as a function m() of the proxies xt and (multi-dimensional) parameter β, plus some error.
The model m() is not given to us from On High. Its form and shapes are a guess; and different people can have different guesses, and different models will give different reconstructions.
Once a model is stated, statistical procedure (frequentist and Bayesian) then makes a guess about β and the error. The error guess allows us to say how good the guess of β is given m() is true. The parameter guess and values of xt where we do not know and values of yt are plugged back into the model, which spits out guesses of yt.
Pay attention: we all know that these guesses of yts are not 100% accurate, so uncertainty about their values should be given. All those squiggly-line plots should (ethically) also contain an indication of the error of the lines. Some kind of plus/minus should always be there.
The huge problem with this is that the plus/minus lines around about the guess of β, which we don’t care about. We want to know the value of the temperature, not of some parameter. Technically, the uncertainty due to estimating β should be accounted for in making guesses of the temperature. If this is done, then the range of the plus/minus bands should be multiplied by from about 2 to 10! (Yes, that much.) And remember, all this is contingent on m() being true.
But if there is no plus/minus, how can we tell how confident we should be about any reconstructed trends? Answer: we cannot be confident at all. Since we typically do not see indications of uncertainty accompanying reconstructions, we have to hunt for the sources of uncertainty in the CRU code, which we can then use to figure our own plus/minus bands.
One example from something called a “SOAP-D-15-berlin-d15-jj” document. A non-native English speaker shows a plot of various proxy reconstructions from which he wanted to “reconstruct millennial [Northern Hemisphere] temperatures.” He said, “These attempts did not show, however, converge towards a unique millennial history, as shown in Fig. 1. Note that the proxy series have already undergone a linear transformation towards a best estimate to the CRU data (which makes them look more similar, cf. Briffa and Osborn, 2002).”
In other words, direct effort was made to finagle the various reconstructions so that they agreed with preconceptions. Those efforts failed. It’s like being hit in the head with a hockey stick.
Sources of reconstruction uncertainty
Here is a list of all the sources of error, variability, and uncertainty and whether those sources—as far as I can see: which means I might be wrong, but willing to be corrected—are properly accounted for by the CRU crew, and its likely effects on the certainty we have in proxy reconstructions:
- Source: The proxy relationship with temperature is assumed constant through time. Accounted: No. Effects: entirely unknown, but should boost uncertainty.
- Source: The proxy relationship with temperature is assumed constant through space. Accounted: No. Effects: A tree ring from California might not have the same temperature relationship as one from Greece. Boosts uncertainty.
- Source: The proxies are measured with error (the “on average” correlation mentioned above). Accounted: No. Effects: certainly boosts uncertainty.
- Source: Groups of proxies are sometimes smoothed before input to models. Accounted: No. Effect: a potentially huge source of error; smoothing always increases “signal”, even when those signals aren’t truly there. Boost uncertainty by a lot.
- Source: The choice of the model m(). Accounted: No. Effect: results are always stated the model is true; potentially huge source of error. Boost uncertainty by a lot.
- Source: The choice of the model m() error term. Accounted: Yes. Effect: the one area where we can be confident of the statistics.
- Source: The results are stated as estimates of β Accounted: No. Effects: most classical (frequentist and Bayesian) procedures state uncertainty results about parameters not about actual, physical observables. Boost uncertainty by anywhere from two to ten times.
- Source: The computer code is complex. multi-part, and multi-authored. Accounted: No. Effects: many areas for error to creep in; code is unaudited. Obviously boost uncertainty.
- Source: Humans with a point of view release results. Accounted: No. Effects: judging by the tone of the CRU emails, and what is as stake, certainly boost uncertainty.
There you have it: all the potential sources of uncertainty (I’ve no doubt forgotten something), only one of which is accounted for in interpreting results. Like I’ve been saying all along: too many people are too certain of too many things.
Update See this story on evidence.