We still have work to do. This is not simple stuff, and if we try and opt for the easy way out, then we are guaranteed to make a mistake. Stick with me.
Scenario 3: different spots, fixed flora and fauna
We started with a fixed spot, which we’ll keep as an idealization. Let’s call our spot A: it sits at a precise latitude and longitude and never changes.
Suppose we have temperature measurements at B, a nearby location, but these stop at some time in the past. Those at A began about the time those at B stopped: a little before, or the exact time B stopped, or a little after. We’ll deal with all three of these situations, and with the word nearby.
But first a point of logic, oft forgotten: B is not A. That is, by definition, B is at a different location than A. The temperatures at B might mimic closely those at A; but still, B is not A. Usually, of course, temperatures at two different spots are different. The closer B is to A, usually, the more correlated those temperatures are: and by that, I mean, the more they move in tandem.
Very well. Suppose that we are interested in composing a record for A since the beginning of the series at B. Is it necessary to do this?
I’m sorry to be obvious once more, but we do not have a complete record at A, nor at B. This is tough luck. We can—and should—just examine the series at B and the series of A and make whatever decisions we need based on those. After all, we know the values of those series (assuming the data is measured without error: more on that later). We can tell if they went up or down or whatever.
But what if we insist on guessing the missing values of A (or B)? Why insist? Well, the old desire of quantifying a trend for an arbitrary length of time: arbitrary because we have to, ad hoc pick a starting date. Additional uncertainty is attached to this decision: and we all know how easy it is to cook numbers by picking a favorable starting point.
However, it can be done, but there are three facts which must be remembered: (1) the uncertainty with picking an arbitrary starting point; (2) any method will result in attaching uncertainty bounds to the missing values, these must remain attached to the values; and (3) the resulting trend estimate, itself the output from a model which takes as input those missing values, will have uncertainty bounds—these will necessarily be larger than if there were no missing data at A. Both uncertainty bounds must be of the predictive and not parameteric kind, as we discussed before.
Again, near as I can tell, carrying the uncertainty forward was not done in any of the major series. What that means is described in our old refrain: everybody is too certain of themselves.
How to guess A’s missing values? The easiest thing is to substitute B’s values for A, a tempting procedure if B is close to A. Because B is not A, we cannot do this without carrying forward the uncertainty that accompanies these substitutions. That means invoking a probability (statistical) model.
If B and A overlap for a period, we can model A’s values as a function of B’s. We can then use the values of B to guess the missing values of A. You’re tired of me saying this, but if this is done, we must carrying forward the predictive uncertainty of the guesses into the different model that will be used to assess if there is a trend in A.
“But hold on a minute, Briggs! Aren’t you always telling us that we don’t need to smooth time series, and isn’t fitting a trend model to A just another form of smoothing? What are you trying to pull!?”
Amen, brother skeptic. A model to assess trend—all those straight-line regressions you see on temperature plots—is smoothing a time series, a procedure that we have learned is forbidden.
“Not always forbidden. You said that if we wanted to use the trend model to forecast, we could do that.”
And so we can: about which, more in a second.
There is no point in asking if the temperature at A has increased (since some arbitrary date). We can just look at the data and tell with certainty whether or not now is hotter than then (again, barring measurement error and assuming all the values of A are actual and not guesses).
“Hold on. What if I want to know what the size of the trend was? How many degrees per century, or whatever.”
It’s the same. Look at the temperature now, subtract the temperature then, and divide by the number of years between to get the year-by-year average increase.
“What about the uncertainty of that increase?”
There is no uncertainty, unless you have used made-up numbers for A.
Look. The point is that we have a temperature series in front of us. Something caused those values. There might have existed some forcing with added a constant amount of heat per year, plus or minus a little. Or there might have existed an infinite number of other forcing mechanisms, some of which were not always present, or were only present in varying degrees of strength. We just don’t know.
The straight-line estimate implies that the constant forcing is true, the only and only certain explanation of what caused the temperature to take the values it did. We can—even with guessed values of A, as long as those guessed values have their attached uncertainties—quantify the uncertainty in the linear trend assuming it is true.
“But how do we know the linear trend is true?”
We don’t. The only way we can gather evidence for that view is to skillfully forecast new values of A; values that were in no way used to assess the model.
“In other words, even if everybody played by the book and carried with them the predictive uncertainty bounds as you suggested, they are still assuming the linear trend model is true. And there is more uncertainty in guessing that it is true. Is that right?”
You bet. And since we don’t know the linear model is true, it means—once more!—that too many people are too certain of too many things.
Still to come
Wrap up Scenario 3, Teleconnections, Scenario 4 on different instruments and measurement error, and yet more on why people are too sure of themselves.