The Ugly Unreality Of The Homogenization Of Time Series: Why Climate Scientists Are Too Sure Of Themselves

The Ugly Unreality Of The Homogenization Of Time Series: Why Climate Scientists Are Too Sure Of Themselves

Day two of the week of classical posts on global warming, now “climate change”. Your author has many bona fides and much experience in this field: see this.

Announcement. I am on vacation this week preparing for the Cultural Event of the Year. This exposes yet another way scientists fool themselves into over-certainty.

This post originally ran in five parts beginning 9 December 2009. The originals: Part I, Part II, Part III, Part IV, Part V


Time to get technical.

First, surf over to Willis Eschenbach’s gorgeous piece of statistical detective work of how GHCN “homogenized” temperature data for Darwin, Australia. Pay particular attention to his Figures 7 & 8. Take your time reading his piece: it is essential.

There is vast confusion on data homogenization procedures. This article attempts to make these subjects clearer. I pay particular attention to the goals of homogenizations, its pitfalls, and most especially, the resulting uncertainties. The uncertainty we have in our eventual estimates of temperature is grossly underestimated. I will come to the, by now, non-shocking conclusion that too many people are too certain about too many things.

My experience has been that anything over 800 words doesn’t get read. There’s a lot of meat here, and it can’t all be squeezed into one 800-word sausage skin. So I have linked the sausage into a multi-day post with the hope that more people will get through it.

Homogenization goals

After reading Eschenbach, you now understand that, at a surrounding location—and usually not a point—there exists, through time, temperature data from different sources. At a loosely determined geographical spot over time, the data instrumentation might have changed, the locations of instruments could be different, there could be more than one source of data, or there could be other changes. The main point is that there are lots of pieces of data that some desire to stitch together to make one whole.


I mean that seriously. Why stitch the data together when it is perfectly useful if it is kept separate? By stitching, you introduce error, and if you aren’t careful to carry that error forward, the end result will be that you are far too certain of yourself. And that condition—unwarranted certainty—is where we find ourselves today.

Let’s first fix an exact location on Earth. Suppose this to be the precise center of Darwin, Australia: we’d note the specific latitude and longitude to be sure we are at just one spot. Also suppose we want to know the daily average temperature for that spot (calculated by averaging the 24 hourly values), which we use to calculate the average yearly temperature (the mean of those 365.25 daily values), which we want to track through time. All set?

Scenario 1: fixed spot, urban growth

The most difficult scenario first: our thermometer is located at our precise spot and never moves, nor does it change characteristics (always the same, say, mercury bulb), and it always works (its measurement error is trivial and ignorable). But the spot itself changes because of urban growth. Whereas once the thermometer was in an open field, later a pub opens adjacent to it, and then comes a parking lot, and then a whole city around the pub.

In this case, we would have an unbroken series of temperature measurements that would probably—probably!—show an increase starting at the time the pub construction began. Should we “correct” or “homogenize” that series to account for the possible urban heat island effect?


At least, not if our goal was to determine the real average temperature at our spot. Our thermometer works fine, so the temperatures it measures are the temperatures that are experienced. Our series is the actual, genuine, God-love-you temperature at that spot. There is, therefore, nothing to correct. When you walk outside the pub to relieve yourself, you might be bathed in warmer air because you are in a city than if you were in an open field, but you aren’t in an open field, you are where you are and you must experience the actual temperature of where you live. Do I make myself clear? Good. Memorize this.

Scenario 2: fixed spot, longing for the fields

But what if our goal was to estimate what the temperature would have been if no city existed; that is, if we want to guess the temperature as if our thermometer was still in an open field? Strange goal, but one shared by many. They want to know the influence of humans on the temperature of the long-lost field—while simultaneously ignoring the influence of humans based on the new city. That is, they want to know how humans living anywhere but the spot’s city might have influenced the temperature of the long-lost field.

It’s not that this new goal is not quantifiable—it is; we can always compute probabilities for counterfactuals like this—but it’s meaning is more nuanced and difficult to grasp than our old goal. It would not do for us to forget these nuances.

One way to guess would be to go to the nearest field to our spot and measure the temperature there, while also measuring it at our spot. We could use our nearby field as a direct substitute for our spot. That is, we just relabel the nearby field as our spot. Is this cheating? Yes, unless you attach the uncertainty of this switcheroo to the newly labeled temperature. Because the nearby field is not our spot, there will be some error in using it as a replacement: that error should always accompany the resulting temperature data.

Or we could use the nearby field’s data as input to a statistical model. That model also takes as input our spot’s readings. To be clear: the nearby field and the spot’s readings are fed into a correction model that spits out an unverifiable, counterfactual guess of what the temperature would be if there were no city in our spot.

Aside: counterfactuals

A counterfactual is statement saying what would be the case if its conditional were true. Like, “Germany would have won WWII if Hitler did not invade Russia.” Or, “The temperature at our spot would be X if no city existed.” Counterfactuals do not make statements about what really is, but only what might have been given something that wasn’t true was true.

They are sometimes practical. Credit card firms face counterfactuals each time they deny a loan and say, “This person will default if we issue him a card.” Since the decision to issue a card is based on some model or other decision process, the company can never directly verify whether its model is skillful, because they will never issue the card to find out whether or not its holder defaults. In short, counterfactuals can be interesting, but they cannot change what physically happened.

However, probability can handle counterfactuals, so it is not a mistake to seek their quantification. That is, we can assign easily a probability to the Hitler, credit card, or temperature question (given additional information about models, etc.).

Asking what the temperature would be at our spot had there not been a city is certainly a counterfactual. Another is to ask what the temperature of the field would have been given there was a city. This also is a strange question to ask.

Why would we want to know what the temperature of a non-existent city would have been? Usually, to ask how much more humans who don’t live in the city at this moment might have influenced the temperature in the city now. Confusing? The idea is if we had a long series in one spot, surrounded by a city that was constant in size and make up, we could tell if there were a trend in that series, a trend that was caused by factors not directly associated with our city (but was related to, say, the rest of the Earth’s population).

But since the city around our spot has changed, if we want to estimate this external influence, we have to guess what the temperature would have been if either the city was always there or always wasn’t. Either way, we are guessing a counterfactual.

The thing to take away is that the guess is complicated and surrounded by many uncertainties. It is certainly not as clear cut as we normally hear. Importantly, just as with the credit card example, we can never verify whether our temperature guess is accurate or not.

Intermission: uncertainty bounds and global average temperature

This guess would—should!—have a plus and minus attached to it, some guidance of how certain we are of the guess. Technically, we want the predictive uncertainty of the guess, and not the parametric uncertainty. The predictive uncertainty tells us the plus and minus bounds in the units of actual temperature. Parametric uncertainty states those bounds in terms of the parameters of the statistical model. Near as I can tell (which means I might be wrong), GHCN and, inter alia, Mann use parametric uncertainty to state their results: the gist being that they are, in the end, too confident of themselves.

(See this post for a distinction between the two; the predictive uncertainty is always larger than the parametric, usually by two to ten times as much. Also see this marvelous collection of class notes.)

OK. We have our guess of what the temperature might have been had the city not been there (or if the city was always there), and we have said that that guess should come attached with plus/minus bounds of its uncertainty. These bounds should be super-glued to the guess, and coated with kryptonite so that even Superman couldn’t detach them.

Alas, they are usually tied loosely with cheap string from a dollar store. The bounds fall off at the lightest touch. This is bad news.

It is bad because our guess of the temperature is then given to others who use it to compute, among other things, the global average temperature (GAT). The GAT is itself a conglomeration of measurements from sites all over (a very small—and changing—portion) of the globe. Sometimes the GAT is a straight average, sometimes not, but the resulting GAT is itself uncertain.

Even if we ignored the plus/minus bounds from our guessed temperatures, and also ignored it from all the other spots that go into the GAT, the act of calculating the GAT ensures that it must carry its own plus/minus bounds—which should always be stated (and such that they are with respect to the predictive, and not parametric uncertainty).

But if the bounds from our guessed temperature aren’t attached, then the eventual bounds of the GAT will be far, far, too narrow. The gist: we will be way too certain of ourselves.

We haven’t even started on why the GAT is such a poor estimate for the global average temperature. We’ll come to these objections another day, but for now remember two admonitions. No thing experiences GAT, physical objects can only experience the temperature of where they are. Since the GAT contains a large (but not large enough) number of stations, any individual station—as Dick Lindzen is always reminding us—is, at best, only weakly correlated with the GAT.

But enough of this, save we should remember that these admonitions hold whatever homogenization scenario we are in.

Scenario 3: different spots, fixed flora and fauna

We started with a fixed spot, which we’ll keep as an idealization. Let’s call our spot A: it sits at a precise latitude and longitude and never changes.

Suppose we have temperature measurements at B, a nearby location, but these stop at some time in the past. Those at A began about the time those at B stopped: a little before, or the exact time B stopped, or a little after. We’ll deal with all three of these situations, and with the word nearby.

But first a point of logic, oft forgotten: B is not A. That is, by definition, B is at a different location than A. The temperatures at B might mimic closely those at A; but still, B is not A. Usually, of course, temperatures at two different spots are different. The closer B is to A, usually, the more correlated those temperatures are: and by that, I mean, the more they move in tandem.

Very well. Suppose that we are interested in composing a record for A since the beginning of the series at B. Is it necessary to do this?


I’m sorry to be obvious once more, but we do not have a complete record at A, nor at B. This is tough luck. We can—and should—just examine the series at B and the series of A and make whatever decisions we need based on those. After all, we know the values of those series (assuming the data is measured without error: more on that later). We can tell if they went up or down or whatever.

But what if we insist on guessing the missing values of A (or B)? Why insist? Well, the old desire of quantifying a trend for an arbitrary length of time: arbitrary because we have to, ad hoc pick a starting date. Additional uncertainty is attached to this decision: and we all know how easy it is to cook numbers by picking a favorable starting point.

However, it can be done, but there are three facts which must be remembered: (1) the uncertainty with picking an arbitrary starting point; (2) any method will result in attaching uncertainty bounds to the missing values, these must remain attached to the values; and (3) the resulting trend estimate, itself the output from a model which takes as input those missing values, will have uncertainty bounds—these will necessarily be larger than if there were no missing data at A. Both uncertainty bounds must be of the predictive and not parametric kind, as we discussed before.

Again, near as I can tell, carrying the uncertainty forward was not done in any of the major series. What that means is described in our old refrain: everybody is too certain of themselves.

How to guess A’s missing values? The easiest thing is to substitute B’s values for A, a tempting procedure if B is close to A. Because B is not A, we cannot do this without carrying forward the uncertainty that accompanies these substitutions. That means invoking a probability (statistical) model.

If B and A overlap for a period, we can model A’s values as a function of B’s. We can then use the values of B to guess the missing values of A. You’re tired of me saying this, but if this is done, we must carrying forward the predictive uncertainty of the guesses into the different model that will be used to assess if there is a trend in A.

An Objection

“But hold on a minute, Briggs! Aren’t you always telling us that we don’t need to smooth time series, and isn’t fitting a trend model to A just another form of smoothing? What are you trying to pull!?”

Amen, brother skeptic. A model to assess trend—all those straight-line regressions you see on temperature plots—is smoothing a time series, a procedure that we have learned is forbidden.

“Not always forbidden. You said that if we wanted to use the trend model to forecast, we could do that.”

And so we can: about which, more in a second.

There is no point in asking if the temperature at A has increased (since some arbitrary date). We can just look at the data and tell with certainty whether or not now is hotter than then (again, barring measurement error and assuming all the values of A are actual and not guesses).

“Hold on. What if I want to know what the size of the trend was? How many degrees per century, or whatever.”

It’s the same. Look at the temperature now, subtract the temperature then, and divide by the number of years between to get the year-by-year average increase.

“What about the uncertainty of that increase?”

There is no uncertainty, unless you have used made-up numbers for A.


Look. The point is that we have a temperature series in front of us. Something caused those values. There might have existed some forcing with added a constant amount of heat per year, plus or minus a little. Or there might have existed an infinite number of other forcing mechanisms, some of which were not always present, or were only present in varying degrees of strength. We just don’t know.

The straight-line estimate implies that the constant forcing is true, the only and only certain explanation of what caused the temperature to take the values it did. We can—even with guessed values of A, as long as those guessed values have their attached uncertainties—quantify the uncertainty in the linear trend assuming it is true.

“But how do we know the linear trend is true?”

We don’t. The only way we can gather evidence for that view is to skillfully forecast new values of A; values that were in no way used to assess the model.

“In other words, even if everybody played by the book and carried with them the predictive uncertainty bounds as you suggested, they are still assuming the linear trend model is true. And there is more uncertainty in guessing that it is true. Is that right?”

You bet. And since we don’t know the linear model is true, it means—once more!—that too many people are too certain of too many things.

How much patience do you have left? On and on and on about the fundamentals, and not one word about whether I believe the GHCN Darwin adjustment, as revealed by Eschenbach, is right! OK, one word: no.

There is enough shouting about this around the rest of the ‘net that you don’t need to hear more from me. What is necessary, and why I am spending so much time on this, is a serious examination of the nature of climate change evidence, particularly with regard to temperature reconstructions and homogenizations. So let’s take our time.

Scenario 3: continued

We last learned that if B and A overlap for a period of time, we can model A’s values as a function of B’s. More importantly, we learned the severe limitations and high uncertainty of this approach. If you haven’t, read Part III, do so now.

If B and A do not overlap, but we have other stations C, D, E, etc., that do, even if these are far removed from A, we can use them to model A’s values. These stations will be more or less predictive depending on how correlated they are with A (I’m using the word correlated in its plain English sense).

But even if we have dozens of other stations with which to model A, the resulting predictions of A’s missing values must still come attached with healthy, predictive error bounds. These bounds must, upon the pain of ignominy, be carried forward in any application that uses A’s values. “Any”, of course includes estimates of global mean temperature (GMT) or trends at A (trends, we learned last time, are another name for assumed-to-be-true statistical models).

So far as I can tell (with the usual caveat), nobody does this: nobody, that is, carries the error bounds forward. It’s true that the older, classical statistical methods used by Mann et al. do not make carrying error simple, but when we’re talking about billions of dollars, maybe trillions, and the disruption of lives the world over, it’s a good idea not to opt for simplicity when more ideal methods are available.

Need I say what the result of the simplistic approach is?

Yes, I do. Too much certainty!

An incidental: For a while, some meteorologists/climatologists searched the world for teleconnections. They would pick an A and then search B, C, D, …, for a station with the highest correlation to A. A station in Peoria might have a high correlation with one in Tibet, for example. These statistical tea leaves were much peered over. The results were not entirely useless—some planetary-scale features will show up, well, all over the planet—but it was too easy to find something that wasn’t there.

Scenario 4: missing values, measurement error, and changes in instrumentation

Occasionally, values at A will go missing. Thermometers break, people who record temperatures go on vacation, accidents happen. These missing values can be guessed at in exactly the same way as outlined in Scenario 3. Which is to say, they are modeled. And with models comes uncertainty, etc., etc. Enough of that.

Sometimes instruments do not pop off all at once, but degrade slowly. They work fine for awhile but become miscalibrated in some manner. That is, at some locations the temperatures (and other meteorological variables) are measured with error. If we catch this error, we can quantify it, which means we can apply a model to the observed values to “correct” them.

But did you catch the word model? That’s right: more uncertainty, more error bounds, which must always, etc., etc., etc.

What’s worse, is that we suspect there are many times we do not catch the measurement error, and we glibly use the observed values as if they were 100% accurate. Like a cook with the flu using day old fish, we can’t smell the rank odor, but hope the sauce will save us. The sauces here are the data uses like GMT or trend estimates that use the mistaken observations.

(Strained metaphor, anybody? Leave me alone. You get the idea.)

A fishy smell

Now, miscalibration and measurement error are certainly less common the more recent the observations. What is bizarre is that, in the revelations so far, the “corrections” and “homogenizations” are more strongly applied to the most recent values, id est, those values in which we have the most confidence! The older, historical observations, about which we know a hell of lot less, are hardly touched, or not adjusted at all.

Why is that?

But, wait! Don’t answer yet! Because you also get this fine empirical fact, absolutely free: The instruments used in the days of yore, were many times poorer than their modern-day equivalents: they were less accurate, had slower response times, etc. Which means, of course, that they are less trustworthy. Yet, it appears, these are the most trusted in the homogenizations.

So now answer our question: why are the modern values adjusted (upwards!) more than the historical ones?

The grand finale.

If you answered “It’s the urbanization, stupid!”, then you have admitted that you did not read, or did not understand, Part I.

As others have been saying, there is evidence that some people have been diddling with the numbers, cajoling them so that they conform to certain pre-conceived views.

Maybe this is not so, and it is instead true that everybody was scrupulously honest. At the very least, then, a certain CRU has some fast talking to do.

But even if they manage to give a proper account of themselves, they must concede that there are alternate explanations for the data, such as provided in this guide. And while they might downplay the concerns outlined here, they must admit that the uncertainties are greater than what have been so far publicly stated.

Which is all we skeptics ever wanted.

Much of what we discussed—but not all—is in this picture. Right click and open it in a new window so that you can follow along.

We have two temperature series, A and B. A is incomplete and overlaps B only about 30% of the time. A and B officially stop at year “80”. We want to know one main thing: what will be the temperatures at A and B for the years 81 – 100?

Official homogenization of A commences by modeling A’s values as a function of B’s. There are no auto-correlations to worry about in this data, because there are none by design: this is entirely simulated data. A and B were generated by a multivariate normal distribution with a fixed covariance matrix. In plain English, this means the two series are correlated with each other, but not with themselves through time. Plus, any trends are entirely spurious and coincidental.

Thus, a linear model of A predicted by B is adequate and correct. That is, we do not have to worry, as we do in real life, that the model A = f(B) is misspecified. In real life, as I mentioned in earlier posts, there is additional uncertainty due to us not knowing the real relationship between A and B.

We also are lucky that the dates are fixed and non-arbitrary. Like I said earlier, picking the starting and stopping dates in an ad hoc manner should add additional uncertainty. Most people ignore this, so we will, too. We’re team players, here! (Though doing this puts us on the losing team. But never mind.)

Step 1 is to model A as a function of B to predict the “missing” values of A, the period from year 1 – 49. The result is the (hard-to-read) dashed red line. But even somebody slapped upside the head with a hockey stick knows that these predictions are not 100% certain. There should be some kind of plus or minus bounds. The dark red shaded area are the classical 95% parametric error bounds, spit right out of the linear model. These parametric bounds are the ones (always?) found in reporting of homogenizations (technically: these are the classical predictive bounds, which I call “parametric”, because the classical method is entirely concerned with making statements about non-observable parameters; why this is so is a long story).

Problem is, like I have been saying, they are too narrow. Those black dots in the years 1 – 49 are the actual values of A. If those parametric error bounds were doing their job, then about 95% of the black dots would be inside the dark red polygon. This is not the case.

I repeat: this is not the case. I emphasize: it is never the case. Using parametric confidence bounds when you are making predictions of real observables is sending a mewling boy to do a man’s job. Incidentally, climatologists are not the only ones making this mistake: it is rampant in statistics, a probabilistic pandemic.

The predictive error bounds, also calculated from the same A = f(B), are the pinkish bounds (technically: these are the posterior-predictive credible intervals). These are doing a much better job, as you can see, and aren’t we happy. The only problem is that in real life we will never know those missing values of A. They are, after all, missing. This is another way of stating that we do not really know the best model f(B). And since, in real life, we do not know the model, we should realize our error bounds should be wider still.

Our homogenization of A is complete with this model, however, by design. Just know that if we had missing data in station B, or changes in location of A or B, or “corrections” in urbanization at either, or measurement error, all our error bounds would be larger. Read the other four parts of this series for why this is so. We will be ignoring—like the climatologists working with actual data should not—all these niceties.

Next step is to assess the “trend” at B, which I have already told you is entirely spurious. That’s the overlaid black line. This is estimated from the simple—and again correct by design—model B = f(year). Our refrain: in real life, we would not know the actual model, the f(year), and etc., etc. The guess is 1.8oC per century. Baby, it’s getting hot outside! Send for Greenpeace!

Now we want to know what B will be in the years 81 – 100. We continue to apply our model B = f(year), and then mistakenly—like everybody else—apply the dark-blue parametric error bounds. Too narrow once more! They are narrow enough to induce check-writing behavior in Copenhagen bureaucrats.

The accurate, calming, light-blue predictive error bounds are vastly superior, and tell us not to panic, we just aren’t that sure of ourselves.

How about the values of A in the years 81 – 100? The mistake would be to use the observed values of A from years 50 – 80 augmented by the homogenized values in the years 1 – 49 as a function of year. Since everybody makes this error, we will too. The correct way would be to build a model using just the information we know—but where’s the fun in that?

Anyway, it’s the same story as with B, except that the predictive error bounds are even larger (percentage-wise) than with B, because I have taken into account the error in estimating A in the years 1 – 49.

Using the wrong method tells us that the trend at A is about 1.0oC per century, a worrisome number. The parametric error bounds are also tight enough to convince some that new laws are needed to restrict people’s behavior. But the predictive bounds are like the cop in the old movie: Move along; nothing to see here.

This example, while entirely realistic, doesn’t hit all of the possible uncertainties. Like those bizarre, increasing step-function corrections at Darwin, Australia.

What needs to be done is a reassessment of all climate data using the statistical principles outlined in this series. There’s more than enough money in the system to pay the existing worker bees to do this.

Our conclusion: people are way too certain of themselves!

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email:, and please include yours so I know who to thank.


  1. Cary D Cotterman

    I read the whole thing, with comprehension. I’m rewarding myself with a Sharing Size bag of M&Ms.

  2. Great article thank you
    It’s amazing how often the statistical smoothing ends up matching their preconceived idea of how things MUST be….!

Leave a Reply

Your email address will not be published. Required fields are marked *