Read Part I
The Analysis (cont.)
Two problems arise when comparing a model’s integration (the forecast) with an analysis of new observations, which are not found when comparing the forecast to the observations themselves. Verifying the model with an analysis, we compare two equally sized “grids”; verifying the model with observations, we compare a tiny number of model grid points with reality.
Now, some kinds of screwiness in the model are also endemic in the analysis: the model and analysis are, after all, built from the same materials. Some screwiness, therefore, will remain hidden, undetectable in the model-analysis verification.
However, the model-analysis verification can reveal certain systematic errors, the knowledge of which can be used to improve the model. But the result is that the model, in its improvement cycle, is pushed towards the analysis. And always remember: the analysis is not reality, but a model of it.
Therefore, if models over time are tuned to analyses, they will reach an accuracy limit which is a function of how accurate the analyses are. In other words, a model might come to predict future analyses wonderfully, but it could still predict real-life observations badly.
Which brings us to the second major problem of model-against-analysis verification. We do not know actually how well the model is performing because it is not being checked against reality. Modelers who rely solely on the analysis model-checking method will be—they are guaranteed to be—overconfident.
The direct output of most climate and weather models is difficult to check against actual observations because models makes predictions at orders and orders of magnitude more locations than there are observations. Yet modelers are anxious to check their models at all places, even where there are no observations. They believe that analysis-verification is the only way they can do this.
This is important, so allow me a redundancy: models make predictions at wide swaths of the Earth’s surface where no observations are taken. At a point near Gilligan’s Island, the model says “17oC”, yet we can never know whether the model was right or wrong. We’ll never be able to check the model’s accuracy at that point.
We can guess accuracy at that point by using an analysis to make a guess of what the actual temperature is. But since model points—in the atmosphere, in the ocean, on the surface—outnumber actual observation locations by so much, our guess of accuracy is bound to be poor.
Actual observations can be brought into the picture by matching model forecasts to future observations and then building a statistical model between the two. This is called model output statistics, or MOS. The whole model, at all its grid points, is fed into a statistical model: luckily, many of the points in the model will be found to be non-predictive and thus are dropped. Think of it like a regression. The models’ output are like the Xs, and the observations are like the Ys, and we statistically model Y as a function of the Xs.
So, when a new model integration comes along, it is fed into a MOS model, and that model is used to make forecasts. Forecasters will also make reference to the physical model integrations, but the MOS will often be the starting point.
Better, MOS predictions are checked against actual observations, and it is by these checks which we know meteorological models are improving. And those checks are also fed back into the model building process, creating another avenue for model improvement. MOS techniques are common for meteorological models, but not yet for climatological models.
MOS is a good approach to correct gross model biases and inaccuracies. It is also used to give a better indication of how accurate the model—the model+MOS, actually—really is, because it tells us how the model works at actual observation locations.
But MOS verification will still given an overestimate of the accuracy of the model. This is because of measurement error in the observations.
In many cases, nowadays, measurement error of observations is small and unbiased. By “unbiased” I mean, sometimes the errors are too high, sometimes too low, and the high and low errors balance themselves out given enough time. However, measurement error is still significant enough that an analysis must be used to read data into a model; the raw data measured with error will lead to unphysical model solutions (we don’t have space to discuss why).
Measurement error is not harmless. This is especially true for the historical data that feeds climate models, especially proxy-derived data. Proxy-derived data is itself the result of a model from some proxy (like a tree ring) and a desired observation (like temperature). The modeled—not actual—temperature is fed to an analysis, which in turn models the modeled observations, which in turn is physically modeled. Get it?
Measurement error is a problem is two ways. Historical measurement error can lead to built-in model biases: after all, if you’re using mistaken data to build—or if you like “inform”—a model, that model, while there is a chance it will be flawless, is not likely to be.
Plus, even if we use a MOS-type system for climate models, if we check the MOS against observations measured with error, and we do not account for that measurement error in the final statistics (and nobody does), then we will be too certain of the model’s accuracy in the end.
In short, the opportunity for over-certainty is everywhere.
Read Part I