Update See also: Global Average Temperature: An Exceedingly Brief Introduction To Bayesian Predictive Inference
Word is going round that Richard Muller is leading a group of physicists, statisticians, and climatologists to re-estimate the yearly global average temperature, from which we can say such things like this year was warmer than last but not warmer than three years ago. Muller’s project is a good idea, and his named team are certainly up to it.
The statistician on Muller’s team is David Brillinger, an expert in time series, which is just the right genre to attack the global-temperature-average problem. Dr Brillinger certainly knows what I am about to show, but many of the climatologists who have used statistics before do not. It is for their benefit that I present this brief primer on how not to display the eventual estimate. I only want to make one major point here: that the common statistical methods produce estimates that are too certain.
I do not want to provide a simulation of every aspect of the estimation project; that would take just as long as doing the real thing. My point can be made by assuming that I have just N stations from which we have reliably measured temperature, without error, for just one year. The number at each station is the average temperature anomaly at that station (an “anomaly” is takes the real arithmetic average and subtracts from it a constant, which itself is not important; to be clear, the analysis is unaffected by the constant).
Our “global average temperature” is to be estimated in the simplest way: by fitting a normal distribution to the N station anomalies (the actual distribution used does affect the analysis, but not the major point I wish to make). I simulate the N stations by generating numbers with a central parameter of 0.3 and an spread parameter of 5, and degrees of freedom equal to 20 (once again, the actual numbers used do not matter to the major point).
Assume there are N = 100 stations, simulate the data, and fit a normal distribution to them. One instance of the posterior distribution of the parameter estimating the global mean is pictured. The most likely value of the posterior is at the peak, which is (as it should be) near 0.3. The parameter almost surely lies between 0.1 and 0.6, since that is where most of the area under the curve is.
Now let’s push the number of stations to N = 1000 and look at the same picture:
We are much more certain of where the parameter lies: the peak is in about the same spot, but the variability is much smaller. Obviously, if we were to continue increasing the number of stations the uncertainty in the parameter would disappear. That is, we would have a picture which looked like a spike over the true value (here 0.3). We could then confidently announce to the world that we know the parameter which estimates global average temperature with near certainty.
Are we done? Not hardly.
Although we would know, with extremely high confidence, the value of one of the parameters of the model we used to model the global average temperature, we still would not know the global average temperature. There is a world of difference between knowing the parameter and knowing the observable global average temperature.
Here then is the picture of our uncertainty in the global average temperature, given both N = 100 and N = 1000 stations.
Adding 900 more stations improved our uncertainty in the actual temperature only slightly (and here the difference in these two curves is just as likely to be because of the different simulations). But even if we were to have 1 million stations, the uncertainty would never disappear. There is a wall of uncertainty we hit and cannot breach. The curves will not narrow.
The real, observable temperature is not the same as the parameter. The parameter can be known exactly, but the observable actual temperature can never be.
The procedure followed here (showing posterior predictive distributions) should be the same for estimating “trend” in the year-to-year global average temperatures. Do not tell us of the uncertainty in the estimate of the parameter of this trend. Tell us instead about what the uncertainty in the actual temperatures are.
This is the difference between predictive statistics and parameter-based statistics. Predictive statistics gives you the full uncertainty in the thing you want to know. Parameter-based statistics only tells you about one parameter in a model; and even though you know the value of that parameter with certainty, you still do not know the value of the thing you want to know. In our case, temperature. Parameters be damned! Parameters tell us about a statistical model, not about a real thing.
Update See too the posts on temperature on my Stats/Climate page.
Update See also: Global Average Temperature: An Exceedingly Brief Introduction To Bayesian Predictive Inference
Update to the Update Read the post linked to above. Mandatory.
A couple of nit-picks:
(1) Just before the first graph, should not the “0.1 – 0.6” be “0.1 – 0.5”?
(2) In the second graph, unless I misunderstood something, the “N = 1000” curve appears to be off-center.
and a request:
The transition from the anomaly calculation to the final temperature graph is where there is most likely to be confusion, and you have essentially skipped over the explanation of how this comes about, and why the two final curves are so similar. If you have time, I would like to see an expansion of this section.
There is a little problem. The temperatures measured at these stations are the maximum and minimum, not the mean. The median temperature, half way between max and min, is assumed to be average. They don’t really measure or know the average. How do you estimate global average if you don’t know local average?
I can believe you that Dr Brillinger knows what you are showing, but, for us others, could you be more specific in your analysis? Could you define your random variables, your parameters and describe your assumptions and methodology?
In particular, to derive your graphs 1 and 2 did you assume independence of the random variables at your N stations?
I do not see any reference to dependence/spatial correlation in your text. Neglecting dependence may produce too certain estimates, which may be too wrong.
What I mean appears in section 3 of my recent paper “Two-dimensional Hurst-Kolmogorov process and its application to rainfall fields” (which contains also an example with temperature), accessible from http://dx.doi.org/10.1016/j.jhydrol.2010.12.012 (preprint in http://itia.ntua.gr/en/docinfo/1101/ ).
Terribly sorry, Matt…
Great post! I agree entirely.
In trying to relate this to statistical terminology that I’m familiar with, it appears to me that:
the “parameter-based statistics” method is a confidence interval (CI),
while predictive statistics (at least the frequentist version) is using a prediction interval (PI) (or a tolerance interval — I sometimes get confused between the two).
Is this correct? The formula for CI width has a square root of 1/N, while the PI has square root of 1+1/N, which is why it doesn’t go to zero as N (the sample size) increases.
As N increases, the CI goes to zero width, while the PI approaches a fixed interval.
Perhaps I disagree a bit. If you wanted to know whether or not there was evidence of a trend over the past 100 years (say), wouldn’t you be asking whether a certain trend parameter (in an appropriate time series model) was zero or not? You’ could use a CI to answer that ??? (equivalent to an hypothesis test).
btw, I took a lot of stats classes in the early 70’s at U.Wisconsin (while working on math & engineering degrees), including time series with George Box. Just wished I remembered a bit more of it.
First time commenter. Great blog. Keep it up.
If you want to know the average temperature for a location why not monitor ground temperature.
The average global temp is often stated to be about 15 C. Understanding that absolutes aren’t needed when studying trends, this 15C number IS important to the argument surrounding the degree of the greenhouse effect. This degree of greenhouse effect notion arises because the theoretical warmth or temperature of the earth can be calculated on the basis of laboratory studies and theoretical considerations of blackbodies. These blackbody calculations which derive from the intensity of the sun at the earth’s surface, amount to a theoretical temperature of minus 18 C. On this basis it is said that the greenhouse effect amounts to about 33 C (15 C – minus 18C). I believe that this number is at least 5 C too high. People overlook the amount of extreme cold in the deep ocean. That cold needs figure in to what the average surface temperature of the world is. I won’t pretend that i fully understand the calculations of climate sensitivity, but if the 33 C being referred to as the greenhouse effect is an important number it should be thought about carefully.
In defense of the significance of this deep ocean pool of cold, recall that the temperature at the bottom of South African diamond mines is something 65 C while deep ocean at comparable depths is more like 4 C. So deep ocean cold water is not only huge and not only much colder than average surface temperatures it’s also much colder than what it would be were it solid rock. I also believe that the ultimate source of this cool pool must be polar water descending and merging into the so called thermohaline circulation. So it’s a real part of the climate system that should figure into calculations of average temperature.
The degree that average temperature figures into AGW arguments; and the degree that deep ocean temps are overlooked when thinking and calculating about global average temperature; is the degree that AGW arguments are off.
The global mean surface temperature anomaly is more like an average score from a questionnaire than an estimate of a physical quantity. There is no physical property of the Earth or its atmosphere (“global average temperature”, “unicorns per square mile”) that is being estimated by this statistic, no matter how fancy the arithmetic.
The mean temperature anomaly is just an index, kind of like the Dow Jones index or IQ, and its meagre usefulness depends on constancy in the population of measurement stations, stability of the recipe for calculating it, and other factors. It estimates nothing. It just is what it is. Assuming that those underlying factors remain constant, which they have not so far, all that trend analysis can do is track some nebulous expected value of this unphysical index. The trend goes up or down, so someone wants to say that the globe is getting “warmer” or “cooler”. But what the index has to do with “climate change” and what it can tell us about trends in important physical quantities or processes is just assumed, based on the blatherings of journalists and some of the purveyors of the index.
It would be of far more interest to focus on the difficult but at least meaningful problem of estimating variation and long-term shifts in a real physical quantity, the total heat content of the upper ocean regionally, because that would say something very useful about ocean dynamics and weather patterns. There would also be a real scientific possibility of checking and judging the merit and effectiveness of statistical methods applied.
I take a Bayesian approach and do not require “randomness.” The model is a normal with standard parameters—which I specifically call the central and spread parameters. It is a mistake to call them “mean” and “standard deviation” because these are functions of observable data, and are therefore observable themselves. Parameters are never observable.
The posterior distributions of the central parameter for N = 100 and N = 1000 (separate simulations) are the first two figures. These are the uncertainty we have in the value of the central—and only the central—parameter after we have observed the data.
The last picture is the posterior predictive distribution, which is the same as the model times the posterior parameter distribution (PPD), integrating over the (unobservable) parameters. The PPD is thus our complete uncertainty in new observable data given our already having observed N = 100 and N = 1000.
Classically speaking, all “stations” are independent of one another. But it would not matter what “dependence” structure you build in; the central argument remains.
The curve is, of course, not off center. It represents the distributions (densities, actually) as we have them based on this one particular simulation.
The absolute numbers I use are meaningless. I only wish to make the point about announcing certainty in a parameter versus announcing uncertainty in an actual observable: the former will always be larger than the later.
True, but that difference doesn’t matter here. I take the “averages” at each station as a given, and assume they are okay. If there is additional uncertainty in their measure, that uncertainty should carried through all the way to the PPD.
While I try to wrap my head around this one, I though I’d pass along a link to this article about Galton’s 1877 design for computing a posterior distribution (just in case it hasn’t hit your radar yet):
HT to Alex Tabarrok at Marginal Revolution.
Thanks very much for the replies. I have a few remarks/questions.
When you say that a Bayesian approach does not require randomness, how do you define randomness? Isn’t it more correct to say that the Bayesian approach enhances, instead of not requiring, the notion of randomness, by considering that the parameters are also random variables (assuming the classical Kolmogorov’s definition of a random variable), because they are unknown?
Could you clarify your statement that mean is a function of observable data and therefore observable itself? Don’t we have here a random field over space, with a continuous 2D index set? Isn’t the mean the integral over infinitely many points? How can thus it be regarded as an observable given that only a finite number N of points are observed (although one can imagine that in every point the field is potentially observable)?
How accurate are we when we keep “classically thinking”, i.e. neglecting dependence? I agree with you that your central argument in your analysis remains, but perhaps a relevant additional argument is missing: that the dependence enhances uncertainty substantially as indicated in the paper I linked above (by dramatically reducing the effective number of points N’, defined as the sample size that gives the same uncertainty of the mean as a number N in the classical statistical setting, i.e. N’ is much less than N).
Mr. Briggs – There is no such thing as the global average temperature. How does that affect your argument?
One mans’s “of course” can be another’s “hmm…”, but thanks for the clarification. 🙂
That was a very nice article. The 1/cÂ² = 1/ÏƒÂ² + 1/Ï„Â² result for the special case of two Normals helps a lot with understanding the reason why the N=1000 case doesn’t improve things a lot relative to N=100. Much appreciated.
That is an excellent question. There isn’t, as you say, a physical global average temperature (GAT). But there certainly can be an operationally defined GAT. For example, one definition is “Take the yearly arithmetic average of these N stations, sum them and divide by N.” Whether or not that operationally defined GAT has some usefulness is another question.
Not to scare away our readers, but no. To a Bayesian, “randomness” simply means “unknown” and nothing more and is defined, as all probabilities are, with respect to fixed premises. For example, the statement, “Briggs is drinking coffee” is random to you since you do not have observational evidence on that topic, but it is not random to me since I do. You may supply premises, such as “Briggs is either drinking coffee or he isn’t” and then ask the probability of the statement given that premises (it is between 0 and 1), but the answer is still random/unknown to you.
See my answer to Smoking Frog about how an operationally defined GAT defines an observable mean.
We can certainly, and we certainly should, build in known (or suspected) spatial correlations into our model. My argument does not depend on the exact model used. The spatial-correlation model simply adds more (unobservable) parameters that we have to integrate out at the end.
From what you say, I understand that the statement “Briggs is Bayesian” has probability 1 and you are happy with it. I too regard myself Bayesian (in a sense–but avoid assigning a probability 1, because I dislike labels) and I agree with your definition of randomness, although I would say it in a somewhat different manner: a random variable is an uncertain quantity to which probability is assigned. So, I do not understand why you reply “but no” when I say that the Bayesian approach (by identifying random to unknown–as you say–or uncertain–as I say) enhances, instead of not requiring, the notion of randomness.
What is randomness except that which affects in an unknown/uncalculated manner? To admit randomness as an inherent quality effectively screams that the universe is capricious — or capricious within limits. Do you think this is so? I do not. At least I hope not because, if it is, then all of modeling is ultimately pointless.
A quick afterthought.
As far as â€œBriggs is Bayesianâ€ having a probability of 1, why would you say that? I admit there’s a high probability of that but consider: maybe he doesn’t believe a word of it but espouses it because its all the rage or maybe he just likes tweaking noses? Why would you not assign a probability to alternate hypotheses?
You say you avoid assigning probability 1 (which implies avoidance of probability 0) but you apparently don’t. Not that there’s anything wrong or bad about that per se but nearly all givens have missing information IMO. So even things that would otherwise be certain (such as a mean or sd) have an element of “randomness”.
Why is everybody ignoring Essex et al 2007 paper and physical reality, and giving any credence to that theoretical monster called global temperature/anomaly? An average between two temperature readings is NOT a temperature, never was and never will be. statisticians should concentrate on datamining experimental data, which are daily temperature readings.
Dr DB, UK
Since there are other measures of central tendency and dispersion, I rather use the terms â€œmeanâ€ and â€œstandard deviationâ€ of a probability distribution, which are different from the sample mean and sample standard deviation.
What is the probability density function in the third graph? If it is the pdf of the posterior predictive distribution, I thought the mean and standard deviation of a posterior predictive distribution for a new observation would also depend on the posterior mean and standard deviation. Shouldn’t the mean (center) near 0.3?
This should be true to all statisticians. Probability distributions are used to describe the randomness. No randomness implies there is no need of any probability distributions, statistical modeling or statisticians… in this case, I would’ve choose to be an astrophysics instead.
Dr DB UK,
Isn’t a temperature reading itself just an average or index of energy content? It it is, then are you saying that an average of averages is not an average. I don’t think anyone is saying that an average temperature is the same as a single temperature reading but it gets really cumbersome to completely define words every time they are used. I would think saying Average Temperature would convey enough.
In any case, is it truly meaningless to stick a thermometer into say a heated pot of water then declare that the temperature is such and such even if deep down it’s just an average? Or would such a pronouncement have uses?
I am quite happy with your afterthought. I just wanted to be provocative, asking if he himself assigns probability 1 to this statement.
About the question what is randomness, I have written my view in a recent paper in http://itia.ntua.gr/en/docinfo/923/
How can you separate statistical notions from physics? Isn’t temperature a statistical notion per se? (reminder: 1/temperature is, by definition, the partial derivative of entropy with respect to energy, and entropy is the expected value of the the (minus) logarithm of a probability density function, i.e. a quantity that is defined within probability theory).
I think Briggs (among others) makes the distinction explicit because some people talk as if “randomness” is an inherent property as opposed to a statement of knowledge level.
Amen to being an “astrophysicist” vs. “astrophysics”. Just as no one would want to be a “statistic” (on anyone’s chart let alone the governments). Much better to be a “statistician”.
Hahahaha…I am watching the movie Scent of a Woman for the third time and reading blogs at the same time.
A global temperature is no more “real” than the average of height of four people in a room who are 3’6″, 4’6″‘, 6′ and 7’6″: nobody is 5’4.6″ tall. But it doesn’t matter. If what we want to know is how the height average changes as 4-8′ tall aliens enter the room, the average is useful.
The global average temperature is a one-number “proxy” for a situation in our world. Whatever it is reflects the current conditions. When that number changes, we look for what might have changed in our world – though the assumption by the warmists is that that change is bad, very bad, and should be nullified or at least limited.
The question of certainty, though, is of extreme importance as it gives Gore, Hansen and the IPCC authority in both declarations of what is going on and, in extension, to what will or might be going on. Yet what certainty is this “95%” or other number? Is it in the reproducibility of results from computer programs, the accuracy of the readings of thermometers and the like, or the correlation with what is measured with reality?
What we want to know is how well what is measured reflects what is, even if it is a non-real number like the average global temperature, and how well computer simulations reflect what is in the real world. The mathematical procedures may be 99% “certain”, in a calculating way, but the reflectance of models to reality may be only 50%, i.e. 50% what they claim to have been going on, say, in Darwin, on average, or 50% correct in what they predict to happen next year.
If a review of temperatures in New Zealand or Australia finds that adjustments are deemed inappropriate 20% of the time, the certainty of the record becomes possibly 50%, depending on where the errors are. If a review of the predictions – temperature rise, tropospheric hot spot, humidity distribution with time – show a 50% failure rate, then the certainty of the work drops more than 50%, as we would suspect errors elsewhere. But the math would still be robust, the code excellent and its conclusions, “certain”.
The certainty of Pachauri and Suzuki and the “consensus” of scientists is rooted in the statistical working of uncertain data and uncertain models. This is what the general CAGW, social scientist support group don’t understand. The certainty is all about methodology of analysis and interpretation, not the certainty of data, assumptions and model-coherence to the world. The uncertainty in those three areas is much greater but, if exposed, undermines certainty of postures, claims of authority and, most importantly, the assuredness of spending taxpayers’ money.
The cover of today’s Calgary Herald trumpets: “The end of Arctic tundra (06-March 2011)”. This “end” is certain, apparently, but why? Because a model based on adjusted, selected data fed into a computer with pre-set algorithms based on assumptions of what counts and what does not count said that, sometime in the future, locally “catastrophic” events were going to occur. Is there evidence that this is happening? No – some anecdotal that things are different today than before, but that means nothing. Fact is, there is no collapse or death of anything happening today. Not yesterday, either. And not about to tomorrow. Sometime, “soon”, but probably in terms of decades.
So why is the Heralds’ writers so agitated? Because the models are CERTAIN. Yes, they are. But that is math. That is how a program runs. Which is a different thing from the progress of events in the world. In the world, we are far less certain about what has gone on, and we are far less certain about will go on.
The N measurements of the local temperature which go into calculation of an average do not represent N measurements of a property of the same material. They are instead N measurements of a property of different materials at the same location.
I’m certain 🙂 that you stats persons have taken this difference into account. If that is necessary.
I don’t know if it helps, but the posterior predictive distribution is the distribution of unobserved observations that we predict conditional on the observed data.
So rather than being the uncertainty in the actual global average temperature, I’d have said it was the remaining uncertainty in the temperature measurements – the temperatures of all those places we didn’t measure. But I may have misunderstood the point intended.
For many years I measured radioactive effluents at many fixed locations. Dan Hughes above is correct. The measurements are not of the same stuff, but of different stuff at the same location. In my experience, the distribution of these effluents was always log-normal! If this is true for the temperature measurements, then the median of a data set is probably a better estimate than the average.
Smoking frog asked a question regarding the existence of a global average temperature. Either I misunderstand your answer or I think I disagree. Please tell me which it is or convince me you are right.
I can understand that people may disagree about the best method to determine the global average temperature, but the fact a field exists seems beyond doubt. Here’s a thought experiment. Think of two marbles, one at room temperature and one very cold having just come out of a freezer. One looks normal, but the other marble has bits of ice on it. You do not have to measure the temperature of the marbles to know one is colder than the other. You can see it. From outer space, Earth looks like the frozen marble. It has ice caps at both poles. The planet would certainly have a warmer global average temperature if the ice caps were gone.
I am convinced that AGW will not be catastrophic, but I hate to see what I think are ridiculous arguments being put forward.
If we do get a more statistically correct global average temperature from this enterprise, is it going to be a more useful metric?
It is assumed by some that it reflects the heat content of the atmosphere, but surely that is not correct unless the moisture content of the atmosphere is constant.
NASA charts record that the moisture content of measured levels has been falling for the last few decades, is this fact to be incorporated into the new figures by a reduction in temperature, to reflect the true heat content?
“From outer space, Earth looks like the frozen marble. It has ice caps at both poles. The planet would certainly have a warmer global average temperature if the ice caps were gone. ”
I have to admit to being baffled by this post. I think the problem is that you haven’t defined what you mean by GAT. If you define it as the mean temperature at 2m height (over land if you like), and if you have a set of measurement stations which record the temperature at 2m without error (or even with error if it is random, zero mean and independent), and if the stations are randomly distributed across the area you want the mean for (the land surface, or the whole globe) then you can certainly get an estimate of the GAT with as little uncertainty as you want, just by increasing the number of stations. This is a plausible definition of GAT and it is certainly something that exists. The problem in reality is that the measurement stations are by no means randomly distributed (for a start they will tend to be at lower than average altitude, and so will systematically overestimate the mean). Hence the concept of “temperature anomaly”, ie the deviation from the mean (at that point) over some baseline period. It is an assumption that although the stations are not random in space, they do give random samples from the temperature anomaly field (a dubious assumption, but necessary). Thus you can estimate the mean global temperature anomaly (again, something that does exist) as precisely as you like by increasing the number of stations. What you can’t do is use the anomaly to estimate the GAT, which is equal to the anomaly plus a totally unknown constant. Maybe this is what you meant?
Incidentally, I sometimes wonder why the lack of any real idea of what the global mean temperature is doesn’t seem to bother the modelers. If any aspect of climate is non-linearly related to temperature (and if it isn’t you don’t need complex models) then the absolute value of temperature (everywhere on the globe, actually) is essential, and the anomaly is no help at all.
It is invalid to average the temperatures of several things unless they are in thermodynamic equilibrium, because the average is not the temperature they would have if they were in equilibrium. If they were in equilibrium, there would be no need of averaging – you could just stick a thermometer into any of them, and the reading would be their temperature.
Then averaging the readings of something with a continuous sinusoidal heat input is invalid? How should I go about determining equilibrium? When the temperature readings stop changing or do I need to invest in one of those expensive equilibrium meters?
Absolutely. If you are unconvinced by the thought experiment, you can always do the marble experiment. If your instrumentation is precise enough, you will be able to find small variations on the surface of the frozen marble. Perhaps one side was touching some frozen peas and another was against a not yet frozen porterhouse steak. Do the variations on the surface of the frozen marble mean the marble does not have a temperature field?
You are seeking perfect conditions before taking observations. It is always good to seek to improve conditions or improve instrumentation or analysis methods, but it is completely invalid to claim a temperature field does not exist. Let’s stop the nonsense and get back to science.