Global Average Temperature: An Exceedingly Brief Introduction To Bayesian Predictive Inference

Update This post is mandatory reading for those discussing global average temperature.

I mean it: exceedingly brief and given only with respect to a univariate time series, such as operationally define global average temperature (GAT). Let him that readeth understand.

GAT by year are observed (here, assumed without error). If we want to know the probability that the GAT in 1980 was lower than in 2010, then all we have to do is look. If the GAT in 1980 was less than the GAT in 2010, then the probability that the GAT in 1980 was lower than in 2010 is 1, or 100%. If you do not believe this, you are a frequentist.

Similarly, if you ask what is the probability that the GAT in the 2000s (2001- 2010) was higher than in the 1940s (1941-1950), then all you have to do is (1) define an operational definition of higher, and (2) just look. One such operational definition is that the number of years in the warmer decade outnumber the number of years in the cooler decade. If the number of warmer years in the 2000s outnumber the tally of warmer years in the 1940s, then the probability that the 2000s were warmer than the 1940s is 1, or 100%

There is no model needed to answer these or similar simple questions.

If you want to ask what is the probability that the GAT increased by at least X degrees C per year from 1900 to 2000, then all you have to do is look. If the GAT increased by at least X degrees C per year from 1900 to 2000, then the probability that the GAT increased by at least X degrees C per year from 1900 to 2000 is 1, or 100%. There is no need, none whatsoever, to ask whether the observed increase of at least X degrees C per year was “statistically significant.” The term is without meaning and devoid of interest.

At this writing, the year is 2011, but the year is incomplete. I have observed GATs from at least 1900 until 2010. I want to know the probability that the GAT in 2011 (when complete) will be larger than the GAT (as measured) in 2010. I cannot observe this now, but I can still compute the probability. Here is how.

I must propose a model which relates the GAT to time. The model can be fixed, meaning it assumes that the GAT increases X degrees C a year: by which it means, it does not increase by X – 0.1, nor by X + 0.3, nor by any other number besides X. In my model, in 2011 the predicted GAT will be the GAT as it was in 2010 plus X. Conditional on this model—and on nothing else—the probability that the GAT in 2011 is larger than the GAT in 2010 is 1, or 100%. This is not necessarily the same probability that the eventually observed GAT in 2011 is larger than the GAT in 2010.

It is easy to see how I might adjust this fixed model by assigning the possible increase to be one of several values, each with a fixed (in advance) probability of occurring. I might also eschew fixing these increases and instead assume a parametric form for the possible increases. The most commonly used parametric form is a straight line (which has at least three parameters; there are different kinds of straight lines used in time series modeling). How do I know which kind of parametric model to use? I do not: I guess. Or I use the model that others have used because conformity is both pleasing and easy.

I choose the straight line which has, among its parameters, one indicating the central tendency of a probability distribution related to—but is not—the increase in GAT through time. To call this parameter the “trend” can only cause grief and misunderstanding. This parameter is not, and cannot be, identical with the observed GAT.

Bayesian statistics allows me to say what values this parameter (and all the other parameters) is likely to take. It will allow me to say that, if this model is true and given the past years’ GATs, then the probability the parameter is greater than 0 is y, or Y%. This is the parameter posterior distribution. Suppose that y = 0.9 (Y = 90%). Can I then answer the question what is the probability that the GAT in 2011 is larger than the GAT in 2010? NO. This is the only probability that means anything to me, but I cannot yet answer it. What if y = 0.999999, or however many 9s you like: can I then say what is the probability the GAT in 2011 is larger than the GAT in 2010? No, no, and no, with just as many “no”s as 9s. Again, “statistical significance” of some parameter (mistakenly called “trend”) is meaningless.

However, Bayesian statistics allows me to take the parameterized model and to weight it by each possible value of the parameters. The end result is a prediction of the possible values of the GAT in 2011, complete with a probability that each of these possible values is the true one, assuming the model is true. This is the posterior predictive distribution; it is free of all parameters and only speaks in terms of observables, here year and GAT.

I can use the posterior predictive distribution and directly ask what is the probability that the GAT in 2011 is larger than the GAT in 2010. This probability assumes the model is true (and assumes the previous values of GAT are measured without error).

If I have more than one model, then I will have more than one probability that the GAT in 2011 is larger than the GAT in 2010. Each probability assumes that the model that generated it is true. Which model is really true? I can only judge by external evidence. This evidence (or these premises) tell me the probability each model is true. I can then use these probabilities, and the probabilities that the GAT in 2011 is larger than the GAT in 2010, to produce a final probability that the GAT in 2011 is larger than the GAT in 2010. This probability is not conditional on the truth of any of the models.

But it still is conditional on the premise that at least one of the models in our set is true. If none of these models in our set is true—which we could only know using external evidence—then the probability that the GAT in 2011 is larger than the GAT in 2010 is likely to be wrong (it still may be right by coincidence).

I hope you can see that I can ask any question about the observables prior to 2011 and that in 2011. For example, I can ask what is the probability that the GAT in 2011 is Z degrees C higher than in 2010. Or I can ask, what is the probability that the GAT in 2011 is W degrees C higher than the average of the years 2001-2010. And so on.

This is how Richard Muller’s group should issue their statements on the GAT.


  1. Speed

    I think that there is something wrong with the sentence, “I hope you can see that I can ask any question about the observables prior to 2011 and that in 2011.” Perhaps it is only that I don’t understand it in which case the problem is mine.

  2. Mike B

    “GAT by year are observed (here, assumed without error). ”

    Aye, there’s the rub. Stations come and go. Time-of-day Observation varies. UHI biases appear. And that’s just land stations. Ocean measurements have their own set of problems.

    Just a reminder that measurement errors, especially in data that have already been taken, cannot be “assumed away”. I realize you’re more at the thought experiment stage here. Just wanted to put those items out there.

  3. JH

    GAT by year are observed (here, assumed without error)… If you do not believe this, you are a frequentist.

    If someone doesn’t believe it, I’d ask the question in a different way. If the GAT in 1980 was 15oC and the one in 2010 was 15.5oC, is the GAT lower in 1980 than in 2010? A third-grader should know the answer once they learn about decimals. No probability involved at all. No statistical model required for this kind of questions. No fun either.

    Assuming repeated observations of identical conditions in 1980 and in 2010, what a real frequeintist wants to know is the limiting relative frequency (probability) that the GAT for the set of conditions in 1980 is lower than for the set of conditions in 2010. But it is of course impossible to actually run an infinite number of repetitions of the conditions in this case. So such interpretation or definition of probability is not applicable.

    Since I am such a modern statistician, I shall let go of the frequentist/Bayesian debate. ^_^

  4. JH

    GAT by year are observed (here, assumed without error)…

    Yes, GATs by years are observed, but I don’t think your definition of error is exactly the one used in statistical modeling. Of course, I never eliminate the possibility that I have misunderstood you.

    Here is my understanding of errors in statistical modeling. The word “error” is a bit misleading. I prefer using the word “deviation”.

    The random (unknown) deviation/error in a statistical model represents the deviation between the proposed systematic function (with unknown parameters if parametric, linear or nonlinear) and observed value due to effects of various sources. What we are modeling is the unknown deviation. And there are diagnostic measures to check the validity of those distribution assumptions about the unknown deviation.

    The likelihood function used in both classical and Bayesian statistics is to describe the unknown/random deviation. Both methods make inference based on what would happen if the data were generated by random variables, as you’ve done in the simulation in your previous post.

  5. Briggs


    You are right—you are wrong. That is, you are right that you are wrong and have not understood.

    The likelihood in Bayesian statistics is not there to describe “random”, a.k.a. unknown, deviation. It is there to tell us the uncertainty we have in the observables, given a fixed value of the parameters. There is no “deviation” in it. The likelihood is just a probability that says things will happen with a certain probability. To a Bayesian, the model and the likelihood are the same thing; they are to a frequentist, too, but the frequentist, as you do, saddles it with a strange interpretation.

    Hey! I see from your first comment that you have (gracefully) abandoned infinite repetitions of (here) climates! What a great day!

  6. mt

    I’m hoping this group doesn’t make any statements about the final 2011 GAT until after 2011 is over.

  7. JH

    I have not abandoned frequentist’s definition of probability. It’s never exactly my interpretation or definition of probability. Whatever probability means to you is probably what it means to me.

    If you are talking about making statistical inference, a Bayesian method to me is just one method. I have no problems with using various statistical tools, Bayesian or not. Limiting oneself to a certain method is not in fashion anymore.

    Anyway, how about helping me understand more about the Bayesian modeling process by answering the following questions. I have no problem with the mechanics/mathematics behind it.

    1) What exactly is the uncertainty? What are you uncertain about?
    I guess it’s not the deviation as I mentioned in my comments.

    2) How do you come up with a probability model for the uncertainty?
    My understanding of statistical modeling also uses probabilities.

    Or 2) How would you postulate a likelihood function for two or more observable quantities?

    The following answer is not an answer at all.

    Or I use the model that others have used because conformity is both pleasing and easy.

  8. JH

    Sorry, I was interrupted, and let me ask “Or 2)” correctly.

    Or 2) How do you postulate a relationship between two or more observable quantities?

  9. Harold Pierce Jr

    The BEST project is an absolute waste of time and money because temperature data are not that accurate and there is no uniform distribution of therometers on the surface of the earth, in particular on the surface of the ocean and at remote sites. Moreover there is better method for analysis of weather station temperature data which I shall briefly outline in the following comment.

    The late John Daly’s criteria for selection of a weather station for temperature data analysis are (1) remoteness, (2) permenant and well-maintained site, (3) long and continuous record, (4) meticulous record keeping and (5) compliant with and adherence to the WMO standards and protocols. He mentions in “What the Station Say” that temperature data from stations in urban areas, from comprised sites (e.g., air and marine ports) and any poorly-sited station should not be used.

    At his website “Still Waiting for Greenhouse” ( ) and under the tab
    “Station Temperature Data”, there are many charts from remote surface weather stations and these show there is no so-called “global warming” in particular

    Thus there is no need to redo what has been done in the past.

    There are several website where investigators have presented their analysis of temperature records. See Roger Sowell’s analyses of US cities at:

    NB: RS is chemical engineer and no amatuer.

    BTW Do you know of the late John Daly? I was quite surprised to learn the Judith Curry had no knowledge of him.

    RE: Climate Cycles: What the Russians say

    You should check out:

    “Cyclic Climate Changes and Fish Productivity” by L.B. Klyashtorin and A.A . Lyubushin. You can download the English translation for free thru this link:

    NB: This mongraph is 224 pages and is not about climate science.

    By analyzing a number of time series of data influenced by climate, they found that the earth has global climate cycles of 50-70 years with an average of about 60 years and with cool and warm phases of about 30 years each. These results are reported in the first two chapters.

    The last warm phase began in ca 1970-75 and ended in ca 2000. The global warming from ca 1975 is due in part to this warm phase. A cool phase started in 2000 and they predict it should last about 30 years. Any analyses of temperature data should take into account this climate cycle.

  10. Ken

    The GAT figure seems, to me, to be highly suspect as it aggregates so much information into a single point value as to distort, or conceal, a lot of information. Something like taking the average temperature in a house–with values ranging from those measured in a hot kitchen to a very cool/cold cellar & rooms in between….the calculation may be correct by various criteria–but does it really convey “information” and of what use is that “information” really useful?

    Seems to me its a good way of hiding more signficant & informative information. A sort of mathematical soundbit that obscures more than it explains or reveals.

    An implicit assumption with such a number would seem that at every location the measured temperature values are fluctuating proportionally across all seasons. But we know this isn’t so — right? It seems I’ve read that some months have skewed the annual data, and, that some regions show no discernable trends.

    In developing an aggragate value, such as the GAT, how does one demonstrate that the aggregated trend is in fact representative over broad areas over lengthy periods versus being the result of skewing from a minority of “outliers” from a minority of locations?

    for example:

    Or put another way, how does one “test” the data to show it can be aggregated in such a manner such that the figure generated really means something??

    Not to mention factoring in the uncertainty inherent in the data (e.g. there’s a similar graph, etc. on this site…and in color too!).

    This GAT figure seems to be accepted out of familiarity — but the figure itself seems misleading, if not meaningless.

  11. Briggs


    “Limiting oneself to a certain method is not in fashion anymore.” Here you speak the truth. Unfortunately.

    Uncertainty can be assessed for any statement, and is always with respect to some evidence/premises. The uncertainty may be a single number, interval, or a null, i.e. “not known” or gibberish.

    The best models use as much physics/biology/etc. as possible, and are not just out-of-the box. Like nearly every in, say, sociology or education is.

  12. George Steiner

    There is this planet, with such variations of climate as deserts, ice covered poles, rainforests, and seasonal variations from winters to summers over tens of thousands of kilometers. Does Mr. Briggs believe that global average temperature has any other meaning than that of an arithmatic artifact?

  13. JH

    It just occurred to me that I was only thinking of forming a statistical model between variables in the form of a mathematical equation, in which an error term is included. I forgot about, e.g., the generalized linear model, a generalization of the liner modelling process to fit the data to various probability distributions. It still can be explained involving the systematic and error parts though.

    You know, Mr. Briggs, a satisfactory answer to me is something like the following. ^_^

    My Dear JH, A Bayesian method usually begins with a likelihood function conjectured by classical statisticians!

  14. JH

    Oh… and the following reply to the satisfactory answer proposed in my previous comment is most welcome.

    Here you speak the truth again.

  15. Well, unlike a Paul Krugman article, I’m gonna have to read this one more than once.

  16. Harold Pierce Jr

    RE: Harold’s Method for Analysis of Weather Station Temperature Data

    RE: Method for Determination of Weather Noise.

    I outline a new method I have developed for analysis of weather station temperature data.

    1. Download the temperature data record. For each month check Tmax and Tmin data for mistakes, errors and omissions which are not uncommon in early records. I found errors of the following types: (1) wrong sign, (2) transposed numbers, (3) misplaced decimal point, (4) incorrect digit, (5) missing data and (6) tomfoolery.

    I found an example of 6 in temperature data from the Quatsino (BC) weather station. In 1926 the Tmin entries for Sept 23, 24, and 25 are -0.6, -0.6 and -0.6, resp. Yikes! The Sign of the Beast! How did this get past the data quality checker in Ottawa? The other Tmin entries for Sept ranged from 1.7 to 12.2 deg C.

    Apparently the observer in Quatsino was sending a message to the data quality checker in Ottawa like all you back there are devils. And in these those days they were.

    2. Sample interval for analysis must be small and no more than 1 month. A small sample must be used so that sunlght is essentially constant over the sample interval.

    3. For each sample interval compute mean Tmax and Tmin and the classical average deviation (AD) from the mean for each metric. The AD contains a measurement of weather noise.

    4. Examine each sample interval for step changes or cycles.

    5. After analyses of the data determine how the results relate to any so-called “global warming and climate change”.

    For my analyses of the temperature data from the weather station at Quatsino BC where record keeping begin in 1895, I use data from March, June, September and December for this practical reason: I’m a pot-boiling organic chemist and don’t know how to to use data bases and spread sheets. I download and analyze the monthly records one sheet at time and use a sample interval 11 days centered on the equinoxes and soltices. This method is simple but very tedious and time consuming but I did find the Sign of the Beast!

    My study of the selected temperature data from the weather station at Quatsino have revealed that present mean Tmax’s and Tmin’s are about the same as those in the early years of the 20th century, no evidence for “global warming”.

    RE: Weather Noise.

    I propose that for a short sample interval:

    AD = WN + FRT

    where WN = weather noise and FRT = field resolution of therometer.

    At Quatsino the grand mean for AD for all analyzed temprerature data in the sample range of 1895-2010 is 1.5 K for both Tmax and Tmin. Since FRT is 0.1 K, WN is 1.4 K.

    Question for WB

    If you buy my weather noise proposal and if WN is 1.4 K, what must be the difference between means of two temperature data sets for P < 0.01? NB I'm a really picky chemist.

  17. Mike B


    One thing I haven’t heard you mention yet is the issue of selection of a representative set of “stations”. I put stations in quotes, because sea surface measurements won’t (necessarily) be taken at fixed locations.

    Having a representative sample is more important than having a large sample, and I’m curious how this group is planning to go about selecting and siting stations.

Leave a Reply

Your email address will not be published. Required fields are marked *