Statistics Of Loeb’s “Observed Changes In Top-Of-The-Atmosphere Radiation And Upper-Ocean Heating Consistent Within Uncertainty”

The paper is “Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty” by Norman Loeb and others in the journal Nature Geoscience. I’m pressed for time, so for background on this paper, surf to Roger Pielke Senior’s place.

From the abstract:

We combine satellite data with ocean measurements to depths of 1,800 m, and show that between January 2001 and December 2010, Earth has been steadily accumulating energy at a rate of 0.50 +/- 0.43 Wm-2 (uncertainties at the 90% confidence level). We conclude that energy storage is continuing to increase in the sub-surface ocean.

Most curiously, the authors choose the “90% confidence interval” instead of the usual 95%. Why? Skip the discussion of the meaninglessness of confidence intervals and interpret this interval in its Bayesian sense. Then this means that the coefficient of the regression associated with time is estimated at 0.5 W-2 with a 90% chance of being anywhere in the interval 0.07 to 0.93 Wm-2.

This is an unobservable coefficient in a model, mind. It is not an amount of “energy.” To get to the actual energy, we’d have to integrate out the uncertainty we have in the coefficients.

Anyway, the change from the usual certainty level also means—I’m estimating here—that the coefficient of the regression associated with time is estimated at 0.5 W-2 with a 95% chance of being anywhere in the interval -(some-number) to one-point-something Wm-2. In other words, the intervals have to be widened, and probably such that the lower portion of the interval is negative: it is almost certainly near 0. Like I say, I’m guessing, but with enough gusto to be willing to bet on this. Any takers?

We have to know about the regression. Details? The authors put details in tiny print and in a supplement. Here’s the small print (bolding mine):

Global annual mean net TOA fluxes for each calendar year from 2001 through 2010 are computed from CERES monthly regional mean values. In CERES_EBAF – TOA_Ed2.6r, the global annual mean values are adjusted such that the July 2005–June 2010 mean net TOA flux is 0.58 +/- 0.38 Wm-2 (uncertainties at the 90% confidence level). The uptake of heat by the Earth for this period is estimated from the sum of: (1) 0.47 +/- 0.38 Wm-2 from the slope of weighted linear least square fit to OHCA to a depth of 1,800 m analysed following ref. 26; (2) 0.07 +/- 0.05 Wm-2 from ocean heat storage at depths below 2,000 m using data from 1981 to 2010 (ref. 22), and (3) 0.04 +/- 0.02 Wm-2 from ice warming and melt, and atmospheric and lithospheric warming1,27 . After applying this adjustment, Earth’s energy imbalance for the period from January 2001 to December 2010 is 0.50 +/- 0.43 Wm-2 . The +/-0.43 Wm-2 uncertainty is determined by adding in quadrature each of the uncertainties listed above and a +/-0.2 Wm-2 contribution corresponding to the standard error (at the 90% confidence level) in the mean CERES net TOA flux for January 2001–December 2010. The one standard deviation uncertainty in CERES net TOA flux for individual years (Fig. 3) is 0.31 Wm-2 , determined by adding in quadrature the mean net TOA flux uncertainty and a random component from the root-mean-square difference between CERES Terra and CERES Aqua global annual mean net TOA flux values.

The same 90% intervals are used, notice. The weights mentioned are hidden in another paper (ref. 26; I didn’t track this down). There is no word on whether the authors (or the others they cite) recognized the correlation in time and thus realize that the estimates of the coefficients, especially their confidence limits, will be suboptimal (too certain). In other words, a straight line regression is not the best model—but it is a model (no probability leakage, anyway! under the evidence that these indexes have no natural boundary). The final uncertainty is estimated by “determined by adding in quadrature” some other numbers.

What a complex procedure! The supplementary paper is little help in reproducing the exact steps taken. That is, it is doubtful that anybody could read this paper and use it as a recipe to reproduce the results (joyfully, the authors do make the data available).

But from a scan of the procedure, and given my comments thus far, it would appear the interval is too narrow. Adding all those different sources together and properly taking into account the uncertainty in each individual procedure is enough to boost the overall uncertainty by an appreciable amount. How much is “appreciable” is unknown. The amount one would have to add to the overall uncertainty is greater than 0. This implies that the final estimate of the coefficient of the regression associated with time should be about 0.5 W-2 with a 95% chance of being anywhere in the interval minus-something to just-over-one Wm-2. Consistent with uncertainty indeed.

There is still another source of uncertainty not noticed by Loeb, or indeed by nearly all authors who use time-series regression: the arbitrariness of the starting and ending points. I am sure Loeb did not purposely do this, but it is possible to shift the start or stop point in a time-series regression to get any result you want. For example, in their main paper Loeb et al. show plots from 2001 until 2010. But in the supplement, the data is from mid-2002 through all of 2010. Changing dates like this can booger you up. I’ll prove this in another post.


Thanks to reader Dan Hughes for helping me find the papers.

Update Be sure to see this post on how to cheat with time series.


  1. Ray

    “the slope of weighted linear least square fit”

    I have a least squares curve fit program which allows you to weight each data point. That came in very handy when I had to fit a straight line to very noisy data and the line had to go thru the origin. With enough adjustable parameters you can fit a nice curve to anything and make it come out correct. Also when you do a least squares curve fit using the normal equations (the usual method), there is an implicit weighting of the data points proportional to their distance from the center. The end points have more weight, so by carefully selecting the end points you can change the trend.

  2. nick


    under what circumstances is it possible to interpret confidence intervals as credible intervals? Or, more specifically, under what conditions do confidence intervals and credible intervals coincide for a given data set? If such fortunate conditions abounded, then one main argument against confidence intervals — that they have only a tautological meaning — would be invaild: Just interpret it in the Bayesian sense (as most people do, anyway)…

  3. Will

    Ray: I’ve written a number of curve fitting algorithms, in various programming languages, over the years. What I’ve learned is that there are an infinite number of knobs to twist to help refine the fit. You can aim for least-squared error, straight error, least-cubed error, etc… Even the base error term is adjustable; Euclidean distance isn’t mandatory.

    Curve fitting, and by this I mean all curve fitting, is guessing refined. They act as lossy compression algorithms. The less loss you want, the more terms you need. Outside of the computer graphics field, there don’t seem to be many who talk about regression in that way though.

  4. Confidence intervals are an estimate of error, but in this paper the authors failed to account for all possible sources of error. As is so typically done, they took heavily filtered, massaged, and manipulated data, and computed their reported error from those alone.

    Among the many sources of error they ignored are: measurement error of the satellite, error in averaging satellite measurements to a monthly “regional” average, error in averaging those to a “global annual mean net”. They did the same with ocean heat storage; that is, they failed to use the raw data but only averaged data. It appears that they assumed no error from those sources. And as you note, they failed to account for auto-regressive error and the bias in end point selection.

    Even from a frequentist point of view, the failure to account for measurement error is incompetent.

    It also appears that they assumed the subset of real error that they did use to be additive: “…uncertainty is determined by adding in quadrature each of the uncertainties listed above…”. That is improper. We know (from the laws of mathematics) that error aggregates multiplicatively. It inflates like a balloon.

    Of course, had they included all possible error sources, their conclusions would have been manifestly uncertain, so much so that they could not have drawn any conclusion with any degree of confidence. Sadly, if they been honest (competent) about it, the paper would not have been published. And then they would have perished, perish the thought.

Leave a Reply

Your email address will not be published. Required fields are marked *