# Demonstration of how smoothing causes inflated certainty (and egos?)

I’ve had a number of requests to show how smoothing inflates certainty, so I’ve created a couple of easy simulations that you can try in the privacy of your own home. The computer code is below, which I’ll explain later.

The idea is simple.

1. I am going to simulate two time series, each of 64 “years.” The two series have absolutely nothing to do with one another, they are just made up, wholly fictional numbers. Any association between these two series would be a coincidence (which we can quantify; more later).
2. I am then going to smooth these series using off-the-shelf smoothers. I am going to use two kinds:
1. A k-year running mean; the bigger k is, the more smoothing there is’
2. A simple low-pass filter with k coefficients; again the bigger k is, the more smoothing there is.
3. I am going to let k = 2 for the first simulation, k = 3 for second, and so on, until k = 12. This will show that increasing smoothing dramatically increases confidence.
4. I am going to repeat the entire simulation 500 times for each k (and for each smoother) and look at the results of all of them (if we did just one, it probably wouldn’t be interesting).

Neither of the smoothers I use are in any way complicated. Fancier smoothers would just make the data smoother anyway, so we’ll start with the simplest. Make sense? Then let’s go!

Here, just so you can see what is happening, are the first two series, x0 and x1, plotted together (just one simulation out of the 500). On top of each is the 12-year running mean. You can see the smoother really does smooth the bumps out of the data, right? The last panel of the plot are the two smoothed series, now called s0 and s1, next to each other. They are shorter because you have to sacrifice some years when smoothing.

The thing to notice is that the two smoothed series eerily look like they are related! The red line looks like it trails after the black one. Could the black line be some physical process that is driving the red line? No! Remember, these numbers are utterly unrelated. Any relationship we see is in our heads, or was caused by us through poor statistics methodology, and not in the data. How can we quantify this? Through this picture:

This shows boxplots of the classical p-values in a test of correlation between the two smoothed series. Notice the log-10 y-axis. A dotted line has been drawn to show the magic value of 0.05. P-values less than this wondrous number are said to be publishable, and fame and fortune await you if you can get one of these. Boxplots show the range of the data: the solid line in the middle of the box says 50% of the 500 simulations gave p-values less than this number, and 50% gave p-values higher. The upper and lower part of the box designate that 25% of the 500 simulations have p-values greater than (upper) and 25% less than (lower) this number. The outermost top line says 5% of the p-values were greater than this; while the bottommost line indicates that 5% of the p-values were less than this. Think about this before you read on. The colors of the boxplots have been chosen to please Don Cherry.

Now, since we did the test 500 times, we’d expect that we should get about 5% of the p-values less than the magic number of 0.05. That means that the bottommost line of the boxplots should be somewhere near the horizontal line. If any part of the boxplot sticks below above the dotted line, then the conclusion you make based on the p-value is too certain.

Are we too certain here? Yes! Right from the start, at the smallest lags, and hence with almost no smoothing, we are already way too sure of ourselves. By the time we reach a 10-year lag—a commonly used choice in actual data—we are finding spurious “statistically significant” results 50% of the time! The p-values are awful small, too, which many people incorrectly use as a measure of the “strength” of the significance. Well, we can leave that error for another day. The bottom line, however, is clear: smooth, and you are way too sure of yourself.

Now for the low-pass filter. We start with a data plot and then overlay the smoothed data on top. Then we show the two series (just 1 out of the 500, of course) on top of each other. They look like they could be related too, don’t they? Don’t lie. They surely do.

And to prove it, here’s the boxplots again. About the same results as for the running mean.

What can we conclude from this?

The obvious.

BORING DETAILS FOLLOW

I picked the number 64 years because it’s a power of 2, and I thought I’d play around with more complicated smoothers, but the results popped out for the simplest one; the actual number of years used doesn’t change a thing, but a power of 2 makes computations easier for Fourier and wavelet transforms (which I didn’t end up doing; but you can; have a nut).

The simulation was done in the free and open-source R statistical software. See this link.

And here the code. You’ll have to learn R on your own, however. I also notice that the WordPress template I use cut out all the formatting on the code; doesn’t hurt it in the least, but makes it harder to read.
 ######################################################### ## simulation

 # If you do not have package gregmisc installed, run the next line # and choose a mirror from which to donwload it # install.packages('gregmisc') library(gregmisc) # contains a running mean smoother n = 64 # number of "years" b = 500 # number of replications a = matrix(0,nrow=b,ncol=11) # holder of p-values d = matrix(0,nrow=b,ncol=11) # holder of k ########################## ## running mean smoother ## for (j in 2:12){ # loop over running mean of j "years" for (i in 1:b){ # the "real" data x0 = rnorm(n) x1 = rnorm(n) # the filtered data # type "?running" to see how to use the function s0 = running(x0, width=j, pad=TRUE, fun=mean) s1 = running(x1, width=j, pad=TRUE, fun=mean) fit = cor.test(s0,s1) # store p-values and "lag" a[i,j-1] = fit$p.value d[i,j-1] = j } } # makes it easier to plot the data if we turn it into a data frame y = data.frame(corr=a[1:(dim(a)[1]*dim(a)[2])],year=d[1:(dim(a)[1]*dim(a)[2])]) attach(y)link # uncomment these lines to save the pictures #jpeg('smooth1b.jpg') boxplot(abs(corr)~year,log='y',xlab="Running mean lag size",ylab="P-value",col=1:11) abline(h=0.05,lty=2) #dev.off() #jpeg('smooth1a.jpg') par(mfrow=c(3,1)) plot(x0,type='l',axes=F,xlab="Year") lines(s0,lty=2,lwd=1.5,col=3) axis(1);axis(2) plot(x1,type='l',axes=F,xlab="Year") lines(s1,lty=2,lwd=1.5,col=2) axis(1);axis(2) plot(s0,type='l',axes=F,xlab="Year",ylab="Smooth") lines(s1,lwd=1.5,col=2) axis(1);axis(2) #dev.off() ########################## ## low-pass filter smoother ## for (j in 2:12){ # loop over running mean of j "years" for (i in 1:b){ # the "real" data x0 = rnorm(n) x1 = rnorm(n) # the filtered data s0 = filter(x0, rep(1,j)) s1 = filter(x1, rep(1,j)) fit = cor.test(s0,s1) # store p-values and "lag" a[i,j-1] = fit$p.value d[i,j-1] = j } } detach(y) y = data.frame(corr=a[1:(dim(a)[1]*dim(a)[2])],year=d[1:(dim(a)[1]*dim(a)[2])]) attach(y) #jpeg('smooth2b.jpg') boxplot(abs(corr)~year,log='y',xlab="Filter size",ylab="P-value",col=1:11) abline(h=0.05,lty=2) #dev.off() #jpeg('smooth2a.jpg') # these extra lines are needed because smoother # centers data on new scale; this scales back mx = max(max(x0),max(x1)) ms = max(max(s0,na.rm=T),max(s1,na.rm=T)) par(mfrow=c(3,1)) plot(x0/mx,type='l',axes=F,xlab="Year") lines(s0/ms,lty=2,lwd=1.5,col=3) axis(1);axis(2) plot(x1/mx,type='l',axes=F,xlab="Year") lines(s1/ms,lty=2,lwd=1.5,col=2) axis(1);axis(2) plot(s0,type='l',axes=F,xlab="Year",ylab="Smooth") lines(s1,lwd=1.5,col=2) axis(1);axis(2) #dev.off() 

1. Pete Petrakis

I understand why smoothed data should not be used for further statistical analysis. However, isn’t it legitimate to plot smoothed data and the noisy data from which it’s derived simultaneously on the same graph to make underlying trends stand out?

2. Briggs

Pete,

Read my comment from the last post about what noise is and is not. Noise is with respect to some model. It isn’t necessarily wrong to show both, but you have to understand what you’re doing. I go on and on about in the previous post.

3. bill r

Congratulations on a nicely done example.

4. TCO

Briggs:

1. Nice exercise. Seriously.

2. Is this a well known effect? I assume it is. What are the seminal references (textbook pages and/or journal articles)?

3. Is this basic demonstration pertinent to Mann’s article? Is it what you would present in a formal comment? Why/why not?

4. What do you think of Mann’s adjustments (ala Hu)?

5. Methodology: why don’t you show the 1 or 0 cases? The control, the “no smoothing” case?

6. Methodology: what is the structure of the time series (autocorrelation coefficients)?

7. Thanks again, man. Very nice explanation of the box plots. Except I wonder what the rainbow colors are for…

5. Sylvain

I maybe mistaken but it seems that this paper reach exactly this conclusion:

http://weber.ucsd.edu/~mbacci/white/pub_files/hwcv-081a.pdf

ABSTRACT

Economics is primarily a non-experimental science. Typically, we cannot generate new
data sets on which to test hypotheses independently of the data that may have led to a
particular theory. The common practice of using the same data set to formulate and test
hypotheses introduces data-snooping biases that, if not accounted for, invalidate the
assumptions underlying classical statistical inference. A striking example of a datadriven
discovery is the presence of calendar effects in stock returns. There appears to be
very substantial evidence of systematic abnormal stock returns related to the day of the
week, the week of the month, the month of the year, the turn of the month, holidays, and
so forth. However, this evidence has largely been considered without accounting for the
intensive search preceding it. In this paper we use 100 years of daily data and a new
bootstrap procedure that allows us to explicitly measure the distortions in statistical
inference induced by data-snooping. We find that although nominal P-values of
individual calendar rules are extremely significant, once evaluated in the context of the
full universe from which such rules were drawn, calendar effects no longer remain
significant.

6. Raven

Matt,

Can you explain how smoothing during the measurement process affects the results. For example, many physical measurement devices will measure a quantity over a time and report an average value (e.g. Tmin-Tmax/2)? Does the smoothing implied by the measurement process need to be incorporated into uncertainty estimates on the p-value?

7. Raven asks whether smoothing done while measuring is also a problem. The answer is: No.

As I see it, here is why Matt’s example leads to misleading results. Say one value, by chance, happens to be very high. The smoothing process will bring down that high value a bit, but bring up neighboring values. So now, instead of one aberrant high value, there is a series of several moderately high values. Essentially all statistical analyses are based on the assumption that each value contributes independent information. With smoothed data, that isn’t true. When random chance actually only affected one point, it ends up altering a bunch of neighboring points in the smoothed data set. Using that as an input to another analysis is misleading.

But if the smoothing is done during data collection, no problem. Each value in the data set that will be analyzed is independent. Similarly, if you averaged Matt’s data for every ten years, and then had a data set with one tenth as many points, you wouldn’t have a systematic bias. The problem occurs with rolling averages, because random events that actually only affected one point, shows up in the smoothed data as a set of adjacent points.

The more I think about this the more obvious it seems. I cannot understand why anyone would want to smooth temperature proxy data before analysis. Can anyone explain the reason for it?

Imagine looking at data that you believe to be a proxy for the last 20 years of temperature changes. Before smoothing you would hope to see evidence a few years decline caused by Pinatubo and especially the rise and fall caused by the big El Nino event around 1998. After a ten year smoothing these events would be almost obliterated from your data, and your chances of finding a genuine correlation with temperature would be gone.

If there is no noticeable correlation with known global temperatures before smoothing then something must be wrong with the proxy; if there is correlation only after the smoothing then surely it must be caused by the smoothing.

9. But, Harvey, using ten year averages in effect throws away information. And what’s so magic about ten years? Why not 7 year averages, or 8, or 11-teen?

Another name for that is “blocking.” Let’s not look at all the data; let’s put it into ten year blocks. What for?

Statistically speaking, 99 percent of blocks used in statistical analysis are spurious. It is common, for instance, to divide daily weather data in weekly, monthly, or seasonal blocks and then average within the blocks. But those blocks are ARBITRARY.

Hey, let’s divide the human population into racial blocks. We can color code everybody. We can see if the red people are different from the yellow people. But the blocks are INHERENTLY BIASED. The a priori blocks taint the data. Conclusions are foregone, built into the spurious blocking system.

Rich, poor, middle class. Fat, thin, normal. Liberal, conservative, moderate. They are all bogus blocking systems.

Bad blocking, aka arbitrary binning, aka dividing of the data into a priori subsets, is the principal source of bias in any type of analysis, not just the statistical kind. Very dangerous, too. Leads to wars, Holocausts, confiscatory taxes, shoes that don’t fit, and a variety of other evils, small and large.

We don’t hand churn data anymore like Granny churned butter. We have computers now to do it for us. Give me all the data all the time. I’ll deal with the problems of autocorrelation and lack of independence. Don’t block me up.

Don’t get all jiggy about the extreme data points. There are no such thing as outliers, there are only ordinary liars.

10. In the original hockey stick paper, Mann speaks of the “natural smoothing” effect of principal components analysis. If I understand what you have written above, smoothing via PCA would also make one far too certain of the results. Is this right?

11. Briggs

TCO,

1. Thanks
2. Sure, and I have no idea
3. Yes, partly, and formal work requires coming up with a new full analysis, and that takes time
4. Not much
5. Laziness
6. None, all numbers are independent
7. The shocking colors are to honor Don Cherry

Sylvain, excellent reference, thanks.

Raven, measurement error (assuming it’s symmetric) doesn’t usually change the mean results, but does—and should—widen the confidence intervals, i.e. increase the uncertainty in the final results. Sometimes measurement error is so small it can be ignored, but not always, though people usually still ignore it.

Harvey, good explanation, thanks. Harvey, over at GraphPad.com separately reminded me that I could have, if I wanted, gotten many more publishable p-values by a simple trick oft used. Can anybody guess what it is? It doesn’t involve changing the data any more (after it is smoothed).

Liberty Boy, Von Neumann beat the author that article who needed over 30 parameters! He wasn’t trying hard enough.

Hadley, I can. Habit and a misunderstanding of what “noise” is.

Mike D, Amen.

Your Grace, PCA smooths, too, yes.

All, I’ve got to tell you, my heart soars like a hawk when reading the discussion here. I am so proud that I could weep. There’s this other blog by some guy named Tanmino or something where the level of sophistication is a tad less than it is here. Tanino was asked to comment on my previous post and—I’m guessing he pondered long and hard—he came up with “Clearly Briggs has a stick up his ass. A hockey stick.” Actually, back in Middle School when I was still gangly and walked a bit bow-legged, the guys used to say corn cob and not hockey stick. Now that I am full grown I manly glide like a thoroughbred down the street, his comment is less apt.

12. Bernie

Matt:
When are the issues raised by smoothing also applicable to transformations of data? The August UAH data is out and it just struck me that those signals are subjected to a significant amount of “manipulation” prior to being presented as temperature readings. What statistical conditions have to be met to allow a particular transformation? Presumably if you charted the original satellite readings you should get a graph with the same shape.

13. JH

How about generating two series from a bivariate normal distribution with a correlation of 0. 5?

I don’t see what inference one can make by calculating the correlation between two series of smoothed data though.

14. Briggs

Bernie,

Transformations? Not necessarily. A linear or monotonic transform won’t, or shouldn’t, change the conclusions much. There are cases where certain priors in Bayesian statistics are dependent on the units of the observables. But it’s not especially problematic.

No, the real problem is the model where the proxy is used to predict temperature. Actually, of course, the satellite measurement, the one we civilians see, is the result of a model and not the actual temperature. This is necessarily so because the satellite is counting photons and can’t measure actual temperature. This is a genuine case of measurement error.

The uncertainty that produces those reported temperatures should be carried through the any analysis that uses satellite-derived temperatures as input. It almost never is, though. And is another reason why I say that people are too certain of themselves.

JH,

Well, I suppose you could simulate series with any correlation you want, then measure the change away from that correlation. Essentially, that’s what I did with the correlation set to 0. Just change the lines in the code to generate a bi-variate normal with the relevant correlation. You’ll see much the same thing in the final result.

You don’t see what inference can be made by calculating correlation between two smoothed series? Well, we agree. Me neither. Why, then, do people smooth?

I’m just showing you what happens if you smooth. And unfortunately, it’s a too common practice (in many fields, not just climatology or meteorology).

15. TCO

5. Continued. Don’t you think you should show the “no” smoothing case? Isn’t that useful to learning? Isn’t it always helpful to show differences? And isn’t it also useful to make sure that something unexpected isn’t at work that you forgot to account for?

16. MikeD: I wasn’t advocating converting annual data to decade data. I just was pointing out that that process won’t create the artifacts that Matt demonstrated. You’ll only see the false correlations if you do smoothing so one large (or small) point ends up lifting (or sinking) a series of values.

17. JM

Forgive me, but don’t these two lines of code:

x0 = rnorm(n)
x1 = rnorm(n)

define two identical series (excepting noise)?

And doesn’t that contradict the statement:

“… The two series have absolutely nothing to do with one another …”

Huh?

18. Briggs

JM, frank,

Two those line generate random normals, n of them for each. This is the standard way to do this. Doesn’t contradict the statement at all. Knowing the value of, say, X0_1 tells you absolutely nothing about what the value of any X1 will be. There is no predictive value whatsoever in knowing any value of X0 (X1) in predicting X1 (X0). They are just made up numbers, each independent of the next.

The guy at Tanimmo who made the comment you posted, frank, was right: he doesn’t know R. Neither does the owner of that blog. A lot of people do, however, want to believe the simulation invalid, and that belief has somehow prevented them from investigating the simulation methods—which are dead simple standard.

The autocorrelation of the simulated series X0 and X1 is 0. A simple check of the R code confirms that. Just type ?rnorm to read about this. Of course, autocorrelation is falsely induced in the smoothed series S0 and S1, which is the point of the simulation. If autocorrelation was not induced, then the two smoothed series wouldn’t take on the appearance that some real signal is there.

The point of showing p-values (of which any regular reader of this blog will know I am no fan) is to show how much signal has been falsely induced with increasing smoothing. We know the p-value, on average, should be about 0.05 5% of the time. The increases were dramatic, were they not?

The post was not about how to best estimate the correlation between two time series with autocorrelation (or some other complicated ARIMA structure). However, I invite anybody who wants to try to do so: compute a test of significance between S0 and S1 taking full account of the autocorrelation. R has a function ccf(s0,s1) useful for this (you’ll have to write a bootstrap method, probably, or do a simple counting of cross correlations “sticking out” above the confidence limits in order to calculate significance). You will probably find that, even after taking into account autocorrelation in the smoothed series, you will still have more statistically significant results than the 5% you should, though this number will not be, naturally, as many as before. Don’t forget, however, no matter what, the true answer is that there should be no correlation found in the final results because we are meant to be making statements about the actual observables (the x0 and x1) and not the smoothed series.

I posted this comment just now over at that blog (we’ll see if it lasts):

Hi all,

Briggs here. Everybody’s welcome to cruise on over to my blog and see if you can catch me out on my statistics. Comments like those by Gav’s Pussycat and Lazar and others show that there is something most of you can learn about some pretty basic statistics. (Just a hint: the autocorrelation between the two simulated series is 0, folks. A simple glance at the code shows this.)

I explain everything over at my place, and correct the many errors I see here.

You pal,

Briggs

[Response: The autocorrelation of the two simulated series may indeed be zero, but that of the *smoothed* series is not. That’s pretty basic, and it’s the reason for the result of your “demonstration” — you failed to account for this when testing for correlation. Were you not aware of this? Were you deliberately trying to deceive your readers?]

Funny, but I thought that the point of your demonstration that you can take two totally uncorrelated series, put a smoothing on them and produce an autocorrelation? Am I missing the point – I thought that you were demonstrating that you should not put smooth time series before looking for correlations, and that your example showed this.

Is there some sort of test you can do to data after smoothing that can deduce whether any correlation is just an autocorrelation produced by the smoothing, as opposed to a genuine correlation in the original data? My guess is that no such test exists (unless it involves removing the smoothing). In that case I do not see what Tammy is complaining about – you wanted to show that smoothing produces false correlations, you showed this, you explained what you had done, and proved your point. What is Tamino complaining about?

20. Briggs

Patrick,

Our boy over there is being deliberately obtuse. I posted this reply to his reply

Look here, old son, the pointâ€”the main reasonâ€”of simulating and then smoothing was to show that smoothing *induces* the autocorrelation (among other things; see below). To say that I havenâ€™t accounted for it is then to say something silly, no?

*In fact, the smoothing induces a complicated structure, and not just an AR(1) series as you imply. The ARIMA structure induced depends on the smoothing method. Lag-k year smoothing induces an ARMA(k,1) or possibly ARMA(k,2) structure. The â€˜kâ€™ here does not mean the previous k-1 coefficients are in the model; just the kth autocorrelation is.

To answer your other question: if you had the unsmoothed series and the smoothed one you can just look: does the smoothed series have higher autocorrelation? You can always invent some bootstrap test if you enjoy classical methods. If you only have the smoothed series, then you are out of luck.

21. Bernie

Matt:
This discussion seems to be an overwhelmingly compelling argument for the archiving of original proxy data and removes absolutely any justification for only archiving data that has been smoothed or in fact transformed in any non-linear and non-monotonic way.

22. Briggs

Bernie,

Amen. Though there are some exceptions.

There are lossy data compression techniques, which are smoother-like, that can be used to compress and store massive amounts of data that would be far too expensive to store without (lossy) compression. A prime example is the use of wavelets to compress and store fingerprint images at the FBI.

Doesn’t seem to be necessary for temperature proxy data, though. Somebody else might enlighten us, but it doesn’t look like there is a massive storage need here.

23. Will J. Richardson

Dear Dr. Briggs,

Tamino responded to your second comment here: “Link” by using the “liar, liar, pants on fire” defense.

Regards,

WJR

24. Briggs

Thanks Will,

I can’t seem to induce any of his fan boys over here. But this is the last time I’m playing tag. It’s tedious. Anyway, here’s what I said:

I’m getting you there, old sweetheart. First you and others hyped “autocorrelation” and now I have you admitting an MA process but denying the AR one. I think you’ll find this amusing: if you look at the ACF/PACF plots you will see the nice little spike a lag k. (Of *course* it’s not real!)

Short-term memory loss is a problem, and often the first indication for a more severe problem. I’d have that checked out if I were you, just to be sure. But for now, let me remind you, the point of the exercise, my simulation, was to show how you could be misled, and become too confident if you smoothed one (or two) series and than analyzed the smoothed series. J, this comment is for you too (surf over to my place and find the spot where I say that smoothing a time series to predict new values of that series is okey dokey, and the part where I discuss measurement error as an indication for smoothing, then say you are sorry).

Tell you what. We can solve this easily. There are two problems: (1) your odd claim that I don’t know statistics, and (2) smoothing doesn’t make you too confident.

For (1), I’d agree to meet you and have an independent third party devise a statistical exam covering each major area of the field. We’d both take it. May the best man win. The loser has to, as you earlier suggested, insert a hockey stick where the sun don’t shine. Deal?

For (2), let’s do this. How about we both apply a filter—say a padded (so we don’t lose n) 10 point Butterworth low pass—to two randomly generated series. We then compute the errors in variable regression (or ordinary, whichever you prefer) between the two. To check for significance, we can run a bootstrap procedure whereby we simulate the models (accounting for the “autocorrelation”, naturally). We’ll have to repeat the whole process, say, 1000 times to be sure. If it turns out that the confidence intervals are too narrow, I win. If not, you do. Deal?

“J” said “Briggs…draws the conclusion that you should never ever use smoothed data. That seems a bit extreme to me. ” Which is why I talked to him, too.

25. TCO

(Cross posted)

Is there a use, a benefit, to processing the two series (in Briggsâ€™s toy example) by smoothing, then regressing, than doing a significance test (even with reduced DOF)? I mean is it â€œbetterâ€ to do it this way versus just regressing the data themselves? Or is it the same (and if so, why add the step)? Or is there some disadvantage (even when adjusting the DOF)?

P.s. For both, please drop the bravado (where to stick the stick). Both of you know some cool stuff. Both are willing to engage somewhat with commenters.

26. Briggs

TCO,

But he started it!

No, you’re right. I hereby drop the bravado (though, in my favor, I would have played either Game 1 or Game 2 with Tinmano).

Anyway—and this will come as no surprise—I saw that Tanmio didn’t want to play either of my two games. Well, I offered—and got called a denialist by one of his fans! So I go there no more.

Incidentally, Game 2, with the Butterworth low-pass filter etc.? I seem to recall a paper that I recently read that used this technique….

The idea, TCO, is that, as long as there is no measurement error, you are stripping away useful information in the data and introducing spurious “information” when you smooth. This is utterly uncontroversial and even trivially obvious.

The only reason people suddenly care about this dark backwater of statistics, and want to defend the use of certain sub-optimal techniques, is that some guys used these techniques to show something about global warming.

My point—actual readers of the original post will recall—was not even that those guys were wrong on their paper’s central claim, but that they would be too certain of their conclusion. Rather tepid criticism, no?

Has it gotten to the point where any valid or honestly made criticism is forbidden?

Anyway, I tried to make nice. I said:

Tamnio, my old pal, I give up. But I tell you what. Next time you are in NYC, let me know and I will be very happy to treat you to a beer. Weâ€™ll go to the Ginger Man on 36th. They always have at least two casks going.

27. TCO

Let’s please explore the concept rather than the (juicy and less brain-taxing) meta-drama.

Why does measurement error jsutify the smoothing then regression technique while confounding variables not?

If the P-test is adjusted for induced autocorrelation, does the smooth, then regress, than p-test become a neutral impact?

How is the r-value affected?

28. TCO

I thought your beer invite was nice. I think it could only be good (or neutral) for you all to talk.

I actually like Jolliffe who said the damn Mann work is too complex to even judge it well without spending weeks digging through it and doing calculations and the like. So, when you bring up schoolboy examples, I push you for quantization and applicability.

Steve McI, in particular, has a bad tendancy to lead with PR soundbit type stuff (Maine=Paris proxies) and then refuse to quantify impact (he thinks he’s being honest, but I see this as sophistry). I would assume that you, with more of a generalist analyst attitude would be less likely to play this sort of game and more interested in learning and being fair.

29. gens

Matt,

While I am generally aligned with your views, you really have lost me here. The Gingerman?? Really? Aren’t you a bit old for that meat market. At least they don’t still have the long lines they used to in the late 90’s, but it hasn’t been the same since our mayor banned cigars in bars. On the plus side, it can be a bit loud, so perhaps you won’t be able to hear Tammy.

TCO,

Are you physically incapable of making a series of comments without incorporating a cheap shot at Steve M? Ok, you guys had a falling out. Get over it.

30. TCO

Gens: You’re right.

31. Luis Dias

Darn, I missed the match. Who won?

32. Has it gotten to the point where any valid or honestly made criticism is forbidden?

Yep, been that way for a few years now. And it’s not just criticism, asking questions to gain information and understanding can easily lead to the same kinds of responses.

33. Briggs

gens, The Ginger Man is a meat market? Oh, good Lord. Don’t tell the female I pal around with. She won’t be happy. Actually, I’m never there later than 6 pm because I have to get home before my bed time.

Luis, Me. Who else?

Joy, That’s about the size of it. My suits fit a little better, but that’s the only difference!

34. JH

The simulation study indicates that analysis of smoothed data could lead to wrong and less reliable results about the original series. And as the level of smoothing increases (larger k), it’s more likely to have wrong conclusions.

Am I missing something? I admit that I donâ€™t read all comments carefully. But, why is Briggs (not) the loser? Did he (not) lose his suits to a female pal at The Ginger Man?

35. Sylvain

TCO you write:

“I actually like Jolliffe who said the damn Mann work is too complex to even judge it well without spending weeks digging through it and doing calculations and the like.”

I hope that you realize that this is the polite way to say that what Mann did was BS.

36. Sylvain

TCO,

Quoted from the paper I posted above:

“…We find it suggestive that the single most significant calendar rule, namely the Monday
effect, has indeed been identified in the empirical literature. This is probably not by
chance and it indicates that very substantial search for calendar regularities has been
carried out by the financial community.25 It is particularly noteworthy that when the
Monday effect is examined in the context of as few as 20 day-of-the-week trading rules,
and during the sample period originally used to find the Monday effect, its statistical
significance becomes questionable. Subsequent to its appearance, various theories have
attempted to explain the Monday effect without much success. Thaler (1987b) lists a
number of institutional and behavioral reasons for calendar effects. Our study suggests
that the solution to the puzzling abnormal Monday effect actually lies outside the
specificity of Mondays and rather has to do with the very large number of rules
considered besides the Monday rule.

Blame for data-snooping cannot and must not be laid on individual researchers. Data
exploration is an inevitable part of the scientific discovery process in the social sciences
and many researchers go to great lengths in attempts to avoid such practices. Ultimately,
however, it is extremely difficult for a researcher to account for the effects the cumulated
â€˜collective knowledgeâ€™ of the investment community may have had on a particular study.
Research is a sequential process through which new studies build on evidence from
earlier papers.
Once a sufficient body of research has accumulated, it is important to assess the results
not by treating the individual studies as independent observations but by explicitly
accounting for their cross-dependencies. In doing this, one should not be overwhelmed
by the sheer amount of empirical evidence. This is sometimes difficult because the
dependencies between results in different studies are unknown. For example, Michael
Jensen, in his introduction to the 1978 volume of the Journal of Financial Economics on
market anomalies writes â€œTaken individually many scattered pieces of evidence â€¦ donâ€™t

amount to much. Yet viewed as a whole, these pieces of evidence begin to stack up in a
manner which make a much stronger case for the necessity to carefully review both our
acceptance of the efficient market theory and our methodological proceduresâ€ (page 95).

Their conclusions apply just as well with climate science.

37. Sylvain

I believe a good paper on the subject:

Common Structure of Smoothing Techniques in Statistics
D. M. Titterington
International Statistical Review / Revue Internationale de Statistique, Vol. 53, No. 2 (Aug., 1985), pp. 141-170

From the conclusion:

“All the different problems have attracted considerable attention in their own right, so
they are presumably of importance. From a theoretical point of view this is certainly true,
as evidenced for example by the explosion of recent papers on density estimation and
nonparametric regression. In terms of practical application, the worth of smoothing
procedures is perhaps more questionable. Undoubtedly their nonparametric flavour is
appealing and their value will be increased by the development of interval estimation
methods (Wahba, 1983).”

38. Man, those AGW alarmists sure get testy. I was especially troubled, though, by the “signal/noise” jabber at that other site. If you measure something, then that’s your measurement. There is no “noise.” Noise is a red herring. There is only signal. The data are the data.

If you analyze the data, for instance by looking for trends, you should also specify (calculate) the error in that analysis. Mann’s hockey stick is just such a statistical trend, the result of an analysis, but he fails to quantify the error involved. Error, btw, is another word for uncertainty.

If you smooth the data first, and then do the analysis, you will naturally (mathematically) find a smaller error than if you had not smoothed the data. That is, the raw data will imply (the analyst infers) more variance about the trend line than the smoothed data.

But the data are the data, and the variance in the data is the true variance. The smoothed data give a smaller variance in the trend analysis, but that’s a chimera, a false reading of the actual true variance. The smoothing yields, as Briggs says, overconfidence in the trend.

The testy alarmists insist that “weather is not climate,” but the data is weather-related. Climate is evidently something that happens only at long time scales, and weather is what happens every day. But plants and plankton grow every day. The rain and snow fall on particular days. The proxies are affected by weather. The proxies record the weather, allegedly, if they record anything at all. The testy alarmists say they must smooth the proxies to detect climate, but that’s semantics. In fact, the data is from weather and the data are the data.

Without the smoothing, the variance about the trend is large. The trend detected in the analysis may not be a trend at all. The testy alarmists want to find a trend so badly that they utterly ignore the variance, or at best underestimate it by using smoothed data.

Some might argue that the error, aka variance, aka noise, in the data balances out. Some measurements are “wrong” because they are too high, and some because they are too low, but in the long run (in the model) they balance each other out. That is not true. In fact the opposite occurs in models: error aggregates. It piles up. Eventually it becomes unruly and chaotic and fills the room like a bad odor.

The testy alarmists wish to find trends that fit their theories so they utilize models with smoothed data, thereby failing to account for the error that is piling up and piling up. In fact, their outputs (trends) are so caked with aggregated error that they signify nothing. The true statistical confidence (after accounting for the true variance) is so huge that the trend could just as easily be downward or no trend at all.

And that’s the nub of all this stat chatter. Don’t get lost in the technicalities. Mann’s model is filled with error that he fails to report or account for. His analysis is pretty much sophistry. It’s a bummer, I know, for the testy alarmists, and they don’t want to cop to it. But them’s the facts, Jack.

PS to Harvey: Good point. Smoothing time series induces autocorrelation (dependencies between adjacent data points). Pretty obvious but I guess a lot of people don’t get it. Blocking does not induce autocorrelation. Blocking just throws away data, hides actual dependencies in neighboring data points, and reduces uncertainty unjustifiably.

PPS to Matt: If you’re ever in Oregon, look me up and I’ll gladly buy the beer.

39. Briggs

Mike D, Stop making so much sense, else we can’t continue to talk about this subject forever.

And “Oregon”? Where the hell is that? Somewhere in Jersey?

40. Bernie

Matt:
For Oregon, you take the George Washington Bridge to Jersey then take Rt 80 to Salt Lake City then RT 84 will take you to Portland. No problem!!

41. I’m just a simple caveman radiologist, I use terms like “noise” and “SNR” and “data compression” and all of this makes perfect sense to me even though I haven’t had a statistics class since a lame biostats class in 1991.

I think this has been very elegantly demonstrated by Dr. Briggs, and I appreciate (intuitively, without completely grasping the mechanics behind) the point he’s making. Makes sense to me.

I think it comes down to this for me: there are the data, the smoothed data, and manipulations of the smoothed data. It is important to not confuse these three things.

People who disagree with Briggs on this point are the kind of people who will tell you that if it’s a really good fax machine, faxing the “Mona Lisa” makes the picture better.

There are even people who will say that if you fax it enough times, you can figure out why she’s smiling.

42. Bernie

Darren:
Excellent, excellent analogy!!

43. Joy

When was complexity of method necessarily a sign of the right answer except to someone who expects to see it? Are there not an Infinite number of complex wrong answers?

Thatâ€™s a subjective measure. One might as well say that one likes the graph because it looks the right shape. It will do in Sales, but I thought the idea was to look for the truth about climate.

This is conceptual. You donâ€™t need to see colourful graphs to grasp what is a simple idea. Itâ€™s simplicity or ease of understanding does not make it less important.

Consider when looking at anything, that the reason it might not make sense may be because it isnâ€™t sensible. Itâ€™s easy with aggressive and supercilious attitudes as shown on RC and Taminoâ€™s web sites to believe that everyone who does not swallow a given idea must be lacking; that there are a select few very clever people who have it all sewn up. Thatâ€™s psychology, not rocket science.

One only has to read â€˜discussionsâ€™ on these sites over the last few months to realise that open discussion is not an option. Any good question is inherently considered stupid or attempt is made to show this with inadequate answers. Surely, any question tacitly seeks an unknown or why ask? It makes for tedious and frustrating reading. No wonder tempers are frayed.

Imagine any other profession where original files or notes were routinely discarded. I would be struck off for such behaviour as a Physio, a profession not known for rigorous science! Why then, cannot scientists who crave global importance be held to, and aspire to at least this standard. If something goes wrong, youâ€™ve got to be able to go back to the beginning, If you drop a stitch unpick it. Too bad for granny if the error is in row one. Just admit it, cry and start knitting again with a prettier pattern that is, this time, robust.

Although any hockey debate is not truly complete without someone offering to shove a stick where the sun doesnâ€™t shine, and makes the debate all the funnier; Steve McIntyre, Anthony Watts and William Briggs have shown an intellectual honesty that should be commended.

44. James

Great blog Briggs. Good to see someone using a hockey stick as it should be used whacking idiots on the head! Also thanks for the pointer to the open source R project. I’m always looking for good free tools.

During recent travels around the web I came across this interesting R code presented as a lab project for statistics students. It shows how to generate hockey sticks by manipulation of PCA analysis.

http://web.grinnell.edu/individuals/kuipers/stat2labs/PCA%20Global%20Warming%20.html