Statistics

# How To Cheat, Or Fool Yourself, With Time Series: Climate Example

Update This post of such importance, that it remains on top today. See below for more comments.

Presented for your satisfaction, a way to cheat either yourself or others using time series. The patter below is only a suggestion.

Presentation

Just look at these anomalies, which are related to rampant, deadly climate change. Higher anomalies are worse for all of mankind in every imaginable way.

The anomalies are presented as monthly measures, over roughly a 10-year time period. A regression was fit to them and is plotted. The per-decade increase in this not-good anomaly is 0.87 per decade. Why, after 20 years, the anomaly will be almost twice as large as it is now!

The 95% confidence interval for the decadal change is 0.44 to 1.3. That means that the anomalies are surely heading up!

But ignore these sorrowful facts, because there is good news to be had. Here are some more anomalies.

These anomalies are on the way down, thus our spirits should be on the rise. In fact, the anomalies will drop at 0.57 over the next 10 years. And after 20 years, they’ll be down more than one full point!

The 95% confidence interval for the decadal change is -1.1 to 0. That means that the anomalies are surely heading down!

How the trick works

The pictures are the same!

Even if you flash up both pictures, the audience members will never notice that they are seeing the same anomalies. Yes, it’s true. You’ll worry that somebody will catch on, but they won’t! I have seen this done many times and nobody ever notices that the pictures are identical—except, of course, for those colorful straight lines. And the starting date.

Now take a look at these anomalies, which are the same as above, and see if you can spot the difference.

Instead of one regression line, there are 24. The first one is drawn using the entire time series. The second one is drawn using the entire time series except for time point number 1. The third removes time points 1 and 2, and so on. There are 24 lines in total, showing anything from a large increase to a large decrease, and each drawn by choosing a new starting point.

Do you get it? This is the whole trick! Nobody ever asks why you chose a particular starting point. You can tell any story you like and people will never think to ask what would happen if you were to use a slightly different data set.

Of course, very clever magicians will manipulate both starting and end points, but it’s best not to meddle with the end points until you become a master. People will (or should) naturally ask why you haven’t included the most up-to-date data, but they will absolutely never ask why you only used some of the history and not all of it.

Statistics

The time series above was generated by the R armia.sim() function, using a mean 0, standard deviation 1, AR(0.64,0,0,0,0,0,0,0,0,0,0,0.35) process, which mimics many different real-world monthly time series. But try your own model. It works for models of any kind. And it’s fun!

Next thing is to show how reliable this trick is. The true answer—given our evidence E that the model is mean 0, etc.—is that the anomalies neither increase nor decrease over a decade. The slope of any regression line, in other words, should be 0. Or the confidence intervals of any line drawn should include 0. Of course the actual results will vary.

It’s your confidence intervals which are the real convincers in the trick. Did you notice that both confidence intervals (for the first two figures) confirm the hypothesis that things are getting better and things are getting worse? Isn’t that great!

To show the reliability of this, suppose your funding depends on things getting worse: you need the anomalies to increase. Therefore, you’ll pick a starting date which gives you the best evidence. Not every time series that is truly unchanging (as our E says it is) will cooperate such that you can definitely show an increase. But you can limit the damage against yourself by showing the smallest possible decrease.

I simulated 1000 different time series, each time picking the best starting point (to show the largest possible increase). Remember: if no cheating occurred, the mean of these samples should be 0. It isn’t. It’s much higher at 0.21—with a 95% confidence interval of -0.88 to 1.31.

Notice how much wider this (better) interval is. It’s better because it takes into account cheating.

What if you don’t want to cheat? Well, your interval will still be wider than if you just ran the regression on the data at hand. Except if the data at hand is all the data that will every occur (and if it is, there is no real to run a time series), the arbitrariness of the starting (and ending_ point must be accounted for. If it isn’t, then you will go away too confident of yourself.

The lesson is, of course, that straight lines should not be fit to time series.

Question: why fit a straight (or any shaped) line to a time series like this? There are three reasons: (1) to discover whether there was a trend, (2) to predict the future, and (3) to use the analysis as part of a larger analysis.

(2) is a respectable goal, and should be encouraged. Most who fit lines to time series have this goal in mind, at least tacitly; that is, they at least imply that the line they have fitted will “continue” into the future. Therein lies a problem. For that line is an all-too-sure guess of what the future will be.

Notice that we stated specifics of the line in terms of the “trend”, i.e. the unobservable parameter of the model. The confidence interval was also for this parameter. It most certainly was not a confidence interval on the actual anomalies we expect to see.

If we use the confidence interval to supply a guess of the certainty in future values, we will be about 5 to 10 times too sure of ourselves. That is, the actual, real, should-be-used confidence interval should be the interval on the anomalies themselves, not the parameter.

In statistical parlance, we say that the parameter(s) should be “integrated out.” So when you see a line fit to time series, and words about the confidence interval, the results will be too certain. This is an inescapable fact.

(1) is also a goal, but a shady one. If we want to know if there has been a change from the start to the end dates, all we have to do is look! I’m tempted to add a dozen more exclamation points to that sentence, it is that important. We do not have to model what we can see. No statistical test is needed to say whether the data has changed. We can just look.

I have to stop, lest I become exasperated. We statisticians have pointed out this fact until we have all, one by one, turned blue in the face and passed out, the next statistician in line taking the place of his fallen comrade.

It is true that you can look at the data and ponder a “null hypothesis” of “no change” and then fit a model to kill off this straw man. But why? If the model you fit is any good, it will be able to skillfully predict new data (see point (1)). And if it’s a bad model, why clutter up the picture with spurious, misleading lines?

Why should you trust any statistical model (by “any” I mean “any”) unless it can skillfully predict new data?

Again, if you want to claim that the data has gone up, down, did a swirl, or any other damn thing, just look at it!

(3) If you fit a line and then use the parameter estimates of that line as input into other analysis (as was done in our sample paper, referenced below), your results will be too certain. We all know the dangers of smoothing time series. If you’ve forgotten, I, II, III.

———————————————————————————

This post was inspired by an actual paper—where I do not accuse the authors of cheating; but they do use time series with different starting and ending dates and then combine those time series to make a conclusion. We can see now that they will be too sure of themselves.

Update See this cartoon which shows that the IPCC has been known to employ the technique of variable start dates.

Categories: Statistics

### 42 replies »

1. Speed says:

For further examples of start point shifting see, Marketing, Mutual Funds.

2. Andy says:

Can you do the same on the national debt trend to make it go down?

3. Will says:

This is one of the best posts you have written. Thank you Professor!

4. Gary says:

So the next question then is: What curved line reasonably may be fit to a time series?

5. Briggs says:

Gary,

Excellent question. There are books upon books about how to properly model times series. Look up Mike West’s or Brockwell & Davis’s.

6. DAV says:

“The lesson is, of course, that straight lines should not be fit to time series. ”

Seems the lesson is to avoid straight lines for non-linear data. Why should one axis being time have anything to do with using linear regressions? The real problem though: the soft sciences (including climate) rely too much on a meaningless number for validation.

7. Ray says:

Anytime you fit a stright line to a finite segment of data you will have a trend even if the process which generates the data has no trend. Depending on the method of curve fitting you can get different trends for the same data. In the curve fit you can minimize the absolute error, the squared error or the maximum error and obtain different trends all depending on the norm you are minimizing.

As a practical example of this, years ago I worked on a project where we fabricated waveguide filters. The filters consisted of a length of waveguide with cylindrical posts dip brazed to the waveguide wall, and once they were dip brazed the filters were unadjustable. We had a specification for so much tilt and curvature and tolerance across the bandpass so the problem was how to fit a curve to the data points to minimize the number of rejects i.e. out of spec filters. The customer did not specify the type of curve fit to be used. The minimax curve fit was best because the extreme data points were equidistant fron the fitted curve. I translated a Remez algorithm into basic, which would run on the instrument controller and automated the whole test process. Amazingly, we didn’t have to scrap a single filter for being out of spec. We didn’t cheat because the customer didn’t tell us what curve fit to use. Probably didn’t know there are all kinds of curve fits.

8. John Brookes says:

Very nice. Which is why climate scientists get sick of “skeptics” trying to show trends from 10 years of data. The more years you have, the more difficult it is to cherry pick your end points – provided there is a genuine trend to the data.

9. Steve E says:

This is precisely what made me a climate skeptic. I’m in financial services and this type of statistical chicanery is used to sell mutual funds every day.

Briggs you are right! Very few people–professionals included–understand start and end date bias. I’ve seen fund sales go through the roof as a fund’s return numbers back on to a disastrous and often historic low point in the fund’s history with little more than mediocre results at the front end. The result is very attractive peer leading results that draw the suckers in.

What’s the expression? Torture the data and it will eventually tell you what you want to hear…

10. Will says:

“Why should you trust any statistical model (by â€œanyâ€ I mean â€œanyâ€) unless it can skillfully predict new data?”

If the only test that matters is ‘does it work against unseen data’ then why isn’t that the _only_ test?

What are there umpteen different tests to ‘reject’ a null hypothesis? Who do people care if something is heteroscadastic or variable X is normally distributed?

11. Doug M says:

â€œWhy should you trust any statistical model (by â€œanyâ€ I mean â€œanyâ€) unless it can skillfully predict new data?â€

I am selling mutual funds. Sometimes I am called on to explain what happend. There is so much data, it is easier to create a model that describes the data, and then describe the model. This model is not expected to have any predictive power. e.g. the story of the last 6 month is the European debt crisis. Why is my asia fund down. If I can show that daily fluctuations in Korean interest rates are correlated to the dollar Euro exchange rate, it is an easier narrative to write. It doesn’t have to be true, just plausible. And, if the correlation breaks down tomorrow, that is okay, too.

12. DAV says:

John Brookes, “Which is why climate scientists get sick of â€œskepticsâ€ trying to show trends from 10 years of data.”

Actually, if the data are cyclic it’s more the starting and ending points than the number of cycles. Try it with a constant sine wave. Climate has apparent cycles so the slope of a straight line applied to depends a lot on where you begin and end.

If you’re looking for a trend you need to plot through the same point on each cycle — say the peaks. With multi-frequency data (like climate) identifying them becomes a problem. Using climate, for instance, should a trend plot start at the bottom of the Little Ice Age? Why not start at the top of the Medieval Warm Period?

13. jack mosevich says:

Great post Matt. Any chance you could explain the approach Mann took to get the hockey stick and that Steve M was right in that random inputs would give a similar graph?

14. Will says:

Doug M: So you use a statistical model to help you invent stories for your clients? Sorry; couldn’t resist. 😉

15. Steve E says:

Will says:
26 January 2012 at 9:59 pm
Doug M: So you use a statistical model to help you invent stories for your clients? Sorry; couldnâ€™t resist.

My point exactly!

This one is the “excuse” use (after sale), which excuses the original “plausible” use (pre-sale) about why this mediocre fund was the greatest, trend-breaking, sure-thing, different-than-anything-that-came before it when it was “sold.” Even the best funds measured by biased time series return to the mean, usually as a result of extended disastrous results.

Unfortunately with mutual funds, not everyone has the same experience over any time series. Most buy at or near a peak and sell at or near a trough.

DAV nails it above, “it’s more the starting and ending points than the number of cycles.

16. Steve E says:

Doug M,

Sorry, I was talking about the mutual fund industry and how it misuses statistics, not you specifically. 🙂

17. Once I was asked to evaluate the stats used in a project that compared two maps (one a fire history map and the other a current vegetation map). The researchers had randomly selected points in the spacial plane, and then compared those points on both maps. What stat method to use when crunching the data?

I pointed out that the sampling was merely a subset of the actual data. They knew the “values” at every point on both maps. Why not use all the data? Compare every possible point? They had a computer, after all. The maps were fully digitized.

They were stunned and very upset at my suggestion.

Furthermore, I pointed out, they were studying maps, not the real world. They hadn’t measured anything “in the field”. They were merely comparing symbolic representations. They had no idea whether the maps were accurate or not, or to what degree of accuracy. They had not created the maps; somebody else did and nobody knew how they did it. Get out in the woods and do some real measuring, I told them.

That second suggestion really fried them, and they sent me away. Later I learned that it was a PhD thesis study, and they awarded the poor sap a doctorate anyway. Now we are supposed to call him “doctor”.

Hahahahahahaha!!!!!!

18. Briggs says:

Uncle Mike,

I had to check the calendar to see if it was Halloween after reading your anecdote. What a great statistical campfire story!

19. If you want a trend from monthly data, the only one that has any sort of real statistical meaning is the trend component of the seasonal adjustment process, be it X11 or X12. Both generate excellent trend series from monthly data, ones that follow best statistical practices and do not allow for screwing around with straight-line trends.

Of course, they don’t generate a straight line trend. But then again, they shouldn’t.

🙂

20. Arfur Bryant says:

â€œThe lesson is, of course, that straight lines should not be fit to time series. â€

Would I be right in thinking that this only applies to a segment of a time series but not if the straight line covers the entire time series? Provided the origin of the straight line is always the start point of the series, then I would think that the line drawn will indicate a reasonable ‘trend’ up to a chosen point (lack of any reasonable smoothing notwithstanding) along the x-axis. In which case, any subsequent straight line drawn from the origin to subsequent points further along the x-axis would therefore indicate a changing trend – eg if the subsequent straight lines are steeper then the trend is greater. Is that fair?

21. Brian H says:

De-obfuscate this sentence pls?:
“(and if it is, there is no real to run a time series)”.
I think the adjective “real” needs a noun to lean on. Unless it’s implied and I just don’t git it!
“need”, perhaps???

22. Robin says:

I believe we have corresponded before (emails, now lost on my old machine :-(( ).
I am very interested by the subject of this article, which is closely related to my own notions on time series. In particular, I’m interested because you talk about confidence intervals round regression lines. The software I use is “home grown” but reliable, and I have sold many hundreds of copies, often to rather “up-market” clients. However, although its simple linear regression analysis facilities (including polynomials) provides automatically the option for plotting the fit with CIs for the fitted line/curve and for future observations from the same population, I am unsure about these intervals for autocorrelated data – typically time series. I am NOT a time series expert! What I have read is that CIs need an adjustment if the data are autocorrelated, and I assume that this means that they will be wider, probably dependent on the degree and structure of the autocorrelation. I have absolutely no idea of how large this adjustment might be. I’ve read (and partially followed) some of the literature but have never found a numerical example of the principles that are so eruditely expounded. Thus I would like to ask whether you have a simple “recipe” that would provide a sort of safety net by introducing a multiplier (possible based on the degrees of freedom) that could be regarded as reasonable. The series I work with typically consist of 100 to 5000 observations, but are mainly in the region of several hundreds.
Any advice (or discussion) will be most welcome.

Regards, Robin (Bromsgrove)

23. Briggs says:

Arfur, Brian,

Well, like I said above, plenty of books on this subject. Short answer: if you have a time series, model it as one.

Brian H,

Typo! Should be “need.”

24. Annabelle says:

Mr Briggs, I would love it if you would comment on the current kerfuffle between Tamino and Bob Tisdale over their attempts to find a trend in ocean heat content.

25. Outlier says:

(1) is also a goal, but a shady one. If we want to know if there has been a change from the start to the end dates, all we have to do is look! Iâ€™m tempted to add a dozen more exclamation points to that sentence, it is that important. We do not have to model what we can see. No statistical test is needed to say whether the data has changed. We can just look.

As a generality: people are very good at seeing patterns, even ones which are not there.
Statistics has a legitimate use in saving us from ourselves in those cases. Most people (statisticians or not) should look at climate time series and say “I see lots of up and down noise”.

26. As part of my BSME, I had to attend some master’s presentations. We were supposed to ask questions. Participation you know.

One candidate got up and made her presentation on “Measuring using digital cameras”. Endless slides of different techniques to gauge a measurement. There was a digitally measured value she put up at the end. I thought I had the biggest soft ball question of them all, so I asked it.

“What was the measurement of the object using standard means?” (you know : A ruler, micrometer, calipers, some measuring device other than the digital camera. There has to be calibration somehow!)

She looked at me dumbfounded. “I don’t know” and acted like I was being some kind of smart ass.

27. Max Beran says:

A while back I looked at a comparable issue in terms of significance testing. The specific question asked was how easy is it for a trend to pass the significance test (reject null hypothesis of no trend) if that test were prompted by a record breaking event. As I recall there was a factor of about 4 – i.e. the trend that would be exceeded by 2.5% of unprompted random samples is exceeded by 10% of samples following a record breaker.

One might expect a similar lowering of the bar to apply in the case here which is of course the mirror image of the case I looked at – record breaking low value marking the start of the analysis period being analogous to record breaking high marking the end of the period analysed.

28. Max Beran says:

Just checked my old workings, and sorry, the ratio is nearer 3 than 4. I can provide the actual summarising diagram if anyone wishes.

More to the point of William Briggs’ article though, I cannot see why a conclusion should be that one should not perform linear trend tests on data. Surely the conclusion is that one should not apply significance criteria designed to deal with randomly drawn samples to samples that do not conform to that standard, such as, like here, where the selection of period is dictated by some feature of the data themselves. If the criteria can be objectively stated it will be possible to construct appropriate confidence interval curves to cover the particular non-randomness.

As to the distinctions between standard errors of (a) trend coefficient, (b) a point on the trend line used to determine the trend line, and (c) a new point, these are all well known in statistics and appear in any text on regression analysis. The “exploding” nature of the third of these would surely also cover most concerns about what can happen in the far future (or past) if the line is projected beyond the range of x values.

29. Dikran Marsupial says:

Great, if we can get the skeptics to understand why the “no significant warming since “, and quit cherry picking, it will be a victory for BOTH sides of the climate debate.

Generally the only time “warmists” talk about short term (e.g. decadal) trends in historical temperature series is when explaining why the “no significant warming since ” argument is bogus (e.g. the escalator http://www.skepticalscience.com/going-down-the-up-escalator-part-1.html ). Conclusions however are drawn from short term trends by the skeptics on a regular basis, and it is to their own detriment in the long run.

30. Dikran Marsupial says:

sorry, should have been:

Great, if we can get the skeptics to understand why the â€œno significant warming since [date]â€œ argument is bogus, and quit cherry picking, it will be a victory for BOTH sides of the climate debate.

Generally the only time â€œwarmistsâ€ talk about short term (e.g. decadal) trends in historical temperature series is when explaining why the â€œno significant warming since [date]â€ argument is bogus (e.g. the escalator http://www.skepticalscience.com/going-down-the-up-escalator-part-1.html ). Conclusions however are drawn from short term trends by the skeptics on a regular basis, and it is to their own detriment in the long run.

31. barry says:

A <10 year trend in surface temperature data? Well of course shifting the start (and end) points is going to make a difference to the slope. The problem is simple – the time period is way too short. Nothing more than that.

20 years is a good minimum. Anyone pitching <15 year trends for global surface temps is trying to sell something.

32. NJTom says:

@Dikran Marsupial I think you missed one of the main points of this exercise, which is that “the data speak for themselves”. Whether there has been “significant warming” over some time period is a matter of observation, not statistical analysis. If it’s not warmer now than it was in 2002, then there has been no significant warming since 2002.

33. Yuri Broze says:

“No statistical test is needed to say whether the data has changed. We can just look.”

That’s silly. Nobody cares whether the “data changed.” If they ask this question with these words, then it’s simply because they don’t have the more precise ones to communicate what they want to know. OBVIOUSLY, they’re interested in knowing if there’s statistically significant evidence to conclude the linear trend — the parameter estimate — is upward / positive. If they just want to know what the data are, then they’ll look themselves. But folks will consult with a statistician to interpret the data, which is exactly what a model does.

You obviously know your stats, I’m just encouraging you to interpret what people really want when they don’t happen to know the jargon.

“If you fit a line and then use the parameter estimates of that line as input into other analysis (as was done in our sample paper, referenced below), your results will be too certain.”

Are there really papers which plug an estimated model parameter into a larger model without carrying over its estimated variance? I admit I haven’t checked. If so — scary indeed.

34. The first one is drawn using the entire time series. The second one is drawn using the entire time series except for time point number 1. The third removes time points 1 and 2, and so on. There are 24 lines in total, showing anything from a large increase to a large decrease, and each drawn by choosing a new starting point.

Do you get it? This is the whole trick! Nobody ever asks why you chose a particular starting point.

Your “nobody” was quite a silly assumption that you ran with. I literally get asked that all the time in seasonal adjustment work I do.

Now, are you seriously saying people would present a trend analysis and intentionally leave out datapoints when estimating model parameters? If so, what is actual evidence of that happening? You are, as is more probable, just casting ‘creationsist-like’ doubt on models/data/science, because something like this could possibly happen (maybe because you don’t like/deny the implications of some science – Good! The cure is for you to do better science, not just rant about QRP boogeymen).

Any trickery like this would be quite easily spotted in a data/code audit. For papers I review, these things are required. This is also why no religious claims make it into science journals. ‘My god made the clouds’ cannot be investigated any further.

Justin