Statistics

Heartland Climate Conference: Day 1. Everybody Gets Time Series Wrong

3404009220_6d23518d9c_z

As predicted, there were cigars. But I missed them. The smoking party left without me while I listed to Ireland’s version of the Mr Wizard brothers. A science experiment involving a syringe, 100 meters of day-glo green tubing, a reservoir of water, and a rush of explanation. And something to do with oxygen dimers. And multimers.

So no cigars, but whisky (not whiskey) there was. And plenty of good food. After an excellent meal, with plenty of sauce, everybody was juiced and raring to go.

The only negative is the Washington Court Hotel. The wall “art” is festooned about, like troops ready to attack. It’s ugly like most “art” is these days. But it’s worse. It’s aggressively ugly. This “art” hates you. I can only imagine it won some sort of award.

Anyway, while trying to explain that no statistical model or test is necessary when looking at time series to discover whether there was a “trend”, I hit upon the following simplification.

I don’t have the facilities, so draw for yourself a standard x-y plot, with x the time and y the measure of interest, say temperature. At some early time point, place a dot for the first “temperature”. And at some later point, place a second dot higher than the first.

Now I ask you: was there a trend in the data?

This question causes distress in anybody who has had classical statistic training. They want to answer, but feel—and I do mean feel, not think—they cannot. The objections will be “There’s not enough data to tell” or “I can’t fit a model to that” and the like.

It is very difficult, almost impossible, for people with training in classical statistics to look at data without reflexively wondering what model “best explains” the data. This is why classical statistics, especially hypothesis testing, has to go. Put it in the same place as the Hotel’s “art.”

Firstly, the probability models in the classical quiver do not say word one about what caused the data. If we knew what caused the data, we would not need probability models. We would just point to the cause! Probability models are used in the absence of knowledge of cause. And they should never be used to say what happened.

Let me repeat that, and let me shout it: probability models should never be used to say what happened. We can simply look at the data and it can tell us what happened.

So why does everybody think “fitting” a, say, straight line to time series data think that straight line explains the data? Well, that’s what they’re taught. Sort of. The concept of causality is vague in probability and statistics. So vague that people are allowed to take away from any analysis whatever they want.

This is why hypothesis testing is so toxic. Once a wee p-value is spotted, “randomness” or “chance” are rejected as causes and whatever other idea the researcher had in mind is said to be the cause. This is wrong in every possible way. Randomness and chance are never causes, and to assume a cause is not a proof this was the sole correct cause.

Secondly, the answer to our question is: there is no way to tell because there is no definition of trend.

We’ve talked about this many times. Trend is analogical. My idea of the word might not match yours. Thus in order to say whether there is a “trend”, we need a definition. If that is “any increase” then, yes, unambiguously, there was a trend. If the definition is “at least three increased in a row”, then there was no trend.

If the definition was “the slope of a regression line fit to the data is greater than 0” then there was no trend. Frequentist statistics in particular often fails when the data are not plentiful.

That’s it. All typos today free on charge. When this thing publishes, I’ll already been in today’s meetings. I speak tomorrow morning. They whole thing will be streamed: click here.

Categories: Statistics

17 replies »

  1. Another scary study announced:http://www.livescience.com/51159-birth-month-disease-data-correlation.html
    Some of the stories actually noted correlation, not causation, but your average reader won’t notice the difference.

    As for time lines, what if we used those very useless but always dragged out charts for child height and weight as an example of lack of predicition? As far as I can tell, they are designed to terrify parents that their child will be starting in a little person show or playing professional basketball, modeling or starring on the biggest loser. Draw a trend line between any two or three points. How often are the charts right? I doubt it’s very often since they are based on percentiles for each age. There are also formulas for calculating this, none of which actually came true in my family. Yet we keep right on measuring and weighing and comparing as if there was any possible predicition available. (Note: My parents were 5′ 3″ and 6′ 2″. Kids were 5′ 4″, 5’8″, 6′ and 6′. So much for prediction.)

    Many times I think consulting a psychic has to be at least as accurate as so-called predictive models and trend lines.

  2. Approximately 75% of children conceived in months lacking an “r” are born in a month with an “r”.

  3. All typos today free on charge.

    Tell me you didn’t do this on purpose.

    One quibble with the minimalist two point trend determination is the accusation of cherry-picking. Yes, a regression line through a series of points has chosen start and end points too, but the line represents a synthesis of more information than just the ends of the series. Yes, we can see all the points, but our minds are less able to integrate them as uniformly as the regression equation does. And a final yes, we always need to keep in mind the limitations and assumptions of the synthesizing method so we don’t extrapolate beyond the data.

  4. Gary (not in Erko): Agreed that we see the information provided by the regression line, which we probably would not see if the line were not there. However, what other trends do we miss by having our attention directed to a regression line?

  5. Sheri,
    Hidden in plain sight — the Purloined Letter effect. We must work hard to overcome our biases, especially the ones imposed on us unaware.

  6. This is why I hate graphs where people “connect the dots”. As if there are numbers “in between” the observed values. It implies more information than there really is, and also implies some kind of process that slowly increases the value from Time X to Time X+1.

  7. Nate,
    There usually is a process occurring in between the dots in a time series (quantum physics excepted I suppose) so the lines do imply a caused change. Temperature charts are a good example regardless of the time scale. But you’re right that the lines imply more than they ought, or rather people assume more than they ought from the lines.

    The argument can be made that graphs are metaphorical representations of reality and some poetic license is allowed as long as everybody tacitly acknowledges it. Alas, they don’t; and so Briggs has fodder for a completely justified rant.

  8. @Gary,

    I guess I see it more often in business visualizations. We’ll have a graph of “sales by month”, and the lines will be nice and straight from October to November. Never mind that the actual sales occurred mostly on “Black Friday” at the end of November. Which is why I try to encourage people to use data-dense visualizations. Alas, executives are simple people who don’t want to be “bogged down in the details.” They just want to hear a good story.

  9. It seems things are worse than we thought (for the state of science). It seems that chocolate, coffee and worst of all, beer, are now on the climate change endangerment list:

    http://www.theguardian.com/environment/2015/jun/09/no-beer-chocolate-coffee-how-climate-change-ruin-your-weekend

    What greater appeal to authority can there be, than an appeal from 42 breweries that climate change threatens the brown bubbly? This stuff strikes at the very core of my being…

  10. Bulldust: One such brewery claimed to be using 100% renewable energy. A complete and utter lie–they paid for the electricity from a wind plant but used grid energy. They were no more using renewables 100% than putting solar panels on the White House helps global warming be averted. If they lied about the energy, who know what went into that beer. Just saying.

  11. I was berated HEAVILY for proposing the same thing about a decade ago. With a background in the International Steel Statistics Bureau, doing Statistical Process Control for Ford, Jaguar, Rolls-Royce, DuPont, spending time in the Australian Bureau of Statistics AND an ongoing interest in Statistics in Life sciences – Epidemiology, I thought I had a decent grasp of data.

    I could not for the life of me see a “steep upward trend” in Australia’s rural temperature records. In frustration I asked the “experts” – “Just LOOK at THE DATA!”

    It seems one is NEVER to use this method in the getting and cleaning of data.

    There IS an upward trend now however, as the temperature records (observations) have been replace by a more fitting (sic) scientificalish method known as “making stuff up”.

    Regards,
    Addinall.

Leave a Reply

Your email address will not be published. Required fields are marked *