Lewandowsky’s Confusion About Statistics

Still at conference, so just a short plug for learning about which you speak.

Stephan Lewandowsky, who believes JFK shot at the moon landings and that’s why the globe has passed the tippling point, or something like that, has said a few words about statistics:

However, our conclusion that the effect [in yet another silly study] is “real” and not due to chance is inevitably accompanied by some uncertainty.

Here is the rub: if the significance level is .05 (5%), then there is still a 1 in 20 chance that we erroneously concluded the effect was real even when it was due to chance—or put another way, out of 20 experiments, there may be 1 that reports an effect when in fact that effect does not exist. This possibility can never be ruled out (although the probability can be minimized by various means).

In his favor, a lot of people who publish too many papers aimed at audiences who are eager to nod their heads sagely at the foibles of their inferiors make the same errors Lewandowsky does. They are widely replicated errors. Which proves that replication in science can often reinforce distortions.

If the significance level is 0.05 it only means that if the p-value is less than or equal to that number, and that you are allowed to declare “success” for your experiment, no matter how silly it is (see the Statistics section on this page for some doozies). What is a p-value? Unfortunately, the definition of this destructive beast is very difficult to remember, so difficult that it is easier to remember what it isn’t.

The p-value is the probability of seeing a statistic as large (in absolute) value as the one you actually did see, given: (1) the values of certain parameters in a model you are using to quantify uncertainty in the numbers are set to a pre-specified number (usually 0), (2) the model itself is unambiguously true, (3) the experiment that generated the data is replicated indefinitely, and (4) the data at hand is measured without error (or if it is measured with error, this error is modeled).

Each word of this cumbrous definition counts, which is why it is so difficult to memorize and to use properly.

You are free to choose the model, the truth of which is usually unknown. For example, you are free to model your data using a hockey stick, even when that’s absurd. You will get a different p-value for every model. One model can give a non-publishable (i.e. significant) p-value, while a second model can give a publishable one. In statistics, there are many models one may choose in any situation. Their name is legion. Many scientists, psychologists in particular, tend to choose poorly.

Now, once you have the model in hand, you still have to pick a statistic. For any given model, there are many. Each statistic will give a different p-value. One statistic (inside a model) will give a non-publishable p-value, another statistic will give a publishable one.

On top of all this is the enormous latitude the scientist has to call the model/statistic pair he used to be relevant to the hypothesis he announces. It could be, and often is, this relationship is tenuous and that a direct reading of the model has little bearing on the “public” hypothesis. Almost always, the hypothesis about the real-life thing is confused and conflated with different hypotheses about the parameters of the model picked. This is not a small error: it is enormous and leads to wild over-certainty. Again, see that page for examples.

And then you are free to manipulate the data itself, tossing away “outliers”, usually defined as data that does not fit your preconceptions. You can do “sub group” analysis. You can say your hypothesis is true only for certain parts of your data. Oh my, it goes on and on.

So in addition to getting the technical definition wrong, Lewandowsky got the practical, boots-on-the-ground definition wrong. He would do well to read “Inappropriate Fiddling with Statistical Analyses to Obtain a Desirable P-value: Tests to Detect its Presence in Published Literature” by Gadbury and Allison for wisdom on this topic.

Conclusion: Especially in dicey areas, and psychology is certainly one of them, there is much more than a 1 in 20 chance that the finding does not confirm the stated hypothesis (about the real-life thing).

Epilogue: Lewandowsky advocates, as do we all, replication to smoke out queer p-values. As an example, Lewandowsky indicates the infamous climate hockey stick has been “replicated,” a sure view, he claims, that the p-values are leading us down a flowery path. Unfortunately, our man has forgotten to include the multiple studies that show the hockey stick is malarkey, as crazy Uncle Joe would say.

There is a psychological term for emphasizing only the evidence which supports your belief and ignoring everything else, but I’ve forgotten what it is.


Thanks to Dr K.A. Rodgers for alerting me to this topic.


  1. steve

    This may be one of the best posts I’ve seen in awhile – getting ready for my comps, so this was great…now, if I can only remember the salient tenets of Game Theory or who advanced the concept of emergent norm behavior!

    thanks and I enjoy your blog,

  2. Great thrashing of that pesky p-value*, but jeez, you didn’t even come up for air long enough to hit one of the biggies. Even if the model is appropriate and well-defined, only a carefully designed test will provide strong evidence that an effect is not just existent, but BIG ENOUGH to be of practical significance. This flaw alone provides hundreds of “significant” results that don’t mean diddly squat.

    On the other hand, p-values are a seductive Gateway Drug to the statistician’s version of the counter-culture: meta-analysis. Now we can pile study upon study, and launder the failed tests with the successful until we get a p-value less than 0.05, suitable for printing on some dingy gray paper. Embrace the absurdity, expand your consciousness. Hit me with another p-value, dude!

  3. TMI

    “There is a psychological term for emphasizing only the evidence which supports your belief and ignoring everything else…”

    Is it an M.A. in Sociology?

  4. DAV

    Now. Now.

    Conspiracy Theory is one of the Universals in the Active Warmista n. Lewandowsky has found this universal simplification fits his data well. Deniers are imperfect instantiations of conspirators. It must be real for it cannot just exist in his mind or in the collective minds of other Warmists.

  5. DAV

    Speaking of conspiracies: I see your enemies messed with my first sentence. Why n was substituted for world is beyond me.

  6. Ray

    That .05 p value is the philosophers stone of statistics. It turns dross into the gold of publishable materiel.

  7. Ye Olde Statistician

    Even on its own accounting, if twenty researchers conduct the same study and there is (really) no effect, one will think he saw it and publish. The other nineteen will remain silent and try something else.

    My old boss, Ed Schrock, used to talk about “Errors of the Third Kind,” in which one carefully collects and extensively analyzes the wrong data for the problem at hand.

  8. Matt


    Do those kind of errors come with large blinking lights and play music? 🙂

  9. JH

    The fact is that p-values won’t get your papers published in a (reputable) statistical journal that values statistical merits and original research.

  10. I suspect that statistics is “used” by many when they have a bunch of data that has “n” variable and a formula that has “n” parameters, so they will use that one. That is, after all, how science/maths is in effect, taught in schools.

    Like the assembly line worker who meticulously position and fasten the various components at hand, they don’t need to understand how/why the parts are installed and why in that way and order. Their reward is for activity. And the plant’s reward is with the product churned out at the end … if somebody buys it.

    Production continues as long as the formula/process can be exercised to the benefit of the participants. Nobody needs to understand how things work or their realistic benefit outside of their sphere of perception.

  11. Mike Ozanne

    One wonders how this is as difficult as it seems to be,most of this research is done by University fellows; A University typically has a Maths faculty that includes a proper statistician; who could be asked about the most appropriate models, methods and sampling methods. Or is it the case that if they asked they would be told that what they were doing was bullsh*t?

  12. Tom

    Mike Ozanne, let’s start with asking Briggs to offer statistical solutions instead of the recurrent rhetoric of pvalues.

  13. Speed

    If the significance level is 0.05 it only means that if the p-value is less than or equal to that number, and that you are allowed to declare “success” for your experiment, no matter how silly it is (see the Statistics section on this page for some doozies).

    This sentence is in the form of an “If … then … “ statement except for the “then.”

  14. Barry Woods

    Another mystery to solve, the Lewandowsky et al paper was announced and circulated to fellow psychologists, news wires, journals and the media (especially the Guardian / Telegraph) as pending publication, back in July.


    as there is no sign, now in October, does anybody know what is happening to this, as the press releases and media coverage went far and wide..

    Dr Adam Corner (a phsychologist, who needs to be a bit more sceptical? because of his own confirmation bias)
    car­rying a banner at Copenhagen with ‘Act Now’ on it, and writing at the time as a green party can­didate..

    Photo, and write up Green Party mag it came from (pg 5)-

    Adam in the Guardian (in July, having been sent it by Lewandowsky):

    “But new research to be published in a forthcoming issue of Psychological Science has found a link between the endorsement of conspiracy theories and the rejection of established facts about climate science.”

    Could someone (who is likely to get a response) ask the journal the status of this..

  15. Stephan Lewandowsky, “the gift that keeps on giving”

    One of the most successful promotions of 1926 was based on
    a picture of a sinister cluster of masked & gowned surgeons,
    looking down at something unspeakable, over the caption:
    “And it all began with harsh toilet tissue.”
    — Nature v 383, 17 Oct 96. p 589

  16. Alexandre

    »There is a psychological term for emphasizing only the evidence which supports your belief and ignoring everything else, but I’ve forgotten what it is.«

    It is called ‘confirmation bias’.

Leave a Reply

Your email address will not be published. Required fields are marked *