Death Blow To Statistical Significance! — Bonus: Here’s The Replacement

Death Blow To Statistical Significance! — Bonus: Here’s The Replacement

Don’t miss the Nature article below! (I stole the graphic from it.)

Am Stat

Here are the opening lines from a press release (unfortunately sounding like every other press release in existence) from American Statistician.

Calling time on ‘statistical significance’ in science research

Scientists should stop using the term ‘statistically significant’ in their research, urges this editorial in a special issue of The American Statistician published today.

The issue, Statistical Inference in the 21st Century: A World Beyond P<0.05, calls for an end to the practice of using a probability value (p-value) of less than 0.05 as strong evidence against a null hypothesis or a value greater than 0.05 as strong evidence favoring a null hypothesis. Instead, p-values should be reported as continuous quantities and described in language stating what the value means in the scientific context.

Containing 43 papers by statisticians from around the world, the special issue is expected to lead to a major rethinking of statistical inference by initiating a process that ultimately moves statistical science – and science itself – into a new age.

My paper is not in there. I didn’t hear about the special issue until it was too late. Do not despair!, for it is here:

     Everything Wrong With P-values Under One Roof

Now if only I could get people to read it! Especially those who say there are good uses for p-values. I say there are not. I saw that every use to which they are put is fallacious. I prove this. I use the word prove in its usual sense. As in prove. Read it.

The ASA, being a bureaucracy, does not go far enough and call for a ban. I do. Let’s push on to new discussions of what to do instead. Here is the link (thanks for the reminder Dan Hughes!) to the AS special issue. All papers are open access.

     The Replacement For Hypothesis Testing

The second paper is only a summary for the material in material that I dearly wish I could get statisticians to read.

What’s more important than hypothesis testing? Understanding cause.

Now I have an invited paper coming out very soon (today maybe?), and I’ll link to when it’s up. Reality-Based Probability & Statistics. Meanwhile peruse these posts.

There is even a complete, on-line class, with free code! (I will probably do something more with this: stay tuned.)

In short, we have to join the computer scientists who have abandoned significance and think they have cause figured out. Well, they do, partially. But they’re computer scientists so, as is not infrequent in this fine body of men, they’re over-promising. I show some of the ways in the new paper.

Back to the (surely) AI-written press release:

[Executive Director of ASA Ron Wasserstein said] “No p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical non-significance lead to the association or effect being improbable, absent, false, or unimportant.”

Just so. And Amen.


Even better, here from friends of ours (Valentin Amrhein, Sander Greenland & Blake McShane) is a note in Nature: “Scientists rise up against statistical significance“.

When was the last time you heard a seminar speaker claim there was ‘no difference’ between two groups because the difference was ‘statistically non-significant’?

If your experience matches ours, there’s a good chance that this happened at the last talk you attended. We hope that at least someone in the audience was perplexed if, as frequently happens, a plot or table showed that there actually was a difference.

How do statistics so often lead scientists to deny differences that those not educated in statistics can plainly see? For several generations, researchers have been warned that a statistically non-significant result does not ‘prove’ the null hypothesis (the hypothesis that there is no difference between groups or no effect of a treatment on some measured outcome). Nor do statistically significant results ‘prove’ some other hypothesis. Such misconceptions have famously warped the literature with overstated claims and, less famously, led to claims of conflicts between studies where none exists.

And a big AMEN to this:

Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions.

“We…call for the entire concept of statistical significance to be abandoned.”

YES YES YES YES YES and (can you guess) YES!

Now these fine gentlemen, like the AMS, do not call for a ban. I do. A complete ban. Read the paper. Do not skim it. Read it.

There you have, friends. We were not alone lo these many years. We had allies. And we have finally reached sufficient strength to cry out and declare war.


  1. You have not yet won the war, but a win in a skirmish brings hope to those fighting on your side. As does finding unexpected allies.

  2. Gary

    It’s not too early to be planning for a task more difficult than convincing statisticians. How will you convince SPSS and SAS to adopt the idea? Their clients are less open to the arguments.

  3. Anon

    Very good news, but there has to be a plethora of material developed for stats departments to use in the meantime. As you may be aware, stats in many colleges/unis draw heavily on online material supplmented by lectures or they give the instructors the “book” and the material to use in class–and any deviation or creativity is strongly discouraged. There needs to be material that lazy stats departments can just slip into place. If there is no material espousing an alternate, or 3rd way, then there will be no fundamental change at the undergraduate level. It is one thing to tell people to “stop” doing something but there needs to be a readily available alternative.

  4. DAV

    Let’s see.

    We had a recent posting about killing without detection and now we have one with statistically significant death threats.

    Place is getting a bit morbid.

  5. bill_R

    key lines from the nature article:
    “We are not calling for a ban on P values. Nor are we saying they cannot be used as a decision criterion in certain specialized applications (such as determining whether a manufacturing process meets some quality-control standard). And we are also not advocating for an anything-goes situation, in which weak evidence suddenly becomes credible. Rather, and in line with many others over the decades, we are calling for a stop to the use of P values in the conventional, dichotomous way — to decide whether a result refutes or supports a scientific hypothesis.”

    For those of us in the real world, applying statistics as part of a process when there is real money on the line (say $10Bn annually) , the “statistical significance” of a single test on one of the multitude of attributes being examined just means “Well that might be worth following up on.Any supporting data?” It’s replication, replication, replication, across multiple populations.

  6. Not being a statistician, I am struggling somewhat with the deeper implications but I think I grasp the boundaries of understanding: p-value is a logical fallacy used to reinforce an argument without substance; it has the mark of finality without being, actually, final. I am interested to read your papers because I enjoy logic, and the existence of a proof surely implies, beyond the math, there is a logical sequence. Given [parameters], because [premises], therefore [conclusion].

    My mind immediately leapt to the idea of ‘quantum mechanics’ which I consider somewhat bunkum but haven’t been able to adequately express due to my limited scientific aptitude. My laymans understanding is that it relies heavily on probabilities to impute some activity going on below observational levels. Because probabilities imply some non-zero suggestion that they are wrong, probabilities are a weak argument being treated like a strong one.

    Am I close to the mark, at all? I concede lack of expertise on all levels so correction is welcome!

Leave a Reply

Your email address will not be published. Required fields are marked *