A new paper has been submitted to a well known journal, “Manipulating the Alpha Level Cannot Cure Significance Testing: Comments on ‘Redefine Statistical Significance'”, by David Trafimow, Valentin Amrhein, Fernando Marmolejo-Ramos, and — count ’em! — fifty-one others, of which is included Yours Truly. The other authors kindly and graciously allowed me to add my Amen, for which I am most grateful.
The “comments” refer to the paper by DJ Benjamin, Jim Berger, and a slew of others, “Redefine statistical significance” in Nature Human Behavior 1, 0189. Our submission is to the same journal, obviously as rebuttal.
We looked at Benjamin before, in the post Eliminate The P-Value (and Bayes Factor) Altogether & Replace It With This. The replacement is predictive modeling, which I wrote about extensively in Uncertainty and briefly in the JASA paper The Substitute for P-Values.
From the new paper, the One sentence summary: “We argue that depending on p-values to reject null hypotheses, including a recent call for changing the canonical alpha level for statistical significance from .05 to .005, is deleterious for the finding of new discoveries and the progress of cumulative science.”
You may download the entire paper as a PDF preprint at Peer J Preprints.
Here (not set in blockquote to avoid the italics) is the entire Conclusion. Help spread the word! It’s time to kill off p-values and “null hypothesis” significance testing once and for all — and restore a great portion of Uncertainty that has falsely been killed off. (Yes, Uncertainty.)
It seems appropriate to conclude with the basic issue that has been with us from the beginning. Should p-values and p-value thresholds be used as the main criterion for making publication decisions? The mere fact that researchers are concerned with replication, however it is conceptualized, indicates an appreciation that single studies are rarely definitive and rarely justify a final decision. Thus, p-value criteria may not be very sensible. A counterargument might be that researchers often make decisions about what to believe, and using p-value criteria formalize what otherwise would be an informal process. But this counterargument is too simplistic. When evaluating the strength of the evidence, sophisticated researchers consider, in an admittedly subjective way, theoretical considerations such as scope, explanatory breadth, and predictive power; the worth of the auxiliary assumptions connecting nonobservational terms in theories to observational terms in empirical hypotheses; the strength of the experimental design; or implications for applications. To boil all this down to a binary decision based on a p-value threshold of .05, .01, .005, or anything else, is not acceptable.
UPDATE Peerj says “This manuscript has been submitted and is being checked by PeerJ staff.” I thought it would have already cleared by now. It hasn’t, so the link above won’t yet work, as John discovered. Once the paper clears, I’ll update again. Sorry for the confusion.
UPDATE Difficulty is that Peer J says all authors have to confirm authorship, which means 54 people have to sign up for an account, etc. etc. Stay tuned.
UPDATE Paper is finally live! Follow this link!