Because some readers may think the crusade against the wee p is mine alone, sinner that I am, or that I somehow represent an obscure shady suspect intellectually insufficient movement, I present to you quotations from three prominent individuals who recognize that time for p-ing is over.
I do not, of course, necessarily agree with the proposed solutions offered by these fine gentlemen. For that solution, which avoids all problems which can be avoided (many cannot), see Chapters 8-10 of Uncertainty: The Soul of Probability, Modeling & Statistics. See also the book page for details, and the on-going classes where examples are given. I recommend the predictive approach, which is pure probability from start to the finish where each user makes their own decisions.
Valentin Amrhein, Professor of Zoology, University of Basel, writing that Inferential Statistics is not Inferential:
Statistical significance and hypothesis testing are not really helpful when it comes to testing our hypotheses.
But I have increasingly come to believe that science was and is largely a story of success in spite of, and not because of, the use of this method. The method is called inferential statistics. Or more precisely, hypothesis testing.
The method I consider flawed and deleterious involves taking sample data, then applying some mathematical procedure, and taking the result of that procedure as showing whether or not a hypothesis about a larger population is correct…
In 2011, researchers at CERN worked on the so-called OPERA experiment and sent neutrinos through the Alps to be detected in central Italy. The neutrinos were found to be faster than light, even when the experiment was repeated. This was surprising, to say the least, and the p-value attached to the observation was smaller than the alpha level of p=0.0000003 that is required to announce a discovery in particle physics experiments involving collision data.
Although the researchers made clear that they were still searching for possible unknown systematic effects that might explain the finding, the news hit the media as: “Was Einstein wrong?”
A few months later, the researchers announced the explanation for the surprising measurements: a cable had not been fully screwed in during data collection.
Bad ps found in bad plumbing?
In my opinion, null hypothesis testing and p-values have done significant harm to science. The purpose of this note is to catalog the many problems caused by p-values. As readers post new problems in their comments, more will be incorporated into the list, so this is a work in progress.
The American Statistical Association has done a great service by issuing its Statement on Statistical Significance and P-values. Now it’s time to act. To create the needed motivation to change, we need to fully describe the depth of the problem….
A. Problems With Conditioning
p-values condition on what is unknown (the assertion of interest; [null hypothesis]) and do not condition on what is known (the data).
This conditioning does not respect the flow of time and information; p-values are backward probabilities.
I cut Harrell off at the p. He has many, many, many objections.
John P. A. Ioannidi, Physician, Stanford, The Proposal to Lower P Value Thresholds to .005:
P values and accompanying methods of statistical significance testing are creating challenges in biomedical science and other disciplines. The vast majority (96%) of articles that report P values in the abstract, full text, or both include some values of .05 or less. However, many of the claims that these reports highlight are likely false…
P values are misinterpreted, overtrusted, and misused. The language of the ASA statement enables the dissection of these 3 problems. Multiple misinterpretations of P values exist, but the most common one is that they represent the “probability that the studied hypothesis is true.”…Better-looking (smaller) P values alone do not guarantee full reporting and transparency. In fact, smaller P values may hint to selective reporting and nontransparency. The most common misuse of the P value is to make “scientific conclusions and business or policy decisions” based on “whether a P value passes a specific threshold” even though “a P value, or statistical significance, does not measure the size of an effect or the importance of a result,” and “by itself, a P value does not provide a good measure of evidence.”
It goes p-p-p-ing along like this for some length.
I have the solution (it’s not mine: it’s old). A glance is here in the JASA paper A Substitute for P-values, Uncertainty has all the proofs and philosophical arguments, and I’ll have more papers coming out soon with expansions and clarifications.