Regular readers will know the arguments and evidence proving P-values should no longer be used for any reason.
If you still doubt, which I doubt, read this paper. Every decision of interest made with a P-value is the result of a fallacy or a mistake.
There is a move afoot to end the tyranny of statistical “significance” and hypothesis testing. It’s not only Yours Truly shouting in the darkness, but a large and growing movement. P-eep at this:
Please retweet: Already 420 people have signed the forthcoming Nature comment 'Retire statistical significance'! The new deadline to send an E-mail to firstname.lastname@example.org and add your name to the list is Friday 8 March. Thanks a lot!
— Valentin Amrhein (@vamrhein) March 7, 2019
It is obviously time to move past P-values. Obvious to statisticians, that is. Not necessarily to non-statisticians:
JAMA rejected this letter from my colleagues & me ("low priority"), so we're publishing on twitter, hoping JAMA will take it more seriously. pic.twitter.com/bh7byo4CrR
— Ken Rothman (@ken_rothman) May 9, 2017
Sad story, no? Though one more common than bedbugs. The P-value was not wee, and so there was weeping and the gnashing of teeth. Worse, there was the false conclusion that “no difference” was found because the P wasn’t wee.
This kind of thing happens so often there ought to be a name for it. P-envy, perhaps. Or Wee-P envy.
All statisticians know that disappointment among clients is palpable when the Ps aren’t wee. We (some of us, anyway) try to explain the weaknesses of P-values, about how “significance” has no bearing in real life, how P-values mislead, how useless they are, how there are better methods. Some of us provide the better methods—the predictive methods. And these often show interesting things, even (or even often) with non-wee Ps.
But no good. If the P isn’t wee, we are asked often to recut, reanalyze, redo the analysis to find the hidden wee P that is surely there. Of course, this does work. Wee Ps lurk everywhere, and all it takes is time to find them. Judging by all the weak papers we see flooding the scienceosphere, they are found.
P-values work like magic. People—not you, dear readers, no, not you—really do think that when a wee P has been found, a real cause has been discovered, a real theory has been proved. Arguments proving these ideas false are not heard, not heeded.
That deafness will be the problem.
As the tweet above says, there will soon be news of several hundred statisticians and scientists moving in a very public way against P-values. The computer science crowd has already beat us, it’s true; but better now than never. Things cannot go on like in the past. Hence the call to make the change.
What will be the result? The Revenge of Inertia.
The biggest impediment will not be other statisticians, who in reasonable order will follow the proofs. It will be users of statistics. It will be editors and referees of medical and sociology journals and the like. All academic statisticians have experienced being lectured by non-statistician journal referees on things like “You really should use this test and not that; they give better P-values.”
The fault, as have I said over and again, is ours. We taught people to use hypothesis testing. We winked at the caveats and weaknesses, ignored the philosophy, and then plunged ahead to the more fun math. Math is easy to test, and math is true. That doesn’t mean, of course, that the math is applicable to real questions, but never mind. The math is what counted (I hope this hilarious pun does not go unnoticed).
P-values, since they offer an easy way out of the heavy labor of hard thought, won’t be given up without a fight. Users will say, and say truthfully, “P-values have worked for us!” They do work, all right. They make decisions for users, decisions which are made far too easily. Yet it’s also so that sometimes P-values make the right decisions.
I give a sketch in the paper (linked above) why this is so, but it’s usually when smart people set up good experiments based on plenty of foreknowledge. In these cases, the P-value can be a sort of proxy for the predictive probability. This is not a good reason to use P-values, though, when you can get the more productive and non-fallacious predictive probabilities instead.
No more P-values, please.