Editorial Note: I had this originally scheduled to go Monday, but due to Stream’s Creatorgate piece showing up yesterday, I delayed this by a day.
I think I only have the email announcing the statement which is incomplete. I got the thing before the embargo date, but you are seeing it after. Hat tip to Steve Malloy for alerting us to it. In either case, here we go, interspersed with my comments.
For the final cut, and the much awaited death of p-values, see my upcoming book Uncertainty. (Last week editor said it went into production, which means copy editing first, etc.)
“The p-value was never intended to be a substitute for scientific reasoning,” said Ron Wasserstein, the ASA’s executive director. “Well-reasoned statistical arguments contain much more than the value of a single number and whether that number exceeds an arbitrary threshold. The ASA statement is intended to steer research into a ‘post p<0.05 era.'”
Amen, brother Ron, amen. But wee p-values is in most places taken as magic. I mean that word in its literal sense. Studies with wee p-values are blessed, those without cursed. Studies which produce wee p-values are “significant”. And what is “significance”? Wee p-values. Hello, Mr Circular.
“Over time it appears the p-value has become a gatekeeper for whether work is publishable, at least in some fields,” said Jessica Utts, ASA president. “This apparent editorial bias leads to the ‘file-drawer effect,’ in which research with statistically significant outcomes are much more likely to get published, while other work that might well be just as important scientifically is never seen in print. It also leads to practices called by such names as ‘p-hacking’ and ‘data dredging’ that emphasize the search for small p-values over other statistical and scientific reasoning.”
Amen, sister Jessica, amen. Again, p-values are magic. I have seen grown men cry and grown women grunt when their study does not produce a wee p-value.
My comments for this next block are inside each bullet point [inside square brackets like this.]
The statement’s six principles, many of which address misconceptions and misuse of the p-value, are the following:
- P-values can indicate how incompatible the data are with a specified statistical model. [No, they can’t. They can only say what the probability of some statistic taking some value is conditional on accepting the model and on (usually) setting certain parameters of that model to fixed values.]
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. [The first half of the sentence is true, the second half is wrong. Nothing in the universe is “produced by random chance.” “Random chance” isn’t actual and cannot acutalize potentials, i.e. it can’t be a cause.]
- Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. [Not only that, they should not be based on p-values at all. Unless you’re a urologist, ignore all p-values.]
- Proper inference requires full reporting and transparency. [Amen. Which is why the Third Way I advocate is the only way to report uncertainty. I have the full theory in my book. I have an abstract here.]
- A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. [Exactly so. And what it does measure is of no interest to man or beast.]
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. [Indeed, by itself, it provides no measure of evidence regarding a model. It assumes not only that a model is true, but that some parameters of that model take fixed values.]
In light of misuses of and misconceptions concerning p-values, the statement notes that statisticians often supplement or even replace p-values with other approaches. These include methods “that emphasize estimation over testing such as confidence, credibility, or prediction intervals; Bayesian methods; alternative measures of evidence such as likelihood ratios or Bayes factors; and other approaches such as decision-theoretic modeling and false discovery rates.”
Likelihood ratios and Bayes factors make some of the same mistakes p-values do, and also should not be used. All these methods are parameter-centric and not observable-centric. Thus they all at a minimum produce over-certainty, and outright fallacy at a maximum. Fallacies are usually those that ascribe cause based on statistical measures. See that Third Way paper linked above, or see the paper “The Crisis Of Evidence: Why Probability And Statistics Cannot Discover Cause“.
Or see my book! (Maybe May, June?)