Nature had Five ways to fix statistics, which asked a few folks, some of whom are our friends, how to shore up the sandy soil of statistics. Thanks to Al Perrella for the tip.
The left out the way, which puts all the other ways, as fine as they are, in second tier of patches. Here is the way. Ready? (All of this is written of in more detail in Uncertainty.)
Eschew parameter-based methods
Eschew parameters! As we saw last week, they don’t mean what you think, they don’t exist, they can’t show cause or “links”. They are mere crutches, mathematical short-hand on the path to prediction.
Here’s what everybody wants to know:
(1) Pr(Y | X, old data, E),
where Y is some proposition of interest, the old data is old data, and X are guesses of new correlates (old X are in old data), and E a joint proposition of all the other assumptions, beliefs, measurements that gave the model and so on. (Models are prior information!)
What don’t you see? Parameters.
No civilian cares about parameters. Yet all classical statistical procedures make Parameter king.
Hypothesis tests? Statements about parameters. Bayes factors? Statements about parameters. Posteriors? Statements about parameters. Point estimates? Statements about parameters. Confidence intervals? Statements about parameters. Asymptotics? Statements about parameters. Inference? Statements about parameters. Paradoxes? Statements about parameters on the continuum.
The theme, in case you missed it, was statements about parameters. Statistical practices that give (1) are in proportion as p-values are wee. If you are wondering how to do (1), follow our Free Data Science Class.
The real sin of parameters is that whatever certainty we have in them does not translate into certainty in Y or in veracity of the model. Relying on parameters almost always commits the Deadly Sin of Reification.
Note: in some fields a parameter is a physical constant, such as the speed of light. The value of these can be predicted; they are not the same as model prameters.
I can hear the complaints now. “I’ve been using this screwdriver to pound in nails for years.” “There are good ways to use screwdrivers to pound nails if you are careful.” “Look at all that screwdriver math and code. We’ve gone too far to develop so-called hammer math and code.”
And statisticians wonder why computer scientists are kicking their kiesters.
Now the other fixes.
JEFF LEEK: Adjust for human cognition
In the past couple of decades, many fields have shifted from data sets with a dozen measurements to data sets with millions. Methods that were developed for a world with sparse and hard-to-collect information have been jury-rigged to handle bigger, more-diverse and more-complex data sets. No wonder the literature is now full of papers that use outdated statistics, misapply statistical tests and misinterpret results. The application of P values to determine whether an analysis is interesting is just one of the most visible of many shortcomings.
The ability to make accurate, verifiable-by-anyone predictions is the best indicator of interestingness.
BLAKELEY B. MCSHANE & ANDREW GELMAN: Abandon statistical significance
Worse, NHST is often taken to mean that any data can be used to decide between two inverse claims: either ‘an effect’ that posits a relationship between, say, a treatment and an outcome (typically the favoured hypothesis) or ‘no effect’ (defined as the null hypothesis).
In practice, this often amounts to uncertainty laundering. Any study, no matter how poorly designed and conducted, can lead to statistical significance and thus a declaration of truth or falsity. NHST was supposed to protect researchers from over-interpreting noisy data. Now it has the opposite effect.
To which we can only say Amen.
DAVID COLQUHOUN: State false-positive risk, too
To demote P values to their rightful place, researchers need better ways to interpret them. What matters is the probability that a result that has been labelled as ‘statistically significant’ turns out to be a false positive. This false-positive risk (FPR) is always bigger than the P value.
Yet the rightful place of p-values is in the museum next to lobotomies, phlogiston, and empiricism.
MICHÉLE B. NUIJTEN: Share analysis plans and results
…Even a seemingly simple research question (does drug A work better than drug B?) can lead to a surfeit of different analyses…
The next step is to share all data and results of all analyses as well as any relevant syntax or code. That way, people can judge for themselves if they agree with the analytical choices, identify innocent mistakes and try other routes.
Can I get an Amen?
STEVEN N. GOODMAN: Change norms from within
The time is ripe for reform. The ‘reproducibility crisis’ has shown the cost of inattention to proper design and analysis. Many young scientists today are demanding change; field leaders must champion efforts to properly train the next generation and re-train the existing one. Statisticians have an important, but secondary role. Norms of practice must be changed from within.
God bless Goodman, but tain’t gonna happen. Science has grown too big and none have the courage to do the necessarily culling. Even moving to predictive methods is only a tweak.
Every single one of these posts drives me closer to buying a paycheck-busting copy of Uncertainty. Roughly speaking, your discussions of uncertainty, referencing uncertainty, are driving up the odds and down the uncertainty of my planned acquisition of Uncertainty.
(No, I have not been drinking, this is just what I’m like in the morning after too much coffee…)
Thanks for this review and quite glad to see the amen’ing and Hail Mary’ing of Gelman’s position. FWIW, this particular member of the Hoi Polloi agrees emphatically.
(By the way Harry Crane, a statistician up at Rutgers, is writing some interesting plain-spoken stuff against Benjamin et al. and their last-gasp recent suggestion to reduce p < .005 as a way to solve the problem.)
What would you call the estimands in sample surveys and process control, where the estimand is quite observable, but much more expensive to observe? If you are estimating yields (bushels/acre, or i.u.s per lot) you do get the observable.
A slope or a variance is just a (weighted) average of u-statistics and is quite observable in some circumstances, too. (e.g. putting bounds on the amount of chemical or drug in a vial or pill). If I am using a reagent I am quite interested in the activity I can expect and the difference between two concentrations.
I will admit to being a mere layman with broad interests. It flabbergasts me to hear that statistics isn’t being done in this simple, straight forward way. No wonder Science! has been failing us so spectacularly recently. There’s no there there, so to speak.