Or, rather, wee p-values are idolized. And it isn’t just me saying so. Reader Dan Hughes points us to Gerd Gigerenzer and Julian Marewski’s peer-reviewed paper “Surrogate Science: The Idol of a Universal Method for Scientific Inference” (pdf) in the Journal of Management.
The paper can be read by anybody (well, you get the idea), but here are the juicy quotes and my comments. It’s long, but boy oh boy is it fun!
Determining significance has become a surrogate for good research.
Amen! Preach it, brother. Sing it loud. Hallelujah.
One of us reviewed an article in which the number of subjects was reported as 57. The authors calculated that the 95% confidence interval was between 47.3 and 66.7 subjects. Every figure was scrutinized in the same way, resulting in three dozen statistical tests. The only numbers with no confidence intervals or p values attached were the page numbers.
That author was nuts and forgot statistics are never needed to tell us what happened. Even though, yes, this unnecessary duplication and absurd quantification are the lifeblood of frequentism.
…in physics, Newton’s theory of simple cause and effect was replaced by the probabilistic causes in statistical mechanics and, eventually, by quantum theory.
The consequence is cause has long been forgotten. Or, rather, cause is whatever the research says it is. Terrible harm has been done because of this.
To understand how deeply the inference revolution changed the social sciences, it is helpful to realize that routine statistical tests, such as calculations of p values or other inferential statistics, are not common in the natural sciences. Moreover, they have played no role in any major discoveries in the social sciences. [emphasis mine]
Nor in any other science. P-values only prove—or “prove”—(a) what is already known (proof), or (b) what is probably false (“proof”).
The Null Ritual
The null ritual is an invention of statistical textbook writers in the social sciences.
…spearheaded by humble nonstatisticians who composed statistical textbooks for education, psychology, and other fields and by the editors of journals who found in “significance” a simple, “objective” criterion for deciding whether or not to accept a manuscript.
Thus has laziness triumphed and become ingrained in science. Gigerenzer is right: statistics is pagan ritual. And now just as effective as offering sacrifices to a volcano.
Some of the most prominent psychologists of their time vehemently objected…the founder of modern psychophysics, complained about a “meaningless ordeal of pedantic computations.” …one of the architects of mathematical psychology, spoke of a “wrongheaded view about what constituted scientific progress,”…
Not that it mattered. The Wee P-value is triumphant.
Unlike many of his followers, Savage carefully limited Bayesian decision theory to “small worlds” in which all alternatives, consequences, and probabilities are known. And he warned that it would be “utterly ridiculous” to apply Bayesian theory outside a well-defined world—for him, “to plan a picnic” was already outside because the planners cannot know all consequences in advance (Savage, 1954/1972: 16)
Amen again! Decision analysis was pushed far, far past the breaking point years ago. The EPA, and pretty much every other agency that wants to “prove” pre-decided conclusions, never remember that (unknown probability) x (unknown costs) = (who the hell knows what’s best). Instead, scientism and false quantification run amok.
A second version of Automatic Bayes can be found in the heuristics-and-biases research program—a program that is widely taught in business education courses. One of its conclusions is that the mind “is not Bayesian at all” (Kahneman & Tversky, 1972: 450). Instead, people are said to ignore base rates, which is called the base rate fallacy and attributed to cognitive limitations. According to these authors, all one has to do to find the correct answer to a textbook problem is to insert the numbers in the problem into Bayes’ rule—the content of the problem and content-related assumptions are immaterial. The consequence is a “schizophrenic” split between two standards of rationality: If experimental participants failed to use Bayes’ rule to make an inference from a sample, this was considered irrational. But when the researchers themselves made an inference about whether their participants were Bayesians, they did not use Bayes’ rule either. Instead, they went through the null ritual, relying only on the p value. In doing so, they themselves committed the base rate fallacy.
…an automatic use of Bayes’ rule is a dangerously beautiful idol. But even for a devoted Bayesian, it is not a reality: Like frequentism, Bayesianism does not exist in the singular.
This isn’t so. But Gig and pal think, what is natural, that Bayes means subjective probability. Logical probability does not suffer from singularity. And any statistical method which is part of the Cult of the Parameter must eventually fall to ritual.
We use the term surrogate science in a more general sense, indicating the attempt to infer the quality of research using a single number or benchmark. The introduction of surrogates shifts researchers’ goal away from doing innovative science and redirects their efforts toward meeting the surrogate goal.
Laziness again. It’s everywhere—and government sponsored.
SPSS and other user-friendly software packages that automatically run tests facilitate this form of scientific misconduct: A hypothesis should not be tested with the same data from which it was derived…
A similarly bad practice, common in management, education, and sociology, is to routinely fit regressions and other statistical models to data, report R2 and significance, and stop there
The first point should be shouted at every PhD defense. It is the key—really the only—difference between good and bad science. It is a point so important that you should read it twice. Don’t forgot to visit the Classic Posts page to see the common abuses about regression.
Surrogate science does not end with statistical tests. Research assessment exercises tend to create surrogates as well. Citation counts, impact factors, and h-indices are also “inferential statistics” that administrators and search committees may (ab)use to infer the quality of research. …hiring committees and advisory boards study these surrogate numbers rather than the papers written by job candidates and faculty members.
Did somebody say laziness and pseudo-quantification again? Yes: somebody did.
An even greater danger is that surrogates transform science by warping researchers’ goals. If a university demands publication of X journal articles for promotion, this number provides an incentive for researchers to dissect a coherent paper into small pieces for several journals. These pieces are aptly called just publishable units. Peter Higgs, the 2013 Nobel Prize winner in physics, once said in an interview, “Today I wouldn’t get an academic job. It’s as simple as that. I don’t think I would be regarded as productive enough” (Aitkenhead, 2013). He added that because he was not churning out papers as expected at Edinburgh University, he had become “an embarrassment to the department when they did research assessment exercises” (Aitkenhead, 2013).
Did somebody say laziness and pseudo-quantification again, even though he just said it? Yes.
Update I forgot to include the popular press article which highlighted the paper: Science is heroic, with a tragic (statistical) flaw: Mindless use of statistical testing erodes confidence in research.
Update I also forgot to give you the current status on my book, which talks about all these kinds of things and gives a solution. It’s thisclose to being done.