Daryl Bem, Cornell professor of psychology, once again believes he has proven the validity of ESP (extra-sensory perception). Bem is a long-time researcher of the paranormal who once found notoriety by gluing ping-pong ball halves to people’s eyes.
Yes, and on these ping-pong balls, he shined a red light. And in the ears of those so afflicted, he piped a gentle hiss. This was to create the ganzfeld, or “total field”, a state sort of like sensory deprivation in which a person would be maximally open to psychic vibrations (it’s always vibrations).
Bem used statistics, p-values in particular, to prove that the ganzfeld worked. Trouble arose when other workers tried to replicate Bem’s success. None could, and the ganzfeld was abandoned. (I write more about this in my So You Think You’re Psychic?)
Now Bem is back with with a new—peer-reviewed—paper in The Journal of Personality and Social Psychology. Once more Bem is able to display wee p-values in support of his theory that people can see, as through a glass darkly, a short distance into the future. (We can discuss the specifics of this kind of ESP at another date.)
It is true Bem’s p-values are all publishable, but it is not true that p-values are what he thinks they are. But Bem only makes the same mistakes that plague those who rely on frequentist statistical methods. These misperceptions are so rife that even the New York Times has noticed them, using Bem’s paper to discuss “one of the longest-running debates in science.” (Thanks to Bob Ludwick for the link.)
The paper quotes Jim Berger (Duke, one of the men responsible for the Bayesian revolution) as saying, “I was on a mini-crusade about this 20 years ago and realized that I could devote my entire life to it and never make a dent in the problem.” I know exactly how he feels. In my own class, I teach Bayesian and frequentist methods, emphasizing Bayesian. By the time we arrive at frequentist methods, students are already skeptical of frequentism because of the hints I have given about it.
After I lay the theory out, and lay it out fairly, students become especially wary after they learn the precise definitions of p-values, confidence intervals, and so forth. But then comes the show-and-tell portion of the class, where we touch actual data. Before I let them have at it, I give them this favorite speech:
Even though you know the proper definition of a p-value—how it tells you nothing about what you really want to know, how it is silent on whether your hypothesis is true or false, how it is mute on whether your model is appropriate or not—you will not be able to resist it. When you see a publishable (less than the magic number of 0.05) p-value you too, like everybody else, will not be able to help yourselves. You will believe you have proved your theory. You will be unable to ignore the call of the p-value.
Right after this, we launch into regression examples in which the software spits out tables of, inter alia, p-values. By the second or third iteration, students are already pointing at their screens saying, “Why can’t I keep this variable? The p-value is low.” And when they are describe their data they readily slip into the same kind of inappropriate causative language Bem does.
Solution? Eliminate teaching of frequentist statistics to all but specialists, mathematical Masters and PhDs and so forth. Do not expose undergraduates in any field, and graduates in non-mathematical fields, to the ideas of p-values, confidence intervals, or hypothesis testing.
These tools have been in the hands of scientists for nearly a century, and all experience has shown that they are subject to regular, even ritualized abuse. It is far too easy to “prove” what isn’t so using frequentist methods; at the least, the answers this form of statistics gives are not in response to the questions asked by researchers.
As the Times says (and correctly), “a team of statisticians led by Leonard Savage at the University of Michigan showed that the classical approach could overstate the significance of the finding by a factor of 10 or more.”
Switching to Bayesian statistics as the standard won’t eliminate biases and mistakes—no statistical procedure can—but it will reduce a vast amount of over-certainty.