As reader Nate Winchester surmised, today, the biggest I Told You So Yet.
Listen, dear reader, when your Uncle Matt tells you there’s bad statistics, there’s bad statistics. And when I warned fMRI applied to personality and “free will” research was little better than electronic phrenology, I hope you were paying attention, because I told you so.
A whole pile of “this is how your brain looks like” MRI-based science has been invalidated because someone finally got around to checking the data.
The problem is simple: to get from a high-resolution magnetic resonance imaging scan of the brain to a scientific conclusion, the brain is divided into tiny “voxels”. Software, rather than humans, then scans the voxels looking for clusters.
When you see a claim that “scientists know when you’re about to move an arm: these images prove it”, they’re interpreting what they’re told by the statistical software.
Now, boffins from Sweden and the UK have cast doubt on the quality of the science, because of problems with the statistical software: it produces way too many false positives.
In this paper at PNAS, they write: “the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.”
The earthy title of the peer-reviewed paper is “Cluster F— —: Why fMRI inferences for spatial extent have inflated false-positive rates”.
No, it’s “Cluster failure“. But one is tempted…One is sorely tempted.
Authors are Eklund, Nichols, and Hans Knutsson. They open the abstract with, “Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated using real data.”
Surprisingly not been validated? Not surprising to us. We know how greedy “researchers” are for results, how willing they are to cut corners, how delighted they are to jump to theory, how apt they are to latch onto the bandwagon and not let go. Scientists, regular readers know, are people too.
Here’s a small sample of what I said in the past (for fuller results, use this search, which will also turn up this article).
Item: “Regular readers will know my opinion on fMRI research. Nothing but newfangled electronic phrenologic theory-discovering machines.” fMRI Discovers Freud, Distribution Plushies Lurking In Brain.
Item: “Can fMRI Predict Who Believes In God?” No. This series takes apart Sam Harris’s “celebrated” study.
Item: Nonpolitical Images Evoke Neural Predictors Of Political Ideology?
Is this a good point to remind us the fMRI data are not pictures of the brain but are themselves output of models and heuristics (“Functional data were first spike-corrected to reduce the impact of artifacts using AFNI’s 3dDespike”, etc., etc.) which themselves are subject to uncertainty which should be carried forward in any analysis but which usually aren’t, and weren’t here? If not, let me know when is.
That 3dDespike AFNI is one of the faulty statistical routines discovered in Cluster Failure.
Item: Brain-Scan Lie Detectors Don’t Work
Item: Our Brains Are Not Us: Review of Brainwashed. It’s more than just bad statistics.
The lights were from a functional magnetic imaging device, or fMRI, an instrument which Sally Satel (psychiatrist) and Scott Lilienfeld (psychologist) in their terrific Brainwashed: The Seductive Appeal of Mindless Neuroscience compare to an automated phrenological machine, a contrivance which when placed in proximity to the skull is purported to reveal all secrets, desires, motivations; even to expose lies and to prove that we are nothing but wet meat machines, mere automatons…
Studies also rely on those colorful brain scans which are not, as many think, “photographs of the brain in action in real time. Scientists can’t just look ‘in’ the brain and see what it does. Those beautiful color-dappled images are actual representations of particular areas in the brain that are working the hardest—as measured by oxygen consumption—when a subject performs a task such as reading a passage or reacting to stimuli” or when they go off script and wonder why they volunteered to be squeezed into a claustrophobia-inducing tube and told to lie as “still as a corpse” for over an hour.
This distinction is important because there is no (non-circular) way to check if a person is thinking what he is told, thus it’s only a possibility that the heavy oxygen-using regions are directed toward the specified experimental tasks. The best that can be said is the areas which glow brightly are correlated with the emotional states said to be under investigation—never minding that emotions are difficult to define, extraordinarily complex things. Is the “hate” center of the brain found in one experiment that same “hate” found in another experiment?
So why are fMRI statistical analyses so bad? What went wrong? The standard: wee p-values, hypothesis testing, etc., etc., etc. No hypothesis test (whether by wee p-value or Bayes factor) should ever be used again. I mean that “No” as in “No.” Eklund and pals also use hypothesis tests, in their demonstration that past research is too certain, which means even Eklund’s suggested corrective methods will produce results which are too certain. You have been warned. In any case, I’ll let Eklund have the last word.
It is not feasible to redo 40,000 fMRI studies, and lamentable archiving and data-sharing practices mean most could not be reanalyzed either…
Finally, we point out the key role that data sharing played in this work and its impact in the future. Although our massive empirical study depended on shared data, it is disappointing that almost none of the published studies have shared their data, neither the original data nor even the 3D statistical maps.
On the other hand, I’ll take the last word. I told you so!