Read Part I, Part II, Part III, Part IV, Part V, Part VI, Part VII
The description on the use of the fMRI (how it works, voltages, etc.) appears, to my untutored eyes, boilerplate. I’ll take it as read. But, as you might not know, the raw numbers which arise from these machines are not used: they are first massaged, kneaded, perhaps even molded. Then they are used. Here’s what Harris did:
We performed standard preprocessing—slice timing correction, motion correction, brain extraction, spatial smoothing (using a 5 mm kernel), high-pass filtering, and pre-whitening—prior to contrast modeling. Individual responses were analyzed in an event-related manner. We modeled four types of trials with separate regressors: nonreligious true, nonreligious false, religious true, and religious false. Since response time varied among conditions, we also included in our model an additional regressor to account for the effects of response time. This regressor had a height equal to the response time for each trial, and was orthogonalized with respect to the other four regressors. The six motion correction parameters were also included as additional regressors. Our maps of blood oxygen level dependant (BOLD) signal changes were the result of pairwise contrasts between each of the task conditions. Statistical images were thresholded using clusters determined by Z >2.3 and a corrected cluster size significance threshold of p = 0.05.
That, my dear reader, is a full, deep-tissue, hands-on data massage. The raw numbers are smoothed, a process which necessarily increases correlation in any subsequent results (i.e. it gives smaller p-values, etc.). The smoothed “individual responses” were then stuck into regressions with response time as “independent” variables, which was somehow orthogonalized. Were all the regressors orthogonalized, or just this? How were they orthogonalized? Are we talking principal components? B-splines? What? Who knows. Harris stays silent.
Recall that response time is already dicey because of the asymmetrical nature of the questions. Plus we already have learned (or we guessed) that response times were averages within individuals. Was that also done here? Even if it wasn’t, won’t that bias in the questions bias the fMRI results?
Finally, ignoring all the difficulties, the images were used in a cluster analysis.
Now, what’s important to understand, even ignoring the smoothing and orthogonalization and question-time bias, is that all these manipulations are uncertain statistical procedures. The fMRI results are not raw results but are themselves models, or outputs from models. The uncertainty inherent in these models—which is not insignificant—is ignored in all subsequent analyses Harris did. It is thus inescapably true that the final results are stated in terms that are too certain (they are stated conditional on assuming the manipulations are error free and without uncertainty). What Harris should have done is to carry the uncertainty in these smoothing/clustering/regression models forward into the tests of difference between groups. He did not. His results, even accepting the utility of the “stimuli” and the small, hand-picked sample, are too certain.
Before reading further, if you have not seen the fMRI dead salmon study, now is your chance. Dead salmon were hooked to an fMRI and shown “a series of photographs depicting human individuals in social situations.” Various areas glowed to indicate how the dead fish responded to this “stimuli.” Now well armed with the kind of absurdity possible using fMRI, read of Harris’s results.
First fMRI claim:
For both groups, and in both categories of stimuli, belief was associated with greater blood-oxygen-level-dependent (BOLD) signal in the ventromedial prefrontal cortex (VMPFC, see Fig. 1, Table 1), an area important for self-representation, emotional associations, reward, and goal-driven behavior.
Figure 1 is pictured above (but shrunk; see the main paper for full-sized image). The first thing that strikes amateurs like your author is how crude these pictures are; bare, blunt sketches. We’re inferring subtle, fractional gradations in the reasoning powers of humans as they ponder difficult questions from these cartoons? Of course, it’s possible that we can, but how seriously are we to take these amorphous orange blobs1?
The orange areas highlight the largest differences between the average of the averages of “true” responses versus “false” responses. The VMPFC average of averages is plotted for Christians who answer “true”, Christians who answered “false”, and non-Christians (called “nonbelievers”) in both categories. That is, the bar is the average of average: the whisker plot inside is one standard error of this average of averages. Since, in classical statistics, it takes roughly a difference of two standard errors to arrive at “significance”, we can see that there’s no real story in these numbers. (The other bars are “left superior frontal gyrus and in both lateral occipital cortices”, with similar results.)
The meat is in claim that the VMPFC (a region near, but not completely in or encompassing, the amorphous orange blobs) is “an area important for self-representation, emotional associations, reward, and goal-driven behavior.” What does this mean?
Just this: that other scientists have hooked up people to fMRIs and asked them questions which the scientists thought to be somewhat related, but imperfectly, to ‘self-representation’, ’emotional associations’ and so forth. These are vague enough terms, but accept them as utterly unambiguous. Then, only sometimes, broad, imperfectly defined areas glowed orange in the person’s brain while operating under these emotions; sometimes the areas did not glow. Therefore, because in this experiment these areas exhibited weak differences in scans between average of averages for 30 people who answered ‘true’ and ‘false’ to some biased questions, we can’t learn much.
But we can speak as if the results are authoritative, which makes all the difference. Especially to reporters.
Notice that the differences in glowiness of which we speak are mere fractions of percentages. Nearly all the responses (or average of averages, anyway), were less than 0.2% different from one another, and most were less than 0.1% different. And this is after smoothing, regressing, and clustering. There is a good chance these results are spurious.
A good test, but one I’ve not seen in any fMRI literature (except for the dead salmon study, noted above), though I admit to ignorance of much in this field, is to submit wholly random numbers (of the same character as the real responses) to these statistical algorithms and see what they produce. My guess is that the algorithms will yield structures (spurious, of course) aplenty. If there are any fMRIers out there who want to give it a go, contact me. (You’ll have to come up with the funds, because I am an unaffiliated researcher, i.e. broke.)
Second fMRI claim:
A direct comparison of belief minus disbelief in Christians and nonbelievers did not show any significant group differences for nonreligious stimuli…The opposite contrast, disbelief minus belief, yielded increased signal in the superior frontal sulcus and the precentral gyrus. The engagement of these areas is not readily explained on the basis of prior work (see Table 2).
This is perplexing: belief minus disbelief produced no (statistical) differences, but disbelief minus belief did? How? Some kind of signed, one-sided test? Perhaps Harris means that it was Christians answering “true” to religious questions minus non-Christians answering “false” on the same questions versus Christians answering “false” on neutral questions minus non-Christians answering “true”? Who knows? The wording here is confusing: it is also far from clear whether the same areas of the brain are being referenced.
Another fMRI claim:
The contrast of religious stimuli minus nonreligious stimuli (see Fig. 2A, Table 3.) revealed greater signal in many regions, including the anterior insula and the ventral striatum. The anterior insula has been regularly linked to pain perception and even to the perception of pain in others. This region is also widely believed to mediate negatively valenced feelings like disgust.
This is unbearably loose; one casual inference follows upon another. So the anterior insula is “linked” to the perception of pain in others. How strongly? What does “linked” mean? How many different tests are we doing? Are adjustments being made for the multiplicity of tests? Doesn’t look like it. Were these brain areas noted beforehand as areas to watch? Or did Harris seek to tell a story that fit his observations after the fact? The latter appears true. This increases the chance of confirmation bias, a failing to which even neuroscientists can fall prey. Yes. This whole study is one large fishing expedition.
Final fMRI claim:
[Certain named] regions showed greater signal both when Christians rejected stimuli contrary to their doctrine (e.g. “The Biblical god is a myth”) and when nonbelievers affirmed the truth of those same statements. In other words, these brain areas responded preferentially to “blasphemous” statements in both subject groups. This contrast is the result of a double subtraction on religious trials: (Nonbeliever Trueâˆ’Nonbeliever False)âˆ’(Christian Trueâˆ’Christian False) = NTâˆ’NFâˆ’CT+CF = NT+CFâˆ’NFâˆ’CT = (NT+CF)âˆ’(NF + CT). The opposite contrast: (NFâˆ’NT)âˆ’(CFâˆ’CT) produced a null result.
Good grief! Now we have Harris telling us he has unambiguously identified which “stimuli” are blasphemous and which not, which implies he has discovered a way to measure the strength of these questions’ blasphemousisty (I like this neologism). We have already examined the questions and noted their difficulties in interpretation. Which questions—which exact questions—are the blasphemous ones? Harris never bothers to tell us. And just look at all the manipulation that went into producing pleasing p-values! This is subtracted from that, which is further subtracted from the other, all under the presumption that the answers are symmetric, that people’s behavior would conform to just what Harris thought it would conform to, etc.
We’re nearly finished. All that is left is the discussion, the portion of the article where we learn if Harris understood all the difficulties which we have brought up.
1“Glowed orange” is my shorthand to describe the false color process by which the brain images were produced. I am aware of how these machines work, having played with data from them.
Read Part I, Part II, Part III, Part IV, Part V, Part VI, Part VII
As this series has progressed I find myself more and more curious as to Harris’s response to it. Any hope of that?
I doubt it. Harris does not have a history of responding to critics, and I can’t really blame him. Doing so would be a full-time job,
I created the mini picture above and have no idea how that blue stripe with words in it appeared. Apologies to all. Go to the original to view a better rendition.
There is only one more post (thank God!) in this series, coming tomorrow.
A “brain extraction”, eh? You have to wonder whose. Even knowing what it means doesn’t stop it from looking strange in print.
One thing that’s very discouraging in the medical field is the tendency to overgeneralize from experiments done with ridiculously small sample sizes. OTOH, the salmon experiment yielded a significant result with a sample size of one. In accord with the field’s propensity: dead salmon have empathy with photographed humans.
The “massaging” of fMRI data in this paper is pretty standard for fMRI preprocessing. While it may be a fair criticism of fMRI in general (though there are hundreds, if not thousands, of studies validating the technique), it is unfair to criticise Sam and his team for using ubiquitous preprocessing techniques. These techniques try to address the inherently low temporal resolution of the BOLD signal.