Watch out Sam Harris, Gordon Hodson and Michael A. Busseri of Brock University are giving you competition for the worst use of statistics in an original paper.
Their “Bright Minds and Dark Attitudes: Lower Cognitive Ability Predicts Greater Prejudice Through Right-Wing Ideology and Low Intergroup Contact” published in Psychological Science1—headlined in the press as Low IQ & Conservative Beliefs Linked to Prejudice—is a textbook example of confused data, unrecognized bias, and ignorance of statistics.
Hodson and Busseri on are track to beat out Harris’s magnificent effort, and they might also triumph over the paper which “proved” brief exposure to the American flag turns one into a Republican and the peer-reviewed work “proving” exposure to 4th of July parade turns one into a Republican.
Let’s see how Hodson and Busseri put themselves into the running.
The authors intimate that “individuals with lower cognitive abilities may gravitate toward more socially conservative right-wing ideologies that maintain the status quo and provide psychological stability and a sense of order”. They say that this “is consistent with findings that less intelligent children come to endorse more socially conservative ideologies as adults”.
How did they prove that idiots and conservatives are racists? They gathered two large data sets from the UK, one started in 1958 (NCDS), the other in 1970 (BCS); about 16,000 individuals in total, roughly equal numbers of males and females. The quizzed the groups when they reached 11 and 10 years old on their “intelligence”; they then came back to these individuals when they were 33 and 30 and asked them about their “socially conservative ideology and racism.”
The authors do not say how many people they used in their analysis; how many individuals were lost in the 20 years between surveys is not noted in their paper. My read of the NCDS website (pdf) makes the loss about 30%. That leaves about 11,000.
Intelligence was defined in one database as scoring well on matching the similarity between 40 pairs of words, and on matching the similarity of between 40 pairs of shapes and symbols. On the other database, this changed to drawing 28 missing shapes, recalling digits from 34 number series, identifying the definitions of 37 words, and “generating words that are semantically consistent with presented words” 42 times.
Thus the two samples measure similar but different abilities. The NCDS (pdf) also had available the Peabody Individual Achievement Test Math and Reading sub-scales which were not used as intelligence measures. Why?
When the kids became 33 and 30 year olds, they were asked whether they agreed with 13 or 16 questions like, “Schools should teach children to obey authority”, “Family life suffers if mum is working full-time.”
Another was, “People who break the law should be rehabilitated.” Just kidding! It’s actually, “People who break the law should be given stiffer sentences.” The bias in the question wording is ignored.
Another question was, “None of the political parties would do anything to benefit me.” Is agreeing or disagreeing with that a “conservative” position? What would the Occupy people say? Another, “Being single provides more time to experience life and find out about yourself.” Conservative or liberal?
According to the NCDS (pdf), there were about 50 questions, of which only 13 were used. A “conservative”, then, is whatever Hodson and Busseri say it is. The same thing goes for what a “racist” is.
For these questions “reliabilities ranged from .63 to .68.” This means the questions are imprecise and imperfect, so that if you use the raw results in subsequent analysis, you must “carry forward” the uncertainty in reliability. Did Hodson and Busseri do this? No.
One would have guessed from the title, that the authors looked at how the scores on the intelligence questions correlated with the scores on the attitude and racism questions, taking into account the uncertainty in the reliability. You would be wrong.
They first modeled the intelligence questions to create one “latent” (unobserved) measure, called “g”. The uncertainty in creating “g” is then ignored in all subsequent analysis. They did the same for the attitude questions, creating a “latent” (actually unobserved) variable called “conservative ideology.” Uncertainty in its creation is also ignored. Then the individuals’ education and socioeconomic status and separately their parent’s socioeconomic status (which again were the results of models) were put into a model with “g” and “conservative ideology” to predict “racism” (the uncertainty of which, as was already said, was ignored). The picture below summarizes their findings.
Lo, they found small p-values. The authors appear unaware that samples of this size are practically guaranteed to spit out small p-values.
What makes the study ludicrous, even ignoring the biases, manipulations, and qualifications just outlined, by the authors’ own admission the direct effect size for “g” on “racism” is only -0.01 for men and 0.02 for women. Utterly trivial; close enough to no effect to be no effect, their results statistically “significant” only because of the massive sample size.
The effect size for “conservative ideology” directly predicting “racism” is higher (0.69 and 0.51). But all that means is that the questions the authors picked for these two attitudes are roughly correlated with one another. In other words, “None of the political parties would do anything to benefit me” is crudely correlated with “I
wouldn’t mind working with people from other races” and so forth.
Yet the authors have the temerity to conclude, “These results from large, nationally representative data sets
provide converging evidence that lower g in childhood predicts greater prejudice in adulthood and, furthermore, that socially conservative ideology mediates much of this effect.”
Truly, statistics can “prove” anything.
Thanks to reader Jonathan Woolley who suggested this study.
Update I saw, on one website which linked to my criticism, a criticism of my criticism (get it?): “The subjects in the test were given a fifty question questionnaire and only 13 questions are used, and this jackass is complaining about that?” I am the “jackass.”
This articulate person (language warning on the link) says that social scientists mix in red herring questions with “real” ones so that interviewees can’t figure out what’s going on. This person also says that I was unaware of this. Not true. But even if I was, it would have been irrelevant.
The point I made was we do not know how the questions the authors did use—it doesn’t matter how many others were rejected and why these were chosen—were used to create “conservative” and “racist” indexes. I have given examples of two questions which are at least ambiguous; there are more. “Conservative” and “racist” are defined as how the authors see them, and not necessarily how civilians and other scientists would see them.
See also my comments below: the models fit by the authors result in very small effects. These effects mostly have small p-values, but as I said above, small p-values are practically guaranteed in large samples (> 1000). And remember, none of the uncertainty in creating the latent “g” and other indexes are carried forward in their models: if if was, the effect sizes would decrease further (and p-values would increase).
And for the real kicker, if we then “integrated out” the parameters (the βs) and tried to predict whether a person with a low “g” would be “racist”—the reason given for the study—the effects would be lower still, probably negligible. The “direct effect” was already trivial, the “total effect” barely marginal.
Incidentally, if you don’t know, “latent” means unobservable (and uncheckable). Social scientists love using these kinds of models—structural equation models, factor analysis, etc.—because they are so fertile. Sprinkle a little data on them and publishable p-values a plenty will sprout instantly.