A Case Of Bad Statistics In An IQ-Intelligence Study

A Case Of Bad Statistics In An IQ-Intelligence Study

We’re going to soon talk more about “IQ” and “intelligence”. I’m working on a more in-depth article on the subject to carefully explicate the vast over-certainties I see on “both sides” in the debate. Consider this small post nothing but a throat clearing. I do not answer all questions here.

Because of a recent Twitter battle, it’s clear there are two sides in the debate, which we may call the Sailerons, who follow Steve Sailer and who regularly commit the Deadly Sin of Reification of IQ with intelligence, and the Talebites, who follow Nassim Nicholas Taleb, a group which dismisses IQ scores of being of any real interest.

Briefly and incompletely, here is why I believe IQ is not intelligence. Those who tout IQ never bother to define intelligence except as ability to do well on certain kinds of tests. How do we know these tests measure intelligence? Because those who score high are smart. This is circular. Why this circularity exists and what extra-test definitions of intelligence are, I save for the other article. I do not mean this paragraph as a proof, so no histrionics, please.

The Talebites wave aside test results, saying, for instance, that they’re not especially useful in discerning differences in successful people. This misses that idea that the pool of people who are not a success probably didn’t score well on those tests. Even if IQ properly measures intelligence (which it does in some aspects, but as through a glass darkly), we would expect there to be no correlation of IQ and success at the bottom of professions (among just those people there) and at the top.

This next proposition should be uncontroversial: IQ tests are useful at predicting the ability to answer questions or to complete tasks that are like the questions used in IQ tests. This is almost tautological (leaving out that testing environments are different than real life). An equivalent proposition is: Baking tests are useful at predicting the ability of people to be successful bakers. It is obvious that it takes more than mere baking skills to be a successful baker; that one number is insufficient at describing baking ability. Yet that number will correlate well (in the statistical sense) with other standard measures of success, such as income, and even health.

Now you will hear from the Sailerons that, of course they don’t believe one number can capture all there is to a person’s intelligence. In practice, they violate that commitment routinely, reifying with wild abandon, especially when it comes to differences in populations. Yet I nowhere among the Talebites find the absence of reification, either. Everybody, with rare exception, says people “have” an IQ or that IQ is “real”.

No. Most people do not have an IQ. Only the people who have taken an IQ test have an IQ score: IQ is only a score on a test. All people have intelligence—of varying degree (I reject categorically any hint of Equality or blank-slatism). Intelligence, as I will prove later, and which anyway should have always been obvious, is much more than a test score.

Here, then, is just one small example from an army of examples of how things go wrong when people do not begin with solid foundations. What happened (a multi-cars-on-the-highway-in-the-blizzard wreck) would not have happened if the folks responsible would have begun with a multifaceted, metaphysically sound definition of intelligence. And if they eschewed p-values.

The peer-reviewed paper, published in the International Journal of Epidemiology, by Arden et al. (“authors”), is “The association between intelligence and lifespan is mostly genetic.”

Wow! What a title!

Given my vast cognitive abilities, which are plain to all, but especially to me, I should live forever!

Just look at what is claimed, at least tacitly. At least that the authors have an unambiguous definition of intelligence. A first for this field. And that the same genes which cause big brains also cause long lives. Deep and important.

And silly. The Deadly Sin of Reification has struck. IQ test scores have become intelligence.

Well, so what. So change the title to “The association between IQ scores and lifespan is mostly genetic” and all is fixed. Alas, no. First, because the reification of IQ scores as intelligence is so ingrained, everybody will see “IQ” but believe “intelligence”. Second, and damning, we have wee p-values, outrageous data selection, and shades of the epidemiologist fallacy.

Jay S Kaufman and Carles Muntaner (“critics”) where so aghast at what the authors claimed, they submitted a rebuttal, which is must reading. They did such a marvelous job listing the authors’ many shortcomings and mistakes that there is no reason for me to repeat their words. I shall assume you (yes, even you) are among the minority who will read them.

A couple of highlights. The critics said the authors relied “on null hypothesis significance testing rather than reporting of effect estimates and their imprecision”. The rebuttal by the authors was to say they did not, at least in the abstract (ceding that they did in the paper).

The authors studied twins’ life spans, relying on a bizarre method of data inclusion.

If at least one twin died by the time of the study assessment, then the pair was included in the analysis, but the survival difference could not be calculated if the second twin was still alive, which was true in about half of the pairs analysed. In that case the authors imputed the death date for this surviving twin by using the national average for a person of the same birth year and sex.

Industrial-grade over-certainty there. But it’s even worse. “Moreover, in some instances, the authors found that the person had already reached their life table estimated survival, and so they assigned this person to die in the same year…” Dude. And what about measures of genes themselves and the causal pathways to actual lifetime and actual intelligence, or even just IQ scores? Didn’t happen. More assumptions were inserted for measurement.

Next we need this headline, from a different paper (in New Scientist, the ex-communist magazine, now suitably morphed). “Exclusive: A new test can predict IVF embryos’ risk of having a low IQ.

Did they say risk of “low IQ”, as if IQ were intelligence and having low intelligence was like having a disease? Yes, sir, they did. Isn’t that sad and stupid? Yes, sir, it is. But it is representative of what happens when you never bother to define intelligence except circularly.

The authors (in the rebuttal) say “Cognitive epidemiology is the field of study that explores links between intelligence, health and mortality.” Health and mortality we can define with only some ambiguity (mortality is certain). But what is their definition of intelligence? IQ. With no idea that IQ is just as well seen as a measure of cultural achievement as intelligence (I mean this only a loose sense; don’t have palpitations). Cultural achievement is, after all, how the questions in IQ tests are picked. (This is not to claim IQ scores have nothing to say about intelligence, suitably defined. And if IQ is predictive, however loosely, with cultural achievement, it is still useful when considering Diversity.)

We can now see why the critics said “To consider IQ, as a marker of a disease or variation in test scores, as a condition to be prevented or treated is absurd.”

There is no more a cognitive epidemiology than there is a memory, perception or emotional epidemiology. These are basic psychological processes, not diseases, therefore not targets of prevention or treatment. A loss of functional cognition, such as occurs in dementia, can be studied as a disease outcome, but variation in intelligence test scores in the population is not such a quantity. Nor is IQ a well-defined exposure even if it were relevant to public health. The misconception of using IQ or other markers of cognitive performance as causes of disease or death, to be prevented or treated at the population level, has had devastating effects in the past century.

As apt and true as that warning is, the hunger for certainty and the mania for measurement assures that we will see many similar mistakes in the future.



  1. trigger warning

    The entire field of psychometrics – wherein IQ testing is the best and most technical exemplar but also includes the risible practice of predicting future human behavior based on expressed “attitudes” and “opinions” – should be properly called psychomancy; i.e., divinization of future behavior or prospects on the basis of word patterns.

    Sadly, the very success of physical-sciences based engineering lured us into the bizarre belief that somehow measuring weightless, colorless, odorless, and evanescent thoughts, beliefs, and memories could produce similar engineering success in the realm of human behavior. And so we “evolved” from a culture that, aware of its own limitations, simply decreed some proportional punishment for punching your co-worker in the nose (knowing that nose-punching would still occur despite our best efforts), to a culture believing that somehow machining thoughts, beliefs, and memories through childhood indoctrination, psycho therapy, counseling, carefully designed “reinforcements”, “nudges”, token-economy tax structures, and mandatory “awareness” seminars might eliminate unwarranted nose-punching altogether.

    This is the ultimate conceit of the Progressive vision: in the words of T.S. Eliot, “escap[ing] from the darkness outside and within by dreaming of systems so perfect that no one will need to be good.”

  2. “Those who tout IQ never bother to define intelligence except as ability to do well on certain kinds of tests. How do we know these tests measure intelligence? Because those who score high are smart. This is circular. Why this circularity exists and what extra-test definitions of intelligence are, I save for the other article. I do not mean this paragraph as a proof, so no histrionics, please.”

    Without histrionics; I would regard this as an inaccurate (and unfair) characterisation of Steve Sailer personally and of the bloggers who are part of his sphere (eg James Thompson – also at unz.com).

    They are all perfectly aware of the historical fact that IQ arose from the observation that *all* tests of cognitive ability showed a positive correlation – each with all others – in group studies. The inference is that this implies a single underlying and varying ability – general intelligence of ‘g’ – which can only be measured indirectly. IQ tests are various (incomplete and distorted) ways of ‘extracting’ a single number to serve as a proxy to that fact.

    Sailer et al are also perfectly aware of the century’s worth of honest studies that (all) show a positive correlation between IQ average score of a group and many outcome measures such as overall educational attainment, salary, job status, most heath measures and life expectancy.

    There is also a mass of research showing correlations between IQ scores and various objective measures such as brain size and simple reactions times (and many others)

    It is this predictive ability which is so impressive and important about IQ. This is why Sailer et al are interested in IQ. But they are also interested in measures of personality, motivation, creativity etc; which are pretty much independent of IQ – knowledge of which increases predictive precision.

    I hope you are not using a false imputation of ‘circularity’ as a straw-man type argument, setting up Sailer et al to be knocked down for this supposed error. It just isn’t true.

  3. Spetzer86

    This video by Jordan Peterson covers the general topic of IQ and pokes at some of the issues. It’s an interesting view of how intelligence (or at least measuring it) may affect outcomes, with a few caveats. https://www.youtube.com/watch?v=fjs2gPa5sD0

  4. Briggs


    Hey, thanks.

    Couple quickies: (1) have a go at defining intelligence non-circularly; (2) I don’t deny (Taleb mistakenly does) the usefulness of test scores in predicting certain other outcomes, more or less successfully, more successfully the closer the outcome is to questions asked on the tests, but in any case far over-hyped due to reliance on wee p-values, and parametric versus predictive statistical methods (explained here); (3) interesting that nobody ever says a person has SAT, or says “SAT is real”; (4) and given “g” is only a non-unique weighted linear combination of (sub-)test scores, it must be correlated to things like test scores.

    The other comments about will (motivation etc.) and some biological measures (brain size, etc.; and why not height, important to me given my soaring achievement?) are better, and even necessary to understand part on intelligence, non-circularly defined, but the focus as you have it is backward. And wholly (it seems to me) materialistic.

    Well, much more to come! I have two conferences coming up these next two weeks and am way, way behind.

  5. Ken

    RE: “How do we know these tests measure intelligence? Because those who score high are smart. This is circular.”

    There’s some implicit assumptions in the words there being taken for granted.

    Are the tests measuring “intelligence” or measuring “knowledge” — knowledge of subjects the less less educated tend not to be exposed? What is “smart”? — Knowledge of a lot of arcane and popular facts, or, the ability to use such factual information to analyze other information and reach other correct conclusions and/or reasonable hypotheses?

    The statement is not necessarily circular so much as not making clear where knowledge is distinguished from actual latent ability…and how tests of knowledge are used as a proxy for mental ability (aka “intelligence”). Or whatever.

    Defining the terms and contexts matters entirely. A savant (e.g., “Rain Man” as portrayed in the movie of that name by Dustin Hoffman) is clearly a genius in certain narrow respects, and an imbecile in others. And so it can go…depending on how one defines their terms. A, if not the, problem is that the terms used have a certain intuitive sense of meaning most agree upon in general terms but also lack needed specificity.

  6. Don buss

    Can’t wait for your follow up post Briggs! Thanks in advance for thinking through this.

    To be fair to Taleb, I read him as admitting that IQ correlates with certain outcomes and the closer the outcome is to taking an IQ test, the closer the correlation. Being Taleb though, I think he rejects that those are outcomes of “success”. Those outcomes are the realm of bureaucrats and paper-pushers (IYI’s). Entrepreneurs, risk-takers are “successful” in Taleb’s world.

    Hence his mocking of the “birkensock” mensa crowd – feeling superior but not as valuable as fat tony.

    Anyway, I am looking forward to your discussion of the literature ane wee p’s!!

  7. Bruce Charlton

    @William “(1) have a go at defining intelligence non-circularly”

    Well, I already did in my comment – but I shall re-do so.

    If you give a group of children a short test of cognitive ability, *any* cognitive ability (reading comprehension ability, vocabulary, maths or logic puzzles, analogies, pattern or colour discrimination, reaction times…) and put their scores into rank order; this rank order will correlate with the rank order of many future events in the same cohort – including academic and work outcomes, health outcomes and life expectancy.

    When the inference is made that the reason for this is a single underlying, not directly measurable, cognitive attribute called g underlies all cognitive tests and is the reason for their co-correlation; then IQ tests can be devised which are less sensitive to the child’s cultural and educational experience – and such IQ tests are more precisely predictive of the above outcomes than individual tests of cognitive ability.

    This stage was reached about 100 years ago, and the correlations above were known –


    except for the health/ lifespan ones which were recently nailed-down by Ian Deary’s Edinburgh group, using tests done on all Scottish children several decades before; but which were known in outline from the cohort studies begun by Terman in California in the 1920s.

    The original use of IQ tests in the UK was (as in the above cited example from the 1920s) to find children from poor and rural backgrounds – who had had very little formal education and whose ability could not therefore be detected by subject-based exam results – who would probably benefit from a selective and academic school education, in order to offer them scholarships.

    I regard it as surprising – and manifestly not circular – that a short test done by a group of children in a classroom one morning when they were 9 years old; turns-out to be predictive (approximately, but still to a predictable degree) of what happens to those children, and their performance – in many different outcomes and in many different ways – over the rest of their lives.

  8. Briggs


    No, sir. These definitions are circular. Any use of correlations with success or with test scores is a circular definition. They all say these scores are high and so are other measures of success or health or whatever, and these measures of success and health require intelligence or are tied to the thing (such as the brain) which we think is responsible for intelligence, so that since the correlation is there, thus intelligence has been defined. Intelligence, in some of its aspects, may be have been demonstrated, but it has not been defined. How do you know these test scores have demonstrated intelligence? You have to have some non-test or non-success definition in mind. You cannot say the test measures all of the aspects of intelligence, because you (third person pronoun) have not defined just what intelligence is in its essence, its nature.

    I do not anywhere say that test scores and other measures are not correlated or are not somewhat useful in making predictions. Far from it! Though I do say the predictive ability is exaggerated, because all such statistics relying on classical methods are necessarily exaggerated, but that is a different point. Most statistics are exaggerated; I mean the certainty in them is.

    Well, I realize I owe the definition of intelligence, and I (actually not I, but others better than I) have it. But I don’t want to give it in shorthand without the full explanation, to avoid unnecessary discussion. I beg your indulgence until I provide it. Anyway, I’m willing to bet, given the nature of your writing, you’ll say “Oh, I agree”.

  9. Nate

    I am greatly looking forward to your follow up article.

    Are you saying that a non-circular definition of intelligence requires only current state knowledge about a person, not after they get this score, it correlates in the future to more financial/relationship/life success?

    Like height? I can measure it in a 6th grader with a ruler now. I don’t define height by whether you will be on the varsity high school basketball team in 5 years or in the NBA.

  10. Bruce Charlton

    Well – I shall wait and see; but my understanding of the circular accusation is that circualr means some kind of tautology; saying the same thing in different ways. Since this is not the case for intelligence, far from it, the definition is non-circular.

    You seem to be saying that the consequences of IQ predictions are included in the definition of intelligence? This just is not true, but I can’t imagine how you could imagine it was so maybe I am misunderstanding.

    Intelligence is no more circular than any other aspect of biology. But what does make it unusual is that it is (nearly always) measured using an ordinal – non-interval and non-ratio scale – and lots of the statsitics fail to take this into account. If you are criticising this, I wholly agree.


    My own preferred use of statistics is simply to summarise – e.g.


    I was a biomedical scientist, seven years in labs, an epidemiologist, and then an evolutionary/ system theorist, for a couple of decades before I got interested in intelligence – so I speak from an unusual breadth of scientific experience.

    But maybe you have some other, and extremely inclusive, defintion of circular. However, my suspicion is that any such will also, necessarily, swallow up and reject a lot of other (real) science, as well as intelligence – this has been my experience in the past with people who make this accusation.

  11. Ye Olde Statistician

    “intelligence” comes from Latin inter-legere, “to read between,” that is “between the lines. A person is more or less intelligent to the extent that he can “read” conclusions not actually in the presented facts. To “connect the dots,” as it were, to obtain the picture.

    This does not mean that the picture is correct.

Leave a Reply

Your email address will not be published. Required fields are marked *