Over-Certainty Is One Of The Main Causes Of Bad Science: Brain Volume & IQ Example

Announcement Next week I am on vacation as I prepare for the Cultural Event of the year. There will be no new posts: there may be classic reposts.

Don’t miss JH’s and my comment below: I mixed up their label; rather, mistook it. My apologies to readers. The lesson still stands, though, with the proper label. I’ll leave the original so all can see my blunder; I’ll put a couple comments in brackets in the main text.

Scientific over-certainty is much easier to generate than you might have thought, because of Truth everybody believes, but never of himself, that the easier person to fool is yourself. (This began as a Twitter thread.)

Here’s an example of how easy it is to fool yourself into over-certainty, from a paper on predicting a score on a test (“IQ”) from observed volume of brain matter. The peer-reviewed paper is “Predicting intelligence from brain gray matter volume” by Hilger and others in Brain Structure and Function.

The plausible idea is that on the whole or in regions, greater brain matter predicts higher test scores; which is to say, higher IQ. I have many quibbles (here, here) with the idea that IQ—a score on a test—-is intelligence, that all aspects of intelligence can be comprised into one number, and measured with little or no ambiguity.

But let’s ignore all that here and acknowledge IQ test scores do go part way into measuring some important aspects of intelligence. Given our culture is saturated in the curious brain-as-computer metaphor, and given other experiences in biology, it far from a crazy idea to suggest that the bigger the brain the more intelligent the brain-bearer (how’s that for a woke term!).

Now, when we make predictions, we usually want to know things like this: Given X, what will Y be, plus or minus? Here that translates to something like this: Given cerebellum volume, what will a person’s test score be, +/-?

What’s done, usually, is that for many observed values of X and Y, we plot the old fashioned scatter plot, with X on the x-axis and Y on the y-axis. Prediction to Predicted. You’ve seen this many times.

But I’ve been noticing that, lately, an inversion of this process.

Here is one of their models, using cerebellar volume and score:

Can you spot the mistakes?

Mistake #1, which is almost impossible to resist, is to assume that straight line is Reality. The eye is inexorably drawn to it. How easy it is to assume this is the Way Things Are!

That mistake is not the authors’. It is our mistake when we read the results. This error is ubiquitous. It is called—ladies, avert your eyes—the Deadly Sin of Reification.

Meaning that when this model is talked about, all that is remembered is that line. The dots, i.e. Reality, fade away; only the model is remembered. Many such cases.

Mistake #2, the gray shade around the line gives uncertainty in some part of the guts of the model. It says nothing about Reality itself. It is not the predictive error (it’s the parametric uncertainty in the model itself, which is of no interest to man or beast).

Because the line is narrow, it means a wee p has been found, and “significance” is declared. This is also ubiquitous error.

Those errors are, as I said, everywhere, and would not warrant this article. The next mistake does.

Mistake #3, the orientation of the graph. X is on the y-axis, and vice versa. The line is drawn as if Y is predicting X – which is not what the paper’s title promised us.

Because of this slip, the model seems pretty good. Sure, there’s scatter around the line, but not terrible. We are led to believe that the model is pretty good.

Here’s the same thing, but flipped in the proper orientation, and with a guess of the predictive model’s uncertainty.

What an astounding change, just from flipping the axis. The model no longer looks impressive, or even that interesting. (If you’re really paying attention, this is the same problem found in medicine and its insistence in looking at sensitivity of tests and not predictive value, or in conflating the two. We’ll save that for another day.)

The cerebellum volume [this should be predicted IQ] goes from about 80 to 120 , predicting observed scores from about 70 to 140. The ranges don’t match, which is bad for the model.

Still, there is a slight increase in the chance of greater scores given higher volumes. There is some predictive value, but not much.

The hypothesis has been confirmed, to a small, but certainly not great, degree. At least for cerebellar volume. (They have many other plots for other brain regions; they all look like this.)

Nothing to write home about, but still worth publishing, because the question was a natural one.

Have you paying attention, Anon? Then here is your homework. The authors did several models (like PCA, which they rejected). Here’s one using the limbic network on test score.

What can you say about this picture? This is from another of the author’s models which, apparently, looks much better.

Hand in your results by Monday, 7 August for full credit.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email:, and please include yours so I know who to thank.

Categories: Statistics

13 replies »

  1. The graph caption in the paper indicates that the y-axis (with the same scale as the x-axis) represents the predicted IQ score.

  2. “The cerebellum volume goes from about 80 to 120, predicting observed scores from about 70 to 140. The ranges don’t match, which is bad for the model.”

    This is simply wrong. As JH pointed out, that plot does not report the cerebellum volume. I quote from the figure 5 caption:
    “Observed (x-axis) versus predicted (y-axis) FSIQ scores”

    It is not, as you suggest, a plot in which a variable is used to predict another one. It is a plot in which the predicted and actual values are compared. So what you wrote (again!) does not make any sense.
    This is the second post about “bad science” that I read on your blog, and for the second time a quick look reveals that you are misrepresenting what the given study says.
    There are a lot of useless studies being published, but you seem to do a worse job than the people that you criticize.

  3. What’s the point of this?

    How to graph something badly? And choose data to analyze that is probably inappropriate.

    It would be hard to imagine that there would be any significant association within this data unless the n was very large. The line is almost straight, looks like it goes from 90 to 105. The highest IQ value is about 119 and the average looks to be lower than 100.

    Hardly proof of anything. One needs another data set or study to make your point (which is what?)

  4. All,

    JH is right! I missed the label! My apologies to all. That’s what you get for writing in a hurry. I got the label on their y-axis wrong.

    But this is only a small error on my part. I assumed they’d show the actual data and not just the results. This is another growing trend with models: not showing the data.

    The lesson still stands, even with my mixing up the labels. We still want X by Y: predicted IQ as X, actual IQ as Y. The modified picture I made is still the right lesson. But change the X axis to predicted IQ.

    That means the homework shows their last model has no predictive power. Everybody has a predicted IQ of about 90. That model is useless then.


    You mean this Laden? Bad Astronomer Does Bad Statistics: That Wall Street Journal Editorial, Government Funding Is A Conflict Of Interest: Cowardly Calls For Climate Scientist’s Firing.

  5. I did not agree with the attempt to get Dr. Soon fired. But because, unlike Dr. Briggs, I do not fall prey at every turn to the genetic fallacy, I understand that my opinion about that has nothing to do with Laden’s demonstration of the errors conveniently referenced in the first link after “You mean this Laden?”.

  6. Lee,

    Yes, Laden is wrong. See the big global warming post at the top banner, and seek out the BEST links.

  7. This attempt at cleverness is as weak as your attempt to understand the paper you’re criticizing. To demonstrate an error is not to commit an error.

  8. @Briggs
    “The lesson still stands, even with my mixing up the labels. We still want X by Y: predicted IQ as X, actual IQ as Y. The modified picture I made is still the right lesson. But change the X axis to predicted IQ.”

    Not really. In that kind of plot, a perfect prediction would result in all the points lying on a 45 degree line. Whether the line they fitted is slightly upward-leaning (as in the original paper) or very upward-leaning (as in your post after you flipped it) does not change the interpretation one bit.
    Moreover, the scattering around the line itself is not what matters. If all the point would lye on a flat line, it would mean that the model sucks, even if the points perfectly matched the line. This is close to the situation shown in the last plot you reported in your post (which is referred to an atlas-based model): all the predicted values are close to the sample mean, which means the model has virtually no predictive power, and the fact that the points lye close to the fitted line does not mean anything.
    But incredibly you interpret this in the opposite way, as if this were a sign of good prediction performance. And to think, it is even written in the paper abstract:
    “in the atlas-based models, the predicted IQ scores varied closely around the sample mean. This renders the practical value even of statistically significant prediction results questionable”

  9. Frazen,

    Assuming that the vertical axis represents the brain volume, Briggs explained the effect of switching response and explanatory variables in this case.


    JH is right!

    It is the most agreeable statement in the world. Hence boldface worthy.

  10. @JH
    Yes, IF the axis represented brain volume, it would make sense. But:
    1) it doesn’t, so I don’t really see the point
    2) he explicitly says: “We still want X by Y: predicted IQ as X, actual IQ as Y. The modified picture I made is still the right lesson. But change the X axis to predicted IQ.” So it really looks like he is talking about a graph with predicted IQ values on one axis and actual IQ on the other.

  11. What would cerebellum volume have to do with and IQ test anyway? If I am remembering correctly, the cerebellum is involved with motor activity and coordination, at a level below consiousness.

  12. As an engineer who spends a lot of time on root cause and corrective action tasks, the line added to that scattergram made be laugh out loud. If I presented a chart like that to a manager, I would expect a response like “Milton, you think you might be going down a bit of a rabbit-hole here? Show me your Pareto chart, let’s get you back on track.”

    As a colleague of mine is fond of saying, “It all pays the same.”

Leave a Reply

Your email address will not be published. Required fields are marked *