Over-Certainty of Polygenic Scores

Over-Certainty of Polygenic Scores

This is really a Part I of a series of articles about the latest trend (fad?) in statistical biology, but I didn’t want to label it that way, which would discourage some readers.

Polygenic scores (PS), sometimes called polygenic risk scores (PRS), are not difficult to understand. Statistically, anyway.

Let’s start with this picture. This is from The Atlantic article “An Enormous Study of the Genes Related to Staying in School“.

This “plots years of schooling against subjects’ polygenic scores”, but “controlled” for Sex and other things. “Control” means dumped into a regression and is not the English word control. It will be almost impossible, but at first try and ignore the blue line, which does not exist, and which is not the data. We don’t need to know yet what polygenic scores are except to understand they are measures taken on each person.

The polygenic scores were normalized, meaning subtracting the means and dividing by the standard deviation, which accounts for the note “in standard deviations”. This is just a simple transform and not of interest. The polygenic score is used to predict years of education (sort of; that “control” screws things around, but we’ll ignore it).

Pick a PS of -2: years of education for that level run about 4-5 to 18-19. Now pick +2: years of education run from about 7 to about the same top, maybe 20. In other words, given a PS of -2 you’d predict, conditional on this data, there’s a 99% (or whatever) chance of years of education in the interval 4-19. Given +2 you’d predict the same chance for years 7-20 or so. A slightly, and only slightly, higher chance for greater years of education by moving from -2 to 2, or 4 standard deviations. Not enough of a difference to make a difference for most decisions.

What about PS scores of -3 versus +3? At +3 there are far fewer dots, and they run from about 11 to 20 years. Now there could be few dots because the sample size is small, or because not that many people really have PS scores this high. Similarly for the -3. There’s no way to tell the story of the missing dots until we take this model and try it to make actual predictions in real life. Just like any statistical model.

Look at the blue line now: it’s a regression with PS and years. Regular readers know better than to rely on R^2 which exaggerates evidence. “11% of the variation” is statistical arcana which produces over-certainty. In any case, you can plainly see this is not a wonderful model. It might—just possibly might—have some utility at very high and very low PSs. But since predictive error always swells at the extremes and where the sample is small, the uncertainty in the prediction would be large.

I mean predictive and not parametric uncertainty here! It would be an obvious error to use that line as a substitute for the prediction without its plus-or-minus. Or to say the plus-or-minus is in the line and not the prediction. Or to say the line is real: that is the Deadly Sin of Reification. Or to use the plus-of-minus of the parameter of the regression and not of the prediction of the observable (the most common generator of massive over-certainty).

This is not an unusual outcome, this model, for polygenic scores against some external measure. Some are better, many are worse. If we were only to go by the predictive performance of models like this, which are underwhelming to say the least, there wouldn’t be much interest in polygenic scores. But there is huge interest.

The reason is that polygenic scores are statistical encapsulations of genetic measures, and most people think genes cause things things like “IQ” or years of education. Or, more tangibly, they cause things like body height or heart disease. And so on. If this is so, if genes are direct causes, then it is thought that polygenic scores express the amount of “cause” certain genes have on an outcome of interest, like score on an “intelligence test”.

Regular readers will not be surprised to learn that I doubt all this, and while I think polygenic scores have some use, the evidence related to them is, as with many statistical measures, hyped and over-certain. I do not say wrong: I say over-certain. The reasons for being skeptical will come later. For now, let’s look at what these polygenic scores are. I’ll skip all niceties, caveats and cautions and give only the rough statistical outline.


A single-nucleotide polymorphism (SNP) is a change at some specific position on the genome in which at least a certain fraction of the population have a different nucleotide than the others. Just one guy with a change out of 7 billion isn’t enough: they say. Most people have, for example, A in a certain position on the genome, but a bunch instead have C. There are other considerations about SNP types that aren’t of direct interest. We’re also ignoring measurement error. The SNP is in the end just a measure: a yes/no per person, a count/fraction per sample.

Enter the genome-wide association study, or GWAS, which look at SNP variations between people. These go one of two ways. The variation in SNPs in one group with, say, a certain disease is compared against a control group without the disease. Or an enormous number of people are sampled, and the variation in SNPs is statistically related to some outcome, like scores on an “intelligence test” or height.

In the simplest disease-control case, a single SNP is used to produce an odds ratio (direct probabilities work, too). If more people with the disease have A in the SNP than C than people in the control, then this SNP gives some evidence of “association” with the disease. Some call (at least implicitly) this association a cause. But if it were a cause: then every person with A would have the disease (unless something exterior blocks it) and every person with C would not, unless there were other causes of the disease beside this gene.

This kind of analysis is no different whatsoever between noticing, say, more people in the disease group ate more bananas than people in the control group. Same statistics, same vague notions of cause.

Distressingly, the SNPs said to be “important” are identified by wee Ps. Wee p-values, that is, with all the over-certainty and mistakes typical of them. Why not use predictive probabilities instead? Why not indeed? P-values cannot discern cause and they certainly generate massive over-certainty, and this is true even with genetic measures.

Now, instead of looking at SNPs one by one, they can be combined in a regression-like fashion to produce a polygenic score or polygenic risk score, in the following way.

We first start with an outcome Y, such as disease presence, score on “intelligence test”, or height. Anything that can be measured on people is a candidate Y. A weight relating each SNP from the GWAS to the outcome Y is then produced via some kind of regression.

The weights are then combined:

S = SUM_i (X_i * Beta_i),

where Beta_i is the weight and X_i the presence of the marker genotype SNP. The S is usually normalized, as above. Now there are too many Betas usually and not enough data, so the regression is often some form of LASSO or Ridge regression. These are nice because they set many of the Betas to 0 and smooths the others. All that is of technical interest. All we need remember is that S is a statistical measure of the state of comparative biology of a person. Blood pressure is such a measure, too, so there is nothing strange about biological measures.

An excellent graphic of this is process at this site, about which more in a moment. In the end, we have for each individual an S and a Y. The S are used to predict the Y. Above, the S was given and Y years of education (sort of).

There are, of course, lots of SNPs in human DNA. Too many to use all at once, even with LASSO; usually only some are analyzed. Which to choose?

The strategy is to find the fewest X_i that give the best association, via some measure, with S and Y. Many use R^2—and not predictive skill. Again we have over-certainty. Anyway, after these SNPs are gathered, people stare at the X_i and say “Oho! This gene X_72 has a high Beta and is therefore responsible for partly causing IQ!”

Everybody recognizes more than one gene is “associated” with or causes complex things like test scores or body height. But, except for a handful of people, everybody also believes these associations are causes (take Charles Murray, who says it outright). Well, they have to be, right? We’ve heard forever of selfish genes and heritability and evolutionary psychology with selection pressure (a cause) on genes. It is said genes are the causes of phenotypes. And brains are computers and we’re nothing but machines designed to promulgate our genes. Well, we’ll get too all that another time.

Can You See Me Up Here?

That site with the clever polygenic-score infographic has the article “New Turmoil Over Predicting the Effects of Genes“.

A key breakthrough was the recent development of genome-wide association studies (GWAS, commonly pronounced “gee-wahs”). The genetics of simple traits can often be deduced from pedigrees, and people have been using that approach for millennia to selectively breed vegetables that taste better and cows that produce more milk. But many traits are not the result of a handful of genes that have clear, strong effects; rather, they are the product of tens of thousands of weaker genetic signals, often found in noncoding DNA. When it comes to those kinds of features — the ones that scientists are most interested in, from height, to blood pressure, to predispositions for schizophrenia — a problem arises. Although environmental factors can be controlled in agricultural settings so as not to confound the search for genetic influences, it’s not so straightforward to extricate the two in humans.

Note that “the product of” is causal language.

What had also emerged from that research as an “obvious, beguiling offshoot,” according to Nick Barton, an evolutionary biologist at the Institute of Science and Technology Austria, was a specific prediction known as a “polygenic score.” Beyond the associations themselves, GWAS could provide estimates of how individual variants in the genome corresponded to measurable changes in a trait; polygenic scores constituted the sum of all those tiny effects. For instance, with height, having a guanine base instead of a cytosine one in a particular DNA region might correlate with being 0.1 millimeter taller than average. The polygenic score would take all those approximations, add them up and spit out a prediction for some individual’s actual height.

This was done to “explain” (cause again) the differences in heights between northern and southern Europeans. Or so everybody thought. Recall that p-values and other traditional statistical measures not based on prediction produce a lot of false signals.

Then came larger databases and recalculations and the signal for height differences disappeared.

“The new studies are really quite disconcerting,” Barton said, because they demonstrated that scientists had been mistaking biases in the polygenic score calculations for something biologically interesting. Their statistical methods of accounting for population structure were not so adequate after all…

Barton agreed. “The whole thing is tricky, because the origins of genetic variation in any population are really complicated,” he said. “Now you really can’t take at face value any of these methods over the last four or five years that use polygenic scores.”

“Maybe the Dutch just drink more milk, and this is why they’re taller,” Sunyaev added. “We can’t say otherwise with this analysis.”

The paper is here: “Reduced signal for polygenic adaptation of height in UK Biobank“. Nice title. Not for the first, and surely no the last, I’m reminded about early work is parapsychology. Early results showed big effects and had everybody juiced, but the closer people looked, the more the results faded into the distance.

What about predictive skill?

Given that some experts want to roll out polygenic scores in the clinic, it’s already clear that this flaw could deepen the disparity in health care. In a study published last month, researchers found that trying to translate insights gleaned from European data to make health predictions in people of African descent led to as much as a 4.5-fold drop in accuracy. Others have tried using polygenic scores to make poorly supported claims about differences in behavioral and social traits between populations (such as IQ and education attainment, which are far more difficult to define and unpack than height is, yet are being used to potentially inform future policymaking decisions). “It’s kind of scary,” said Sarah Tishkoff, a geneticist at the University of Pennsylvania who emphasized how critical it is to collect more underrepresented genomic information.

And cause (ellipsis original)?

“The methods developed so far really think about genetics and environment as separate and orthogonal, as independent factors. When in truth, they’re not independent. The environment has had a strong impact on the genetics, and it probably interacts with the genetics,” said Gil McVean, a statistical geneticist at the University of Oxford. “We don’t really do a good job of … understanding [that] interaction.”

We’re not nearly done. We have to look at that interaction and at such things as “IQ.” Height can at least be unambiguously measured (or near enough). How about something as difficult as intelligence? More to come!

To support this site and its wholly independent host using credit card or PayPal (in any amount) click here


  1. Sheri

    “But if it were a cause: then every person with A would have the disease (unless something exterior blocks it) and every person with C would not, unless there were other causes of the disease beside this gene.” Yes, which is why the theiving lawyers that sue for cancer “causes” (read—source of income for immoral, horrid creatures) are so very, very evil. They KNOW damn well one thing does not cause cancer but they care nothing about anything but THEIR money. In fact, the more people who die, the more they make. I actually consider the use of “is associated with” as meaning the way to measure how vile and contemptible a human being is and who much cash they will steal based on the lie. It’s a throwback to the medicine men, temple chiefs and other crap we supposedly “outgrew”. I swear there are idiots I could convince that eating toast causes pregnancy because it’s associated with women who get pregnant. Humans are flaming idiots. This will be the death of science and it is well-deserved.

    All this genetics stuff harkens back to Mengele. and most likely will lead to another eugenics period where the “inferior” are murdered en masse.

  2. Please keep this up! There must be a single gene associated with Critical Thinking and you have it.

  3. “Distressingly, the SNPs said to be “important” are identified by wee Ps. Wee p-values, that is, with all the over-certainty and mistakes typical of them. Why not use predictive probabilities instead? Why not indeed? P-values cannot discern cause and they certainly generate massive over-certainty, and this is true even with genetic measures.”

    Yes, why not? So are there any examples of “predictive probabilities” being used by scientists and researchers that we can look at? (in any subject)

    In the area I work at, we make statements about two populations (with any necessary assumptions and possible errors) based on two samples. Can this be done with “predictive probabilities”?


  4. Bill_R

    if you want to see what the big kids actually say:

    Rothman KJ. Causes. Am J Epidemiology 1976;104:587–92.
    Pearl, Judea. “Causal inference in statistics: An overview.” Statistics surveys 3 (2009): 96-146.
    are good places to start. Somewhat more nuanced than lab rats and regression

  5. bob sykes

    Without doing any calculations, it is obvious there is no relation between the plotting variables.

    If you ran the regression on a different machine or using another statistics package, would you get the same result? Is the determinant of the data matrix nonzero?

  6. Uncle Mike

    I’m going to be brutally honest about this subject. Both my parents had college degrees and my father had a doctorate. I have two post-grad degrees. Both my children have post-grad degrees. My grandchildren have been identified as talented and gifted (except for one who is only 3, and we’re not sure about him).

    So without any false modesty I can say that my genes are better than your genes. Too bad; it sucks to be you. Dummies.

  7. DAV

    Bill_R, February 6, 2020 at 3:01 pm

    if you want to see what the big kids actually say:

    Pearl, Judea. “Causal inference in statistics: An overview.” Statistics surveys 3 (2009): 96-146.
    are good places to start.

    Well, I read it did you? Sheri’s right and Pearl doesn’t say otherwise.

    causal and associational concepts do not mix.

    Examples of associational concepts are: correlation, regression, dependence, conditional independence, likelihood, collapsibility, propensity score, risk ratio, odds ratio, marginalization, conditionalization, “controlling for,” and so on.

    Examples of causal concepts are: randomization, influence, effect, confounding, “holding constant,” disturbance, spurious correlation, faithfulness/stability, instrumental variables, intervention, explanation, attribution, and so on.

    Many researchers, for example, are still convinced that confounding is solidly founded in standard, frequentist statistics, and that it can be given an associational definition … this definition and all its many variants must fail.

  8. Bill_R


    Perhaps I was too terse. My point was that a simple additive glm model is insufficient for causation and general prediction. If you’ve read both Rothman and Pearl you’d see that they cover that. (and that Rothman predates Pearl by more than a decade)

    And yes, I’ve been reading Pearl, since his 1988 book. Along with Spirtes, Glymour, Scheines, Cox, Dempster, Shafer, etc., etc. Been doing it, too, for almost as long for “reals.”

Leave a Reply

Your email address will not be published. Required fields are marked *