Epidemiologist Fallacy Strikes Again: Miscarriages And Air Pollution Edition

Epidemiologist Fallacy Strikes Again: Miscarriages And Air Pollution Edition

I love air pollution. Smells like progress. Good for you. Nothing healthier. Except smoking.

There, now that that’s out of the way, let’s get to the paper “Air pollution-induced missed abortion risk for pregnancies” by Liqiang Zhang and a slew of others in the peer-reviewed journal Nature Sustainability.

Now I know what you’re going to say. But it’s true. There is really is a journal with that asinine name. It has papers like “Domino effect of climate change over two millennia in ancient China’s Hexi Corridor”, “Citizen science and the United Nations Sustainable Development Goals”, and of course “Realizing resilience for decision-making”, which opens with the words “Researchers and decision-makers lack a shared understanding of resilience.” University libraries must be cutting back on their dictionaries.

What? Huh? “Get going already”? Oh, very well. Let’s try to take this thing seriously.

The New York Times did. Took it seriously, I mean. It makes for amusing reading as they try to (a) paint pollution as a killer, while (b) denying a person is being killed. Skip it.

See this video on the Epidemiologist Fallacy or read about it here.

So Zhang got “clinical records of 255,668 pregnant women in Beijing from 2009 to 2017.” Which is fine. Now comes part one of the Epidemiologist Fallacy: they never measured actual “air pollution” exposure on any of these women.

Instead, they “computed the air pollutant exposure level of each pregnant woman on the basis of measurements at the nearest air monitoring stations from her residential and working places.” Latitude and longitude.

Can you say over-certainty?

Well, it might be all right, as long as they took the uncertainty of actual exposure into account in all their models. This would widen all their confidence intervals by, oh, two or three times. Did they? No, sir, they did not.

But did they imply that “air pollution” caused miscarriages on the basis of the too-narrow confidence intervals? Yes, sir, they did.

Previous studies have also indicated that maternal long-term exposure to air pollution may mean a higher likelihood of abortion/miscarriage, stillbirth and birth defects.

We investigated several possible causal mechanisms to explain this linkage…

We thus have the second part required for the Epidemiologist Fallacy.

Now for some fun technical niceties. Monsieur (I’m guessing) Zhang et alia reported their results in terms of what they call “odds ratios”. They were not. It’s a very common mistake to make, which many statisticians do make, to call the parameter inside a model an “odds ratio”.

No. What we want is

Pr(miscarriage | level of pollution, data, model)
Pr(no miscarriage | level of pollution, data, model)

That’s an odds (ratio). The confidence interval—which is equivalent to a p-value—about some parameter in a model associated with pollution says nothing directly about the uncertainty in the observable, here a miscarriage. It is a too-common blunder to confuse the uncertainty in the parameter with the uncertainty in the observable.

What this means is that the stated intervals in their “findings” are not only too narrow because of the Epidemiologist Fallacy, they’re extra-too-narrow because of the confusion of parametric with predictive analysis.

“So what’re you’re trying to say, Briggs, is that pollution is a good thing, right?”

Yes, you caught me. That’s exactly what I meant.

“Har har. You think you’re so smart. At least authors are trying to do something.”

That’s the Don’t Just Do Something, Stand There Fallacy. Doing something harmful can be worse than doing nothing.

“But it’s obvious air pollution causes miscarriages.”

Is it?

“Yes, it is. It certainly isn’t doing these women any good.”

If it’s obvious, then we didn’t need this study.

“Come on. This study at least put some numbers on the problem.”

Lousy ones; numbers you can’t believe.

“Don’t be such an ass. Everybody does studies like this.”

Too true.


  1. Sheri

    First, who the heck cares about miscarriages in the “we will abort any kid we said you could not have” China? That alone is insanity.

    Second, “Nature Sustainability” translates to “we believe in unicorns and fairies”, so why are we discussing fiction as if it had any meaning? Yes, the NYT thinks it does, but that only bolsters my statement.

    As I say, “any solution is not a solution if it’s not the RIGHT solution”. Throwing crap, fake solutions is how the Aztecs and Incas fell. You know, when some brilliant guy said “that bronze is too rare and expensive for water troughs, let’s use lead”.

  2. trigger warning

    Interesting comment, Mr Ozanne. I was struck by the notion of “premature death”. According to the US National Cancer Institute ‘s website, a “premature death” is defined as a “death that occurs before the average age of death in a certain population”.

    Zounds! Not sure precisely what they mean by “average”, but by any calculation, lots of people are dying younger than the lifespan they are entitled to. Obviously, eliminating all deaths at younger ages than the average, via well-intentioned regulation and well-designed tax policy of course, is vital (no pun intended). Indeed, perfect work for the profound deliberations of Western regulatory bodies.

  3. Bill_R

    Your enemies have struck! You might want to look at that odds ratio. The estimate you printed is the odds (Odds(A|X) =Pr(A|X)/Pr(not-A|X)). An odds ratio is, not surprisingly, the ratio of two odds ( OR = Odds(A|X)/Odds(A|Y)) and is used to compare the odds of outcome A at X with the odds of A at Y.

  4. Bill_R

    For public health or planning purposes, the important number would be something along the lines of the number of excess deaths or an attributable effect, which presumes a causal mechanism or another form of cost.

    Odds ratios are useful for comparing effects and, very important, are easy to estimate (in a logistic regression). The ratio cancels out the common (multiplicative) intercept and you are left with a sort-of dimensionless number, which can be waved around like a Vorpal sword. But they really confuse almost everyone ( a ratio of ratios?) Try explaining them to engineers or business people….

  5. Ken

    “Everybody does studies like this.
    “Too true.”

    That’s a good example of the Sweeping Generalization fallacy.

    Not “Everybody” does it like that.

    A “lot” do (whatever a “lot” is).

    That’s a distinction with a difference.

  6. DAV

    @trigger warning,

    Zounds! Not sure precisely what they mean by “average”, but by any calculation, lots of people are dying younger than the lifespan they are entitled to. Obviously, eliminating all deaths at younger ages than the average, via well-intentioned regulation and well-designed tax policy of course …


    It’s similar to poverty defined by a lower quantile. Then poverty is guaranteed forever and gov’t jobs are retained. If 10% is the defining line, even those in the lower 10% averaging $1M/yr are impoverished and will need government assistance — the Millennial’s Dream.

  7. Ray

    Does somebody who dies after the average age of death in a certain population die post maturely?

  8. Yonason


    It’s very simple. It’s just a case of Schrodinger’s Citizen. Some live longer than expected. Some live shorter. But you don’t know until you ring their doorbell, or look in their window.

    Here Roger Penrose explains it in terms of a famous cat.

    So, if the problem doesn’t occur until we measure the system, maybe we could resolve it if we just stop taking actuarial data? Ignorance is bliss, after all, which is where millenials are determined to take us anyway.

  9. Kalif

    Odds ratios are in effect sizes in themselves. No need for p values, whatsoever. The CI built around odds ratios is based on binomial distribution and we may have an issue with that, but the actual odds ratio anyone obtains is what the data show.

    Odds are a proportions of people with adverse outcomes relative to those without. The odds ratio is the ratio of two odds. I don’t see the issue there. Why talk about the parameter at all? Odds ratios are effect sizes themselves. No need for p-values. CIs for ORs are built on binomial distribution and I may have a problem with that, but not with the actual counts. They are based on the observables. They got what they got.
    Now, who was exposed and how you measure the outcome, is another issue, as miscarriages were very common in stone age too, regardless of pollution levels. However, it is more of a methodological and research design issue. In epidemiology, risk ratios are usually reported instead of ORs.

  10. Mariner

    The idea that air pollution causes [i]marriages[/i] is probably the most out-of-character ploy of your enemies ever.

    Perhaps some of them are being won over to your side.

  11. Briggs


    Golly, thanks, who knew. Now you can take a look at the Books & Class page, and go to the class on predictive statistics. I should have linked it, but all regular readers know about it.

    Also see this!

    Odds ratios exaggerate evidence in all the many ways we have discussed.

  12. “Now comes part one of the Epidemiologist Fallacy: they never measured actual “air pollution” exposure on any of these women.”

    That’s just semantics though. One can certainly take pollution exposure to be a measured level of exposure at areas you frequent (work and home).
    Personal exposures for pollutants are obviously difficult to obtain, as they tell us, so their ambient approach, which they also tell us they are doing, measuring air pollution around where they live and work, seems a logical approach, as the women would spend most of their time in those locations.

    If measured ambient pollutants have little to do with increase risk of MAFT, we’d expect to see that in the data.


  13. Briggs

    No, Justin, it’s not just semantics. If you say you measured X and did not measure X, then you did not measure X. If you use a proxy for X, you MUST include the uncertainty in the proxy’s aptness. Which nobody ever does.

    And then you must not state things in terms of odds ratios, which are statements about unobservable parameters. You MUST state things in terms of the observable, as I demonstrated.

    The point is the fallacy leads to massive OVER-certainty. As proved (in the sense of proved) in Uncertainty, and in the classes on the Books & Class page. Try it yourself!

  14. @Sheri,
    In my mind, lead metal pipes are/were not as much of a problem as lead acetate is/was (and maybe some other things like lead citrate). Dose and route make the medicine or the poison; and to know what the dose is, you have to know what the chemical is.

    [How many people know that the mammalian body makes nitric oxide and what the body uses it for?]

  15. Unless you live to 120, you will die prematurely ;p (And the span of a man’s years shall be 120). Oh wait, what about 3 score and ten? That’s closer to the mean life expectancy (which is the cumulative death rate) at birth (which leaves out those who were conceived but didn’t make it to birth, whether ‘naturally’ or by induced abortion).

  16. It keeps getting harder to tell when people are being sarcastic.

    I am with Sheri. One of the eye opening moments for me was realizing that spontaneous abortions happen way more often than we think. Then there is the strange duality that we need specialists in fertility, but pregnancies seem to happen at oh so many inopportune times. Hmm, maybe there is something to the fertility window discussions I see on other lines of discussion. We all have to be careful discussing fertility. Don’t want to shame women who choose not to have children OR to shame women who accidentally chose to have women. We especially don’t want to get into other variants of “accident” on the subject.

  17. “No, Justin, it’s not just semantics. If you say you measured X and did not measure X, then you did not measure X. If you use a proxy for X, you MUST include the uncertainty in the proxy’s aptness. Which nobody ever does. ”

    I’m not convinced, especially since the field of epidemiology started out with Snow making a map of diseases that showed clustered around a city’s well. They very clearly define ‘air pollution exposure’ as ambient air pollution (which they have measuring stations for) portioned out by where the person spends their time on average (.33 work, .66 home), since actually measuring pollutant levels in the person is difficult to obtain. They show a map of measured pollutants, and overlay that with where these people live and work. I don’t believe that if you live and work under a column of air that you won’t be breathing in that air. They also observe more MAFT in locations where pollutants are higher. Their study is also consistent with findings from the referenced studies on pollutants and their effects as well.

    The uncertainty is reflected in confidence intervals, as well as somewhat in adjustments for controlling for confounding factors and spatial autocorrelation. It isn’t perfect, but then where is any evidence that introducing ‘uncertainty in the proxy’s aptness’, exactly how to is not mentioned, would say make these confidence intervals overlap or change the story or consistency with other research? Again, they already directly observe more MAFT in areas with more pollutants, so the proxy already seems apt.

    You have it somewhat backwards on certainty/uncertainty. If you only base things on what you observe, you are actually the one being over-certain, because you are basing things only on descriptive statistics from the single or few samples that you took, compared to many samples you could have taken from a larger population.

    The paper links to their supplementary information with the maps which my browser isn’t letting me type out,


  18. Briggs

    Posting this for Bill Raynor.

    @Briggs, @Justin,

    I’m in @justin’s camp here. There is a matter of context. If I’m practicing statistical epidemiology/public health, I’m interested in measurable effects on entire populations, not some pie-in-the-sky individual causation. Epidemiology is wholesale. What Briggs is talking about is clinical, or, retail. The sample sizes should be a hint. At those sizes, one doesn’t particularly care about particular individuals. This is covered in just about every intro epidemiology course.

    Once can monitor pollution levels (or whatever) at air monitoring stations, with very little error. One can suggest policy changes based on the relationships discovered using a pre-existing casual model. Just as I can do the same sort of monitoring on a manufacturing process. If you want to get technical, google “Berkson Error Model.” Ray Carroll at TAMU has written a bit about this.

    Briggs is using arguments similar to those used by Fisher and by the tobacco companies to avoid liability for cancer deaths. And he is correct, for any particular individual. We don’t have all the information for a completely nailed down causal chain, and, frankly, we probably never will for free-living humans. The experiment is too long.

    BTW, I didn’t read the paper. But, back in the day, I did write them. Clinicians always had the same criticisms. It’s an old one.

Leave a Reply

Your email address will not be published. Required fields are marked *