Dust is thrilling researchers these days. Dust, or the more scientific sounding PM2.5, is the next greatest struggle after global warming. We’ve already seen attempts to prove dust, as measured by the proxy of living next to roads, causes heart disease. The attempt failed, for all the usual statistical reasons: wee p-values, no causal link, the epidemiologist fallacy, and so forth.
Same thing with our latest paper, “Living near major roads and the incidence of dementia, Parkinson’s disease, and multiple sclerosis: a population-based cohort study” by Hong Chen and a slew of others in the Lancet.
Because I don’t want the point to get lost, I want to emphasize PM2.5 papers, or actually their proxies—it’s almost always proxies—do nothing except wreak havoc, they are similar to global warming papers. Researchers cannot fathom living near a “major” roadway can do anything except cause harm, that it can do no good. So they never check for the good, just like with global warming, which is everywhere malevolent.
Skip that. It’s on to the paper!
Emerging evidence suggests that living near major roads might adversely affect cognition. However, little is known about its relationship with the incidence of dementia, Parkinson’s disease, and multiple sclerosis. We aimed to investigate the association between residential proximity to major roadways and the incidence of these three neurological diseases in Ontario, Canada.
So what about the data for this statistical scavenger hunt? For those people who had medical records and who were diagnosed for certain comorbidities, they have them. For those without records or with undiagnosed illnesses, they didn’t have them. Obvious points, but it cuts back on the idea disease states were “controlled” in the statistical models.
Income was not measured; they used an error-prone proxy instead, assigning people to incomes buckets belonging to neighborhoods.
Then came the weird measures. “To control for regional differences in the incidence of
dementia, Parkinson’s disease, and multiple sclerosis, we created a variable for urban residence (yes/no), density of neurologists using the ICES Physician Database to represent accessibility to neurological care, and the latitude of residence given the reported latitude gradient with multiple sclerosis.”
Density of neurologists? Yes, sir. Density of neurologists.
We now enter the realm of the epidemiologist fallacy. “Briefly,” our authors say, “estimates of ground-level concentrations of PM2.5 were derived from satellite observations of aerosol optical depth in combination with outputs from a global atmospheric chemistry transport model (GEOS-Chem CTM). The PM2.5 estimates were further adjusted using information on urban land cover, elevation, and aerosol composition using a geographically weighted regression….Similarly, we derived long-term exposure to NO2 from a national land-use regression (LUR) model…”
Actual PM2.5 and NO2 exposure was not measured. The proxies were assumed to be the real exposures. This is the epidemiologist fallacy. But wait! That isn’t always bad. There’s a possibility of saving the day. That’s if (a) the uncertainty in the proxies as measures were used in the statistical models at all stages, and (b) the uncertainty in the chemistry transport etc. models was used in the statistical models at all stages.
And were these crucial uncertainties—large ones, too: “The final LUR model explained 73% of the variation in annual 2006 measurements of NO2”—used in the statistical models at all stages?
Alas, dear reader, they were not (at least, I did not find any evidence they were).
There were more uncertainties, mostly involving complex measures of distances of this and that, which are important but would distract us (you can read the paper). I’m more interested in the uncertainties in the outcomes themselves, and the comorbidities. I’ve already mentioned the difficulty that only diagnosed maladies were in the databases, and that undiagnosed ones weren’t.
But that doesn’t mean that what appeared in the databases was error free. As the authors say: “These databases have been validated previously using chart review, with sensitivity of 78–84% and specificity of 99–100%.”
Dude. That doesn’t translate into an unimpeachable accuracy rate. Meaning, as should be obvious, the outcomes and some measures had uncertainty, too. Which is not damning. Measurement-error models exist to handle these kinds of things.
So. Was the uncertainty in the outcomes and measures incorporated into the exquisitely complex statistical models?
Alas again, my dear ones, it does not appear to be so.
It was worse than this, because after all this came the wee p-values (or confidence intervals, which amount to the same thing).
They found that the hazard rate of Dementia, but none of the other maladies, was highest for those with addresses nearest to “major” roadways, after “controlling” for that other stuff (in an incomplete way).
Curiously, having an database address nearest “major” roads was as dangerous living farthest away for Multiple Sclerosis, and that addresses in the middle range fared best. Which is a signal something screwy is going on. But since none of the p-values for MS were wee, this oddity was dismissed.
Why the wee ps? Well, the datasets were huge. A major (huge!) failing of p-values is that ps are always wee for large enough sample sizes, even in the complete absence of cause. Here, the effects weren’t so big, either.
The confidence intervals were parametric, not predictive. Huge sample sizes make for short CIs, just as they make for small ps. What’s wanted are actual predictive intervals, but we don’t see them.
Their conclusion: “In this large population-based cohort, living near major roadways was associated with increased dementia incidence. The associations seemed stronger among urban residents, especially those living in major urban centres and those who never moved.” Who lives near “major” roadways and doesn’t move in major urban centers? People who usually can’t afford to, and who might not be in as good health and their more mobile not-so-near neighbors.
“We observed that exposures to NO2 and PM2.5 were related to dementia and that adjusting for these two pollutants attenuated its association with roadway proximity, suggesting that the effect of traffic exposure might, at least in part, operate through this mechanism.”
At best this is translated into “Breathing pollution isn’t good for you,” which isn’t a major breakthrough. On the other hand, exposure to NO2 and PM2.5 was not measured. Proxies were.
My bet, or really my challenge to the authors, is to redo the whole shebang, only this time incorporate all the uncertainties I mentioned (and some which were too boring to bring up) and recast the results in predictive not parametric terms. I’ll lay fifty bucks that says all the associations disappear or become trivial.