Let’s add in a layer of uncertainty and see what happens. But first hike up your shorts and plant yourself somewhere quiet because we’re in the thick of it.
The size of relative risks (1.06) touted by authors like Jerrett get the juices flowing of bureaucrats and activists who see any number north of 1 reason for intervention. Yet in their zeal for purity they ignore evidence which admits things aren’t as bad as they appear. Here’s proof.
Relative risks are produced by statistical models, usually frequentist. That means p-values less than the magic number signal “significance”, an unfortunate word which doesn’t mean what civilians think. It doesn’t imply “useful” or “important” or even “significant” in its plain English sense. Instead, it says the probability of seeing a test statistic larger (in absolute value) than the one produced by the model and observed data if the “experiment” which gave the observations were indefinitely repeated and if certain parameters of the quite arbitrary model are set to 0.1 What a tongue twister!
Every time you see a p-value, you must recall that definition. Or fall prey to the “significance” fallacy.
Now (usually arbitrarily chosen and not deduced) statistical models of relative risk have a parameter or parameters associated with that measure.2 Classical procedure “estimates” the values of these parameters; in essence, makes a guess of them. The guesses are heavily—as in heavily—model and data dependent. Change the model, make new observations, and the guesses change.
There are two main sources of uncertainty (there are many subsidiary). This is key. The first is the guess itself. Classical procedure forms confidence or credible “95%” intervals around the guess.3 If these do not touch a set number, “significance” is declared. But afterwards the guess alone is used to make decisions. This is the significance fallacy: to neglect uncertainty of the second and more important kind.
Last time we assumed there was no uncertainty of the first kind. We knew the values of the parameters, of the probabilities and risk. Thus the picture drawn was the effect of uncertainty of the second kind, though at the time we didn’t know it.
We saw that even though there was zero uncertainty of the first kind, there was still tremendous uncertainty in the future. Even with “actionable” or “unacceptable” risk, the future was at best fuzzy. Absolute knowledge of risk did not give absolute knowledge of cancer.
This next picture shows how introducing uncertainty of the first kind—present in every real statistical model—increases uncertainty of the second.
The narrow reddish lines are repeated from before: the probabilities of new cancer cases between exposed and not-exposed LA residents assuming perfect knowledge of the risk. The wider lines are the same, except adding in parameter uncertainty (parameters which were statistically “significant”).
Several things to notice. The most likely cancer cases stopped by eliminating completely coriandrum sativum is still about 20, but the spread in cancer stopped doubles. We now believe there could be more cancer cases, but there also could be many fewer.
There is also more overlap between the two curves. Before, we were 78% sure there would be more cancer cases in the exposed group. Now there is only a 64% chance: a substantial reduction. Pause and reflect.
Parameter uncertainty increases the chance to 36% (from 22%) that any program to eliminate coriandrum sativum does nothing. Either way, the number of affected citizens remains low. Affected by cancer, that is. Everybody would be effected by whatever regulations are enacted. And don’t forget: any real program cannot eliminate completely exposure; the practical effect on disease must always be less than ideal. But the calculations focus on the ideal.
We’re not done. We still have to add the uncertainty in measuring exposure, which typically is not minor. For example, Jerrett (2013) assumes air pollution measurements from 2002 effect the health of people in the years 1982-2000. Is time travel possible? Even then, his “exposure” is a guess from a land-use model. Meaning he used the epidemiologist fallacy to supply exposure measurements.
Adding exposure uncertainty pushes the lines above outward, and increase their overlap. We started with 78% chance any regulations might be useful (even though the usefulness affected only about 20 people); we went to 64% with parameter uncertainty; and adding in measurement error will move that number closer to 50%—the bottom of the barrel of uncertainties. At 50%, the probability lines for exposed and not-exposed would exactly overlap.
I stress I did not use Jerrett’s model—because I don’t have it. He didn’t publish it. The example here is only an educated guess of what the results would be under typical kinds of parameter uncertainty and given risks. The direction of uncertainty is certainly correct, however, no matter what his model was.
Plus—you knew this was coming: my favorite phrase—it’s worse than we thought! There are still sources of uncertainty we didn’t incorporate. How good is the model? Classical procedure assumes perfection (or blanket usefulness). But other models are possible. What about “controls”? Age, sex, etc. Could be important. But controls can fool just as easily as help: see footnote 2.
All along we have assumed we could eliminate exposure completely. We cannot. Thus the effect of regulation is always less than touted. How much less depends on the situation and our ability to predict future behavior and costs. Not so easy!
I could go on and on, adding in other, albeit smaller, layers of uncertainty. All of which push that effectiveness probability closer and closer to 50%. But enough is enough. You get the idea.
1Other settings are possible, but 0 is the most common. Different models on the same data give different p-values. Which one is right? All. Different test statistics used on the same model and data give different p-values. Which one is right? All. How many p-values does that make all together? Don’t bother counting. You haven’t enough fingers.
2Highly technical alley: A common model is logistic regression. Read all about them in chapters 12 and 13 of this free book (PDF). It says the “log odds of getting it” are linearly related to predictors, each associated with a “parameter.” The simplest such model is (r.h.s) b0 + b1 * I(exposed), where the I(exposed) equals 1 when exposed, else 0. With a relative risk of 1.06 and exposed probability of 2e-4, you cannot, with any sample size short of billions, find a wee p-value for b1. But you can if you add other “controls”. Thus the act of controlling (for even unrelated data) can cause what isn’t “significant” to become that way. This is another, and quite major, flaw of p-value thinking.
3“Confidence” intervals mean, quite literally, nothing. This always surprises. But everybody interprets them as Bayesian credible intervals anyway. These are the plus or minus intervals around a parameter, giving its most likely values.