A standard epidemiological study goes like this: people who have been “exposed” to some thing, say, cell phone radiation, are examined to discover whether or not they have some malady. Some will, some won’t.
A second group of folks who have not been “exposed” to cell phone radiation are also examined to discover whether they have the same malady. Again, some will, some won’t.
In the end, we have two proportions (fractions, or percentages) of sufferers: in the exposed and non-exposed groups. If the proportion of those who have the malady in the exposed group is larger than the proportion of those who have the malady in the non-exposed group, then the exposure is claimed, at least tacitly by scientists, and absolutely by all lawyers, to have caused the malady.
Actually, the exposure will be said to have caused the difference in proportions of the malady. But the idea is clear: if it weren’t for the exposure, the rate of sufferers in the exposed group would be “the same” as in the non-exposed group.
The statisticians or epidemiologists who publish these results, and the press-releasing officials at the institutions where the epidemiologists work, are careful, however, to never use the word “cause”.
They instead point to the exposure and say, “What else could have caused the results?” thus leaving the reader to answer the rhetorical question.
This allows wiggle room to doubt the exposure truly caused the malady, in much the same way as television newsreaders use the word “allegedly.” As in, “Smith was allegedly caught beating his victim over the head with a stick.” If upon other evidence, Smith turns out to be not guilty, reporters can truthfully, and oh-so-innocently, claim, “We never said he did it.”
The humble p-value is the epidemiologist’s accomplice in casting aspersions on the exposure as the guilty (causative) party. Here’s how.
Classical statistics allows you to pose a hypothesis, “The proportions of sufferers in the two groups, exposed and not-exposed, are identical.” The data for this hypothesis—here, just the numerators and denominators of the two proportions—are fed into software and out pops a p-value.
If that p-values is less than the mystical value of 0.05—this value has always been arbitrary, but you simply cannot talk people out of its metaphysical significance—then you are allowed to “reject” the hypothesis of proportional equality.
Meaning, in the tangled language common to statistics, you conclude that the proportions are different. This is always a fact which you already knew (either the proportions were different or they weren’t; to check, just use your eyes).
But the actual interpretation when a small p-value is found, irresistibly cherished by all even though it is contrary to theory, is that the exposure did the deed. That is, that the exposure caused the differences in proportions.
Here’s where we are so far: two groups are checked for a malady, if the proportions of sufferers are different, and if the software pops out a publishable p-value (less than 0.05), we say the difference in groups was caused by some agent.
However, a curious, and fun, fact about classical statistics is that the larger the sample of data you have, the smaller your p-value will be. Thus, just by collecting more data, you will always be assured of a successful study. Here’s an example.
Our two groups are cell phone users (exposed) and non-cell phone users (non-exposed), and our malady is brain cancer. I trot to the hospital’s brain cancer ward and tally the number of people who have and haven’t used a cell phone. These folks are the numerators in my sample.
I then need a good chunk of non-brain cancer sufferers. I am free to collect these anywhere, but since I’m already at the hospital, I’ll head to the maternity ward and count the number of people who have and haven’t used cell phones (including, of course, the newly born). None of these people have brain cancer. These folks added to the brain cancer sufferers form the denominators of my sample.
This will make for a fairly small sample of people, so it’s unlikely I’ll see a p-value small enough to issue a press release. But if I repeat my sample at, say, all the hospitals in New York City, then I’ll almost certainly have the very tiny p-value required for me to become a consultant to the lawyers suing cell phone manufacturers for causing brain cancer.
In Part II: What Could Possible Go Wrong?