Be sure to read yesterday’s post first.
One of the screwy consequences of classical statistics is that my odd sample (mixing babies and the patients from the brain cancer ward) is perfectly acceptable. Nobody would ever use such a sample, of course, but that is a tacit admission of the incompleteness of the classical theory (about which, more another day).
For a moment, ignore the oddness of my sample and assume that it is instead a “good” one, where I’ll leave “good” unexplained. Think of it as comparing cell phone users and non-users in the “right” way (whatever that means).
Even assuming the sample is “good”, we still have a problem. We cannot say why only some people in the exposed group developed cancer, nor can we say why everybody in the non-exposed group did not. Obviously, something, or some things, caused some of those people to develop cancer, or caused them not to develop cancer. What?
Why didn’t everybody in the exposed group develop cancer, and why didn’t everybody in the non-exposed group remain free of the disease? Obviously, something or some things that we did not measure are part of the causal brain cancer chain.
Through our sample, we only know two things with certainty: (1) cell phone exposure does not always cause cancer, (2) non-cell phone exposure does not always cancer. Further, these are direct statements of causality.
Any other statement about causality we can make with our sample can only be true with a probability somewhere strictly greater than 0 and less than 1. What are these uncertain statements like?
It is our surmise that, through some biological mechanism, said to be plausible conditional on information external to our sample, that cell phone radiation twiddles with certain cellular (ha!) processes, turning normal cells into rogue ones. But we also must believe that these mechanisms only work sometimes, or only on people who meet other criteria, or both.
We might guess what these other criteria are—say, smoking—but these are just guesses: we cannot know with certainty that the other criteria are causally responsible, just as we cannot say that cell phone radiation certainly is.
Further, we might guess incorrectly—this is what it means to guess, right?—and that other processes, completely unknown to us are what causes the cancer. For example, it could be that ingesting an unknown suite of chemicals in just the right order causes the cancer, but only when hit by the radiation from the cell phone.
Suppose, then, that we know of no other criteria: that is, we are not considering any other measured or unmeasured (of which there a number approaching infinity) characteristic. That is, we are not prepared to formally specify, or model, how these characteristics affect cancer. Understand: this includes the sample, or the way in which the sample was taken.
If we believe our mixed maternity ward/brain cancer ward sample is somehow “biased”, we must be prepared to model that bias. It is we who suppose the bias is a certain way. That, after all, is what it means to create a model. (Classical theory has a difficult time formalizing just how bad this sample is; people surely make statements that it is, but they do not do so based on formal probability.)
OK, no other criteria considered, including the way the sample arose, except exposure. Suppose we see a low p-value. Are we entitled to say that the cell phone radiation caused the cancer? No, as already explained. But can we make statements such as, “There is an 3% chance that if you use a cell phone, you will develop cancer?” Yes, but not using classical theory—you can say this using Bayesian statistics. And even then, it is not a statement of causality, merely one of correlation.
Classical theory only lets you say something about the p-value and that hypothesis mentioned yesterday. Don’t forget, though, that that hypothesis is actually only a statement about the parameters of a formal probability model of exposure and cancer. P-values do not say anything directly about chances about actual things happening.
So does going with Bayesian statistics solve all our problems? In other words, is that “3% chance” correct? Answers: no, and probably not.
All probability statements are conditional on specified information. That “3% chance” is a correct probability assessment given our information. If our information is faulty or biased, then so it the “3% chance.” Since we cannot know our information is true, then we cannot know whether the “3% chance” is true, either.
That is, it could still be that case that something utterly unconnected with what we measured or with our biological theory caused the cancer.
And with observational/epidemiological data, the chance, as experience has shown us, of something else causing the malady, is pretty high.
Obviously, this is just a sketch. My vacation ends today, but this was written the day before, hurriedly in a coffee shop on my way to lake for one last swim.