This came up yesterday (again, as it does often), so I figure one more stab is in order. Because the answer isn’t simple, I had to write a lot, which means it won’t get read, which means I’ll have to write about it again in the future.
Trust your eyes
You’re a doctor (your mother is proud) and have invented a new pill, profitizol, said to cure the screaming willies. You give this pill to 100 volunteer sufferers, and to another 100 you give an identically looking placebo.
Here are the facts, doc: 71 folks in the profiterol group got better, whereas only 60 in the placebo group did.
Now here is what I swear is not a trick question. If you can answer it, you’ll have grasped the true essence of statistical modeling. In what group were there a greater proportion of recoverers?
This is the same question that was asked yesterday, but with respect to the global temperature values. Once we decided what was meant by a “trend”—itself no easy task—the question was: Was there a trend?
May I have a drum roll, please! The answer to today’s question is—isn’t the tension unbearable?—more people in the profitizol group got better. The answer to yesterday’s question was (accepting the definition of trend therein): no.
These answers cause tremendous angst, because people figure it can’t be that easy. It doesn’t sound sciency enough. Well, it is that easy. You can go on national television and trumpet to the world the indisputable inarguable obvious absolute truths that more people in the drug group got better, and that (given our definition of trend) there hasn’t been a trend these twenty years.
Question two: what caused the difference in observed recovery rates? And what caused the temperature to do what it did?
My answer for both: I don’t know. But I do know that some thing or things caused each person in each group to get better or not. And I know that some thing or things caused temperature to take the values it did. I also know that “chance” or “randomness” weren’t the causes. They can’t be, because they are measures of ignorance and not physical objects. Lack of an allele of a certain gene can cause non-recovery, and the sun can cause the temperature to increase, but “chance” is without any power whatsoever.
Results are never due to chance, they are due to real causes, which we may or may not know.
The IPCC claims to know why temperature did what it did. We know the IPCC is wrong, because their model predicted things which did not happen. That means the causes it identified are wrong in some way, either by omission or commission. That’s for them to figure out.
Clever readers will have noticed that, thus far, there was no need for statistical models. But if our goal was only to make the statement which group got better at greater rates or if there was a trend, no model was needed. Why substitute perfectly good reality with a model? That is to commit the Deadly Sin of Reification (alas, an all too common failing).
Enter the models
The classically trained (Bayesian or frequentist) statistician will still want to model, because that is what statisticians do. In the drug trial they will invent for themselves a “null hypothesis”, which is the proposition, “Profitizol and the placebo cause the exact same biological effects”, which they ask us to “accept” or “reject”.
That means, in each patient, profitizol or a placebo would do the same exact thing, i.e. interact with the relevant biological pathways associated with the screaming willies such that no measurement on any system would reveal any difference. But given you are a doctor, aware of biochemistry, genetics, and the various biological manifestations of the screaming willies, it is highly unlikely this “null” proposition holds. Indeed, to insist it does is to abandon or willfully ignore all this knowledge and cast all your attention on only that which can be quantified (the Sin of Scientism).
Of course, you might have made a mistake and created a substance which was (relative to the SW) identical with the placebo. Mistakes happen. How do we tell? Do we have any evidence that profitizol works? That’s the real question, the question everybody wants to know. Well, what does “works” mean?
Uh oh. Now we’re into causality. If by “works” we mean, “Every patient that eats profitizol is cured of the SW” then profitizol does not work, because why? Because not every patient got better. If by “works” we mean, “Some patients that eat profitizol are cured of the SW” then profitizol works, and so does the placebo, because, of course, some patients who ate the drug got better. Defining properly what “works” is not an easy job, as this series of essays on a famous statistical experiment proves. Here we’re stuck with the mixed evidence that patients in both groups got better. Clearly, something other than just interacting with a drug or placebo is going on.
What to do?
Remember the old saw about how the sale of ice cream cones was “correlated” with drownings? Everybody loves to cite—and to scoff at—this example because it is obviously missing any direct causal connection. But it’s a perfectly valid statistical model. Why?
Because a statistical model is only interested in quantifying the uncertainty in some observable, given clearly stated evidence. Thus if we know that ice creams sales are up, it’s a good bet that drownings will rise. We haven’t said why, but this model makes good predictions! (I’m hand-waving, but you (had better) get the idea.)
Statistical models do not say anything about causality. We’re not learning why people are drowning, or why people are getting better on profitizol, or why the temperature is doing what it’s doing. We are instead quantifying our uncertainty given changes in certain conditions—and that is it.
If we knew all about the causes of a thing, we would not need statistics. We would feed the initial and observed conditions into our causal model, and out would pop what would happen. If we don’t know the causes, or can’t learn them, but still want to quantify uncertainty, we can use a statistical model. But it’s always a mistake to infer (without error; logically infer) causality because some statistical model passes some arbitrary test about the already observed data. The ice cream-drowning model (we assume) would pass the standard tests. But there is no causality.
Penultimate fact: To any given set of data, any number of statistical or causal or combination models can be fit, any number of which fit that observed data arbitrarily well. I can have a model and you can have a rival one, both which “fit” the data. How do we tell which model is better?
Last fact: Every model (causal or statistical or combination) implies (logically implies) a prediction. Since models say what values, or with what probability what values, some observable will take given some conditions, all we do is supply those conditions which indicate new circumstances (usually the future)—voilà! A prediction!
It’s true most people who use statistical models have no idea of this implication (they were likely not taught it). Still, it is true, and even obvious once you give it some thought.
Not knowing this implication is why so many statistical models are meager, petty things. At least the IPCC stuck around and waited to see whether the model they proposed worked. Most users of statistics are content to fit their model to data, announce measures of that fit (and since any number of models will fit as well, this is dull information), and then they run away winking and nudging about the causality which is “obvious.”
Not recognizing this is why we are going through our “reproducibility crisis”, which, you will notice, hits just those fields which rely primarily on statistics.