I know, I know: we’re sick to death of regression, but we have to cover the biggest error, which is how the Deadly Sin of Reification happens.
You have a “y”, an outcome, some number the uncertainty in which you want to quantify with a normal distribution. We do not say “y” is “normally distributed.” That’s the beginning of the sin. It is our uncertainty in and not the “y” itself that the normal represents.
The normal has two parameters, a central m and spread s. The central parameter m is modeled as a function of “x”s, like this:
m = b0 + b1x1 + … + bpxp.
Now most textbooks get this equation right. And then immediately forget it. Almost directly after the equation is introduced, the author—and perforce, the students—substitute this:
y = b0 + b1x1 + … + bpxp.
The central parameter is forgotten and the equation is said to represent the observable itself. The model has become the reality. The Sin has been committed, because why? Because if this second equation holds, it is causative. It says, “When I change xp by such-and-such amount, y itself is caused to changed by bp.”
This is false. This is not so. This is wrong. This is foolish. Plus, it’s not right.
Probability models are not causal. They only represent uncertainty. They do not say what happens to the “y”, only what happens to our uncertainty in the “y”.
The second equation is often given in the form you see it; I mean, it is written out like that. But often it is given just in the words. Suppose “y” is “white blood count” and I want to quantify my uncertainty in its value using a normal distribution. I should use the first equation. One of the “x”s might be a person’s “body temperature.”
If it turns out the “b” associated with “body temperature” is positive, it means that, given the other “x”s, the central parameter of “white blood count” increases by “b”—if it is 100% certain “b” is positive. Of that, we typically don’t know. But that is a problem for another day. For now, assume we do know “b” with certainty. In this case, we now say the probability for higher “white blood counts” has increased.
Yet if I were to follow standard procedure, I would immediately say that higher “body temperatures” cause higher “white blood counts.” I would say (wrongly), “Increasing body temperatures drives up white blood.” I might add the hopeful escape clause “on average”, in an attempt to alleviate the Sin. This does not fix the mistake. The reification has happened, and cannot be removed.
Our sample data includes many individuals, some with high “body temperatures” some with low, some with high “white blood counts” some with low. What caused each individual’s actual “body temperature” and “white blood count”? Too many things to count. A body temperature is a complex configuration of bones, muscles, blood, energy use, and on and on. Same thing for “white blood count.” There may be some disease or diseases or malfunctions in some of the people which, through various mechanisms, cause higher “white blood counts.” But “body temperature” is not likely to be one of these causes.
The Reification continues in ascribing a theory which “explains” the positive “b”. This theory may be true, or likely true, and sometimes even is, especially in situations where a highly controlled experiment has been run, where every possible (known) causal factor has been accounted for.
But usually the theory is a raw guess.
Take this all-too-typical headline “Restaurant rage: Living in an area with lots of fast food stores can make you impatient and unable to savour things, researchers warn“.
Researchers recruited people on line, asked them in which zip code they lived, looked up in a book how many fast food restaurants where in that zip code, ignoring that land area in zip codes change dramatically, asked the participants some questions, ran a model which showed the answers to the questions and number of restaurants (actually, the ratio of fast food to “normal” restaurants) were “correlated.”
The researchers immediately committed the Deadly Sin of Reification. They said their work shows “that as pervasive symbols of impatience, fast food can inhibit savoring, producing negative consequences for how we experience pleasurable events.”
Now this theory might be true, but in no way has it been proven, or even close to proven. It is also absurd on its face. All other possible explanations of the data were denied, as felt natural after the Deadly Sin had been committed.
Typos are a two-for-one special today only.