The ideal experiment is one in which control is exerted over all aspects of an environment. We desire to measure an outcome; for ease suppose this is a simple number. Any measuring system we can devise will make our measurement of this number finite and discrete, even if the outcome itself is continuous and infinitely graduated. How we know this number could be continuous and infinitely graduated when we can only ever measure it at finite, discrete markers is a matter I won’t here consider.
Before us is the interocitor, which has many levers, all of which by supposition might exert influence over our number X. The “by supposition” is key. We assume or suppose or have otherwise proved that these levers and only these levers have a bearing on X. Now, because we are good at imagining, we can always imagine after measuring X that something besides the levers took control over X at this instant. It is possible, in the sense of we can imagine it, that Martians aimed psychic vibrations at our interocitor through a wormhole and affected X. Since this kind of imagining can go on ad infinitum, if we allow it, any of an infinite number of causes other than the levers could have controlled X.
But we do not allow it. That is, it is we who assume that it is these and only these levers that control X. Then, when lever A is moved through its paces and all the other levers are held fixed, we can watch the change in X. And then we can say that the lever in it places caused X to take the values it did. That is, the lever caused X given all the other known aspects of the environment. If the environment changed, the lever might no longer be causative.
Incidentally, if the interocitor breaks, the situation becomes similar to vibrating Martians. That the machine is broken but unsupposed to be broken does not change how we ascribe cause to the lever. The broken nature of the machine is part of the fixed environment. All that results is that the cause of the lever could be different if the machine were fixed. Now we might suspect the interocitor is broken, but only by assuming outside knowledge, using suppositions like, “I would have guessed because of my experience with other machines, that X would have moved more as the lever was pushed.” But that is a different question. It stands that the fixed environment, even with a broken machine, still allows us to say what the lever causes. It only means the environment was not what we thought.
That’s a long-winded explanation of experimental control which everybody already knows. The point of dragging it out in gory detail was to prove that understanding cause is an epistemological concern that relies at base on the assumptions we bring to the problem. It’s clear enough that if we move the lever, assuming all else remains fixed, and X changes, that the lever is the first cause of the state of X.
Enter “statistical control,” which is not like actual control. Since the most common usage of the term is in regression modeling, that’s the example I’ll use. Regression assumes we can measure the uncertainty in values of X, which we can only measure at discrete and finite levels, can be characterized by a normal distribution, which has two parameters, a central and spread. Since measurement is finite and discrete, and normal distribution assert probability over the continuum, regression is always an approximation.
Anyway, the central parameter of this normal distribution is said to be a function of various measurable observables; as these observables change, the central parameter for the normal changes, and thus our uncertainty in X changes (the spread parameter is thought to be fixed).
It is obvious that we would not use regression in cases of actual physical control: there is no need of it. In actual control, the causes are known (or assumed) and we can make direct measurements. Regression is used only when the causes are unknown. All probability models are used when causes are unknown. We have already seen that probability models, of which regression models are a subset, cannot ascertain cause.
We can still us an interocitor as an example if X is buried in quantum mechanical effects, where the range of X (all else held as constant as possible with our lever in a set position) must be characterized with probability, but quantum mechanical situations are those where it is acknowledged causes are unknown.
The interocitor being useless, consider X measures income of individuals. Somebody is interested in whether incomes differ between races, so race is an observable measured. Now to see if income did differ, all one has to do is look: it did or it didn’t. depending on the definition of differ. It really is as simple as this: define differ and just look.
There are many definitions of differ. One might be—and this is in no way an endorsement: the best definition depends on the decisions one wants to make—is if the means of people measured are unequal. Suppose this definition is met: very well, there is a difference between races.
What caused this difference? There is no way to tell using just this data (the link above proves this, as do extensive discussions in Uncertainty: The Soul of Modeling, Probability & Statistics). A voice will suggest “Racism caused the difference”, but this will be mere guesswork. Any number of potential causes can be suggested, and each is equally supported by the data; which is to say, they are not supported at all.
This point is emphasized, because the way regression (or any probability model) is often used is that the cause that is thought of by the modeler is asserted to be true, even though there is no evidence this purported cause is the true cause. But there is a suspicion by the modeler that others might not buy his purported cause, so the parameter associated with race in the regression is put forth as proof the cause in genuine. But this is no different than defining differ as the regression parameter. Cause cannot be asserted, nor even inferred, not on the evidence of the data alone.
Now even if this race parameter is “significant”, the modeler still suspects there will be doubt by others about the cause. And so—finally!—“statistical control” enters the scene. The modeler will enter another observable in the model and say that by doing so he is “controlling” for this observable. Of course, all that has happened is that the change in the central parameter for X as the new observable is changed can be measured (assuming race is still in the model). Since we could not tell cause with just race in the model, we cannot tell it with any new observable in the model.
It should be obvious that “statistical control” is nothing like actual physical control. All that has happened with statistical control is that a subset of the data has been looked at; say males and females of the different races. The definition of differ can, of course, be modified to incorporate sex, so that if one wants to see whether there were differences in sex and race, one need only look.
The model is only useful is making predictions of new observations of X, given assumed values of the observables race and sex (and whatever else is stuck inside). But in no case has cause been shown.
Update Randomness and chance are never causes; they are only states of mind.