Class 70: Most Claims Of “Controlled For” Are Wrong: The Right Way

Claims of control are almost always false or misleading. “Controlling for” sex or age or anything in probability models is not true control. It’s just adding things to models. Vast over-certainty is generated here.

Video

VIDEO LINK FIXED!

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Given below; see end of lecture. Do the code!

Lecture

My use of the example is now hackneyed, but it is still apt. A chemist wants to measure the rate of heat change due to a reaction. He measures out his substances to great precision, he isolates his test apparatus from all those things he can think of that might add or subtract heat, and which have nothing to do with the reaction itself. Then he performs his experiment, perhaps more than once, reasoning even he might have missed some things, or that the substances with which he is working might not be as perfect or pure as claimed, or that there are only imperfect, in the strictest sense, methods of mixing or stirring or whatever.

The chemist can rightly claim to have “controlled for” whatever list of things he thought were important. He did his best to control the conditions and bring out the powers of the substances under consideration. On the assumption—does this sound like probability yet?—that he has controlled for all causes in changes of conditions, and for the causes between the substances, then he is entitled to say that the effects he witnessed were caused by whatever he claimed.

That leaves open the possibility he missed something. His claim is conditional, just as all arguments are conditional, hence why all probability is conditional. This is no weakness: it is plain fact. The chemist—you have heard this before, and will hear it again and again—has brought cause to his model, which is to say, his argument. There is no difference (to us) between models and arguments. He did not extract cause from his model. For all models only say what they are told to say.

That kind of thing is what is meant by word control. When you hear of controlled experiments, that is what you ought to have in mind.

When you hear instead a claim that “such and such were controlled for in our probability (or statistical) model”, you reaction should be extreme doubt and skepticism. For nearly all the time this does not represent the kind of control the chemist made, actual manipulation of all suspected or assumed causes and conditions. What is usually meant is dumping junk into a regression or other similar model.

This is not control. It is called control. It is a false boast. This is not our first instance of lousy harm-inducing naming in science. Probability and statistics is rife with wholly misleading names. Like “significance”. Which only proves that indulging in a bad habit leads to worse excesses.

Anyway, let’s look at how to add things in probability models the Right and Wrong way. We do the Right way today, and leave the Wrong to next week.

The Right way acknowledges at the start that adding anything to a model is just assuming another premise, which is not control, and which is neither good nor bad in itself. It is only good or bad depending on what we do with the information.

We’ll continue using the GPA example. We at last need some math. Let $\theta$ be the central parameter of the normal which quantifies our uncertainty in as-yet-used-or-seen GPAs. There is a spread parameter, too, but nearly everybody ignores it. We’ll do that, too for today.

This is linear regression:

$$\theta = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p.$$

Linear means in that central parameter. Those “x” are other measures we choose to premise on, but which people wrongly falsely in error mistakenly say are “controls”. Of course, we can add anything to that right hand side, functions of an “x” (squaring is popular), or two or more “x” multiplied together, or whatever. It’s just more stuff, more premises.

None of this is causal, as we learned earlier. Though many, many, many claim cause—through “significance” or other means, seen next time. Those “$\beta$” in particular are not causes. They’re only a way to relate the “x” to the $\theta$; only that, and nothing more. And that’s really all there is to regression, or really any probability model. Making functions, containing measures “x”, for whatever (usually) parameterized probability we choose. Normal is only one of (how many?) an infinite number of possibilities. Almost always the choice is ad hoc and made because of custom. Whether that’s good or bad in any individual case has to be checked each time.

Since we don’t know what values to put for each $\beta$, we have to assume something about them. Just like we assumed the model and assumed its parameterized distribution and form, which is a vastly more important and consequential step. But for some reason, the small matter of making guesses or assumptions about the measly $\beta$ consumes the attention of most.

Some of this is because people enjoy bickering; often in academia the smaller the matter the greater the heat. But a lot is because it is said the $\theta$ and $\beta$ have “true” values, which we have seen they most certainly do not. The claim is always false. The parameters are fictions, useful for approximations. We prove this again below.

There are many tips and techniques proven to give better and worse approximations, of course. And there are other arguments to consider this or that way of guessing are better in some sense. We’ll cover those another day. For now, we’ll let the “default” of the software be our premise. Which is, also of course, what most do.

When we added “White or not” as our “x”, it was in the form of a step or indictor function (it has many names), which was “I(White)”, which gives 1 if the person is White, or 0 if not. If that was the only other measure premise in the model, then we have:

$$\theta = \beta_0 + \beta_1I(White).$$

So now you can see that $\beta_1$ takes on the natural interpretation of the difference in central parameter $\theta$ for Whites and non-Whites. And that the “Intercept” is when $\theta = \beta_0$, i.e. when a person is not White.

Simple as that. It becomes a chore to keep track of what the “Intercept” means when you have more than one “x” in the model, though. But it will be highly useful for us today. Suppose our second premise is a person’s High School GPA. Then our model is (using the obvious shorthand):

$\theta = \beta_0 + \beta_1I(White) + \beta_2HGPA.$

This makes the “Intercept”, i.e. $\beta_0$, equal the central parameter for the normal when a person is non-White and has a HGPA = 0. Which can happen, I suppose, and probably does in places like Baltimore. But instead of HGPA suppose we added age (which many do, only we don’t have it in our data), then $\beta_0$ would be the central for the normal representing uncertainty in non-Whites who are 0 years old.

Now that right there ought to tell you–ought to tell everybody—that parameters aren’t real and don’t have “true” values, and only have values based on whatever premises we supply about them, but it never does. I find this very strange. The matter is right there, in the open, but somehow it can’t be seen.

(If you have trouble seeing this, add age with HGPA, then $\beta_0$ becomes the parameter for non-White 0 year olds with HPGAs = 0. Which also makes no sense whatsoever if parameters have true values.)

All right, let’s add HGPA and sex with race. The data is coded so males are 1 (another indicator function, but the software does it automatically here). There are several more things we can add, like math SAT score and the like (listed here). But this is enough for illustration. I’ll leave the rest as homework.

Our interest is in “new” GPA (we know all about the old), and we have said, in our premises, that race, sex, and HGPA were probative. That means we have to specify values of race, sex, and HGPA. We’ll get different probabilities for different GPA for each change of premise. That’s a feature, not a bug. It is not only a feature, it is a downright requirement. It is the only way to make sense of the model we constructed. The code is below.

This is the Right way. Which is a way that requires quite a lot more work than the Wrong way, as we’ll see next time.

We can draw a picture, suitably discretized (because, as always, normals live on the continuum, and all observables on a finite discrete set) for each combination and value of HGPA, race, and sex that interest us. Which is doable enough here, but (again) can become a major chore when one adds, say, a dozen “x”. But that’s tough cookies, because it was you who said you wanted these things in your model, so you are obliged to show how probative they are!

The HGPA has a median of 3.5, likely due to grade inflation. So I picked that and a contrasted it to a value of 4, along with male and female, White and non-White. There are eight possibilities (2 levels of each of three measures), but I picked only 4 that interested me. What interests you might be entirely different. You can see which combinations I picked on this picture, which is the (discretized) probabilities of new GPAs for each new group, all conditional on the model, its parameterization, certain premises about those parameters (conjugate priors), the integration sample, and the data sample. Change any and you change the probability!

The legend is tight, but you can see I first did non-Male non-White low-HGPA first (orangeish), then the others, ending with White Male high-GPA (light purple).

Probability leakage is now ridiculous for White Males with 4 HGPA. Something like 5.3% of probability leaked out of the model (meaning there’s that much chance of a college GPA greater than 4.15, where 4.15 is the max possible). That’s likely to be important for most applications, but that’s just a guess. Whether it’s important is up to you. It is always up to you.

The chance a White Male with high HGPA has higher college GPA than a non-White non-Male with low HGPA is 86%.

And so on for any questions you might care to ask. Which questions are important is not up to me. I’m just the guy doing the calculations. That’s why the Right way is so hard. I can make no universal judgements like I can with the Wrong way. There is no final answer here. Everything is acknowledged to be only as good as the premises we conditioned on. And we know those have errors. Important ones? That’s up to those who would use the model, not me.

All I can do is guess what some people might think is important. Of course, if you were the decision maker you’d ask specific questions of me, and I’d answer them. If instead I were writing a peer-reviewed paper on this subject, all I can do is guess about best questions. Are other values of HGPA important? Maybe. I didn’t do them, they are homework for you.

But you already see that if I were to add in more “x”s, my job grows increasingly complex. Just like life itself. Of course, I could leave out any “x” I wanted, and recompute. The answer is still just as good as the premises, only now they are different premises.

Your homework is to add in the math and verbal SAT scores (you’ll see them in the data).

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use PayPal. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.

require(Stat2Data)
require(rstanarm)
require(ggplot2)
require(scales)

data(FirstYearGPA)

fit = stan_glm (GPA ~ White + Male + HSGPA, data=FirstYearGPA,iter=1e5)
summary(fit)

z = data.frame(White = c(1,1,0,0),
               Male = c(1,1,0,0), HSGPA = c(3.5,4,3.5,4))
p = posterior_predict(fit,z)
n = dim(p)[1]
w = data.frame(GPA = matrix(p),
               White = factor(c(rep(1,n),rep(1,n),rep(0,n),rep(0,n))),
               Male  = factor(c(rep(1,n),rep(1,n),rep(0,n),rep(0,n))),
               HSGPA = factor(c(rep(3.5,n),rep(4,n),rep(3.5,n),rep(4,n))))


g = ggplot( aes(x=GPA, fill=interaction(White,Male,HSGPA)), data=w) +
    geom_histogram( alpha=0.6, position = 'identity',bins = 200) +
    theme_bw() +
    labs(fill=c('White.Male.HSGPA'))
g

png('h3.png',width=1024,height=800)
 g
dev.off()

#Pr(GPA_white-male-hiHGPA | data, models, assumption, x)
i = which(w$White=='1' & w$Male=='1'& w$HSGPA=='4')
sum(w[i,1]>4.15)/n

#Pr(GPA_white-male-hiHGPA > GPA_non-white-non-male-loHGPA | data, models, assumption, x)
j = which(w$White=='0' & w$Male=='0'& w$HSGPA=='3.5')
sum(w[i,1]>w[j,1])/n

# I only did 4 of the 8 combinations. remember
tapply(w$GPA, list(w$White,w$Male,w$HSGPA), mean)

Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

4 Comments

brad.tittle

October 23, 2025, 12:39 pm

I pull together a bunch of components from primary arms. I put a sight on top. I go to the range. I point set of components at a target downrange… I MISS.

My son misses.

Another person at the range says “Will you accept some help from me? ”

OF course.

We get the gun on paper.

We keep getting the sight a little better.

Then we compete to see how well we can shoot.

SCATTER GRAPH…
Darin Johnson

November 8, 2025, 2:09 pm

I’m struggling to extrapolate this to the case when the thing I’m estimating is a probability. Let’s say I observe many widgets and determine that 15% of them fail each year. I can calculate the prediction interval around that estimate, but…what does it mean?

The next widget either will or will not fail (1 or zero), so the range of outcomes spans the entire range of probabilities. If I ask ChatGPT what the standard, calculated prediction interval means, he (it?) says something like, “the range of failure probabilities that the next widget will have. The expected probability is 15% plus or minus the confidence interval, but any given widget could have some other probability.”

Wait, what?? Widgets **have** probability? I know that’s not true.

Perhaps the prediction interval reflects my uncertainty about how precisely I can estimate probability of failure for any given widget, capturing the notion that–whatever my certainty about the average–there is additional variation in each in my estimate that any individual widget will fail.

But how would I know that? My data is a bunch of 1s and 0s–widgets that did or did not fail. How can I know anything about the distributions of the probabilities for individual widgets within that population? Assuming such a thing even exist…

I’m going in circles. Some help from my classmates, por favor.
Briggs

November 8, 2025, 5:02 pm

Darin,

There is no interval around a probability (with esoteric exceptions not of interest to us).

All your evidence is E. Y is a bad widget. You have Pr(Y|E) = 0.15 (say). No interval.

If you produce N widgets, you can then put a prediction of how many of those are bad, and interval around that prediction.

You can say, for instance, Pr( Bad [lo, hi] | NE) = 0.8, i.e. an 80% chance there will be between [lo] and [hi] bad widgets of N.

Simple as that.

In real manufacturing, things usually go bad or right serially, and you can account for that in more complicated models; it all goes in E. And you can make better predictions taking into account more and more of the cause and conditions that lead to good or bad widgets.
Darin Johnson

November 9, 2025, 1:14 pm

Thank you for your answer, Professor.

So the prediction interval is useful for predicting the number of failures among the next N widgets, but you need to know how big N is to make a prediction, and if N = 1 Pr(Bad [lo, hi] | NE) isn’t very interesting.

If I’m collecting data to estimate failure probability of widgets for inclusion in a decision model, is the prediction interval how I convey my uncertainty for use by the decision-makers? That way if they run into an application where, say, N = 100 they know how to use my results. “Well, Darin tells us that the failure probability is 15%, but he’s not very sure about it, so we could reasonably expect between 10 and 20 failures with an outside chance of as many as 30.”

(And I’m counting on you to tell me when I’ve stepped over the line between a student and somebody angling for free consulting. Believe me, I will understand.)

Cary D Cotterman on Complete List Of All Moral & Ethical Questions Answered By ScienceDecember 24, 2025
A quarter-inch-drive ratchet with a set of sockets doesn't answer any moral and ethical questions, but it's damned useful. Merry…
Uncle Mike on Complete List Of All Moral & Ethical Questions Answered By ScienceDecember 24, 2025
Christmas is another question Science has no explanation for. Joy and peace to you and yours.
Faith on Complete List Of All Moral & Ethical Questions Answered By ScienceDecember 24, 2025
Briggs, you're the best! Merry Christmas!
Complete List Of All Moral & Ethical Questions Answered By Science – William M. Briggs on Complete List Of All Moral & Ethical Questions Answered By ScienceDecember 24, 2025
[…] older version of this post originally ran 12 November 2016. All scientific successes in morals and ethics since then…
Johnno on Why British & EU Rulers Are Juicing War With RussiaDecember 23, 2025
The NSW government’s Terrorism and Other Legislation Amendment Bill 2025 passed the NSW upper house early on Wednesday morning, 24…