Be sure to review the RIGHT way from last week. Today, of the infinite number of ways to go sour, we look at one common way modeling goes awry.
Video
https://youtu.be/Q_oS9WQMqTs
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Given below; see end of lecture.
Lecture
The RIGHT way to model anything is to propose your proposition of interest, whatever it may be call it “Y”, then compile ALL evidence you can think of probative of Y, call it “X”. Then you form:
Pr(Y|X).
And you are done. Ta da. Congratulations. You have discovered that modeling is nothing more than ordinary probability. And that ordinary probability is a matter of logic, of epistemology, of uncertainty, of things that exist in the mind and not the world. How easy is that? Turns out all modeling is nothing more than fleshing out that Pr(Y|X).
X might be in the world, and, if you’re doing science, so might Y. But neither must, not if all you are interested is the logical-uncertainty relations between propositions (math is not in the world, for instance). No matter what, though, Pr(Y|X) does not exist in the world. It is in the mind only. (No, your mind is not in the world.)
But suppose most or all of X is in the world, and so is Y. We’ll stick to science, of measurable things. Last week we did a particular model, a correlational model, of predicting GPA given some observations and knowing whether a person was White. It, too, was just Pr(Y|X), where Y was the GPA, and X was the observations, race, and some details we added to make math of the thing.
We learned, again, the valuable lesson, that if you change the X you change Pr(Y|X). Of course. It doesn’t seem like a hard lesson to learn. Let us call our new evidence, which is formed from any change in X, W. Then we learned
Pr(Y|X) does not equal Pr(Y|W), except accidentally (coincidences do happen; or the changes to X are not probative of Y, given X).
We were also reminded that if X described the full cause (final, material, efficient and formal) and the conditions under which the cause can happen, then Pr(Y|X) = 1, or Pr(Y|X) = 0, depending on the circumstances.
All of which we learned before we did models. Again, and in brief, modeling is nothing more than probability. And we are done.
When viewed the RIGHT way. Now let’s do the WRONG way.
Well, a wrong way. Since the number of ways to go wrong are limitless, we’ll have to pick one or we’ll go nuts. We choose the most common ways of going wrong.
Recall we had a couple hundred GPAs and a race marker. And we decided to do “regression”, because everybody else does. We saw that got us in trouble with probability leakage. But we didn’t really learn how else to judge the model, which we’ll do another day. Still, the best test is always whether the model is useful to you. Not necessarily to me. To you.
We learned the very important, really most important, lesson was that we could not judge cause from the model. Indeed, that we brought cause to the model.
Let’s start our errors with hypothesis testing. That is, we start with the same model, regression, but only examine strange things inside the model, and forget what we wanted was to learn about the uncertainty of GPA. Wherever you do this model, you’ll get output resembling this (in the wrong way; this is from R):
Recall we decided that we’d use normal distributions to represent uncertainty in new GPAs, allowing one central parameter for Whites and another for non-Whites, and a third shared parameter representing the spread. Recall these parameters do not exist, and it is wrong to seek their “true” value because things that do not exist do not have true values.
We see here “estimates” for these parameters, or rather a version of them. The central parameter for non-Whites is called (because of coding reasons) “(Intercept)”. And we see an “estimate” for the difference in central parameters, White minus non-White. We don’t get the Whites parameter, only the difference. We do not see, and we never see, the spread parameter.
So while we might make a stab at characterizing uncertainty in GPA by using a normal with the central parameter for non-Whites as 2.84, and the central parameter for Whites as 2.84+0.32 = 3.16, we have zero idea of how much spread to give these normals.
Indeed, the WRONG way focuses entirely on these central parameters, ignoring everything else. Ignoring, even, the point of the model in the first place.
Our next WRONG item is to do a hypothesis test. Our “null” hypothesis is that the White difference parameter equals exactly 0. Recall this parameter does not exist, so it’s not clear what this hypothesis means in real life. But probability is flexible, and can answer questions of any stripe. So maybe it isn’t so bad yet.
The hypothesis test commences. It uses that “t value”, for our hypothesis test is a “t test” (and not the good kind of tea; what a nice pun). Which gives us the P-value of 0.000023. Notice the very extremely helpful asterisks, which announce the terrific news that this P-value is not only less than the magic number, it’s a lot less. Which means we are permitted to reject the null hypothesis. Which is to say, declare it is false. An act of will, as you recall in nauseatingly long lectures on the dismality (you heard me) of P-values.
Since we have declared “this parameter is zero” is false, we deduce, and must deduce, “this parameter is not zero.” There is no probability this deduction is correct. It just is. It is an act of will because declaring (the proposition) “this parameter is zero” is false is an act of will. Which also has no probability that it is correct.
The decision has been made. And we now declare, WRONGLY, Whites are different than non-Whites. It’s not that this judgement is wrong. We deduced last time it was correct, or we’d never be able to tell the difference between Whites and non-Whites. It’s that it is wrongly decided, a fallacy. Fallacies can have correct conclusions given other premises.
Worse, since our P was wee we can now intimate, hint, wink, or even outright declare that the difference in parameters (or the difference parameter being non-zero) is due to whatever cause we had in mind. If we suggested Whites were “racist”, we declare we have evidence in favor of that view. If we wanted to declare (which nobody in academia would) Whites were more intelligent on average, we would insist the wee P proved this. The long “Discussion” we add at the end of our peer-reviewed paper (wee Ps grant publication) would be all about this cause.
As we showed last week, we have no proof at all of these or any causes in just this data. Not for any hypothesis of cause. Indeed we showed it was we who brought causal hypotheses to models, and not that we extracted any from it. We did this when we first learned probability when it was we we decided what evidence was probative to our Y.
Now the averages (means) of GPA for non-Whites in the data was 2.84 and for Whites 3.16, and there’s no coincidence the parameter “estimates” match these. This is a quirk of the normal model we used. We didn’t need a model to compute these means; they just are the means. But if you want to say they have anything to do with new data we might see, then we need a model, because we are uncertainty about new data. We can ask, as we learned in the RIGHT way, what chances new data will have values like 3.16 for Whites, or whatever else we want to ask.
Step two in the WRONG way is to say something more about the nonexistent parameters. We compute the confidence interval in them. This, as we learned before, in no way whatsoever gives any confidence in their values, and that no probability can be attached to their uncertainty. We learned that only one interpretation exists. Either the “true” value of the parameter is in the given interval, or it is not. A tautology. Under no circumstances must you attach any uncertainty to the interval or its width.
But of course all violate this proscription, with no exceptions. Confidence intervals are thus incoherent. Unless one interprets them in a Bayesian fashion, which theory forbids, and which all do. Here they are:
Ignore the “profiling” business, an R quirk. You see the limits, which makes these 95% confidence intervals, where the 95% speak only and solely about infinite repetitions of the “trial” or “experiment” that gave rise to the data. Which is another impossibility. There will be no infinite repetitions.
But even supposing there could be, what could the interpretation of these intervals and parameters possibly be? They are taken to be real, as somehow causal in themselves, or measures of causes. We saw earlier (on origins of parameters) this cannot be. But it is believed.
This is a holdover from believing Chance is real, a force, mysterious but guiding, controlled by unseen powers. Nobody ever quite defines it, but they know it’s there, lurking. Even if misty thoughts about Chance are put aside, there is still the conviction the parameters are real parts of Nature, alive somehow. This is never stated, understand. But Probability is there, operating, pushing things, not wholly, but subtly. Those parameters are thus as real as anything in the world is, though always unseen.
There is much more to this, including the false idea of “control”. But this is enough for one day.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use PayPal. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.
Discover more from William M. Briggs
Subscribe to get the latest posts sent to your email.