Thanks to whoever sent the Dice (there was no note or name attached). They work!
Some are recognizing testing is flawed, but they seek refuge in “effect sizes”, which is faulty causal language, and yet another version of parameter-based analysis, meaning more over-certainty.
Video
Some are recognizing testing is flawed, but they seek refuge in “effect sizes”, which is faulty causal language, and yet another version of parameter-based analysis, meaning more over-certainty.
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Given below; see end of lecture. Do the code!
Lecture
We know from half a dozen (or was it more?) proofs that every use of P-values, and hence hypothesis testing, is fallacious, for the uses to which they are put. We learned parameters inside probability models do not have true values, except in the incidental and uninteresting sense of simulations. Nature does not “draw” effects from distributions.
We know that limits are approximations to finite, discrete Reality, and that finite, discrete measures are not approximations to limits. We proved that you cannot extract cause from a model, and that we bring cause to models.
The conclusion is that any parameter-centric analysis fails, or at least produces over-certainty, if not outright error. This has not yet been fully grasped, I think, so today an emphasis why this is so.
First a reminder that conclusions can be true, even in fallacious arguments. Take the premises “Socrates loved to argue” and “Socrates had a wife who he enjoyed spending time away from.” Both of the premises are true. Now the conclusion: “2+2=4”.
That conclusion does not follow. To say that it does is a fallacy. An error in thought has been made. It does not matter if you repeat those same two premises for every other mathematical conclusion you can think of, every use is a fallacy. Your justification for the conclusion in each case is always false. It is never right. It is each time wrong. And this is so even if all the best people say it isn’t.
P-values are something like that. With the added spice of selective memory.
Hypothesis sometimes seems to produce the right answer, which is approving a causal conclusion, because researchers sometimes—but far from always—set up reasonable experiments or take acceptable observations. That is, they are sometimes right about their causal hypotheses, which they brought to data. In these cases, sometimes the P will be wee. And so the wee P is credited with the confirmation of the cause. But, of course, it has done no such thing. It is still a fallacy.
The selective memory comes from remembering only those instances in which the P-values seem to have worked, like “2+2”, and forgetting the much, much, oh so much larger set where they did not.
Many in science are beginning to recognize the weaknesses of hypothesis testing, and seek to move from the dichotomous rulings of the method and concentrate on something they call effect size.
I mentioned this in the Classes on regression, but I think rather too hurriedly. So let’s review our regression result for GPA, done the Wrong way first.

What goes here goes for all non-physical parameters. Not just this paltry GPA example.
Recall what our model said: the characterizing of uncertainty in GPA with a normal distribution, because, hey, why not. This has two parameters, a central and spread. The spread we ignore, because, hey, why not. The central is a linear function—a regression—of all those other measures listed (like HSGPA), all indexed by parameters (the “betas”). Changes to these betas, for any values of the measures, changes the uncertainty in GPA.
We can call this an effect if we like, and make no error if we are careful, which few to none are. It is true that if we accept the value of 0.516 for the parameter indexing HSGPA (high school GPA), then for, say, an HSGPA = 1, this causes the effect of changing the central parameter for the normal distribution characterizing our uncertainty in GPA by increasing it 0.516 college GPA points. This is surely cause and effect, but it is of propositions. Or within propositions, if you like.
It in no way says anything about how much a HSGPA = 1 causes changes in college GPA.
This beta, or any non-physical beta, is not a parameter measuring the “effect size” of cause in actual things. But that, of course, is how people take it. Effect size is causal language. Cause and effect. Read any paper which speaks of “effect size” and you will see this mistake made.
I am careful (today, anyway, a rarity) of speaking of non-physical beta, because there are in some physical models parameters that represent forces or actual causes. We deal with these another day.
That is not that case here, nor in most probability models.
The objection will be “But I can think of lots of causal reasons and conditions why changing HSGPA causes, or leads to causes, in changes in college GPA.” So can I. But notice we bring these causes to the model, and we don’t extract them from the model. We don’t verify them using “testing”; we assume them because of our long experience with people and their study habits and intelligence. It is these causes we have in mind when we associate changing HSPA with changing college GPA.
But that’s not what this model is. That’s not what this equation says. It is a strict error to say “effect” when one means in a real-world causal sense.
Recall two important things. We don’t need a model to say what happened. We can simply look at the data and see what happened. The model is only in use to characterize uncertainty in GPAs we haven’t yet acknowledge in the model or seen yet.
And though you’re sick of hearing it, think how we can make the same causal ascriptions to all the examples at the spurious correlations site. If our methods, testing and effect sizes work, then they must work for all things. If not, then the theory has to specify in what precise conditions they don’t work. They do not make those specifications for the spurious correlation examples. We know they are absurd because we bring in our outside-the-model evidence and use that to dismiss the method’s answers.
Pause and reflect.
As it turns out, nobody in the dataset had a HSGPA of 1. (If this were Harvard, maybe nobody would have anything less than 4.) That means there is no directly assessable causal effect on college GPA, except for the inference that those with a HGPA (of the type of people measured) never went to college.
This does not stop us from making counterfactual inferences. We can suppose what happens to the uncertainty in college GPAs given an HSGPA = 1. Or even HSGPA = 0. Why not? How something like is verified, or even believed, is for another day.
As it is, it turns out there are some with HSGPA of 3 and 4. Here are summaries of their college GPAs:
Given this, what is the probability the mean college GPA for those with HSGPA = 3 is less than that of those with HSGPA = 4?
If you had any hesitation answering that, it was because you must have had training in the kind of methods mentioned above. The answer is 1. It is certain, given the observations. A value of 2.9 vs. 3.6 is a certain judgement.
You can call this a “test” if you like, but that’s mighty bold of you. We don’t need to test. We just look.
Here’s another question: Given this and the model, what is the probability the college GPA for those with HSGPA = 3 will be in new students less than that of those with HSGPA = 4?
If you answered this, it means I have taught you badly. There is no answer. We don’t know, because our model also conditions on sex, race, and the SAT score. We need to specify values for them. We must! After all, it was we who said these things were important, which is why we crammed them into the model. They are there and must be considered.
Of course, we can take these out of the model, too, and leave only HSGPA. Why not? Then we can answer the question. Taking them out, and leaving only HSGPA, gives 0.82.
The point is this. This is the way to speak of results, not of “effect” sizes, but changes in probabilities of quantities we deem important. Is this particular quantity important? Maybe, maybe not. That depends on who is asking. There is no universal answer. That is the answer. We have been asking far too much of our statistical methods. We have been fooling ourselves in a grand manner.
Homework: See if you get the same answer.
Here are the various ways to support this work:
- Subscribe at Substack (paid or free)
- Cash App: $WilliamMBriggs
- Zelle: use email: matt@wmbriggs.com
- Buy me a coffee
- Paypal
- Other credit card subscription or single donations
- Hire me
- Subscribe at YouTube
- PASS POSTS ON TO OTHERS
Discover more from William M. Briggs
Subscribe to get the latest posts sent to your email.

