Class 58: Hypothesis Testing Is All Wrong

Class 58: Hypothesis Testing Is All Wrong

We want Pr(Hypothesis true|Evidence) but get Pr(Data we didn’t see | Hypothesis false). This makes no sense, but I promise that is what is done. My friends, it gets real difficult for the next several Classes. I beg you will stick with it.

Video

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Given below; see end of lecture.

Lecture

This is an excerpt from Chapter 9 of Uncertainty.

Hypothesis Testing

Classical hypothesis testing is founded on the fallacy of the false dichotomy. The false dichotomy says of two hypotheses that if one hypothesis is false, the other must be true. Thus a sociologist will say, “I’ve decided my null is false, therefore the contrary of the null must be true.” This statement is close, but it isn’t the fallacy, because classical theory supports his pronouncement, but only because so-called nulls are stated in such impossible terms that nulls for nearly all problems are necessarily false, thus the contrary’s of nulls are necessarily true. The sociologist is stating something like a tautology, which adds nothing to anybody’s stock of knowledge. It would be a tautology were it not for his decision that null is false, a decision which is not based upon probability.

To achieve the fallacy, and achieve it effortlessly, we have to employ (what we can call) the fallacy of misplaced causation. Our sociologist will form a null which says, “Men and women are no different with respect to this measurement.” After he rejects this impossibility, as he should, he will say,Men and women are different” with the implication being this difference is caused by whatever mechanism he has in mind, perhaps “sexism” or something trendy. In other words, to him, the null means the cause is not operative and the alternate means that it is. This is clearly a false dichotomy. And one which is embraced, as I said, by entire fields, and by most civilians who consume statistical results.

Now most statistical models involve continuity in their objects of interest and parameters. As before, a parameterized model is $\mbox{Y} \sim D(\mbox{X},\theta)$ where the $\theta$ in particular is continuous (and usually a vector). The “null” will be something like $\theta_j = 0$, where one of the constituents $j$ of $\theta$ is set equal to a constant, usually 0, which is said to be “no effect” and which everybody interprets as no cause” of Y. Given continuity (and whatever other premises go into $D$) the probability $\theta_j = 0$ is 0, which means nulls are always false. Technicalities in measure theory are added about “sets of measure 0” which make no difference here. The point is, on the evidence accepted by the modeler, the nulls can’t be true, thus the alternates, $\theta_j \ne 0$, are always true. Meaning the alternative of “the cause I thought of did this” is embraced.

If the alternates are always true, why aren’t they always acknowledged? Because, again, decision has been conflated with probability. P-values, which have nothing to do with any question anybody in real life ever asks, enter the picture. A wee p-value allows the modeler to decide the alternate is true, while an unpublishable one makes him decide the null is true. Of course, classical theory strictly forbids “accepting”, which is to say deciding, a null is true. The tortured Popperian language is “fail to reject”. But the theory is like those old “SPEED LIMIT 55 MPH” signs on freeways. Everybody ignores them. Classical theory forbids stating the probability a hypothesis is true or false, a bizarre restriction. That restriction is the cause of the troubles.

Invariably, hunger for certainty of causes drives most statistical error. The false dichotomy used by researchers is an awful mistake to commit in the sense that it is easily avoided. But it isn’t avoided. It is welcomed. And the reason it is welcomed is that this fallacy is a guaranteed generator of research, papers, grants, and so on. Two examples, one brief and one in nauseating detail will prove this.

Suppose a standard, out-of-the-box regression model is used to “explain” a “happiness score”, with explanatory premise sex. There will be a parameter in this model tied to sex with a null that the parameter equals 0. Let this be believed. It will then be announced, quite falsely, “that there is no difference between men and women related to this happiness score”, or, worse, “men and women are equally happy.” The latter error compounds the statistical mistake with the preposterous belief that some score can perfectly measure happiness—when all that happened was that a group of people filled out some arbitrary survey. And unless the survey, for instance, were of only one one man and one woman, and the possible faux-quantified scores few in number so that a tie is likely, then it is extremely unlikely that men and women in the sample scored equally.

Again, statistics can say nothing about why men and women would score differently or the same. Yet hypothesis testing always loosely implies causes were discovered or dismissed. We should be limited to statements like, “Given the natures of the survey and of the folks questioned, the probability another man scores higher than another woman is 55%” (or whatever number). That 55% may be ignorable or again it may be of great interest. It depends on the uses to which the model are put. Further, statements like these do not as strongly imply that it was some fundamental difference between the sexes that caused the answer. Though given our past experience with statistics, it is likely many will still fixate on the causal possibility. Why isn’t sex a cause here? Well, it may have been some difference besides sex in the two groups was the cause or causes. Say the men were all surveyed coming out of a bar and the women a mall. Who knows? We don’t. Not if all we are told are the results.

It is the same story if the null is “rejected”. No cause is certain or implied. Yet everyone takes the rejection as proof positive that causation has been dismissed. And this is true, in its way. Some thing or things still {\it caused} the observed scores. It’s only that the cause might not have been related to sex.

If the null were accepted we might still say “Given the natures of the survey and of the folks questioned, the probability another man scores higher than another woman is 55%”. And it could be, after gathering a larger sample, we reject the null but that the difference probability is now 51%. The hypothesis moves from lesser to greater certainty, while the realistic probability moves from greater to lesser. This often occurs, particularly in regressions. Variables which were statistically “significant” barely cause the realistic probability needle to nudge, whereas “non-significant” variables can make it swing wildly. That is because hypothesis testing often misleads. This is also well known, for instance in medicine under the name “clinical” versus statistical “significance.”

It may be—and this is a situation not in the least unusual—that the series of “happiness” questions are ad hoc and subject to much dispute, and that the people filling out the survey are a bunch of bored college kids hoping to boost their grades, see HenHei2010 on WEIRD people, who form the backbone of many studies, where WEIRD = “Western, Educated, Industrialized, Rich, and Democratic”. Then if the result is “Given the natures of the survey and of the folks questioned, the probability another man scores higher than another woman is 50.03%”, the researcher would have to sayI couldn’t tell much about the difference between men and women in this situation.” This is an admission of failure. The researcher was hoping to find a difference. He did, but it is almost surely trivial. How much better for his career would it be if instead he could say, “Men and women were different, p<0.001”? A wee p then provides the freedom to speculate about what caused this difference.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.


Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *