Class 62: Die P-value, Die! Die! Die!

Class 62: Die P-value, Die! Die! Die!

We want Pr(H|Evidence), we get Pr(Data we didn’t see | H false). And then everybody pretends they are the same. Why why why.

Video

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Given below; see end of lecture.

Lecture

This is a modified excerpt from Chapter 9 of Uncertainty.

So much has been written on the dismal subject of p-values, including some above, that it seems like piling on to say more. But I do say more only to note three things, two of which are commonplace and that lead to a third which is not.

First, nobody ever remembers the definition of a p-value. Everybody translates it to the probability, or its complement, of the hypothesis at hand. For this reason alone p-values should be abandoned. Second, even some self-labeling Bayesians want to keep p-values, but in a Bayesian sense. This is to give an old error a new name, but it will still be an error. Thirdly, is something more interesting: the arguments commonly used to justify p-values are fallacies. Here is the proof.

It turns out that frequentist theory implies that the distribution of the measure of difference, like in the race-income problem, actually called the “p-value of the test statistic”, is “uniformly distributed”. What that means is discussed in the next section, but what the theory implies is (something like), “If the null is true, the p-value can be any number between 0 and 1, and is equally likely to be any of them”. The argument people employ, however, progresses like this: “The null entails that we see a p-value between 0 and 1. We see a p-value that is less than the magic number. Therefore, the null is false, or rather rejected as if it were false.”

This argument is not valid because the first premise says we can see any p-value whatsoever, and since we do (see any value), it is actually evidence for the null and not against it. There is no p-value we could see that would be the logical negation of “0< p-value <1”; other than 1 or 0, which may of course happen in practice (The simplest example is a test for differences in proportion from two groups, where $n_1=n_2=1$ and where $x_1=1, x_2=0$, or $x_1=0, x_2=1$. Small samples frequently bust frequentist methods). And when it does happen in practice, then regardless whether the p-value is 0 or 1, either of those values legitimately falsify the null, not just 0. That is, an observed p-value of 1 is evidence against the null, according to the argument.

Importantly, the first premise to that argument is not that “If the null is true, then we expect a ‘large’ p-value,” because we clearly do not. But the argument would be valid, and the null truly falsified, if the first premiss were “If the null were true we would see a large p-value,” but nowhere in the theory of statistics is this kind of statement asserted. Though something like it often is. R.A. Fisher, the inventor of p-values, was fond of saying this—and something like this is quoted in nearly every introductory textbook:

Belief in null hypothesis as an accurate representation of the population sampled is confronted by a logical disjunction: Either the null is false, or the p-value has attained by chance an exceptionally low value.

This is the same argument as before; but Fisher’s “logical disjunction” is evidently goofy, as the first part of the sentence makes a statement about the unobservable null hypothesis, and the second part makes a statement about the observable p-value. And neither says anything at all about the hypothesis itself! But it is clear that there are implied missing pieces, and his quote can be fixed easily like this: “Either the null is false and we see a small p-value, or the null is true and we see a small p-value.” Or just: “Either the null is true or it is false and we see a small p-value.”

Since “Either the null is true or it is false” is a tautology, and is therefore necessarily true and thus can be removed, we are left with,We see a small p-value.” Which is of no help at all. The p-value casts no direct light on the truth or falsity of the null. This result should not be surprising, because remember that Fisher argued that the p-value could not deduce whether the null was true; but if it cannot deduce whether null is true, it cannot, logically, deduce whether it is false; that is, it cannot falsify the null.

Current practice is that a small p-value is taken to be by everybody to mean “This is evidence the null is false or likely false.” That is because people are arguing like this: “For most small p-values I have seen in the past, the null has been false; I now see a new small p-value” as evidence for the proposition “The null hypothesis in this new problem is false.” But this doesn’t work because the major premise is false, or at least unknown.

Given all this, and of the myriad other criticisms no doubt well known to the reader, plus the ineradicable Cult of Point-Oh-Five, it is far past the time for p-values to go.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use PayPal. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.


Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

5 Comments

  1. McChuck

    “Faster, pussycat! Kill! Kill!”
    You can’t repeat the mantra enough.

  2. To be fair to R.A. Fisher he warned people were over-enthusiastic about applying the tools he provided. He wasn’t as impressed with the smoking and cancer story as was fashionable at the time, for instance. If you read Fisher’s works he was all about being very careful about constructing the hypothesis in the first place if the statistical analysis was to mean anything. It was more about adding confirmation to what you already know, since as it was blindingly obvious to anyone, especially Fisher, from the get go that you can find acausal correlations between independent variables in random samples of random things.

    Having looked in to the smoking and cancer thing myself I think it has been extremely overblown from the get go, presumably for some kind of nefarious purpose.

  3. …and another thing: in Fisher’s works first understanding the distribution of what you’re measuring in population you are sampling from is crucial. If you imagine your population has a normal distribution for the attributes you’re measuring, well you need to first establish that is indeed highly likely to be the case: you need to understand your population thoroughly to know what statistical methodology can be applied and what sampling method and number of samples is appropriate. All these considerations disappeared when computers arrived and people could simply enter numbers into the computer and press buttons until the results they wanted appeared. Again, it seems to me to be unfair to blame Fisher for what computers have done to destroy peoples’ ability to think. Of course, now AI has “arrived” and nobody will ever think again while, ironically, AI thinks by statistical predictions of token probabilities.

  4. Rudolph Harrier

    P-values enable a lot of shenanigans, and they are dangerous because most of their “honest” users trust them too much due to being both widely accepted and something that they don’t understand how to calculate. That is, an oracular computer hands them a number which they then interpret via ritual means.

    But I think a larger problem is that once you get out of applications like testing production lines, most people just want to use statistics to lie. If not to the public, at least to themselves. Let’s take something more basic: the fact that causation is not correlation. Everyone will agree to this. It is easy to understand that even when there is a cause correlation doesn’t let you know the direction, that there could be a third cause, and that since correlation amounts to “both go up” or “both go down” some completely meaningless correlations are inevitable, especially in the short term.

    Yet people who agree to that a million times will still put an x y graph with a very vague upwards pattern and say “Look at the graph! This proves x causes y!” And then when presented with correlations that they agree are spurious, like the number of UFO sightings in Tucson and the price of milk in Anchorage or whatever, they will say “well, everyone knows that correlation doesn’t prove causation.” Even though they could be looking at practically identical graphs in both cases.

    The point being that they aren’t using the graph for any sort of actual honest analysis. They are just looking for a tool to further their rhetoric. If they do this with basic graphs, of course they are going to do it with any tool that’s sufficiently non-trivial for them to claim that anyone who disagrees with them doesn’t understand the math. So yeah, p-values are bad, but I doubt that getting rid of them will do much to improve how statistics is done. People will just start lying with other measures. Though I suppose you might help out a few people who are honest but just use p-values because “that’s what you’re supposed to do.”

Leave a Reply

Your email address will not be published. Required fields are marked *