I’ve lost count of the lesson numbers.
The definition of a p-value, here phrased in the incorrectly named “test of difference in means”, is:
Given the truth of a probability model used to represent the uncertainty in the observables in group A and in group B, and given that one or more of the parameters of those probability models are equal, and given that the “experiment” from which the data collected in A and B were to be repeated a number of times approaching the limit, the probability of seeing a statistic calculated from each of these repetitions larger (in absolute value) than the one we actually found.
Incidentally, there is no need to ever “test” for the difference in means, because means can be computed; that is, observed. You can tell at a glance whether they are different. The actual hypothesis test says something indirect about parameters. Anyway, if the parameters are equal, the model for the two groups are the same. In this case, it is called the “null” model (statisticians are rarely clever in naming their creations).
There are a number of premises in the p-value definition, some more controversial than others. Begin with the truth of the model.
It is rare to have deduced the model that represents the uncertainty in some observable. To deduce a model requires outside evidence or premises. Usually, this evidence is such that we can only infer the truth of a model. And even more cases, we know, based on stated evidence, that a model is false.
Now, if this is so—if the model is known to be false—then the p-value cannot be computed. Oh, the mechanics of the calculation can still be performed. But the truth of the output in conjunction with the truth of model is false. If the model, again based on stated evidence, is only probably true, then the p-value can be calculated, but its truth and the truth of the model must be accounted for, and almost never is.
Equating the parameters is relatively uncontroversial, but small p-values are taken to mean the parameters are not equal; yet the p-value offers no help about how different they are. In any case, this is one of the exceedingly well know standard objections to p-values, which I won’t rehearse here. Next is a better argument unknown to most.
Fisher often said something like this (you can find a modified version of this statement in any introductory book):
Belief in the null model as an accurate representation of the population sampled is confronted by a logical disjunction: Either the null model is false, or the p-value has attained by chance an exceptionally low value.
Many have found this argument convincing. They should not have. First, this “logical disjunction” is evidently not one, because the first part of the sentence makes a statement about the unobservable null model, and the second part makes a statement about the observable p-value. But it is clear that there are implied missing pieces, and his quote can be fixed easily like this:
Either the null model is false and we see a small p-value, or the null model is true and we see a small p-value.
And this is just
Either the null model is true or it is false and we see a small p-value.
But since “Either the null model is true or it is false” is a tautology (it is always true), and since any tautology can be conjoined with any statement and not change its truth value, what we have left is
We see a small p-value.
Which is of no help at all. The p-value casts no direct light on the truth or falsity of the null model. This result should not be surprising, because remember that Fisher argued that the p-value could not deduce whether the null was true; but if it cannot deduce whether the null is true, it cannot, logically, deduce whether it is false; that is, the p-value cannot falsify the null model (which was his main hope in creating them).
Recall that making probabilistic statements about the truth value of parameters or models is forbidden in classical statistics. An argument pulled from classical theory illustrates this.
If the null model is true, then the p-value will be between 0 and 1.
We see a small p-value.
The null model is false.
Under the null, the p-value is uniformly distributed (the first premise); which is another way of saying, “If the null is true, we will see any p-value whatsoever.” That we see any value thus gives no evidence for the conclusion.
Importantly, the first premise is not that “If the null model is true, then we expect a ‘large’ p-value,” because we clearly do not.
Since p-values—by design!—give no evidence about the truth or the falsity of the null model, it’s a wonder that their use ever caught on. But there is a good reason why they did. That’s for next time.