Class 65: A Last Plea Never To Use Or Trust Statistical Evidence Like This

My last attempt to you convince you to think logically about evidence and reject P-values.

Video

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Given below; see end of lecture.

Lecture

I’d say my career has largely been a failure. My goal has been to convince people of the following proposition: Probability is only in the mind. And then to show all the consequences (many) flowing from that proposition.

One of those is the proof that P-Values when used to say anything, except for a minor statement about a thing nobody cares about or should care about, are fallacious. They should therefore never be used to make any decisions, and if they are it will only be coincidental that you made a good one. There is irony in that.

We did several weeks of me trying to show you in every way I could think of why P-values ought to be tossed on the Great & Growing Intellectual Scrap Heap. I have failed in this, judging by the comments I have received. The fault is mine.

I can make a good summary of these comments in the following fictional but based-on-real events dialog. See if sounds familiar to you. But first, recall what a P-value is. Data are collected in reference to some Theory. A model of that data is produced. And then some function of the observations is calculated. Like a “t statistic”, or even just a mean. Then it is asked what is the probability more extreme values of that function of observations would arise if we ran the experiment that “generated” the data an infinite number of times, all assuming that Theory that “generated” the observations is false.

A shorthand way of saying this is Pr(Data never seen | The cause I posit is false). This is all completely incoherent. But somehow generations of statisticians have been raised to think “Of course I want the probability of data I never saw, assuming my hypothesis is false! That’s how I know if my hypothesis might be true.” We will meet all this again when we do models, when I promise you that you be seduced by the magic of the wee P.

The Wee P Dialog

Briggs: Here are several weeks of proof, including rigorous logical arguments, of why P-values must never be used.

Student: I get where you’re coming from, Briggs. But P-values have some uses.

Briggs: No they don’t.

Student: They can give good results if used correctly and wisely.

Briggs: No they can’t. Anyway, you say you want to know about What I Want To Know, a suspected cause of some observations you have made. Isn’t that so?

Student: It is.

Briggs: And you have collected, besides those observations, other evidence which you think is relevant?

Student: Indeed. I used that evidence to build my model.

Briggs: Then why not calculate Pr(What I Want To Know | Evidence)?

Student: I see you’re trying to evade P-values. I used P-values to tell me that the null isn’t likely true.

Briggs: So you don’t even want to consider Pr(What I Want To Know | Evidence)?

Student: Why should I? I have the answer I need from the P.

Briggs: What exactly does the P tell you about What I Want To Know?

Student: That the null is false.

Briggs: What is the ‘null’?

Student: That the cause isn’t working, that it isn’t here.

Briggs: And the P is this probability, yes? Pr(Observations I did not make | What I Want To Know Is False)?

Student: That’s not how I say it. I say it’s the probability of more extreme statistics given the null is true.

Briggs: But the null is the logical contrary of What I Want To Know, isn’t it?

Student: I suppose maybe it is, but I put in terms of these here parameters in a model. Specifically, I mentally equate the value that this parameter in my model which I say is linked to the cause equals 0, which is the null hypothesis. Then I calculate a statistic, which is a function of the data. Then I can calculate the probability I will get, upon infinite repetitions of the experiment, a statistic more extreme than I did get, assuming the null is true.

Briggs: And this calculation applied to you data resulted in a wee P?

Student: It did.

Briggs: And so you “rejected” that null, isn’t that true?

Student: I did.

Briggs: And since you rejected a proposition, which means you decide it is false, then logically you must have accepted the logical contrary of that proposition you say is false.

Student: No, we never “accept” anything. We can only reject hypothesis.

Briggs: If you reject one thing, by which you mean you act like it is false, or conclude it is false, then it is inescapable that you have accepted its logical contrary. Just as in math if you say “I claim x does not equal 7” means you logically must accept that “X equals a value other than 7”, which is its logical contrary.

Student: That’s not what P-values do. I only reject the null.

Briggs: And what does rejecting the null do for you with regard to What I Want To Know.

Student: It means I can say the cause I suspect is linked to the outcomes I measured.

Briggs: Linked to?

Student: That means I am right to suspect that the cause might be operative.

Briggs: ‘Might’ and ‘suspect’ is probabilistic language. If you do that, why not calcuate Pr(What I Want To Know | Evidence)? Do you assign probabilities to the logical contraries of nulls when you reject them?

Student: Don’t be silly. You can’t put probabilities on propositions. Only real data can have probabilities.

Briggs: So how do you conclude the cause is genuine or the observations you made?

Student: I never accept a proposition. Like I said, I can only ever reject them.

Briggs: Meaning claim they are false?

Student: Exactly.

Briggs: So if H0 is your so-called null proposition (put mathematically), and you reject it, you are saying Pr(H0 is true| Some kind of evidence) = 0, which logically implies Pr(The logical contrary of H0| Some kind of evidence) = 1.

Student: No, you cannot put probabilities hypothesis.

Briggs: So you’re not sure your H0 is false? And therefore you aren’t sure the logical contrary of it is true?

Student: Of course I’m not sure. I might be wrong about the cause of the null.

Briggs: Again you are sneaking in probability language with that ‘might’. Why not just calculate Pr(What I Want To Know | Evidence)?

Student: Then I misspoke. I only act like I believe the null is false when the P is wee.

Briggs: Yet you acknowledge your act might be wrong?

Student: Sure, even good scientists like myself make mistakes.

Briggs: What are the chances you’re wrong?

Student: I see what you’re trying to do. I can’t say. What I can say is that if the null is true, then I will see statistics more extreme than I saw if I repeat the experiment an infinite number of times only P% of the time.

Briggs: That’s why I say P-values are Pr(Observations that we never made | null is true), which is another way of saying Pr(Data we didn’t see | null is true).

Student: I didn’t say we didn’t see anything. I said data more extreme, in the form of these statistics.

Briggs: Data “more extreme” just is data you did not see. And anyway, you only have the experiment before you. This is it. You don’t have those imaginary repetitions: you have this. So why not calculate Pr(What I Want To Know | Evidence), where part of that Evidence is what you did see, not what you didn’t in purely hypothetical experiments that have never been made?

Student: Look. My P was very wee, which means I have good confidence the null is false.

Briggs: There you go trying to sneak in probability language again with that ‘confidence.’ If you’re doing that why not just calculate Pr(What I Want To Know | Evidence)?

Student: I don’t think you understand P-values. They can also be take as certain kinds of averages of data under the null.

Briggs: What in the world does any of that have to do with What I Want To Know?

Student: If those averages take certain values, then I can give more or less weight to What I Want To Know.

Briggs: Once more, you are sneaking in probability language with that ‘more or less weight.’ If you’re doing that, why not just calculate Pr(What I Want To Know | Evidence)?

Student: I told you. You cannot put probability on propositions.

Briggs: You’re doing it all the time. When you act like null is false, and the depth of earnestness of that act depends on the size of the P, with smaller ones giving you greater strength of resolve, you have said something about Pr(What I Want To Know | Evidence), albeit not in numerical terms.

If you make any decision about What I Want To Know based on the P, you have committed a logical fallacy. You start by believing the null is false, and then stating that under the null a P value of any value between 0 and 1 is equally possible. You then see a value. You cannot then logically move to acting like the null is false, or true, from that. It is a logical impossibility. You have confused and conflated decisions with probability. The whole idea is incoherent.

Student: You don’t get it. P-values work all the time. Take the gentleman whose pen name I adopted. He was a brewer at Guinness and used tests he invented, t-tests, to show things like water at this-and-such temperature made better beer than water that that-and-the-other temperature.

Briggs: You mean he accepted causal hypotheses? I thought that was forbidden.

Student: No, he only rejected hypotheses.

Briggs: Which either means he accepted others or put probabilities on those others. Anyway, old William Seely Gosset got lucky; or rather, he was a good scientist. He made careful controlled—real control, as a chemist would—experiments, where only the manipulated causes were in operation.

Look, if you assume the only possible cause is the change in temperature, then you can—no, must—logically conclude the change in result was caused—or “linked to”—the change in temperature! It doesn’t make any difference, at all, what P-value you get. Indeed, even Pr(What I Want To Know | Evidence) is irrelevant, or here equals 1, because in that Evidence is the assumption that only the change in temperature was operative.

The only use Pr(What I Want To Know | Evidence) has here is if you change What I Want To Know not to what caused the difference, because you assumed that, but what will the size of the effects of that change in cause. For instance, something like Pr(Beer is twice as good | Evidence, including only temperature was changed).

So again, and I promise for the last time, why not just calculate Pr(What I Want To Know | Evidence)? Why continue to use P-values when you can do everything you say you want directly, with no obfuscation, no impenetrable interpretations?

Student: Many authorities say P-values have some uses.

Briggs: Sigh.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use PayPal. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.

Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

9 Comments

McChuck

September 18, 2025, 8:25 am

Modern scientific statistics is nothing more than a roleplaying game based on a 20 sided die. The game master (study author) makes a proposition and picks an arbitrary number for significance. Let’s say it is 15. He then rolls the die. If the result is 15 or greater, the proposition is likely true. If you roll less than 15, the proposition is likely false. If you roll a 20 (5% chance), the proposition is absolutely true. If you roll a 1 (5% chance), the proposition is absolutely false.

If he doesn’t like the outcome, nothing prevents the researcher from slightly modifying the proposition and significance and rolling the die again. And again. And again. Until it finally agrees with the intent of those funding his research.

¡Science!
Uncle Mike

September 18, 2025, 2:45 pm

I disagree with your hypothesis that “my career has largely been a failure”. You shouldn’t expect the world to immediately beat a path to your door, but instead that the snowball you launched will become an avalanche in time. I think it will. And besides, the work we do in this life is judged by God, not man.
Ed

September 18, 2025, 3:24 pm

As a teacher who’s seen, in the eyes of many a student (pun intended), the inglorious mist of complete misunderstanding, I can feel your pain. I must also say that it took me a little while to understand your points, but I do think I learned what you wanted to teach, and this Socratic dialogue worked well as a great synthesis.
J. Johnson

September 18, 2025, 8:56 pm

Well you convinced me to never use p-values, and how to articulate to others why they shouldn’t either. I’d say you’re successful at least that far.

Though I do have to catch up on homework.
Glenn Ammons

September 19, 2025, 7:43 am

Thanks, this is a very understandable summary of your arguments against p-values and frequentism.

There is a mistake in the dialogue. The Student should have said “given the null is true” here:

Student: That’s not how I say it. I say it’s the probability of more extreme statistics given the null is false.
JRob

September 19, 2025, 11:20 am

The spurious correlation example is the best proof that p values can’t be used. That’s what you have to hammer home every time someone wants to use a p value. Just keep demanding that they accept that UFO sightings cause mothers to name their daughters Annabelle. And keep demanding that they accept all the other 1000s of spurious correlations.
Briggs

September 20, 2025, 5:53 pm

Glenn,

Clearly my enemies, responsible for all typos, are on the Student’s side.
gareth

September 28, 2025, 5:17 pm

I’ve been away in northern climes (Scotland) so am catching up…

Firstly a comment: Your propositions are incorrectly formed. “Will this happen” is not a proposition, it is a question. “This will happen” is a proposition and can be evaluated against evidence – particularly in the future when “will” ought to have happened.

Next: Yes, I understand and agree that 0.95 in this calculation is an arbitrary cutoff and also agree that we should evaluate the probability of a proposition given the evidence insofar as it relates to it.

What I still can’t see is where the “data we didn’t see given the proposition is false” comes from. Specifically, using the example of the UFO sightings and Annabells, and assuming the proposition that they are connected is false, where is the data that we didn’t see? Other than all possible data, related to anything and nothing, that we rejected, didn’t consider or are unaware of? How does this – in detail – work?

Still head scratching and ‘t interwebs ain’t much help either (e.g. https://www.statology.org/p-values-explained-in-plain-english-with-visuals/)

(sorry to be an ignorant pest, but maybe an improving force in you technique when lecturing to such!)
gareth

September 28, 2025, 5:30 pm

Edit: And how can the “data we didn’t see” be included in any calculation, given that that we rejected, didn’t consider, were unaware of, or just didn’t see it ??? As far as our event horizon goes, it doesn’t exist.