Let Go Your Wee P! — Reader Help Requested

Let Go Your Wee P! — Reader Help Requested

I flatter myself that this is the best general thing I have written to explain why you must never, not ever, use a p-value, hypothesis test, “significance”, Bayes value, or any of that.

It comes in review of the book Bernoulli’s Fallacy: Statistical Illogic and the Crisis of Modern Science by Aubrey Clayton.

Bernoulli’s fallacy happens when you give the probability of the data and assume it is the probability of the hypothesis. (Or functions of the same.) That’s what p-values, hypothesis testing, and so forth are. Mistakes. Always mistakes.

Let me quote myself, and point you to the Academic Questions site. I know clicking an extra button is a pain in the keister, but I am begging you will do so. JUST CLICK HERE.

After some time passes, on some quiet day I’ll republish the whole here so that I myself don’t lose track of it. I added the bold below to catch your eye. Because I really want you to read this.

Ask a psychologist what the chances are that a person will walk slower after reading from a list of words having to do with old age than reading from a neutral list. He won’t tell you. He can’t tell you. What he can tell you is that in his model of walking time, after “controlling” for a number of items including the word list, the “parameter” representing something to do with walking time was highly “statistically significant,” with something called a “p value” that was boastfully small.

You see a drug commercial on TV and are impressed by the cavorting of the actors. You want to cavort. So you go to the doctor and ask him if Profitol is right for you. You ask him the chance the pill will let you cavort. He won’t tell you. He can’t tell you. What he can tell you is that he read about an experiment using the pill, and that if the “null hypothesis” comparing that pill to another pill was true, the probability of seeing data that was not seen in the experiment was pretty low.

This answer being incomprehensible, you seek a second opinion. The next doctor gives you a test for cavortitis, the malady which causes an inability to cavort. The test is positive. So you ask the doctor, “Does that mean I got it?” He says, “Well, in those patients with cavortitis, the test comes back positive ninety-five percent of the time. And it’s even better for those without the disease: the test comes back negative ninety-nine percent of the time.” He writes you an exorbitantly expensive prescription for Profitol. Suddenly you don’t feel so good.

And you shouldn’t. Because the second doctor didn’t answer your question. Neither did the first. Neither did the psychologist. Neither can anybody who uses classical statistical procedures. Because those are designed not to answer questions put to them in plain language.

Take the second doctor. You asked him (implicitly) what the probability is that you have the disease after testing positive. Let’s call your having the disease your “hypothesis.” He instead tells you the probabilities having to do with the test. The test is “data,” so he gives you probabilities of the data instead of the probability of the hypothesis. Worse, he acted as if the probability of the data were the probability of the hypothesis. So did the first doctor and the [psychologist].

So does everybody who uses classical statistical procedures.

The conflating of the probability of the data as if it were the probability of the hypothesis is called, as Aubrey Clayton tells us in the book of the same name, Bernoulli’s Fallacy

Once again, I am asking as a big favor to JUST CLICK HERE.

We—you and I, together, dear reader—would improve Science inestimably if we could only convince others of this lesson. I beg your help. Please pass on this page or the Academic Questions page to any who quote any science paper that uses “significance” etc.

Oh, why the “[psychologist]” in brackets? Because my original example used a climatologist. But then I decided that might trigger some weak minded critics, so I changed it to a real-life psychologist example. And then my enemies, wee P worshipers all, put back “climate scientist” for “psychologist”.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.

17 Comments

  1. SiG

    Pardon my slowness here, but what are we supposed to do at the NAS site? Is this to get readers at that site? I see a place to leave comments at the bottom with no comments.

  2. Cary Cotterman

    As a contrary old coot, whenever I read idiotic words and phrases often associated with old age, like “spry”, “feisty”, and “sharp as a tack”, I don’t walk slower. My distaste for those insufferable terms stimulates me to cavort, caper, and gambol like a stripling. Works better than Profitol or any other nostrum.

  3. > You asked him (implicitly) what the probability is that you have the disease after testing positive.

    Years ago, when we were learning about diagnostic tests, I also asked the same question. It took some failing, but eventually I realized what I want is “positive predictive value” and “negative predictive value”, the two things that only got mentioned once in the class I attended, and never again. We were always talking about sensitivity and specificity, never about the other two metrics. And it’s the other two metrics that deal with specific individual cases.

  4. Matt, I usually agree with you, but I don’t understand this.

    Let’s say that my hypothesis is that an unweighted die will give equal (within uncertainty) results for each of the six faces, but a weighted die won’t.

    I take a hundred dice that I have measured to make sure they are not loaded (casinos regularly do such tests). I throw them each 1200 times, and get about 200 results for each face.

    So I do a binomial test on the results complete with p-values, and it say yes, those are within the expected values given your hypothesis.

    How does this not support my hypothesis?

    Then I throw a given die 1200 times, and I get the result “1” 600 times. Again, I do a binomial test complete with p-values, and it says the die is weighted with a minuscule p-value (less than 2.2e-16, to be exact).

    Again, how does this not support my hypothesis?

    I’ve chosen this example because inter alia statistics (including p-values) was developed by gamblers to guide their bets, and it has been HUGELY successful in that regard … and it is used constantly by casinos to make sure that nobody is cheating. IF p-values are worthless as you claim … why do they do that, and more to the point, why do they continue doing it?

    In friendship,

    w.

  5. Briggs

    Willis,

    You are confusing the correlation between the wee p and your artificial example that was selected to have a probability of a bent die equalling 1 as proof the p-value is a valid tool of inference.

    As I have shown, it is not. But it can seem to be when researchers pick examples where the cause is known to good degree.

    The example you picked is far from simple. Take the well known Gambler’s Fallacy. Gambler in Vegas has noticed red came up 10 times in a row on a roulette wheel. What is the probability red on the 11th spin?

    There isn’t one. Depends on what evidence you assume. If you assume the casino is on top of its machines, then the probability is just under 1/2 (recalling the green slots). If you assume the casino is negligent, then the probability might be closer to 1.

    There is no correct math. It all depends on what we bring to the problem.

  6. Uncle Mike

    What a wonderful death sentence for the weepy! Of the many many you have frequently offered, it might be the best, P=0.95.

    However, if you want the job done right, and p-values to be forever banned, I suggest you contact D.J. Trump. There’s a guy who delivers. It may require beheading Academia, but oh well, they’ve been asking for it, easy come easy go, and good riddance to rubbish.

  7. Johnno

    SiG, you use the clickety-click to go there to the fuller article to read more about the cavorting of these, our scientists.

    “Let ho your Wee P,” should be appended to the plaque on the Statue of Liberty.

  8. Rudolph Harrier

    Willis,

    You say that you trust p-values because gamblers use them. But do they? I can’t find any examples, and I’m not sure what an example would look like.

    Let’s say that you look at a roulette table, set up some arbitrary test statistic (say, the number of times that black comes up) and calculate the probability that you would see at least as much of that statistic with an equal probability for every spot than you did in reality. You get p = .04. Okay… what do you do now? Walk away because you believe that not every result is equally likely? It’s in fact to your advantage if some spaces are more likely (and long as they are consistently more likely) because you’ll bet on those!

    The more sensible thing is to carefully observe which results come up the most often, verify that this happens over an extended period of time, and then bet on those. There have been gamblers who made money by doing just this (generally the more likely spaces result from physical defects in the roulette tables.) This means that probability is an extremely important gambling tool… but p-values play no part in this analysis.

    I can’t think of a single example where a p-value would serve a gambler better than a frequency analysis of the possible results.

  9. cdquarles

    The catch with positive predictive values and negative ones is that, while the axioms are true, many of the assumptions taken as true are not axioms. Those values are conditional. When the necessary and sufficient conditions don’t exist for the predictive values to be true, they won’t be.

  10. Ignacio García Blanco

    Hey Dr. Briggs, thanks for sharing that paper.
    I’ve shared it with some of my colleagues from college.

    I’ve sent you an e-mail a few backs. Whenever you have the chance, I’ll really appreciate if you can get back to me.

    Thanks in advance.

    Kind Regards.

  11. Briggs

    Ignacio,

    Could you please send again. I get so many emails that I cannot keep up. Thanks.

  12. Maypo

    Hey Dr. Briggs, I think I generally follow your arguments and agree with them. What I am struggling with is, given the fallacies of current methodology, how would a reasonable study then be constructed to ‘prove’, say, that some compound causes cancer or autism or heart attacks or cavortitus? Is your argument that it simply is not possible and asking the question is unreasonable?

  13. Briggs

    Maypo.

    Excellent question, and one we cover starting next week, if you don’t mind waiting.

    Quick answer: it ain’t easy.

  14. Maypo

    I’ve never understood the concept of “bated breath”, but count me as waiting eagerly.

  15. Briggs

    Maypo

    Horses used to be “bated”, i.e. fed. So I guess it means “breath which stinks of eating.”

  16. Ignacio García Blanco

    Hey Dr. Briggs.

    I just resent the e-mail. The subject line begins with “Inquiry regarding…”

    Thanks in advance.

Leave a Reply

Your email address will not be published. Required fields are marked *