David Stove Exposes Karl Popper’s Wee P!

We have discussed before (and in detail here) how Fisher, inventor of the wee P of which scientists boast (“Look how small my P is!” shouted the excited scientist), was deeply influenced by the ideas of logical positivism. The best known proponent was Karl Popper. His big idea was that no proposition (presumably except his own) could ever be verified, though they could always be falsified.

This explains the curious language Fisher used. A “null” “hypothesis”, which is a proposition by another name, could be “rejected”, i.e. “falsified” in some odd sense, but never “accepted”. One always “failed to reject” the “null”. That is, failed to “falsify.” Because, to Fisher, falsification was the only move in logic allowed.

This is, of course, silly. Because falsifying any proposition is always to truthify (if you will) it’s logical contrary. If you say “The null has been falsified”, and that is true, then “The contrary of the null is true” is itself true. You have violated Popper’s rule. It turns out propositions can be verified after all. Popper’s position, and therefore also Fisher’s, is irrational.

This sort of silliness was pounced on by David Stove in his book Anything Goes: Origins of the Cult of Scientific Irrationalism (1998; in the States; playing off Feyerabend’s title) also titled Scientific Irrationalism: Origins of a Postmodern Cult (2001; elsewhere); both reprints of Popper and After: Four Modern Irrationalists (1982; and now online).

Here is Stove dissecting a point in Popper’s The Logic of Scientific Discovery (1959). Stove shows how Popper “sabotaged” certain logical propositions (starting on page 65 in Anything Goes). Comments of mine are between curly brackets {like this}. Original edits by Stove are in square brackets [like this]. I have added paragraphifications to aid screen readability.

We couldn’t have had p-values foisted upon us without this sabotage, which this excerpt will show.

The propositions {sabotaged by Popper} were unrestricted statements of factual probability: that is, contingent unrestricted propositions of the form “The probability of F being G is = r”, where 0 < r < 1 {where the bounds are strict; i.e. 0 and 1 are barred}. For example, H: “The probability of a human birth being male = 0.9”. {Stove considered “factual” probability to be statements about observables, contrasted with “logical” probability, which are statements about non-observable propositions. We, however, say there is no difference. And, even more importantly, that all probability is conditional, just as all logic is.}

Concerning such propositions Popper had fairly painted himself into a corner. For he had maintained (1) that some such propositions are scientific; (2) that none of them were falsifiable (i.e. inconsistent with some observation-statement); while he had also maintained (3) that only falsifiable propositions are scientific. (The reason why (2) is true is, of course, that H is consistent even with, for example, the observation statement E: “The observed relative frequency of males among births in human history so far is = 0.51” {and where failure to grasp this is the cause of great and long-lasting probabilistic grief}).

Popper draws attention with admirable explicitness [Popper p. 191] to this—to put it mildly—contretemps. He puts it almost equally mildly himself, however. For he insists on calling the conjunction of (1), (2) and (3) a “problem” (“the problem of the decidability” [p. 196] of propositions like H); when in fact of course it is a contradiction. The reader can hardly fail to be reminded of Hume’s complaint about the absurdity of the “custom of calling a difficulty what pretends to be a demonstration and endeavoring by that means to elude its force and evidence”. But Popper’s ‘solution’ to his problem was far more remarkable than even his description of it, and indeed was of breathtaking originality.

It consists…in making frequent references to what it is that scientists do when they find by experience that s, the observed relation frequency of G among F’s, is very different from r, the hypothesized value of the probability of an F being G. What scientists do in such circumstances, Popper says, is to act on a methodological convention to neglect extreme probabilities (such as the joint truth of E and H); on a “methodological rule or a decision to regard […] [a high] negative degree of corroboration as falsification” [p. 2020], that is, to regard E as falsifying H. {You see the relationship to p-values.}

Well, no doubt they do {and they do; oh, they do}. But obviously, as a solution to Popper’s problem, this is of that kind for which old-fashioned boys’ weeklies were once famous: “With one bound Jack was free!”. What will it profit a man, if he has caught himself in a flat contradiction, to tell us about something that scientists do, or about something non-scientists don’t do, or anything of that sort? To a logical problem such as the inconsistency of (1), (2) and (3) there is of course—can it really be necessary to say this?—no solution, except solutions which begin with an admission that at least one of the three is false. But least of all can there be any sociological solution.

For our purposes, however, what is important about the episode is the following. The pairs of propositions we are talking about are pairs such as E and H. As (2) implies, and as is in many cases obvious, E is consistent with H {again, if you can’t see this, you are confusing probability with decision, an exceedingly common and inveterate mistake}. But the logical word ‘falsifies’ or its cognates, applied to a pair of propositions, implies that their logical relation is that of inconsistency. So to say that E falsifies H would be to make a logical statement which is false, necessarily false, and obviously false.

So Popper will not say that. What he says instead are things which, however irrelevant to his problem, are at least true (even if only contingently true). Such as the following. That “a physicist is usually quite well able to decide” when to consider a hypothesis such as H “‘practically falsified'” [p. 191] (namely, when he finds by experience, for example, that E). That “the physicist knows well enough when to regard a probability assumption as falsified” [p. 204] (for example he will regard H as falsified by E). That propositions such as H “in empirical science […] are used as falsifiable statements” [p. 204]. That given such an observation-statement as E, “we shall no doubt abandon our estimate [of probability, that is, H] in practice and regard it as falsified” [p. 190].

Now I have pointed out countless times that, logically, “practically falsified” is equivalent to “practically a virgin”. Which is to say, not falsified and not a virgin. P-values when used to “practically falsify” a hypothesis, then, are not logical. They are decisions.

Scientists are allowed to make decisions, of course, but they should not be allowed to strip away uncertainty when conveying their results. Which is what p-value use does. (They also confuse uncertainty, implying, quite falsely, though in a vague sense which nobody can articulate, that the p-value is the probability the “null” is true.)

These are the very models of how to sabotage a logical expression by epistemic embedding, or of ghost-logical statements {made by embedding logical statements in empirical ones; such as “Statisticians regard small p as falsifying the null”; it sounds logical, but it is nothing more than a sociological statement}.

They use a logical expression, one implying inconsistency, but they do not imply the inconsistency of any propositions at all. They are simply contingent truths about scientists. Yet at the same time there is a suggestion that not only is a logical statement, implying inconsistency, being made, but that one is being made with which no rational person would disagree. This suggestion is in fact so strong as to be nearly irresistible, and it comes from several sources.

First, Popper’s references to a rule, decision, or convention, imply that when scientists regard E as falsifying H, they cannot be wrong: and they therefore serve to suggest that they are right. Second, there is the fact that scientists regard E as falsifying H, and that they are unanimous in doing so. How can a reader suppose that scientists, all scientists, are mistaken in regarding E as inconsistent with H? He might almost as easily suppose all philosophers mistaken in regarding a Barbara syllogism as valid. {Barbara: All X are Y; and all W are X; therefore all W are Y.}

Third, and most important of all: the reader’s own common sense—and it is his logical common-sense—emphatically seconds the statement of logic which here appears, by suggestio falsi, to be being made. He knows, as everyone (near enough), knows, that given E, it is rational to infer that H is false. And since scientists, as these statements report them, seem to be saying only very much the same thing, the reader is disposed to think that the scientists are right. And if they are right, it is clearly a point of logic on which they are right.

The suggestion, coming from all these sources, that a logical statement, and a true one, is being made, is so strong, in fact, that to many people it will appear perverse, or at least pedantic, to resist it. What is there, then, to object to, in the statement that scientists regard E as falsifying H?

Or: What is there, then, to object to, in statement that scientists regard “p < 0.05” as falsifying the “null”?

Simply that its suggestion, that a statement of logic is being made, is false; and that suggestio falsi is not better, but worse, the stronger the suggestion is. The statement is only a ghost-logical statement. It implies nothing whatever about the logical relation between E and H. A logical word, “falsifying”, is used indeed, but its implication of inconsistency is sabotaged by the epistemic context about scientists. This is cold-blooded murder of a perfectly good logical expression, in exchange for a handful of sociological silver about scientists.

What makes the case more unforgivable is that the logical expression here sabotaged is not only a strong or deductive-logical expression, but the one which is, of all deductive-logical words, Popper’s own particular favorite; and that he had just a few pages before undertaken that, however others might succumb to non-deductive logic, he never would, but that in his philosophy all relations between propositions of science would be “fully analyzed in terms of the classical logical relations of deducibility and contradiction” [p. 192].

Stove goes on, but we won’t. This is enough for us to see that what happens with p-values is nothing more than custom, and a custom having no logical weight.

Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.

Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here; Or go to PayPal directly. For Zelle, use my email.

Categories: Philosophy, Statistics

14 replies »

  1. Popper was not a logical positivist. If anything, he was a logical negativist, It was his critique of positivism that brought down the whole house of cards. He asserted, with Hume, that no finite amount of evidence could ever positively establish a scientific hypothesis; however, a contrary piece of evidence might prove it false; i.e. modus tollens. Stove was trying to resurrect positivism.

    In English Common Law, the accused is presumed innocent until proven guilty. However, an acquittal does not prove him innocent. The verdict is in effect, “not proven.”

    The Null Hypothesis in scientific testing serves the same role as the Presumption of Innocence. We presume that the treatment is ineffective [that the experimental group does not differ from the control, that the effect is zero, et al.] and require proof that it does.

    The actual weakness of logical negativism (as we may term ‘Popperism’) was pointed out by Pierre Duhem, I think before Popper himself came out. And that is that no Hypothesis walks alone but, like an aircraft carrier is always accompanied by a flotilla of other ships, it is always accompanied by a flotilla of other Hypotheses. When the falsifying evidence is produced, it may not be clear which of these hypotheses has been falsified. For example, heliocentrism predicted visible parallax in the fixed stars. That none could be seen did not falsify heliocentrism (though it seemed to do so to contemporaries); it falsified the allied hypothesis that the stars lay only a little farther off than Saturn. Similar considerations falsified the falsification of Maxwell’s theory of electromagnetism and Darwin’s theory of natural selection, which were saved respectively by the electron and by Mendelian genetics.

  2. Stove really stove in Popper. Popped him in the stove and fired it up until Popper popped. Briggs used the remains to make popovers then fished out Wee-pee Fisher and tossed him in the brig.

    That’s some fine truthifying.

  3. The problem with all of this is the data fiddling that has taken place in the collection
    and methodology used in the collection of fair samples. When we see whole data sets
    of say temperatures adjusted and homogenized to increase past cooling to support a
    politically fraudulent hypothesis and the original data destroyed a dark cloud off doubt
    engulfs the entire field of statistical inference. Wee P values are much easier to spot
    and their inherent dishonesty has propelled the field into nothing more than a form of
    mental masturbation employed to deceive the public and politicians. Truth tellers like
    Briggs are our only salvation who may may some day replace the generation of mythology
    with the illusion of certainty we crave.

  4. My favoritest examples of the use of p-values is their use in “Table 1” — the table in which you simply list the demographics of your study: the number of your subjects, etc.

    There exist reviewers who insist, e.g., that “Subjects in the trial = 126 (p<0.05)" not only makes complete sense, but also that "Subjects in the trial = 126" will not make sense without the p-value.

    It doesn't even matter that no hypothesis at all attaches to “Subjects in the trial = 126;” you counted them, they totaled 126, that’s that.

    You MUST sprinkle that magic p-value dust over every thing and every number in your study, or those reviewers will reject your paper until you do.

  5. Because falsifying any proposition is always to truthify (if you will) it’s logical contrary.

    Shouldn’t it be its logical contradictory, instead of, contrary?

    ‘Every Swan is white’ and ‘Some Swan is not white’ are contradictories.
    ‘Every Swan is white’ and ‘No Swan is white’ are contraries.

  6. Jh,

    I saw three white and no black swans Monday. A true statement. Therefore never teach p values anymore.

  7. Hagfish: OT: Interesting article.

    I’ve been following the situation in Ukraine. It’s clear to me that, for Russia, this is just preparation for something larger.

    Think about it. They are fighting a land, sea and air battle. A 1300 km front across rivers, lakes, mountains and plains. Against a military that was about 230,000 strong (not anymore), a military that was one of the best equipped in the world (not anymore).

    Russia is operating their front reportedly with 130,000 to 190,000 troops at any one time. They rotate these troops in and out regularly; with time this is producing a battle-experienced Russian force of around 1 million with a reserve force of about 10 million (reportedly China also has a force of 50 million).

    Russia is also operating in Georgia and Syria. They are supported closely in Syria by Syrian and Iranian forces under what appears to be a unified command. In parallel with Ukraine, there are daily battles also occurring in Syria but no one is reporting this. This includes actions against Turkey and Israel. I can’t figure Turkey out; they appear to be neutral but if a global conflict emerged I think they would either remain neutral or side with Russia, but not NATO in either case. Turkey could be the one influence that prevents WW3.

    Watching the activity of the front lines in Ukraine, it looks like a chess match. Different areas relentlessly push forward on different days in order to exert maximum pressure on Ukraine’s capabilities and morale, grinding them down. Ukraine is approaching a point of political and military collapse.

    I haven’t mentioned the resources war that is already global. The world is soon to be deprived of computer chips, as neon, a key component of chip production, has been cut by 50% globally, and Russia controls 60% of the remaining production capacity. Combine this with gas, oil, copper, lithium etc etc.

    Then there is the financial war. The Ruble has been the strongest currency in the world against the dollar this year. Are sanctions working? No, this has been a complete disaster.

    The potential for an economic collapse of the west leading to creeping global conflict is real. It’s very worrying to me.

  8. The krakenstician is beyond hope of learning, but for all 5 of krakenstician’s readers:

    We assume a fair coin model, do 100 flips for each of 3 experiments (and we assume we designed the experiments well), and observe 92 heads, 89 heads, and 79 heads. So we reject the fair coin model.
    We would estimate that the probability of this coin landing heads is greater than 50%.

    Do we really “know” the coin is not fair? No.

    But we do know, that if the coin were fair, it would be quite unlikely to observe 92, 89, and 79 heads, because we’d expect around 50 heads each time under a fair coin model under good experimental setup.

    The distance between what you observe from experiment and what you expect under a model is essentially the p-value. The larger the distance (or equivalently the smaller the p-value), and the more we tend to reject the model.

    Because some would be convinced by 60 heads vs 50, and some others would only be convinced by 85 heads vs 50, and there are different costs associated with making errors, we can naturally use different numbers for alpha.

    If we instead observed 45, 53, and 55 heads, we would conclude the fair coin model holds.

    Do we really know it is “true”? No.

    But, again, if it were not fair coin, it would be then more likely to observe say 32, 20, 98, or 89 heads than the number of heads around 50.

    This logic works in science, which is why it is used instead of krakenstats.


  9. Dear Mr. Briggs,

    You are welcome. Ha.

    We played Rummy 500 this Father’s Day weekend. My son-in-law got a 6-card royal flush and finished the round by one play. A true statement. Therefore, never teach probability values anymore. (if this sounds silly, please note I am just using your reasoning.)

    So, Popper believes that your experience can falsify/disproof the theory, e.g., that ‘all swans are black’ but can never verify or prove that ‘all swans are white.’

    I guess repeating ‘wee p’ sounds like the perfect propaganda phrase for some of the Lilliputians, e.g., Johnno, who read your blog. (Yes, I know how to insult people in English too. )

    Perhaps, you can teach them what a probability value (calculable or not) is, what it indicates by the way it’s defined, what probability value would be more appropriate, and how to make proper inferences about a hypothesis.

    No more commenting on this blog for this summer. Have a great summer.

Leave a Reply

Your email address will not be published.