Why (Scott Alexander’s) Bayesian Rationality Fails

Why (Scott Alexander’s) Bayesian Rationality Fails
Glowing brain drawing with light particles around

Scott Alexander, late of Slate Star Codex, and New York Times doxee, is the subject of Curtis Yarvin nee Moldbug’s latest.

Rather, Alexander’s (and Moldbug’s? see postscript) Bayesian rationality is. The rationality is there in the link buried amidst 20,000-plus words.

Alexander, a wordy bugger himself, apparently used to have this slogan at his old blog:

     “P(A|B) = [P(A)*P(B|A)]/P(B), all the rest is commentary.

This is Bayes’s formula, written, as most write it, badly, and in a way that leads to the Rationality Fallacy, which is the belief that rational thought is the highest form of thought. The commentary-quip is just plain false.

First, Bayes is a dumb formula, where by dumb I mean the old fashioned word. It does not speak; it contains no intelligence. It takes input, churns it, and spits out a number, just like a calculator. Calculators aren’t responsible for what you put into them, and neither is this formula. The highest form of thought is not calculation.

Second, Bayes isn’t even needed. It’s a nice tool, but nothing exciting. I don’t say jettison it, but for the sake of sanity, don’t worship it.

All probability fits this schema:


The uncertainty in Y is given by what you say in X. Everything you believe or assume, at this moment, everything that is probative of Y goes into X. If you know X_1 now and later learn X_2, then X = X_1 + X_2.

You can calculate both Pr(Y|X = X_1) and Pr(Y|X = X_1 + X_2). You can use Bayes to help in the calculation, or just jump to Pr(Y|X) (as in the link above), or you could employ one of dozens of other formulas depending on the problem’s complexity. Bayes is not about “updating” per se; it’s just a probability calculation.

Or you could use no formula and wing it, since almost all uncertainties are not quantifiable.

All these formulas are just that: formulas. The formulas are rational, yes. But all the magic comes in specifying the Y and X. And that’s mostly non-rational; in some cases wholly unrational. Yarvin uses this word in distinction to irrational, but more as a pejorative than a solution, which it is.

The Y is easy enough: it’s the thing you want to know about, suitably, or not, defined. The X is more mysterious. Hold that for the moment.

Here’s why Alexander’s way of writing probability is wrong, which helps explain why rationality is false god.

You can’t ever have a Pr(A) (or Pr(Y)). There is always at least an implicit X lurking, so that the real notation Pr(A|X). This isn’t just being pedantic. It really does show why rationalism is a failed philosophy.

Yarvin picks as one of his examples Y = “Obama’s birth certificate is fake”. He then gather two or three boatloads of words to cram into his X, and derives, using Bayes, his Pr(Y|X). Then he says (among much hemming and hawing) this represents the rational approach to evidence.

I’ve never brought myself to care about Obama’s origins. My take was always that if he was shown to be foreign born, Congress would have “discovered” an obscure codicil in some forgotten law that explained how Kenya in 1961 was “really” part of the USA. Or whatever.

That’s part of my X. Not Yarvin’s. Bayes is silent on both of our Xs. Yet we can each rationally quantify our Y accepting our Xs, then by following the formulas. Following (the proper, of course) formulas is rational thinking, which is a good thing.

Rationality is a real thing. It says, “Use these formulas, from math and logic, in this way without error.” That’s fine and sure advice. But—here’s the big but—rationality is silent on what goes into the formulas. What goes in is what’s it’s all about.

There is no way to bootstrap rationality. All thought has to begin in inspiration, sometimes called intuition, of which there are many kinds, none of them rational.

Here’s a prime example of non-rational thinking with which everybody agrees. The law of noncontradiction. You can’t have Y and ~Y (using Yarvin’s notation for not-Y, which is popular) true at the same instant. In our extended notation, it cannot be that Pr(A|B) = Pr(~A|B) = 1 (but, for geeks, we can have Pr(A|B) = Pr(~A|C) = 1 as long as B ≠ C).

There’s no rational way to prove noncontradiction. There’s no formula to get to it. You either believe it, or not. If we let noncontradiction be our Y, the only possible X is, in effect, “Y is obviously true”. The formula Pr(Y|Y is obviously true) = 1 works fine, and is rational. Yet there’s a lot more to thought than the blind following of formulas.

There’s also blind faith.

No universal—and we all believe scads of them, and need to—is capable of rational proof. No formula gets you to one. Except maybe in math, which does have limit formulas, though all of these ultimately rely on unrational or non-rational propositions.

Bayes as philosophy fails. It isn’t so much it fails because it’s dumb, but because it’s a way to disguise necessary unrational thinking and falsely label it rational. In every situation we must answer, which X for this Y and why? Rationalists pretend the X they have picked is rational, the only “obvious” choice. Unless you agree, you’re “being irrational”.

Then there is the profound difference between probability and decision. They are not the same. No Pr(Y|X) compels any action or decision. Your decision depends on what matters to you, not to me, even if we agree on Y and X. (I have more on this, coincidentally, later this week.)

There are some decision formulas, but they are just like Bayes. They are dumb and only work on what you supply; all require the unrational belief that this decision formula is the “best” or “proper” one. All morals and ethics are unrationally based.

Rationality fails in decision making because a lot of what goes into these decisions is imponderable, like universals were in probability, and variable, like the Xs themselves.

Rationalists say their loss function is the rational choice. You hear this every time somebody wants to force an action because that’s what “the Science says.” If you disagree, you’re a “denier”.

Moldbug and Alexander even give a good example here, but fail to connect it all up. Government officials adopt silly and harmful coronadoom policies, they say, whereas bloggers (ahem) beat them in predictions. This is because officials aren’t trying to just get the doom right, like bloggers: they’re trying to do power right, too. The two decisions aren’t the same.

Postscript Moldbug seems to be on my side here, at times, and also Alexander’s. He never commits. Yarvin thus reminds me of the rabbi in Hail, Ceasar!, in the scene where priests and a rabbi are asked to examine a movie studio’s new Christ-centered epic. The rabbi cavils, disputes, interjects, argues. Yet when asked his decision, says, “Eh, I haven’t an opinion.”

Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here


  1. Leo

    Question for clarification: “In our extended notation, it cannot be that Pr(A|B) = Pr(~A|B)”

    Would that be better rendered as Pr(A and ~A | B) = 0) ?

    For if perchance Pr(A|B) = 1/2 then Pr(~A|B) is also 1/2 and Pr(A|B) = Pr(~A|B).

  2. Briggs


    Good catch, sloppy notation on my part. Your way works, and so does the fix I put in.

  3. Vetrani Sui Sunt Circuli

    Moldbug wants to keep us talking.
    It’s against his interests that we stop talking, that might and probably would go somewhere unpleasant.
    And he’s very, very against unpleasantness.

  4. Jan Van Betsuni

    One glaring rationality problem with our Political Class Trust-Science-Crowd is that their X is a Trust-Us-X. If we all do X then probability of Y resulting is known (they claim with confidence). Because of course: “Pr(Y|X), (and) all the rest is commentary (or irresponsible science denial/authority scepticism)”. But this Trust-Us-X is not an Observational-X, or an Empirical-X. It is a still untested Hypothosis-X, this mighty “Trust-Us-X”. To which in time, other Trust-Us-Again (X(n)) may be added (governmentally) along Bayesian Highway.

    Postscript: A young lawyer receives two Hassidic men into the reception room of the firm. He asks: “Gentlemen, why have you come here today?” One replies: “Well, my business partner and I, we want to argue about something, and we want the senior lawyer to listen.” “Yes that’s right” says the other visitor “and we’ll pay.”

  5. George Crews

    Instead of the absolute probability that a proposition is true, what about updating its odds relative to another proposition?

    Pr(Y1|X) = [Pr(Y1)*Pr(X|Y1)]/Pr(X)
    Pr(Y2|X) = [Pr(Y2)*Pr(X|Y2)]/Pr(X)

    Combining and rearranging:

    [Pr(Y1)/Pr(Y2)] = [Pr(Y1|X)/Pr(Y2|X)] / [Pr(X|Y1)/Pr(X|Y2)]

    Thus, for any new evidence X within the domain of theories Y1 and Y2, we can update Y1’s odds of being true relative to Y2. There is no need to know the absolute probability of either theory nor of understanding the underlying reality.

    I believe they call this process of updating theories the scientific method.

  6. I agree to your statement that there is no blind faith.
    Faith should always be rational. Therefore I had done some investigations, which I have shared on the following site: http://www.ist-jesus-gott.de

  7. Dear Professor Briggs

    I am afraid I might be a frequentist. When faced with a math puzzle, knowing I am not good at math, I tend to do this. For example, looking at Yudkowsky’s mammogram, https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/ I do this:

    OK suppose we tested 100K women, of whom 1K have breast cancer therefore 99K do not have it.
    Of the 1K who do have it, 800 test positive. Of the 99K who do not have it, 9504 test positive. Therefore a positive test means 800/10304 = 7,76% chance of having breast cancer.

    I always do this with math puzzles. It always works. Yet, it might be frequentist?

    Another example from the PISA tests. We drove from A to be at 20 km/h, then we drove back at 30 km/h, what was our average speed for the whole roundtrip? So what I do is let’s say the distance is 30 km. So we drove there for 1.5 hours and back for 1 hour. That means our average km/h is 60km/2.5 hours = 24 km/h

    This unknown method makes hard math puzzles easy. But it might be frequentist?

  8. Briggs


    You are not a frequentist. Nobody is. There’s nothing wrong with using past frequencies to use as or in models as predictions for future ones. That makes you a logical probabilist.

    In a room are three Martians, A, B, and C. One will walk out the door. Given this, what is the probability B walks out?

    This is only one example of many kinds of problems, like counterfactuals, that can have probabilities but no frequencies.

    See the book and free stats page. About Uncertainty.

Leave a Reply

Your email address will not be published. Required fields are marked *