Fun quiz time!
This is some data: x = c(0, 1, 0, 1, 1, 0, 1, 1, 0, 1). Something caused that data to be. I know what it is, but you don’t. But you should. Indeed, I want you to tell me. Go.
Trouble starting? Let’s call the cause E, for explanation. Sometimes explanations are causes, sometimes only vaguer descriptions. We want cause, but I’ll settle for an explanation. Go.
Maybe some light notation makes it easier. Let’s write:
Pr(E | x)
for the probability the Explanation is true given our data. Notation makes the whole thing more scientific anyway. And it impresses outsiders.
Tough situation. How about this: E_1 = “A machine spits out only 7s”. The subscript hints there may be other explanations to come, which is also the answer, if you’re quick enough to catch it.
Well, we have nothing from Pr(E_1 | x), because there isn’t anything in x about logic, no way to tie the x to the E, which we need. X are just some dumb—can’t speak—numbers; E_1 is just some words. Call that logic L, which also contains the meaning of all the words, grammar and symbols used. We can then do this:
Pr(E_1 | xL) = 0,
with the obvious reason (logic L) that you can’t get 0s and 1s from a machine that says only 7. We can infer from L that we’re looking for some cause or explanation that makes the digits we saw. And possibility other digits, too, if we allow that in L. If we don’t, then we don’t.
My dear friends, again, this is the answer.
Think: we’ve only seen a handful of points. Who’s to say whether we won’t see a 2 or 42 next after the 0s and 1s?
I’ll tell you who: you are.
We have learned one solid thing (if you haven’t already grasped the answer). With a little logic, we can at least infer the falsity of some explanations. But that’s all we can do with logic.
At this point, regular readers should be put in mind of the discussion on why rationality is a false philosophy. Rationality only works with given symbols, manipulating them by set, proper rules. It is a necessary skill. But rationality does not say, at any time, what symbols, what data, what evidence, what causes are right and proper to consider. It is mute, too, on the decisions to make from the manipulations.
That, again, is the answer.
If you recall, those that consider themselves rationalists believe they have discovered great secrets about how irrationalists—those who do not agree on all the assumptions, symbols, etc. with them—make errors in thinking.
It’s rare for people to make mistakes in logic itself, in manipulating probability rules in situations common and important to them. Errors galore are found in artificial situations, of course, such as in classrooms, and in stilted problems given to demonstrate how irrational people “really” are.
Learning the difficult rules of high-order symbol manipulation is not easy, and not all can do it. Mistakes are common. However, most don’t need these tools, as elegant as they are. Grandma doesn’t know chemistry works but can still bake a mean pie. Knowing the rules of symbol manipulation, and modeling, in chemistry, if grandma did know them, won’t subtract from the pie’s goodness, but it would be a classic blunder to say the pie isn’t good because grandma can’t balance a reaction equation.
Sorry to repeat myself, but this again is the answer.
We’re after the explanation of x: we’d like the cause, because knowing the cause is to know all, but we’ll settle for a probability explanation. What is this explanation?
The answer is this: there is no answer.
Not until you—yes, you—decide on the possibilities. You must, absolutely necessarily must, bring in outside information, facts or assumptions not provided in the quiz to solve the quiz. The quiz doesn’t give any hint, so the quiz has no answer.
I brought in E_1 to tease you. It is obviously absurd, but we notice the absurdity only because most of us share much of the same rules of logic, and knowledge of symbols, rules, and grammars. Likely we won’t all agree precisely on these. But we could, simply by specifying them in complete rigor. Pack all that rigor into L.
This L will not be a small proposition; indeed, it will be very large. Spelling out in detail what “x = c(0, 1, 0, 1, 1, 0, 1, 1, 0, 1)” means is no small task. It only grows as we add the grammar and definitions of possible explanations.
And it is we who bring in possible explanations. None were specified as considerations in the quiz. They can therefore only come from us.
Now we can draw on our own Experience, which I’ll shorten to N (we used all the other good letters!), and say things like this:
Pr(E_2 or E_3 or … or E_j| xLN) = 1,
which is to say, given our experience N of data that “looked like” x and explanations associated with it, we limit our possible explanations to this set. It is we who say it is this set, and none other.
We limit is the key phrase. We say it is one of these. If we say the only explanation we’re considering is E_2, whatever that is, then E_2 is it. In other words, we start with
Pr(E_2| LN) = 1,
then adding x to the information on the right does nothing, and can do nothing, to change this probability, unless via L we know E_2 is impossible. It is only if we consider more explanations than one can adding evidence x change the “prior” on the explanations; e.g.:
Pr(E_2| LN) = Pr(E_3| LN) = 1/2.
These need not be 1/2 each: the portion is specified by your LN.
This is it! If we don’t add and limit explanations, we can’t even start. We can’t say, “Well, let’s just list all possible explanations and let the data decide which is best.” That itself is a limitation, albeit a loose one. Worse, the possible explanations, even for this wee data set, are infinite. Infinity is not just a big number: it goes on forever. (I’ll leave it as homework to prove this easy inference.)
Practical, finite limitations are thus always needed. And always by what you bring to the problem in your N and L.
For example, N might contain words to the effect that you’ve seen coin flips assigned numerical values like this string. Or (your N goes on to say) it might be a snatch of computer code. Or it might have been produced by the fevered brain of Yours Truly. Or on and on, stopping when you say so.
Or the infamous Q might have given it to us as key to Trump’s pathway to victory, which might still happen. Any day now. It’s coming.
Now, my N_B (for Briggs) says that that explanation is nuts, something close to an impossibility. My N_B, I stress (I don’t say impossible, because my N_B allows for bizarre things to happen). If I therefore see somebody announce
Pr(Q + Trump code = Victory| xLN_Q) = 0.999,
with a fraction of probability left over for what this Q fan considers some more mundane explanation, I might say to the Q fan “Your theory is nuts”. But I do so always with the understanding that my N_B (and likely L, too) is different from the N_Q of the Q fan:
Pr(Q + Trump code = Victory| xLN_B) = very small.
I would make an enormous mistake, worse than the Q fan’s trusting in Trump, if I said something has gone wrong with the thinking of the Q fan in any formal way.
Because I do not know how the Q fan arrived at her N_Q and L (the Ls might have subscripts, too, which I’ll leave off). And anyway, given her N_Q and L, accepting them as true, as we do all things on the right hand side of the bar, she has almost certainly not made any rational errors. She surely obeyed all the probability and logic rules.
It’s still worse than it sounds. For the quiz said “x = c(0, 1, 0, 1, 1, 0, 1, 1, 0, 1)”, but that does not stop anybody from augmenting that x with other data, sticking the flourishes into N, or calling them y (or whatever). That is, the quiz said x only, sans any other information, but many work with:
Pr(E_2 or E_3 or … or E_j| xyLN) = 1,
with the relevant “priors” Pr(E_i|yLN), where y can be anything we like. Augmentation can be negative in effect, too. For instance, y = “Remove that last 1 from x, because I don’t like it.” That kind of augmentation may or may not be a mistake, depending on what is in N.
In a classroom situation, such as in giving descriptions about woke versus unwoke female bank tellers named Linda (a famous problem), these additions are annoying, and strictly speaking wrong. People do badly on artificial problems like this. What happens usually is that, if untrained, people change the problem, such as by adding that y. Even if we tell them not to. This insouciance surprises many, so much that even build theories around it.
But no. Our debate is almost never on the rules of probability and logic. It is instead, and should be, on the N, L, and x (or y) each of us has in any situation. I write these propositions as if they are fixed, and they are, too, but usually only for an instant; they are usually updated from moment to moment with new thoughts. Real life thinking is not done like in a computer or classroom, and thank God for that.
Because computers have no intuition (of which there are several kinds), the highest form of thinking. It’s those intuitions that supply us with the N, y, and E_i. Rationality gives us L. The explanations and so on have to come from somewhere, and they do not come from any algorithm. Yes, algorithms can be written to cycle through pre-programmed explanations, but this is not thinking.
IT’S A CONSPIRACY
We are often lectured about “conspiracies theories” from our betters. I say the Q fan’s thinking has gone wrong because she has selected unlikely (to me) possible explanations. But an elite academic critic will say “dangerously” wrong, a conspiracy theory, an example of irrational thinking. This critic is doing what the rationalist is doing. And the same thing the student Linda bank teller mistake maker is doing. They’re all sneaking in hidden unacknowledged premises as if these were the undisputed correct ones.
The critic judges N_Q wrong because he uses N_A (for academic), which means something like this is happening in the mind of the academic:
Pr(N_Q| M_AL_A) = 0; Pr(N_A| M_AL_A) = 1,
where M_A is still further background knowledge, assumptions, or premises thought to be true, and from which N_Q and N_A are derived (it could be M = N, with proper subscripts). And we allow the rules of logic used by the academic might be different.
Well, I do not mean to wholly disparage the academic here, because I share the effect of his N, M and L, at least roughly. But this merely highlights our disagreement with the Q fan is not over rational thinking or logic, but in the evidence we consider relevant. We would have to agree with the Q fan if we accepted her evidence, and she’d have to agree with us if she accepted ours.
Up to this point, we more or less assumed near certainty in the explanations given the data, but it need not work out that way. If we had only vague notions, we might be left with uncertainty, and come to probabilities not too different from uniform across the number of explanations we’re considering. Nothing in the world wrong with that.
But if some lunatic blogger asks you to pick one and only one, and if you decide to play along, you must have a decision function, which is not the same as the probability. Not the same thing at all.
That is, you must first posit some N and L, maybe even a y, and some E_i. Then you must, using Bayes if you like (but it’s not strictly necessary; Bayes is just a tool), form your posteriors. You will come to some order (these all may be equal, too):
Pr(E_1| xyLN) ≥ Pr(E_2| xyLN) ≥ … ≥ Pr(E_p| xyLN)
where the indexes are swapped to indicate the order.
It does not follow that you should pick Pr(E_1| xyLN) as the best. Because best does not necessary mean most likely. It can mean that, if you say so. But you don’t have to say so. The best decision depends on what happens if you’re right and what happens if you’re wrong. And how you weigh these potentialities.
This is the other great hidden truth rationalist critics miss. They assume their “loss function” is your loss function. They assume that the best decisions they make for themselves are the best for you, too. Worst of all, they say that their choices are the rational ones, and yours the irrational ones. Because Bayes and obscure math.
The error a Q fan makes is not in deciding her explanation is best, because her loss function makes sense in her context, nor is it in proposing the explanation, because it isn’t impossible on any L. And indeed on many N, given the corruption of our politics, the explanation isn’t even that crazy.
No, the real error is ignoring all those x (the evidence) that are against her theory, and in limiting the possible explanations based on an expanded N (the expansion containing rejected evidence). Confirmation bias is real, a constant danger. Especially in those, like academics, who fall in love with their explanations.
But it’s only an error if my N is right, and hers wrong. If she turns out right, then it is I that picked the wrong N and auxiliary evidence. However, if she turns out wrong and again rejects the disconfirmatory evidence, she has made another mistake.
Because we don’t know for sure, at this point in time, who has the exact correct cause or explanation over many questions, we have disagreement. This is why, as I said above, the argument is, or should be, over which evidence is relevant. And over which loss function to use.
We see, time and again, that forgetting this is what leads people to scream “Denier!” or to insist on lockdowns and mask mandates while calling skeptics “Murderers!” These folks reject evidence that have already proven their explanations—also called theories or models—wrong.
***Incidentally, to get x I typed “(runif(10)>.5)*1” into R. The x I got was be conditional on many other hidden things, too, like the “seed” and my version of R, the computer platform, and so forth. I hope by now you see that I gave you an explanation, but not the complete cause.
All this and more in in Uncertainty.
Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here