Curious Notions Of Cause In Science & Statistics

Curious Notions Of Cause In Science & Statistics

You will forgive me if this post is more technical than usual. I try to leave out math, knowing how confusing it can be. But here it is necessary.

The Review

I was asked to review the paper “Causal emergence is widespread across measures of causation” by Renzo Comolatti and Erik Hoel.

It begins like this: “While causation has historically been a subject of philosophical debate, work over the last few decades has shown that metaphysical speculations can be put aside in favor of mathematical formalisms.”

No they can’t. How could they be? You must first have a philosophy of cause before you can represent it mathematically. You can write down any number of mathematical equations, but the act of mapping them onto (what you at least perceive as) Reality, philosophy enters necessarily. It is a philosophy to say equations represent Reality, in whatever way they do.

Skipping the intro, jump to the meat. As usual (in the field), they posit \Omega, the “set of all possible occurrences.” This is conditional on whatever assumptions are made to limit this set (examples below). Then comes their philosophy, masked as mathematics: “we can consider causes c \in \Omega and effects e \in \Omega, where we assume causes c to precede effects e“. That “precede” can be dicey, for some causes and effects can be simultaneous (hit your scroll bar for an example). They appear to have in mind time series-like causes and effects only.

Anyway, that is more philosophy; enough that I think it well demonstrated metaphysics are a necessity. It’s then, and always, a question of which metaphysics.

Now it’s very strange to have the causes inside the set of all that can happen, though it’s natural the “events” live there. Skip that for a moment. (We did “events” not too long ago.)

Here’s the setup (they let c \in C for all assumed causes, and same for events):

As we will see, in order to gauge causation, we will have to evaluate counterfactuals of c, and consider the probability of obtaining the effect e given that c didn’t occur. We will write this probability P(e | C\setminus c), where C\setminus c stand for the complement of c, by which we mean the probability of e given that any cause in C could have produced e except for c. Note that although conventionally written P(e) we will write P(e | C) to underscore the following notion: namely, that to meaningfully talk about P(e | C) (and P(e | C\setminus c)), a further distribution over C must be specified. That is:

P(e | C) = \sum_{c\in C} P(c)P(e | c)

where there is some assumption of a distribution P(C).

This is confused. First, there is no such thing, in any context, of an unconditional probability. So we cannot write P(c) and have it mean anything. It appears, however, that our authors to represent a cause picked from among a set of causes. But some cause has to do the picking. Which cause is this? It isn’t c and it isn’t in C. It lies outside in some mysterious way. Further, P(C) suffers the same fate of conditional, and adds a second layer of the unknown. Not only is c caused to be the cause of e, but something is causing C.

Like many, they have swept a philosophical assumption under the math and said “No philosophy here! But if something has to pick c, we’ll call that cause randomness.” Which is impossible.

Second, P(e | c) must either equal 1 when c is the cause of e—for c has caused e to happen—and equals 0 when c is not the cause of e. The math above still works, though, because if C is exhaustive of all causes under consideration, then indeed P(e | C) = 1 because just one of P(e | c_i)=1 and all other P(e | c_j)=0, j\ne i.

We can see the trouble. They are confusing knowledge of cause with cause itself. There is nothing wrong with having a supposed set of efficient causes, only one which we assume has operated, all of which are capable of the effect, but we don’t know which worked, and then using probability to assist in ascertaining the most likely of these.

For instance, somebody murdered Mr Body, and it can be one of Colonel Mustard, Professor Plum, Mrs White (before she was canceled for being white: yes) and so on. The murder is the event, the possible causes c are the individuals (we’re still being loose with cause). All six are C. If we begin only—the word is strict—-with the assumption it has to be one of these six, a true assumption inside the game, then P(c_i|g) = 1/6, \forall i and P(e | c_ig)= 1, \forall i. Thus P(e | Cg) = 1 as required.

In Reality, if an event—a definable observation in the world—happens, there must have been some reason it was so. It must have been caused. There must be a formal, material, efficient, and “final” cause. This in no way implies we know any or all of these elements in an observation.

Take mysterious lights in the sky. You see then, they happened, but you have no idea why, and don’t even care to guess, or aren’t experienced enough to guess. But you do notice they seem to happen at somewhat semi-regular intervals. You can model the occurrence using probabilities, where the information about intervals goes in c (again being loose with cause).

It is not as simple as all this, because we cannot consider efficient cause alone. Our authors seems to sort of understand this.

For any cause c, we can always ask, on one hand, how sufficient c is for the production of an effect e. A sufficient relation means that whenever c occurs, e also follows…Separably, we can also ask how necessary c is to bring about e, that is, whether there are different ways then [sic] through c to produce e…Yet these properties are orthogonal: a cause c may be sufficient to produce e, and yet there may be other ways to produce e. Similarly, c may only sometimes produce e, but is the only way to do so.

You can see how confused this is. And this is because the four aspects of cause are not laid out with care. Suppose you have to shoot a guy in the head. You do. He dies. You caused the death. But the bullet also caused it—if we’re being loose about which part of cause we’re discussing, formal, material, efficient, and goal/final. Next guy you have to shoot holds up a steel plate which stops the bullet. One form of cause has still operated, but another form has not. The event did not occur, even though one aspect of cause was present.

So let’s call a c that has all four aspects of cause a complete cause. Obviously, then, if c is a complete cause of e P(e|c) = 1.

They define sufficiency of a cause (not complete) as:

     suff(e,c) = P(e|c).

And necessity as:

     nec(e,c) =  1- P(e | C\setminus c).

To us, if c is a complete cause, then either P(e|c) = 1 or 0, and nothing else. Given c is the complete cause, then 1- P(e | C\setminus c) = 0 because there are no other complete causes in C. Or if c is a complete cause among others (you may also shoot), then again the total is 0, because C\setminus c still has a complete cause in it.

Nothing has been gained by adding these concepts. Much is gained in understanding the full or complete cause of any observations, such as me deciding to pull the trigger and the resulting execution.

Again, we must keep clear the difference between cause and knowledge of cause. They are not the same. Probability works for the second, but not the first.

Our authors spend time showing how these measures might fit in with other schemes or theories of causality—except the classical Aristotelian view, which we espouse. They create measures of “causal strength” and show how various philosophical theories of cause can be cast as functions of suff(e,c)), latex nec(e,c)$, and base probabilities like P(e|c).

They begin with Hume’s constant conjunction. But Hume used that argument to say that cause could never be known. Hume was the arch skeptic, responsible for an endless amount of confusion.

I’ll skip the math, because it all has the same troubles as the original sufficiency and necessity measures.I’ll also again note that it is a fine thing to use probability to use probability to pick a most likely cause, or aspect of cause, from an assumed set. But the assumptions must be there. It doesn’t matter when, in an inference, the assumptions are made. Probability isn’t magic. You can make them after making the observation. But you have already made at least some assumptions when you make the observation. It is impossible not to. You have to demarcate this thing that happened somehow. There is no escaping philosophy.


The subject of counterfactuals is, however, important. Here’s their example:

[Y]ou go away and ask a friend to water your plant. They don’t, and the plant dies. Counterfactually, if your friend had intervened to water the plant, it’d still be alive, and therefore your friend not watering the plant caused its death. However, if the Queen of England had intervened to water the plant, it’d also still be alive, and therefore it appears your plant’s death was caused just as much by the Queen of England. This intuitively seems wrong. How do we appropriately evaluate the space of sensible counterfactuals or states over which we assess causation? As we will discuss, there are several options.

This seeming paradox is caused in the same way many paradoxes are: by forgetting what one has conditioned on; or forgetting that conditioning, i.e. the assumption made, is crucial.

If you begin by assuming the cause under consideration is “friend no water, plant die; friend water, plant live” then the friend not watering, by assumption, caused the plant to die. Nothing else could have caused the death—even if something else did! That is, you can have no knowledge of any other cause. Because you only allow that assumption.

If you instead broaden in to “friend or Queen no water, plant die; friend or Queen water, plant live”. If the plant is dead, all we know, is that either your friend or the Queen didn’t pass the bucket. If your friend swears he watered, and you believe him, then you have falsified your assumptions. You may then create new ones. Recall the timing does not matter.

So this counterfactual attack, or requirement, including Pearl-like “do(x)” operators, to understand cause does not work. As always, we must keep in mind the distinction between knowledge of cause and cause itself.


Quite a lot of people like the idea of “emergence”, using analogies of ant colonies and so on. Individual ants are tiny-brained and have limited behaviors. Jointly, though, the hive exhibits a high degree of complexity. The hive-level behavior is said to “emerge” from the simple behaviors below.

This is true in one sense, and useful, but false in the sense that somehow the hive is alive itself, that it thinks and behaves and directs the behaviors below it, as an entity itself. There are no causal powers of the hive that can accomplish this. It is, in the end, only individual ants behaving at the circumstances around them dictate.

Thus explanations of cause (or consciousness) “emerging” from below fail.

So I’m not real clear on what idea our authors are pushing. They have an equation for Causal Emergence:

CE = CS_{macro} - CS_{micro}

If CE is positive, there is causal emergence, i.e., the macroscale provides a better causal account of the system than the microscale. This can be interpreted as the macroscale doing more causal work, being more powerful, strong, or more informative, depending on how the chosen measure of causation is itself interpreted. A negative value indicates causal reduction, which is when the microscale gives the superior causal account. Note that the theory is agnostic as to whether emergence or reduction occurs.

If I understand them right, this makes their same mistake of mixing up knowledge of cause and cause itself. This would seem to be a measure of probability usefulness, with very little of actual knowledge of cause about it.

Continue with the ants. A pure probability model at the hive level, for predicting whatever observables about the hive you care to name, might be more useful than semi-causal models of individual ant behavior at predicting hive-level observables. Recall probability can be silent on cause and still be useful. Ask casinos.

But a complete causal model of all the ants in the hive must be a superior model to the hive-level probability model. Obviously, if we know everything, we can predict everything. The former model will be huge and complex, and is likely impossible except to gross approximation. The latter probability model can beat the more complex model in cost, time to run, even results.

In the olden days of yore, I would critique papers and email the authors letting them know I had taken their names in vain, and offer space for a rebuttal. I was never taken up on it. I think (and I have a poor memory for this) I was only acknowledged once, out of I don’t know how many papers.

To be fair, since I have no position or authority, rebutting my criticisms does nothing for anybody’s career even when, Heaven forefend, I’m wrong. So I’ll let this lie.

Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.

Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here; Or go to PayPal directly. For Zelle, use my email.


  1. Hagfish Bagpipe

    Briggs: “…some causes and effects can be simultaneous (hit your scroll bar for an example).”


  2. Incitadus

    Briggs writes:
    “But a complete causal model of all the ants in the hive must be a superior model to the hive-level probability model. Obviously, if we know everything, we can predict everything. The former model will be huge and complex, and is likely impossible except to gross approximation. The latter probability model can beat the more complex model in cost, time to run, even results.”

    Yes and ants too can be deceived…but the level of human deception is what sets us apart.
    We literally can believe anything fact or fiction it is the primary disadvantage of language
    based consciousness. Misdirection and lying with statistics, a language unto itself, should
    come as no surprise. It’s the ghost in the machine that eludes them. Who besides Microsoft
    said “It all hangs on a word?” Ants have a language all their own we can’t understand it but
    they can smell it, feel it, see it, hear it, and act on it just like we do.

  3. The paper nicely illustrates two things:

    1 – the use of a buzzword interpolator to make the simple sound deep:

    “emergent macroscale models are more useful to intervene on and understand the system in question with causal emergence can reveal the intrinsic scales of function in opaque non-engineered systems where the scale of interest is unknown, like in gene regulatory networks”

    Gee.. you sure guys?

    2 – what they actually report is that most of the uses of stats to establish causality in science that they reviewed come down to the statement: “these observations are associated with this other one this often ” and if the implied ratio approaches one, the science approaches certainty about causality.

    Oh wow! tenure, now…

  4. Fascinating as always, my friend. I’m reminded of the following:

    Story The First:

    Moment in Time

    ‘What is Fate?’ Nasrudin was asked by a scholar.

    ‘An endless succession of intertwined events, each
    influencing the other.’

    ‘That is hardly a satisfactory answer. I believe in cause and

    ‘Very well,’ said the Mulla, ‘look at that.’ He pointed to a
    procession passing in the street.

    ‘That man is being taken to be hanged. Is that because
    someone gave him a silver piece and enabled him to buy
    the knife with which he committed the murder; or because
    someone saw him do it; or because nobody stopped him?

    And the sting in the tail of Story The Second:

    “And what happened to the jewels and treasures which the
    evil courtiers had usurped from the treasure-chest? That is
    another story. As the incomparable Nasrudin said: ‘Only
    children and the stupid seek cause-and-effect in the same

    A couple of ancient Sufi stories, as told in “The Exploits of the Incomparable Mulla Nasrudin” by Idries Shah. The web page is greatly worth reading, it has several dozen stories.

    My very best to you and yours,


  5. Paul Daly

    That’s right.

  6. Rudolph Harrier

    The paper gives the definition of P(e) = P(e|C) = the sum of P(c)P(e|c) for all causes c in C. You note that this has a problem, because the “absolute” probability of P(c) is not defined. (There is also the problem that the set of “all possible causes” is probably uncountably infinite in most applications, and so the sum listed may not be defined in any meaningful sense. But put that to the side.)

    But if we were to take the paper at its word then it would lead to a recursive definition. That is, since the paper purports to not only be a mathematical model but an actual description of causation in the real world, then c must itself be an event. For example if I asked my friend to water my plants and he didn’t, and thus him not watering my plants caused my plants to die, then there must have been some cause for my friend not watering my plants (ex. maybe he’s annoyed at me, perhaps he lost the key to my house, a random whim on his end, etc.) So if we were honest about the definitions to calculate P(c) we should sum up P(d)P(c|d) where d ranges over all possible causes for c. And thus to calculate P(d) we need to do the same thing with a new set of potential causes for d, etc.

    So they get themselves into a bit of a first way situation. One of two things happen:

    1.) We end up with a recursive definition for probability and never to any known “initial term” probabilities. Therefore no probability can ever be calculated.
    2.) We eventually do get to some probabilities which can be calculated. But how could this happen, given the definition? The only plausible way is to find causes c so that P(c) = 1 or P(c) = 0. Those would be metaphysically necessary or impossible events. So to establish any probability you essentially need to trace the chain of causation back to God’s existence and then determine the probability of an event given God’s existence. Not exactly easy!

    But if you say “Surely there are some events that we can know the exact probability of without using this definition, well before we talk about necessary or impossible events” then why are you even using the definition at all? Just calculate P(e) without worrying about the causes for e.

  7. Cause is a solution to the conditional “if and only if”.

  8. Joy

    It begins like this: “While causation has historically been a subject of philosophical debate, work over the last few decades has shown that metaphysical speculations can be put aside in favor of mathematical formalisms.”

    No they can’t.

    That’s what I was going to say!
    Except you used maths to explain?

    Dean Erricson,
    As for scroll bars and cause/effect, it’s a linear event, nothing strange or unexplainable about that.
    Spongecake is an emergent property of self raising flour, egg, butter (if it’s caused right)
    No two are ever the same, even mixed in a machine.
    Well maybe not but the experiments are fun

  9. Eric

    No jokes in comments yet about the “cause and probable effect” of submitting a word salad paper with meaningless equations that predict nothing? I’m disappointed.

    We even have a statement from the reviewer offering a high probability predicted response with wee pee bc datapoints!

    “ In the olden days of yore, I would critique papers and email the authors letting them know I had taken their names in vain, and offer space for a rebuttal. I was never taken up on“

    Yup, that’s a wee pee if I ever saw one!

    Predicted Rejection!! (we should formalize an equation…)

  10. Joy

    Casue is a solution to the questions
    What?, which? who?, how?, when? and where? but not to “why?”

    Seven questions, one of which is mystery, always will be for the ultimate question
    “no reason why” is just one potential answer.
    “there is no why” is not a sensible answer
    “why?” is not a sensible question to a scientist, it’s irrelevant, which is why they get so cross when people ask.
    “why” is the question to ask a living person or a living God
    People are mixing “why?” and “how?”; generally those who have faith in God.
    Why’s my favourite.
    Someone said
    “why?” comes before investigation and is the driver of science, but I’m paraphrasing.
    Dav said it, so it must be true
    I don’t want any salad cream though I prefer ranch dressing or Newman’s salad dressing
    or nothing at all, but I’m flexible about the salad cream in truth, we don’t get an option most of the time

  11. john b(s)

    And Cause never was the reason for the evening
    Or the tropic of Sir Galahad.
    So please believe in me
    When I say I’m spinning round, round, round, round Smoke glass stain bright color Image going down, down, down, down Soapsuds green like bubbles
    Oh, Oz never did give nothing to the Tin Man That he didn’t, didn’t already have

    America – Tin Man

  12. Joy

    Ah, Johnby
    There’s an American singer who sings abut the Tin Man, either Tim McGraw, or the other one, Kenny Chesney. Alexa spells names! Love the sir name McGraw, reminds me of Feathers McGraw

Leave a Reply

Your email address will not be published. Required fields are marked *