Was Fisher Wrong? Whether Or Not Statistical Models Are Needed

The answer is yes. Or, if you prefer, no. It depends. Here’s a long query about the subject from friend-of-the-blog Kevin Gray.

Over the years many have disagreed about many things with the man some feel was the Founding Father of Modern Statistics.

Part of this has surely been a reaction to R.A. Fisher’s frequently caustic ways of interacting with colleagues, but the discipline itself has developed extraordinarily since he was active.

For instance, in an agricultural study say we want to test the main effects and interactions of soybean variety, amount of watering, amount of fertilizer, and soil compaction on yield. The design was structured (rotated) so that soil composition beneath and surrounding the plants would not bias the results.

In this example, there would essentially be no heterogeneity in any of the predictors. That is, the water and fertilizer would be identical across all cells as would soil compaction (which would be performed by a trained professional with a mechanical device). The seeds for the different varieties should also show essentially no variation within cell, either.

There is no true sampling error to account for. Do we really need ANOVA?

Rather than assuming that our experiment was “sampled” from a hypothetical population of identical experiments, wouldn’t it make more sense to set a criterion for effect size in advance, and if the experiment meets or surpasses this threshold, to repeat it?

If the results of the trials differ more than slightly, this means the methodology was not replicated exactly, a possibility ANOVA does not address.

We may wish to estimate the main effects and interactions of the predictors (variety, watering, fertilizer, etc.) via multiple regression, perhaps with a Bayesian hierarchical approach. I would focus on the point estimates and ignore the confidence/credible intervals.

Similarly, in observational studies when we perform causal modeling on population data, as is common in many econometric applications, point estimates are what matters. A t-test to see if a linear trend (for example) differed from zero would be meaningless.

So, my non-scholarly stance is that Fisher was wrong in this case.

The problem has always been, and still is in spite of recent attempts, a confusion about cause. The classical view is schizophrenic.

Gray is right: if we assume the only causes of yield are variety, watering, fertilizer, and soil compaction, then the only causes are variety, watering, fertilizer and compaction. Those are it. There are no others. As in none. Including imaginary ones. None. By assumption. That assumption is the key move.

There is therefore no such thing as “sampling variability.” And no need of ANOVA (basically regression), or for any other kind of statistical-probability model. All we have to do is set levels of all those causes, wait for the beans to do their thing, and measure them.

We still have to assume the measurement process itself has no error. This is often a very fine nice assumption. Go count the number of licensed working cars in your parking slot. I bet most of you, save those well into their third mimosa, this being breakfast time, come to a firm reliable error-free number.

Putting a ruler up to a speedy subatomic particle is a different question, and so is asking a person how fretful they are on a scale of sqrt(-32) to 18.7. The act of measurement is part of the system. We won’t worry about that problem here.

If we’re right about these four causes being the sole causes, then after we measure yield at a specific setting of those causes, we’re done. Finis. Too, there should be no deviation from the measured yields if we redo the growing with the same cause-settings.

Ah, but what if there is a deviation? Two or more runs of the same settings of the four assumed causes and two or more different yields? Assuming no measurement problems, then necessarily there must be at least one more cause than our four assumed causes.

And, even worse, it could mean that one, or even all, of the four assumed causes aren’t causes at all. Pause and reflect on this.

All this must be so because if there are only four causes, then necessarily we should get the same measurement each and every time. If the measurements differ, then our assumption is falsified.

Now it could be that when we repeat runs the measurements differ, but always by some trivial amount (we can discuss another day what to do if this amount is constant, predictable, or unpredictable). There are still other causes operating we haven’t assumed, or errors in the ones we did assume, but since the departures are not important, we don’t care. And again ANOVA isn’t needed.

ANOVA—which is to say, some probability model—is only needed if at any combination of the repeat causes, the measurements differ importantly. It’s easiest to think of this with just one assumed cause. Pick any. How about watering.

For instance, with water set to “level 1” (whatever this might be), we see “yield 1” one time and “yield 2” another time (where the two yields are importantly different). Clearly there is another cause besides watering, or the watering isn’t a cause at all. The latter is a logical deduction based on this information alone, and no outside information (such as what plants without water do).

There is now uncertainty in the eventual measurement. We express uncertainty using probability, quantified or not. The uncertainty is because we do not understand the cause of the measurements. We think the cause is water, based on information outside this system, but we know, because of differing measurements, there must be at least one more cause at work. It could be many causes. We don’t and can’t know.

If we did know, we’d be back in the first situation, and probability isn’t needed.

Therefore, the only time we need probability is when we don’t understand cause.

This is why probability can’t be used to discover cause, in spite of the many burgeoning claims that say it can. Whenever a claim like this is made, it is always because somebody has snuck in an outside premise having nothing to do with the measurement system at hand.

We learn cause by induction. We know water is a cause in plant growth because of induction. In our experiment, we start the problem by assuming water is a cause. We don’t back it out: we begin with it. We deduce (a weaker form of knowledge, in the scale of things) that another cause or causes must exist because we get different measurements at the same setting of water.

We can’t induce what these causes are with just the information provided. And we don’t need to, as long as we’re comfortable using probability to deal with the uncertainty in the unknown cause or causes.

Which probability model to use, should one demand quantification of uncertainty, is an entirely separate matter. I’ll leave off here saying the model is usually picked by custom, and for scarcely any other reason.

But at no point, ever, does probability come alive, become some real thing, become a cause or some strange physical thing. Probability stays in the mind, as part of your understanding only.

That’s why it’s strange that the unknown or mistaken causes (when we have differing measures) are said to be probability, and called “error”. It’s no so wrong to use the word error, because it signals a mistake in our thinking, but to say, as they do say, that the error is “normal” (or some other probability) is to say that probability becomes a cause. Which is absurd.

So, yes, Fisher was right and Fisher was wrong. It depends on the precise question asked.

Buy Uncertainty, if you want to read more about this topic. Or buy my new book and own your enemies: Everything You Believe Is Wrong.

Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here

Categories: Statistics

20 replies »

  1. Apart from beginning with a faulty model, for a given solar profile, such an experiment would depend more on the characteristics of the soil; such as it’s richness, consistency and soil-moisture characteristics, the depth to the water table, evapo-transpiration (all factors affecting the moisture content at the root level – there is an optimum for a given plant) and the genetic robustness? of the seeds/plants. Faulty (or highly variable) seed stock could throw the result of the entire experiment. It’s about consistency between elements and interactions between the plant, soil, water and nutrients etc.

    Very difficult if not impossible for an initial model to account for all these possible causal factors.

    In the past I’ve taken the view that we do the best we can to replicate precisely the same properties of each element of a trial, then measure everything we can possibly measure – and through a process of elimination find the ones that do not correlate – at least we know what wasn’t causal – but we still don’t know exactly what was.

  2. Statistical analyses and models for the ductal, tensile, and shear strength of steel that’s
    going into a bridge are invaluable discrete and repeatable. Measurements of various
    alloy compositions, forging temperatures, quenching etc. lead to tables that are precise
    and have an absolute and eloquent quality about them approaching certainty. Absent
    an honest methodology of inputs, measurement, and controls statistics become the greatest
    tool of mass deception ever devised and you see it every day. The most powerful predictive
    tool for the performance of material properties lies under a cloud due to it’s misuse from
    areas of study that do not meet these strict criteria. It becomes like a magic eight ball,
    a propaganda tool of deception in service of political, commercial, esoteric, and more often
    than not nefarious ends. Once trust is displaced and faith in truth is lost people ignore
    and revile the oppressor and all their tools of deception most particularly statistical models.

  3. George Box famously said that ‘all models are wrong,’ adding that ‘some are useful.’ The question is how far wrong can they be before they cease to be useful?
    Ellis Ott proposed the Analysis of Means, in which the various levels were compared to one another.
    The reason why Box’s observation was true is because no model can include all the suspect causal factors, as the OP notes. It is the net effect of all these ‘other’ factors that introduces variation between otherwise identical replications of experimental conditions.
    Ott recommended a max of four experimental factors because, as he said, with more than four factors you are more likely to screw up the conduct of the experiment than to screw up the arithmetic.
    I agree with Mr Incitadus that experimental statistics are better suited to studies of inanimate systems than to the animate. I personally found in industry that physical systems — metal fabrication, stamping, and so on — gave best results. Chemical systems, a bit less so; and biological systems less than that. And the joker in the deck was, as suggested already, the variability in the measurement system(s) used. Repeated measurement of the same object with the same micrometer often [though not invariably] yielded the name results; but repeated chemical analysis of the same batch, often less so. Though even measuring the diameter of a steel rod is no simple matter: there are an infinite number of diameters around the circumference and along the length to be sampled from.
    In the pseudo-sciences of sociology, psychology, et al., “the very act of identifying one’s object of study is already an act of interpretation, contingent on a collection of purely arbitrary reductions, dubious categorizations, and biased observations.” [D B Hart]

  4. Two observations:

    I like how Matt today snuck in an example of “models [can/will] only say what you tell them to say.”

    I’m such an innocent; I’m always bemused by how difficult it seems to be to have actually read (as in absorbed-read, comprehended-read) an admittedly wide-ranging and non-simple book like Uncertainty. One more reason for me never to run a blog; one more reason to admire Matt’s ability to do so.

  5. The full impact of statistical modeling on society comes into it’s own in the medical
    biological arena. Where treatment modalities are calculated based on the statistical
    analysis of outcomes with accompanying comorbidities to determine the cost
    effectiveness of various treatment options. Like when to apply a ventilator, dispense
    an expensive drug, or even remove life support; (which in some countries is becoming
    less of a patient/family centered option). All under the guise of scientism with the support
    of solid statistical data steeped in exhaustive detail, unalterable, other than exceptional
    circumstances, surprisingly dependent on nothing more than your ability to pay.

  6. The seeds for the different varieties should also show essentially no variation within cell, either.

    The selection of seeds is one source of experimental error, just for example.

  7. Most of what Kevin Gray said doesn’t make sense. For example,

    Rather than assuming that our experiment was “sampled” from a hypothetical population of identical experiments, wouldn’t it make more sense to set a criterion for effect size in advance, and if the experiment meets or surpasses this threshold, to repeat it?

    What does it mean to say that our experiment was “sampled” from a hypothetical population of identical experiments?

    An experiment is a process that produces (statistical, random) results, e.g., tossing a coin is an experiment.

    What’s random in the tossing-coin experiment?

    One key component in statistical modelling is the random error accounting for the unknowns and unpredictables, possibly due to unknown causes or measurement errors.

    I customize a model for a data set by examining the data structure, how the data were collected, etc., though I cannot tell you whether my customization would become a custom for practitioners.

  8. Was Fisher wrong? Is the Pope Catholic? Bayesian methods are needed for hypothesis testing under conditions of new information. Enough said. The Pope’s fidelity to the Church is a Bayesian problem, too.

  9. Well considering the fact that we still don’t have a clue how electricity moves from
    point A to point B within and now we’re told around a wire, much less how plant photo-
    synthesis creates matter. I’d say say the dazzling flourishes of statistical quantification
    are as good as it gets for now. In man’s eternal quest for absolute certainty statistical
    science and scientism itself fill the void once occupied by religion. Just like the
    church it assumes the mantel of authority for which the ever patient and expectant
    crowd eternally waits. And like the church it will furnish all the rationalizations
    for a cull of excess labor should mass disenchantment arise. Now couched and
    cloaked in metaphysical hubris the god of mathematics saves a planet not some
    dead bound for heaven soldier’s soul. Prince Charles speaks with knowledge of
    culls and the health of ruminants, and crazy father Philip displayed an inordinate
    fondness of deadly viruses to address overpopulation. Did he come back as one?

    Actually they’re pretty up front about their plans and they’ve got their next war on
    the front burner. Putin and Biden are all in and Xi is onboard, they’re all on the same
    team their only fear is us. According to the NYT the pandemic is finished and that
    desperate situation in the Ukraine must be addressed, be afraid stay afraid it’s all
    they’ve got.

    Prince Phillip’s Desire…

  10. @Incitadus: “Well considering the fact that we still don’t have a clue how electricity moves from
    point A to point B within and now we’re told around a wire, much less how plant photo-
    synthesis creates matter. “

    We’ve known for well over a century how electricity moves. Maxwell’s Equations explained it in 1862. (A much simpler version can be derived using geometric algebra.) Luminiferous ether theory was debunked back in 1887 (Michelson-Morley).

    Photosynthesis does not create matter. It converts solar energy (sunlight) into chemical energy. (This was taught in 6th grade science class back when I was a kid.) Plants mostly grow using water, nitrogen, and carbon dioxide. This is why higher atmospheric CO2 leads to faster plant growth, and why nitrogen fertilizers are used.

  11. We’ve known for well over a century how electricity moves. Maxwell’s Equations explained it in 1862.

    Do we? We have transmission line models that seem to work but that’s not the same as knowing reality. Imagine a model that is based on magic. How close to reality would it be?

  12. We’ve known for well over a century how electricity moves. Maxwell’s Equations explained it in 1862.

    Do we? We have transmission line models that seem to work but that’s not the same as knowing reality.

    Indeed. It seems that electrons move down the wire opposite to the flow of the “electrical fluid.” But then, Maxwell knew nothing about electrons, since they had not yet been invented.
    Newton, in the General Scholium of his Principia stated that he did not know what “gravity” was, but only how it functioned. Same thing here. Maxwell did not so much ‘explain’ electricity as ‘describe’ it. The physical thing need not correspond to the model. Remember, the Ptolemaic model of deferents, equants, and epicycles gave accurate predictions of planetary motions, but that did not make the model an actual depiction of the Ding an sich.

  13. electrons move down the wire opposite to the flow of the “electrical fluid

    If the models have any connection to reality then the elections barely move or move quite slowly. And the energy flow is on the outside of the conductors. There’s a Veritasium you tube that tries to explain how the model predicts flow but he kinda cheated and actually presented a transmission line. There’re even videos explaining his “explanation” because it’s counterintuitive. He also used physics instead of the lumped components most engineers use. A 300,000 km line will have almost instantaneous flow in both directions because it’s energy moving. Assuming correct models of course.

  14. Thanks, Matt!

    In the draft I sent you I’d forgotten to note that inferential statistics does not assess the probability that a theoretical or conceptual model underlying a statistical model is incorrect. In fact, it assumes both the theoretical/conceptual and the statistical models are correct. I see confusion about this in the scientific community. Likewise, inferential statistics does not address measurement error, errors in the experimental design or flaws in its execution.

  15. I’m very late with this comment but just to say that we don’t know how electricity moves.
    I asked someone who knew everything about it and he said we don’t know.
    Which is why ‘normal’ non initiated people don’t understand it, like me. It’s still in the theoretical stage of understanding.

    Grill an electronics engineer for long enough and find out that you just won’t understand in a way that’s ‘everyday’ and satisfactory for most of us.

Leave a Reply

Your email address will not be published.