Statistics

The Common Cause In All Reproducibility Crises

Here is a cartoon, but a revealing one. Type “reproducibility crisis in” in your search engine and let it suggest the ending of the sentence, as we did here:

Science, medicine of all kinds, psychology, physics, finance, economics. And so on.

Now these are all academic crises. They happen inside universities and similar institutions. The crises are in formal academic disciples and manifest in formal publishing, and are comprised of the growing horror that published papers, especially prominent—i.e. popular—papers in which that which was claimed true turns out to be false, or at least far from proven. Attempts to duplicate well established results fail at an extraordinary rate.

Of course, not all admit to the crisis. Here’s a lady (in the Proceedings of the National Academy of Science) who says the failures seen across all these disciplines aren’t really failures. The are “epochal changes” and represent “empowerment” and that if the succession of shocking re-non-results were given these more pleasant names, then this would be “inspiring, and compelling.”

Well, for some, feelings are more important than dull metrics, like model accuracy and Theory-Reality concordance. For others, what counts most are those dull metrics.

The crises are well recognized, and to a certain extent not political. Even Vox admitted (two years ago) that “science” has been in a replication crisis “for a decade.”

Here’s a conference on the crisis in, if you can believe it, “ML-based science” (thanks to Stephen Shipman for the tip). Speaking for many fields, one of the goals of this conference is to “Identify root causes of the observed reproducibility failures and explain why they have occurred in dozens of fields that adopted ML methods.”

Nature did a survey of working scientists and found only 3% of respondents said there was no crisis. That does not inspire confidence.

Take a look again at the cartoon, which turns out to be pretty accurate. Did you notice what is missing from the list? Seeing what is not there—hearing (so to speak) the dog that didn’t bark—is often the hardest evidence to discover.

It’s engineering.

But why is engineering missing? Engineering also suffers from the crisis, but to a far lesser extent. There is one key reason. Better, it will turn out that this strange reason is why the other fields, if they wanted, could be absent from the list, too. But aren’t.

It’s not DIE. Academic engineering is, of course, as subject to the cancer of DIE as the other fields. Injecting Diversity for the sake of Diversity weakens engineering just as much as it degrades any subject. DIE means what it says. Standards will fall. Certainly DIE contributes to the reproducibility crises, but it is not, after all, the premiere cause of them. That is something else.

The advantage is this: engineering, at least in its unwoke and least academic state, is in maximum contact with Reality. Theory exists, but is largely subservient to Reality. Not wholly: largely.

Theory also goes by the alias Models.

In engineering, there is much Theory, many Models, and must be. Theory is an explanation of the way the world works. The more accurate the Theory/Model is, the better the match to Reality, because the better it explains how the world works. Sure, there are Diversity bridges that fail, and even non-Diversity bridges that collapse because of false Theory and ugly Models. But it’s rare (but not impossible) in engineering for a Theory to be retained after it fails its confrontation with Reality.

It is not as rare in the other fields. There is variability in this, and some of these disciplines are more Reality-based than others, but all of the other areas are much more in love with Theory and Models.

The crisis arises when false or shaky, but beautiful, Models and Theory is believed because they are beautiful, appealing, complimentary, lucrative, political, or cultural, and not because they accurately explains the way the world works.

To extent a field relies on statistics, the deeper it will be in the crisis. Statistics in its “testing” sense—whether Bayesian or frequentist—is nothing but a way to provide evidence for the Theories in the minds of scientists. It’s not only statistics, naturally. This happens whenever scientists search only or mainly for confirmatory evidence, and have lost the ability to accept criticism. They can’t criticize themselves, and they certainly will not accept criticism from non-Experts.

Not so coincidentally, this was pointed out to me early this morning:

This is in psychology, a field which is saturated in statistics, would be dead without it. Those p-values represent the “testing”, which “confirms” Theory. Having the true belief of the originator of the Theory is what helps it replicate.

Now I realize I haven’t proved my own claim here, in these 800 words. But I ask you to think back to the years of analysis of failed science you and I have examined to provide that proof. Or consider this article a “Part I”. We’ll certainly be doing more.

Accepting our contention, the way out of these crises is obvious. Restore Reality. Make Theories and Models meet the test of Reality. In statistics in particular, it means abandoning “testing” and Model-centric thinking. And moving, like in engineering, to making skillful predictions.

Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.

Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here; Or go to PayPal directly. For Zelle, use my email.

Categories: Statistics

13 replies »

  1. Briggs: “…the way out of these crises is obvious. Restore Reality.”

    “Restore Reality” — I’d like to order the T-shirt, bumper sticker, and lapel pin.

  2. The reproducibility crisis dates back to the late 1980s, at least; when I began in science – but not many people talked about it then because there was still enough real science around that many hoped the real would prevail over the fake.

    But by 2000 – when John Ziman published Real Science – it was clear that science had become… something else, with almost no resemblance to the original.

    The deep reason is not a matter of the system, nor of how close to ‘reality’ a subtype happens to be. After all, a lot of medical research is very close to reality. But the evaluation of medical research was itself captured (and by the managerial-bureaucratic elements – as with ‘Evidence-Based Medicine’) to the extent that (for example) even the grossest and most in-your-face harms of new treatments were dismissed as ‘anecdotes’.

    The same will happen, more likely already has already happened, with engineering. Because feedback from reality only ‘counts’ when it has been directed via the bureaucracies and the mass media – therefore ‘reality’ has become… whatever They say.

    When a non-reality-based bridge collapses – it can be attributed to almost anything – climate change…sabotage by denialists (such as Emmanuel Goldstein) – or else, ultimately, simply denied outright: It never happened. There never was a bridge – prove it! (All possible proof being pre-dismissed as lies and deep fakes)

    Science is, or was, based on a genuine belief in, and dedication to, Truth; and when this is absent (as now, in nearly-all professional research) then there is no science – whatever methods and analyses are performed.

  3. The map is not the territory.
    Some people love maps so much, they never look up from the paper at the scenery until they walk off a cliff.

  4. Engineering has it’s share of “Theory-Reality concordance” failures.

    Take Finite Element analysis, for example. Widely used, but frequently fails to agree with reality (except in very narrow applications). A 1960’s engineer with a slide rule could get a more concordant result, in many cases.

    Practitioners who fail tend to get slapped with forever names. I know one who earned the forever name “Foot-Off”. He was in charge of setting levels as a bridge was constructed, end abutments towards the center. They met in the middle but with about a foot difference in elevation. Harsh but true. It happens.

    However, if Engineering ever does DIE, so will civilization. Could happen.

  5. Interesting. I had a suspicion where “evidence based medicine” would take us. Why? I could see the effects of politics in medicine in the ’80s. It has only gotten worse since then. Human nature hasn’t changed. Certain folk do not want to accept that. Reality is what it is; and that’s not necessarily what you want it to be. Seek that which is true and avoid that which is not true.

  6. Per bruce g charlton, a comment on substack neatly captured our problem: instead of evidence-based policy-making, we have policy-based evidence-making.

    If the policy is “more power for me over you”, then skillful people can be paid to manufacture the evidence I need, whether for anthropogenic climate change or pandemics or…

  7. Y’all need to chill out and understand that in the brave new world in order to attract diversity into the sciences it became necessary to take science in the direction of art school.

    Then let your creativity go wild!

    It doesn’t matter how stupid your abstract painting looks, nor does it need to be explained or be held up to scrutiny.

    All Dr. Picasso (Phd) needs to have is reproducibility of cash for his pieces sold to the higgest bidder at the auction that serves the picture of the world they wish to convince you of for their own ends.

    It’s just good business. It doesn’t need laborous years of research and data and observation. Just say “fiat”, and fire the paintball gun at a canvas, and so it is done, and Klaus Schwab sends you a cheque for a million bucks for your avant garde piece and you get to go talk on TV like a celebrity!

    Experts will LOVE and shower praise on your colourful insanity. Liberal hipster groupees too will jump on board as they go wherever the wind takes them. Normal people and children will scratch their heads and wonder just what are they looking at? And why are they being constantly told they have to love it, or else they are uncultured bigots?

    That is what’s going on here. The nude models come in, do what they are told to do, then calamity is painted in bright red around them. Everything resembles Hell. And now you all have to pay for all the blue and green paint to be added to put out the fire. Blacks and browns too, there is already white underneath on the canvas and it all needs to be covered up so that the work can progress towards completion.

  8. The 737MAX debacle was a possible engineering example of too much reliance on the models and not enough real world testing. Boeing owes a lot of profit to its ability to model nearly every aspect of flying a plane. But reality demands the plane actually fly in the real world with real human pilots in the cockpit.

  9. Well, yeah – human reality isn’t the same as engineering reality..

    While I do not think that most “social science” people do science or show much understanding of either stats or logic in their work many of them are, I think, getting a classic “bum rap” with respect to the irreproducability of many of their results.

    The reason for that is simple: to get past the cynical view of social theory as based on comparing two halves of a sample of 3 most “social science research” relies on samples of, usually, either 20 to 40 undergrads or large survey populations. In both forms the behavior studied is, for each subject, the outcome of a large number of both immediate and longer term factors most of which the subject does recognize and relatively few of which are shared between subjects. Thus the finer the behavior distinctions the researcher seeks to draw, the more dependent the outcome is on a combinatorial explosion of largely unknown factors affecting the subjects whose behavior is at issue; and, therefore, the more likely it is that any attempt to reproduce the result will fail.

    Observe your sample for a few days and you’ll find that people like to eat – and that’s a reproducible result because it’s easily categorized high level behavior. Put the same subjects in a MRI machine and see what brain twitchs pictures of specific foods produce, however, and you’ll get completely non reproducible results because the effects are extremely idiosyncratic.

  10. Reproduction is something the self-Darwinizing don’t do. Academia in general, including the semi and pseudo disciplines mentioned, is a bus careening over a cliff. They are not long for this world, and they will leave no progeny. On the plus side, a brighter, less cluttered future awaits the survivors.

    PS – Restore Reality is catchy. Trademark it posthaste.

  11. “To the extent a field relies on statistics, the deeper it will be in the crisis.”

    Many types of engineering rely heavily on signal processing, which makes heavy use of statistics. As an example, here’s an excerpt from Wikipedia:

    “In signal processing, the coherence is a statistic that can be used to examine the relation between two signals or data sets. It is commonly used to estimate the power transfer between input and output of a linear system. If the signals are ergodic, and the system function is linear, it can be used to estimate the causality between the input and output.”

    The Wikipedia article uses as an example ocean water levels (i.e., tides) and groundwater well level. Common sense perhaps makes this a poor example (too obvious), but it does come across, at least casually, as pretty darn convincing evidence of a causal relationship. However, this is deceptive, as the high coherence would still be present if the well were located thousands of miles from the tide gauge, pointing to a third unmeasured causal agent (the inertia of the earth/moon/sun and gravity).

    https://en.wikipedia.org/wiki/Coherence_(signal_processing)

    Engineering uses a build-test-revise approach, which provides the strong connection to reality. There is constant pressure to make this process faster and cheaper, so engineers increasingly rely on modelling, more commonly referred to as simulation (using models of sub-components). The models are tweaked until they closely match measurements. Even then, though, the conventional wisdom is “if your simulation fails, your device will definitely fail; if your simulation works, your device MIGHT work”. In practice, success is invariably preceded by many failures.

    It sounds like in some fields, the model is the goal, whereas in engineering, the model is more of a tool to improve efficiency. If an engineer were to skip the build-test-revise cycle and release a design upon the world based solely on a successful model, that engineer would soon be in search of other employment, or perhaps incarcerated.

Leave a Reply

Your email address will not be published.