Is Most Published Research Wrong? Yep. Here’s Why

One of my students—my heart soars like a hawk!—sent me the video linked above. I’ll assume you’ve watched it, as my comments make reference to it.

About the dismal effects of p-values, the video already does a bust-up job. Plus, we’re sick of talking about them, so we won’t. I’ll only mention that the same kinds of mistakes are made using Bayes factors; only BFs aren’t used by most researchers, so few see them.

Some of the solutions recommended reduce (nobody can eliminate) the great flood of false positive results, like pre-registration of trials, are fantastic and should be implemented. Pre-registration only works for designed trials, though, so it can’t be used (or trusted) for ad hoc studies, of which there are many.

Reducing the drive for publication won’t happen, for university denizens must publish or they do, they really do, perish. Publishing negative results is also wise, but it will never be alluring. You won’t get your name in the paper for saying, “Nothing to see here. Move along.”

A partial—not the; there is no the—solution is one I propose in Uncertainty: The Soul of Modeling, Probability & Statistics, is to treat probability models in the same way engineers treat their models.

Engineer proposes a new kind of support to hold up a bridge. He has a theory, or model, that says, “Do X and Y will happen”, which is to say, “Build the bridge support according to this theory (X), and the bridge won’t fall (Y)”.

How can we tell if the theory, i.e. the support, works? There is only one true test: build the bridge and see if it stands or falls.

Oh, sure, we can and should look at the theory X and see how it comports with other known facts about bridges, physics, and land forms, all the standard stuff that one normally examines when interested in Y, standing bridges. There may be some standard knowledge, Z, which proves, in the most rigorous and logical sense of the word “proves”, that, given X, Y cannot be.

But if we don’t have this Z, this refutatory knowledge, then the only way we can tell If X, Then Y, is to X and see if Y. In other words, absent some valid sound disproving argument, the only way we can tell if the bridge will stand is try it. This is even true if we have some Z which says it’s not impossible but unlikely Y will stand given X the theory.

Have it, yet? I mean, have you figured the way it works for statistics yet?

Idea is this: abandon the new ways of doing things—hypothesis testing, parameter-based statements (which are equivalent here to saying only things about X and not Y given X)—return to the old way of building the models Pr( Y | X) and then testing these models on the real world.

Out with the new and in with the old!

Examples? Absolutely.

As per the video, Y = “Lose weight” and X = “Eating chocolate”. The claim many seem to have been made by many was that

    Pr (Y = “Lose weight” | X = “Eating chocolate”) = high.

But that’s not what the old ways were saying; they only seemed to be saying that. Instead, the old ways said things about p-values or parameter estimates or other things which have no bearing to the probability Pr(Y|X), which is all we care about. It’s all somebody wanting to know whether they should each chocolate in order to lose weight cares about, anyway.

Assuming you’re the researcher who thinks eating chocolate makes one lose weight. Perform some experiment, collect data, form some theory or model, and release into the wild Pr (Y | X), where the X will have all the supposeds and conditions necessary to realize the model, such as (I’m making this up) “amount of chocolate must be this-many grams”. And that’s it.

Then the whole world can take the predictions made by Pr(Y|X) and check them against reality.

That model may be useful to some people, and it may be useless to others. Scarcely any model is one-size fits all, but the new ways of doing things (hypothesis tests, parameters) made this one-size-fits-all assumption. After all, when you “reject a null” you are saying the “alternate” is universally true. But we don’t care about that. We only care about making useful predictions, and usefulness is not universal.

The beauty of Uncertainty’s approach is simplicity and practicality. About practicality we’ve spoken about. Simplicity? A model-builder, or theoretician, or researcher, doesn’t have to say anything about anything, doesn’t have to reveal his innermost secrets, he can even hide his data, but he does have to say Pr(Y|X), which anybody can check for themselves. Just make the world X (eat so much chocolate) and see if Y. And anybody can do that.


  1. I saw a comment yesterday about global warming that said we should take the models from the past and compare them to actual temperatures and CO2 values and see if any were accurate, rather than always predicting 50 years in the future. It won’t happen since it’s unlikely there would be a good match. Rather, scientists “tune” the model (read: rewrite to what actually happened) and then assume accuracy.

    The problem here is bridges collapsing make engineers look very, very bad. They have incentive to be right. Most research has no such outcome. One can be wrong over and over and there is no consequence. Comparing the study to reality doesn’t matter—no study collapsed and killed 50 commuters. It’s not exciting. The best hoped for flashy outcome is personal injury lawyers getting rich before the study is shown to be false.

    I can’t see the Pr(Y|X) catching on. People won’t understand that the idea is if chocolate makes you lose weight and they eat chocolate and lose weight that’s a good thing. If their neighbor gains 10 lbs doing the same thing, that’s bad, so the idea is wrong and bad. I don’t think people are going to understand the “doesn’t apply to all idea” and again, personal injury lawyers will have you flogged for suggesting this idea. Their new yacht will be repossessed as people realize the “harm” does not apply to everyone and without that “p” value, there’s no “proof” of the harm to show a jury. Come on, we need science to tell us who to blame and collect damages from.

  2. Eric Slattery

    I’ve been at grips with this for a while. My work (Clinical Research Coordinator III) is primarily Research in Concussions, specifically tools to identify and longitudinal tracking of concussions. So much research (not just my area) seems to be only in the theoretical, which of course has its place, but there needs to be transference to the field/clinic/whatever setting it suits. Ok cool, we have this tool that could be useful in identifying concussions because of X Y & Z, lets run some trials in the clinic and see if this can help us better identify said condition than the current mean we’re using. Some other considerations are (at least in a clinical setting) to identify time involved versus other measures, expenses by the institution to use said method, and if you can/need to bill the subject for doing said procedure. But sadly, many things just remain in the theoretical; usually the microscopic type research, IMO.

    Another consideration is the poor Statistics education at Universities. When i got my Masters at Miami University (Oxford) in 2014, i had to take only 1 barebones Statistics class. Of course i did a thesis and did all the stats to the best of my knowledge then, but i didn’t know there were other methods, better methods, and if any of my methods were actually futile. I did some Multiple Stepwise regression at the time, yes i know flog me :P, but i know better now and of other methods thanks in part of ditching SPSS (University Licensed) and using R to actually understand what I am performing when doing Statistics and the great community at Stack Exchange to help out with both coding and offering better methods. Stumbled on this sight when I read something about the problems with p-values and have been trying to expand my knowledge on other methods in Bayes Factors, Bayesian Inference, and other tools that fit situations better than the blanket p-values everyone relies on. Yes, they have their time and place.

    Summing up, i feel it’s a problem of the universities poorly training people, tricking them to think that what they teach is the end all, be all, and that there is a whole slew of stuff to discover if they would just investigate on their own and weren’t so restricted.

  3. Ken

    Richard Feynman’s 1974 Caltech commencement address, “Cargo Cult Science,” on learning how not to fool yourself. Feynman describes flawed, wrong, and willfully misleading science as “cargo cult science” (he doesn’t simply overgeneralize and then dump the entire discipline under a pejorative, “scientism,” and dismiss an entire discipline). Feynman’s description of Young’s study on how to properly set up experiments involving rats is illuminating – one must understand and account for all the factors, and that involves understanding tangible realities (not philosophy). But a sizable proportion of researches don’t/won’t do that.

    Feynman gets to the real fundamental issue, what “cargo cult science” was fundamentally about, he said, “It’s a kind of scientific INTEGRITY…” [EMPHASIS added]

    The factor at the core of the matter is INTEGRITY. Not knowledge of proper vs improper application of the analytical tools.


    – 2016 – American Statistical Assn (ASA) issues a formal statement on the p-value (link below)
    – 2005 – Ioannidis publishes paper inspiring the video presented, ‘Why Most Published … are False’ (link below)
    – 1974 – Feynman addressed the same fundamental issue (in broad terms, link below))
    – 1937 – Young did his study on proper rat maze design (still routinely ignored to this day)

    Does anyone really believe that, after some two generations and counting, researcher still haven’t figured out how to conduct proper analyses, or, they don’t really understand how much weight to assign to real vs apparent cause-effect and so forth?!?!

    No doubt some are ignorant and may be ‘cured’ by education, but that’s a minority; education & works such as Briggs’ recent stats book merely address a symptom of much deeper issues.

    Most researchers are, if not willfully unscrupulous, willfully self-delusional and more than willing to publish whatever they can (‘publish or perish!’ in academia, especially) using the moral logic that if ‘peer review accepts it, it must be ok’ (even when they know it ain’t) … deferring willfully to peer-review-as-authority (and thereby shifting moral culpability to others) along the same lines as Dr. Stanley Milgram observed the average deferring to authority in his famous experiment.

    The issue is one of INTEGRITY (or lack thereof), not education.


    ASA Statement on p-values, etc:

    Feynman’s “Cargo Cult Science” speech:

    “Why Most Published Research Findings Are False,” by John P. A. Ioannidis,

  4. Ken, when I refer to “scientism” , I’m referring to the notion held by some very reputable scientists (but not all) that science can explain everything. That, of course, isn’t so. For an illustration of the scientism I abhor, please see (department of shameless self-promotion) “Where is the Catechism of Science?” and the quotes given by scientists to the first four questions.
    I believe that science, carried out properly is a beautiful and noble enterprise. And I agree with Pope St. John Paul II’s quote:

    “It can be said, in fact, that research, by exploring the greatest and the smallest, contributes to the glory of God which is reflected in every part of the universe.” Pope St. John Paul II, Address on the Jubilee of Scientists, 2000

  5. berserker

    “build the bridge and see if it stands or falls.”
    – Make sure the engineer is standing under the bridge during testing.

  6. John J Murphy

    Good video; however, maybe you should look at his following video on
    “13 Misconceptions About Global Warming” where he pushes “The Merchants of Doubt” by Oreskes at the end.
    One of my colleagues suggested he should be asked for the “p-value” of the AGW hypothesis – and how he calculated it.

  7. Ye Olde Statistician

    he doesn’t simply overgeneralize and then dump the entire discipline under a pejorative, “scientism,” and dismiss an entire discipline

    Wittgenstein and the others who coined the term “scientism” weren’t talking about science. They were talking about the belief that the methods of natural science can and should be imported into all other fields of human endeavor, much as a man whose only tool is a hammer decides to use it to paint porcelain.

  8. berserker: Love it! Of course, an engineer really would stand under the bridge, drive over it and use it because he has developed the skills to know how to build a safe bridge. If he won’t stand under it, find another engineer!

    YOS: My doctor once told me, when I asked about why the specialist I was seeing was stuck on one idea and wouldn’t budge, “You give a carpenter a hammer and everything he sees is a nail.”

  9. Ken

    Bob Kurland/YOS – re ‘scientism’ comments:

    You two are presenting a measured, considered, application of the term here. That’s in contrast to how another(s) apply the same term — force-fitting (sometimes) and/or wildly extrapolating a bad example (or a good example of a bad application of science) to “scientism” and then overgeneralizing to suggest that’s representative of an entire discipline (“science”).

    Science (good science objectively conducted) has and continues to nibble away at theological precepts. Evolution was openly denied by many Catholic priests when I was a kid…now the Church has accepted it by making humans an exception of sorts. And so it will continue to go….

    It seems the unacknowledged argument, or fear, evidenced by recurring omissions of this key distinction: Much of the science passed off as “scientism” isn’t science nor is it claimed to be science by the scientists making the statements … which are not pronouncements of accepted fact so much as forecasts of what science will, someday, prove.

    It is the implications of what those proofs might turn out to be that cause some, to varying degrees, glom onto the “scientism” shtick — pre-emptive denial of proofs yet unproven. Proofs that will force some to amend, or reject, particular beliefs.

  10. “Is most published research wrong?”

    Of course it is. That is a silly question. At it’s best, new research brings new discoveries, new ways of looking at things, and often renders old research incorrect or just obsolete. That’s a good thing. Religion, for example, clings to dogma. It does not allow for very much change, and only slowly.


  11. imnobody00

    Ken’s definition of scientism is different from everyone else’s definition of scientism. If you change the meaning of the words, you can get to any conclusion you want (especially if you make claims that are obviously false).

    JMJ, the methods of science and religion are different because the subject is different. Science deals with the low-hanging fruit, the material world, because you can make experiments with it and you can rely on the laws of nature (which imply a lawgiver). Religion, philosophy and history cannot do that. This is why science progresses faster than other disciplines.

    Many people think that all disciplines are inferior because they cannot follow the methods and pace of science reveals that these people no longer put science in its proper place (as a useful tool to find some truths in a specific domain area) but they view science as the only standard of truth. This is called “scientism”.

  12. JMJ: It is clear you fail to understand the point of religion. Only a fool believes religion should evolve. That’s for elitist progressives to do—impose their new ideas on society and claim moral superiority.

  13. Ye Olde Statistician

    Religion, for example, clings to dogma. It does not allow for very much change

    Sorta like mathematics?

  14. acricketchirps

    Religious people are so dumb they cling to things instead of just adhering to them as bright people do.

  15. acricketchirps: What is the difference between clinging and adhering? They seem pretty much the same. Plus, do you mean by “things” ideas or actual objects? How do you know adhering is brighter than clinging or are you just stating an object of faith to you?

  16. acricketchirps

    Ideas *and* objects–guns and religion f’rinstance. As for the difference, ask a bright person; I’m numbered among the dumb clingers.

  17. “As for the difference, ask a bright person; I’m numbered among the dumb clingers.”


Leave a Reply

Your email address will not be published. Required fields are marked *