How Do You Know If An Experiment Works? Or, Yet Another Argument Against P-values

How Do You Know If An Experiment Works? Or, Yet Another Argument Against P-values

Listen to the podcast on YouTube, Bitchute, or Gab.

Anon asks questions about models (I added the bold question brackets and the link). Pay attention most closely to [Q2], which is our main point today.

Thank you for all your material regarding the topic of mathematical modeling.

I have read your paper called “Models Only Say What They’re Told to Say.” But unfortunately, I do not understand the theory of it. Would you like to elaborate a bit on it?

[Q1] If these definitions are correct: “Causal inference is focused on knowing what happens to Y when you change X. Prediction is focused on understanding the next Y given X (and whatever else you’ve got)”*, why do models only say what they are told to say?

[Q2] Let’s say I want to investigate the usefulness of wearing masks. So, I will measure the relevant factors concerning the virus. Then, I will implement the intervention, in this case, the wearing of masks. After that, I will do a measurement again. Would such research not provide the usefulness of the masks (assuming that the research design is correct)?

[Q3] Or let, in another setting, I want to predict the revenue of my bakery. Hence, I will use weather data, sales data, calendar data, and the like. This setting will provide me with valuable predictions that are approximately in concordance with reality. Hence, the bakery can create a better staffing schedule. How does your theory apply to this case?

I would love to hear from you.

God bless.

[Q1] Because they can’t say anything else. Here’s a simple model: “If input is between 0 and 7, print success, else print failure.” That model can only say success or failure, and nothing else. Because that is what we told it to say.

We accept also the premise that “Nothing breaks.” We could entertain a different premise, like, “The model might break, because the input goes haywire, and then anything goes“, or any of a set of similar premises. And even then the model only says what it is told to say conditional on whatever implied premises are deduced from that anything goes.

But there is no warrant to suppose any of these infinite number of additional premises.

The goal we should always have is understanding cause. If we knew the full cause (formal, material, efficient, and final) of some observable Y, even conditional on external circumstance X, then we’d know all there is to know about Y. See also this post on the difference between essential and empirical models.

But the point is simple: models can only say what they’re told to say, because they have no power to do anything else.

[Q2] This is an excellent question, and the heart of all research.

If you run any experiment, or take any observation, of Y (say, coronadoom infection), such that you assume the only difference in conditions is X_1 (masks on) or X_2 (masks off), then given that assumption, the only difference in Y, if there is one, must be caused by X.

Consider this carefully and slowly.

We have only three possibilities. (1) Y is the same regardless of X, (2) Y always changes the same way between X_1 and X_2, or (3) Y changes in different ways, even with the change from X_1 and X_2.

Under (1), you have proven, contingent on believing the only difference is X, that X has no bearing on Y. X is not causative of Y. You may dismiss X as interesting to Y. This is a very rare, to non-existent, case in contingent Y. Y is constant here!

I cannot stress enough that this is contingent on believing or accepting the only difference is X. We can’t, like in [Q1], assume additional premises that there are other possible causes, because we assumed X is the only possible difference.

Under (2), we have proven that whatever happens to Y under X_1 always happens to Y under X_1, and whatever happens to Y under X_2 always happens to Y under X_1.

If Y = no infection always when X_1, and Y = yes infection always when X_2, then we must conclude masks work. Contingent—and here is that same stress!—on the premise that X is the only possible cause.

These onlys and alwayss are strict as possible. No deviation in the least is allowed!

Under (3), we have proven the model that X is the only cause is false. We assumed X is the only cause, but Y varied such that sometimes Y was infection with mask, sometimes no infection with masks, and vice versa.

Masks might still be a cause against infection sometimes. But masks cannot be the only cause.

Consider your experiment showed 70% infections in the masks group, and 75% infection in the no masks group. If masks were the only cause, and prevented infections, then we should have seen 0% infections in masks group, and 100% infections in the no-masks group.

This wasn’t true, so that it cannot be masks are the only cause. There must be other causes. What are they?

We do not know!

We only made an assumption about masks being the only cause. We proved that that was false.

Again, there must be other causes at work. The results are consistent now with “masks sometimes work” and “masks never work”. The other causes, of which we are ignorant, must be explaining at least some of what we saw, and might explain all of what we saw.

If you can see this, you see far. Incidentally, this is yet another proof for the silliness of hypothesis testing and p-values.

Now let’s see why despair is not warranted with your third question.

[Q3] We answer this the same as your second question. Your bakery model is of some form, and uses weather input and so on, and predicts revenue. When the inputs are all some identical value (each), the model will spit out the same prediction for revenue. Because that is what you told it do. Simple as that.

For masks, we can’t be sure about cause, but we can build a model based on what we saw and predict infections based on masks and no masks. It’s purely a correlation model, like your bakery model. But that doesn’t mean it can’t be useful.

We might look outside the model and investigate individual cases: what caused the revenue on this day, or what caused infection in this person. Huge problem! Not impossible, just very difficult.

Most scientists want to bypass that difficulty, and desire to conclude their correlational models proved cause. This is why they are always “rejecting nulls” and the like.

But it’s wrong.

Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.

Subscribe or donate to support this site and its wholly independent host using credit card click here. For Zelle, use my email:, and please include yours so I know who to thank. a


  1. ” (assuming that the research design is correct)”

    That is to assume everything important; and there is no definition of correct research design, and no design for any research was ever correct – in the sense of inevitably being able to answer any question finally.

    There is no such thing as scientific method, and never has been – except as a wholly arbitrary assertion. In particular science is orthogonal to ‘statistics’ – has nothing necessarily to do with statistics. Therefore statistical analysis has nothing, as such, to do with science.

    (The main role of statistics should be to allow a scientists to get an overview of numerical data, when there is too much for him to hold in his memory at once, or when it is presented in a confusing way. Statistics main role in science is therefore summarization, including – valid forms of – clarification by summary.)

    The only guarantee of research being good lies in the *motivation* of scientists: who did the research to seek and speak the *truth* in that particular area of interest.

    This truth-motivation includes the scientists who did previous studies – i.e. the work of those on which the current investigation has been based must have been truthful; the piece of research under question must seek truth and be truthfully written; and truthful motivation is necessary those who read, interpret and evaluate that research.

    This is a lot to ask, and has hardly ever been the case for many areas of science in history – but it has been so in particular times, places, and subjects over a period of a few hundred years up until recent decades.

    It is now not the case in any area of well-funded or high status professional research – and we are back to a few (mostly scattered) individuals who do honestly motivated science, using mostly very selective and older data, mostly communicating among themselves – or not communicating at all, because they have no genuinely scientific community with whom to communicate.

  2. Ye Olde Statistician

    cf. Nancy Cartwright, The Dappled World. All scientific laws are ceterus paribus (all wlse equal), Physicists will take enormous pains to block or screen the additional variables in an experiment, in which the can control most factors. She uses Newton’s 2nd Law [a=F/m] as a type example. It is patently false as a universal law. Try dropping a cannonball and a dollar bill from the tower! Now try dropping a wadded-up dollar bill and discover that Force and Mass are not the only factors affecting acceleration. The wadded-up bill has the same mass as the unwadded bill, yet will accelerate in a more satisfyingly Newtonian manner. The law remains valid in the absence of air resistance, hence in a vacuum. Or in the case of a charged body, an e/m field. Dropping objects in a swimming pool will introduce other problems, such as buoyancy. If this is true in science — and we have not even mentioned the very fast [relativity] or the very small [quantum] — how much more so in crypto-sciences like sociology or economics, where lab conditions are seldom as controllable.

  3. Steve

    “Under (2), we have proven that whatever happens to Y under X_1 always happens to Y under X_1, and whatever happens to Y under X_2 always happens to Y under X_1.”

    Curses! Your enemies have been at it again!

    [Q1] Absolutely right. An easier way to see it for those who don’t like all these Xs and Ys is using the number of heads in a room to predict the number of shoes. Generally, it will be roughly twice, but it really depends on the circumstances. That model probably will not be right at a swimming pool, or at a convention of quad amputees, for example.

    [Q2] Back when I learned experiment design as an engineer, it was common practice to lump all unknown causes into a single variable. I take it this is no longer done?

    [Q3] Assuming science has to do with understanding the causes of things in an assumed rational universe, I can’t think of any reason to believe that rain in some way causes the sale of 3 more crullers. It might change the behavior of actors, like they step into your bakery while waiting for the bus, and 3 of them opt to buy a cruller. A correlation can obviously be useful. However, as Bruce explains, science demands one appreciate the difference between causation and correlation.

    Ye, the crumpled bill is a great example of both confounding and the dangers of extrapolation. Strictly speaking, though, isn’t F=mA just a definition, similar to c=pi *D or flux=sigma*T^4? Just as one would not expect that pi*D would come up with the right c for a “circle” with 4 sides, the Second Law does not necessarily hold outside it’s assumptions.

  4. gareth

    If Y = no infection always when X_1, and Y = yes infection always when X_2

    Shirley, this applies not only in the case of “always” but also in the case of “n%”, whatever value of n (between 0 and 100) is assumed?

    And, presumably, that n% value has to be an input, either explicitly or as a result of some function also explicit in the model (however deeply buried) ?

    So, maybe models only say a percentage of what they are told to say 😉

  5. Milton Hathaway

    “Models only say what they are told to say” is one of those statements that sounds counterintuitive when you first hear it, then obviously true once you think about it. Perhaps the most interesting thing about the statement is that the obvious needs to be repeated constantly to keep us humans from drifting off again into magical thinking.

    I like the “only cause” explanation in the [Q2] response. We are constantly bombarded throughout our lives with stuff like “coffee causes [bad thing]”, supported by some recent study that showed a slight difference in percentages. So I’m drinking a cup of coffee at work, and someone will walk up to me and say “you know you are going to [bad thing] by drinking that crap”.

    So I tell myself that some percentage of the population drinks coffee and [bad thing] is a sure thing, whereas the rest of the population is unaffected by coffee in this way, or perhaps even derives some immunity from [bad thing]. Then right on cue, there is a PSA on the tube where a serious-sounding expert in a lab coat says “everyone thinks they are that one person that can drink coffee and never [bad thing]”, with a near-death [bad-thing] sufferer grasping at life in the background.

    (And yes, I chose coffee because its image has been largely rehabilitated, and it’s now a great anti-oxidant or something, and will scare away all sorts of bad things.)

    Eventually I decided to take Rush’s advice, ignore all the studies and just live my life in peace. (But I really, really, really wish Rush had gotten one of those whole-body cancer screening scans the local hospital is constantly pestering me to take. I’m not going to get one, though, so there’s that.)

    Early in the Covid pandemic, I told a very worried coworker, when discussing the shutdown, “Life is risky; a risk-free life isn’t possible, and wouldn’t be worth living if it was”. The look of utter horror on his face will be forever etched in my memory. It’s like he thought he was going to cure cancer someday or something.

  6. awildgoose


    A risk-free life is exactly the future the controllers have planned for us.

    It is mostly clearly seen in their lunatic, “15-minute City,” and Saudi, “The Line,” concepts.

  7. Steve

    @gareth, if it works that way 20% of the time, all that tells you is that you may have isolated a cause of greater, less than, or equal to 20% of incidence. That is, it tells you little more than you knew going into it. Nothing more in terms of science and causality. The causality could be zero, and there were confounders that your experiment design was not good enough to catch.

    Most of my older PE colleagues are OK with moving forward on a 90% for a system that has non-catastrophic worst-case potential. That is, there is at least one more cause out there that we could not find, but we are OK with a scrap rate of 10%. We would not be OK with a 10% chance of Bhopal. Bear in mind, though, that when doing that, we are functioning as statisticians, not scientists/engineers, the distinction @Bruce made above.

  8. Not an argument against p-values at all, since they are just a distance of how far what you observe from a well-designed experiment(s) is away from what you expect under a model that you’ve tested as many assumptions for as you can. You’re arguing for p-values, you are just unaware of that.

    Yes, obviously researchers try and think of and control for other factors in the experimental design, and try and replicate the experiment.

    The just a mask or no mask and then see if there is infection, and just do that experiment one time, is obviously (?) not a well-designed experiment.


Leave a Reply

Your email address will not be published. Required fields are marked *