Randomization Isn’t Needed — And Can Be Harmful

Randomization Isn’t Needed — And Can Be Harmful

Had a named person in statistics (Andrew Althouse) ask me about randomization, which he likes, and which I do not. “I want to compare outcomes for a specific patient group getting surgery A vs surgery B (assume clinical equipoise). If I’m not going to randomize, how should I allocate the patients in my study so I am confident that afterward I can make inference on the difference in outcomes?”

Excellent question. My response, though it was unsatisfying to the gentleman, was “I’d have independent experts allocate patients, ensuring balance of (what are thought to be) secondary causes, where the panel’s decisions are hidden form trial surgeons. Try to inject as much control as possible, while minimizing cheating etc.”

Too terse to be believed, perhaps. I expand the answer here.

Control in any experiment is what counts, not randomization. For one, there is no such thing as “randomization” in any mystical sense as required by frequentist theory. Probability does not exist. Randomness does not exist. This is proved elsewhere.

What we can do is to create some sort of device or artifice that removes control of allocating patients from a man and gives it to a machine. The machine then controls, by some mechanism, which patients get surgery A and which B.

A man could do it, too. But men are often interested in the outcome; therefore, the temptation to cheat, to shade, to manipulate, to cut corners, is often too strong to be resisted. I’ve said it a million times, and I say it again now: every scientist believes in confirmation bias, they just believes it happens to the other guy.

There is also the placebo effect to consider in medical trials. If a patient knows for sure he is getting a sham or older treatment, if affects him differently than if he were ignorant. The surgeons must know, of course, which surgeries they are performing; thus it is impossible to remove the potential for fooling oneself here. The surgeons doing the sham or older surgery (which we can imagine is A) might slack off; when switching to B they might cut with vigor and renewed enthusiasm.

Now if some sort of “randomization” (i.e. allocation control) device that spit out A and B, 100 of each (Althouse later gave this number), it could be that all 100 As were female and all 100 Bs male. It doesn’t matter that this is unlikely: it could happen. Imagine if it did. Would you be satisfied in analyzing the result?

No, because we all believe—it is a tacit premise of our coming model—that sex is important in analyzing results. Why? Because sex, or the various systems biologically related to sex, tend to cause different outcomes, which include, we suppose, the surgical outcomes of interest here. We would be foolish not to control for sex.

Which is exactly why many trials “randomize” within sex by removing the control from the device and giving it back to some man, to ensure a good balance of males and females in the groups. This makes eminent sense: control is everything.

I don’t know what the surgery is, but it has to be something. Suppose it’s some kind of vascular surgery applied near or to the heart. We know there are lots of other causes, such as CHF, that might also play a causal role in the outcomes we’re tracking. If we’re sure of this, we would also “block” on CHF. That is, we would again remove control of the allocation device and give it to a man.

And so on for the other causes. We might not have the funds or time to explicitly control for all of these, in this physical allocation sense. But we might later include these in any model of uncertainty of the outcome. This is also called “controlling”, although there is no control about it. We’re just looking at things as they stood: we had no control over these other measures. (I wish we’d drop the misleading terminology. See this award-eligible book for a longer discussion of this.)

Enter Don Rumsfeld’s unknown unknowns. There may be many other causes, secondary or more removed (mitigators and so on), of the outcome of which we are ignorant. This must be so, or science would be at its end. How many such things are there in our surgery? We don’t know. They are unknown unknowns. There could be one, there could be ten thousand. The human body is a complicated organism: there are feedbacks upon feedbacks.

How will the machine allocator split these possible causes in the groups? We have no idea. It could be that the machine, like we imagined for sex, puts all or most of a dastardly cause in A and all or most of a beneficent cause in B. And this could go back and forth, and forth and back across all the other causes.

There is nothing we can do about this. They are, after all, unknown unknowns. But the mechanical allocator can’t somehow magically fix the situation such that an equal number of all causes are distributed in the groups. You don’t know what you’ll get. Worse, this ignorance is true, too, for the mechanical allocator for causes we know but don’t explicitly control for. “Randomization” is the experimental procedure of tossing darts and hoping for the best.

Notice closely, though, that the desire for uniform distribution of causes is sound. It is often thought “randomization” gives this. It cannot, as we have seen. But if it is so important—and it is—why not then control explicitly for the causes we know? Why leave it to “chance”? (That’s a joke, son.)

Consider this is precisely how physics experiments are done. Especially in sensitive experiments, like tracking heat, extreme care is taken to remove or control all possible known causes of heat. Except, of course, for the cause the physicist is manipulating. He wants to be able to say that “When I pulled this lever by so much, the heat changed this much, because of the lever”. If he is wrong about removing other causes, it might not be the lever doing the work. This is what got Fleischmann and Pons into such deep kimchi.

Return to my panel of independent experts. They know the surgeries and the goals of these surgeries. They are aware, as can be, of the secondary and other causes. They do their best to allocate patients to the two groups so that the desired balance of the known causes is achieved.

Perfection cannot be had. Panel members can be bought; or, more likely, they won’t be as independent as we liked. Who on the panel wouldn’t, deep in his heart, not like the new treatment to work? I’ll tell you who: the rival of the man who proposed the treatment. The panel might control sub-optimally. Besides all that, there are always the possibility of unknown unknowns. Yet this panel still has a good chance to supply the control we so rightly desire.

Randomization isn’t needed, does nothing, can cause harm, while blinding is often crucial and control is paramount.

Bonus Althouse also asked this (ellipsis original): “Your ‘expert panel’ has assigned 100 patients to receive A and 100 patients to receive B. 14 of the patients that received A died, 9 of the patients that received B died. Your statistical analysis is…what, exactly?”

He wasn’t satisfied (again) with my “Predictive analysis complete with verification.” Too terse once more. As regular readers know, if we cannot deduce a model from accepted-by-all premises (as we sometimes but rarely can), we have to apply looser premises which often lead to ad hoc models. These are the most frequent kind of models in use.

I don’t know what ad hoc model I’d use in this instance; it would depend on knowing all the details of the trial. There are many choices of model, as all know.

“That’s a cop out. Which model is best here?”

Glad you asked, friend. We find that out by doing a predictive analysis (I pointed to this long paper for details on his this works) followed by a verification analysis—a form of analysis which is almost non-existent in the medical literature.

I can sum up the process short, though: make a model, make predictions, test the predictions against reality.

Makes sense, yes?


  1. “every scientist believes in confirmation bias, they just believes it happens to the other guy.”

    Now that is beautiful. Lmao

  2. Stephen Senn

    “I’d have independent experts allocate patients, ensuring balance of (what are thought to be) secondary causes, where the panel’s decisions are hidden form trial surgeons. Try to inject as much control as possible, while minimizing cheating etc.” Oh dear. Yet another commentator on clinical trials who has never been involved with them and adheres to the freeze-dried microwave theory of clinical trials: just pop the patients in the microwave to thaw them out, I’m ready to start my trial.
    So here’s the dirty secret: you have to enroll patients when they present, to use the technical jargon. See myth 1 in , https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.5713.

    Then we have

    “Notice closely, though, that the desire for uniform distribution of causes is sound. It is often thought “randomization” gives this. It cannot, as we have seen. But if it is so important—and it is—why not then control explicitly for the causes we know? Why leave it to “chance”? (That’s a joke, son.)”
    Now read this https://errorstatistics.com/2013/07/14/stephen-senn-indefinite-irrelevance-2/ and start again.

    People always pick this up from the wrong end. You are trying to produce an analysis and a design that will satisfy not just you but others. Write down your model first. This defines a class of designs that are approximately equally efficient. If you refuse to choose one at random, I want to know why.

    Quite frankly, this blog inspires me to write something about global warming.

  3. Bruce Charlton

    Good ideas But…

    I spent quite a long time, maybe 15 years, clarifying the limitations of randomised megatrails ( e.g. https://link.springer.com/article/10.1186/cvm-2-1-002 ) and suggesting improvements (e.g. https://academic.oup.com/qjmed/article/90/2/147/1612946 ); but I eventually realised that the people who fund and conduct the trials are not in the slightest degree interesting in discovering the truth about treatment – their agendas are completely different, usually marketing and/ or career progression.

    So I gave up! Sooner or later you will do the same, I predict.

  4. Brad Tittle

    ” every scientist believes in confirmation bias, they just believes it happens to the other guy.”

    I think you just inflected an accent in there.. Are you related to AOC or Hitlary?


  5. Brad Tittle

    Wait, you were using the they/them pronouns with he/she verbs. You were trying to be hip!

  6. Stephen Senn

    The paper by Prof Charlton and others describing the PACE methodology cites a paper by Horowitz et al as showing that in some centres a beta-blocker given to patients suffering from MI was harmful. However, as Frank Harrell and I showed, the methodology of H et al was fallacious https://www.ncbi.nlm.nih.gov/m/pubmed/9253383/ : amongst other errors, the Gail-Simon test was used in a way that G and S spepcifically warned it should not be.
    When appropriately analysed, the BHAT trial that provided the data showed no evidence of centre to centre variation in effect.

  7. Name

    The question is basically how to analyze a non-randomized trial. As you’d know, these days most trials by large pharmaceuticals are randomized, but the question is still important for studies where the intervention or exposure cannot be randomized for ethical reasons or where it’s impossible to do so. The best approach would be to gather an expert panel on the condition being studied to discuss potential confounders and effect modifiers for the intervention/exposure prior to start of the trial. These should be stated explicitly in a protocol along with their rationale, either as a publication or in a database. During the trial, data relating to these potential confounders and effect modifiers would be collected. When the study is completed, the statistician would then adjust statistically for the confounders and if necessary check for statistical interactions or do subgroup analyses for the effect modifiers such as male vs female etc. which, as mentioned, were decided and published a priori. (b)Adjusting for confounders is good but still possibly biased because one cannot adjust for unknown confounders, that is why randomization will always be less biased and preferred wherever possible (b). It’s simple to verify how well a study population was randomized: it’s usually in table 1, Study characteristics where different potential confounders are listed for each study group and checked for a statistically significant difference.

    The study sample size needed to observe a statistically significant difference between groups would make it extremely unlikely that, for example, all males were allocated to group A and all females to group B, but block randomization by sex could prevent this from occurring. The best protection against uneven distribution of confounders including unknown confounders is a very large sample size. Failing this, one could pool data from individual trials which adjusted for similar confounders in a meta-analysis.

  8. Randomization has worked quite well in sampling of all sorts. Less overall error and less costs. If you want that, and your inferences to be valid, you randomize when appropriate (random sample from population and/or random assignment of treatment(s) to units). No mystical voodoo there, just utility. Randomization doesn’t always mean a free for all though, and the discussion of restricted randomization goes back 70 years – ie what do you do if you get scientifically unacceptable random arrangements. Can also randomly sample proportional to size, or a zillion other ways. Normal distribution theory works so well because it approximates the randomization distribution. I’d like to know how experimental design as a field can be so successful without randomizing.

Leave a Reply

Your email address will not be published. Required fields are marked *