Last night was spent at the storied Chapter House, for our annual first-night ceremony of Deliberate Numbness Induction, much needed after eight straight hours of unmitigated statistical theory. The only weakness to this procedure is that we all have to get up next day and sit through another full eight hours, all the while knowing that no relief is on the way, with vivid awareness that no new ceremony awaits us at day’s end.

Thus, the lectures on Day 2 are somewhat more random. Which is appropriate, since randomness is the topic.

It is often read that the “best” kind of experiments are those that are “randomized”, indeed “randomization” is often required before certain government agencies will deign to examine your results. The key words are *randomized controlled trials*.

What is important is control, not “randomization”, which adds nothing and might even cause harm to an experimental setup. Rather, “randomization” does not provide what users of it think it provides—which is what? If any answer to this question is given, it is usually that “randomization” provides “balance” to an experiment. But this is false. It causes, or can cause, imbalance.

Suppose you want to split a group of patients in two, one set to be given a new drug, the other a placebo. *Controlling* the experiment so that you have (roughly) equal number of patients in each group is easy. But how do we make it so that all the biological, physical, and time- and location-based characteristics (or confounders) of each patients is balanced between the two groups? For example, we wouldn’t want to give all the men the drug and all the women the placebo.

Why wouldn’t we? Well, because we suppose, based on our knowledge of medicine and physiology, that men and women react to drugs differently, and thus we are apt to be misled by results which do not divide the drug and placebos equally among the sexes.

So how about we flip a coin (perhaps a computerized coin) so that there is an equal chance each patient gets the drug or placebo? That won’t work, because it could happen that the coin flips push more men to the drug (or placebo) group. I mean, we could have “heads” a dozen times in a row. It’s unlikely, sure, but it could happen. And considering that we’re not just doing this experiment, but many thousands like it every year, then that “unlikely” event is almost sure to happen.

Of course, we could *control* our patients so that (roughly) equal numbers of men and women get the drug and placebo. Then, just among the men, and separately among the women, we could “randomize” so that all those other possible confounders are, inside each sex, “balanced.”

But this doesn’t work. Each patient has an innumerable array of possible confounders. Hoping that any set of “random” coin flips will land such that all these confounders are balanced between the two groups will be in vain.

Of course, for those confounders we can identify (and measure), we could control the splitting of the groups along them. Thus, control is important, even crucial (as experiments in physics, for example, acknowledge).

“Randomization” thus does not grant legitimacy over an experiment, and it will likely lead to imbalances. What is important is control, the stricter the better.

This proof, for those who want to think more deeply about the subject, is old and well known and can be written up in a lovely, and fully convincing, mathematical form. The proof is acknowledged, but resisted by those who hold with the relative-frequency view of probability because it is axiomatic to them that only “randomized” results can be analyzed. It is “randomization” that turns ordinary measurements into probabilistic measurements, which can then be input into those cookbook formulas statistics students know and love.

To those (like your author) who hold with the objective, logical, or Bayesian form of probability, any evidence or observation can be analyzed, because to them *random* is merely a synonym for *unknown*. And probability is just quantification of the unknown.

Where does that leave exact tests, which are based on randomization (after appropriate physical controls, matching, stratification etc. are implemented)?

Would it be more correct (or at least

ascorrect) to say that if the groups responded differently we wouldn’t know if it was due to the drug or due to the gender?Isn’t the ideal and unobtainable objective to have two identical groups? If, for example, the test subjects were all identical twins with one of each pair assigned to each group there would be maximum “control” and minimum “randomness” in the group creating process. Rats used in research aren’t found randomly in a bad part of town but are inbred and raised at great cost to make groups identical.

If we assume there is only one unobserved characteristic that could be confounded with our experimental treatments, then it’s easy to show that randomization greatly reduces the chance of said confounding–it’s just a clever use of the hypergeometric distribution. The obvious “catch” is that we’re doing this for UNOBSERVED variables, so by their very nature we don’t know how many of them exist, and I fear that randomization is too feeble to avoid confounding when there are several potential confounders lurking within the subjects (sounds like a great project for some undergrad stats majors!).

So why is randomization such a staple of experimental design? I suspect–off the top of my head, without a literature review–that it’s a clever dodge to avoid selection bias on the part of the experimenter, who is, after all, the most prominent confounder/elephant in the lab. If so, it’s motivation isn’t really about confounding at all, it’s about blinding.

Grr! “…its motivation..”