I promised this article a long time ago: I hope these well known calculations are enough to give the gist.
Randomized controlled trials are supposed to be the gold standard of experimental research. This is true. But it would be just as true if we said “controlled trials” without the “randomized.”
“Randomness” is not needed. “Random” trials are even the opposite of what we desire. Which is information and evidence. Since “randomness” means “unknown”, adding randomness decreases information.
There is just one reason, which I’ll reveal later, why “randomness” is important.
Suppose you’re designing a marketing, finance, or drug trial: anything where you have “treatments” to compare. Since everybody understands drugs trials, we’ll use that.
A controlled study splits a group of people in two; it gives one half drug A, the other half B. The results are measured, and if more people improve using A, then A is said to be better. How much better depends on what kind of probability model is used to measure the distance.
How to make the groups? If you need 100 folks, take the first 50 and give them A, the second 50 gets B. That sound suspicious to you?
Ordinarily, the groups would be split using “randomness”, usually computerized coin flips. “Randomness” is supposed to roughly equally distribute characteristics that might affect the drug. This being so, any difference in A and B is due to the drugs themselves and not to these unmeasured characteristics.
This view is false.
One measurable characteristic is sex. If men and women appear equally and uniformly (yet unpredictably) then the chance one group is filled with all men is 2-n, where n is the number of people in your study. If n = 50, the chance is about 10-16: that’s a decimal point followed by a lot of zeros. If n = 100, the chance is about 10-30. Pretty low.
All men in one group is an extreme imbalance. But if men were, say, 70% of one group, this would still be cause for concern. The chance of at least this sized discrepancy for n = 50 is 0.001; and for n = 100 is 0.000016. Quite a difference! While a discrepancy is still unlikely, it no longer seems impossible.
Of course, men and women don’t always show up equally and uniformly. Any departure from these ideals only makes the chance for a discrepancy between the groups higher. How much higher depends on what is meant by “unequally” and “non-uniformly.” However, the effect will always be substantial.
Very well. If we just measure sex, then the chance of a discrepancy is low. But now consider fat and skinny (say, divided by some preset BMI). We now have two groups to balance with our coin flips. The chance that we have at least one discrepancy (in sex or weight) doubles.
For n = 50 it is now 0.0026; and for n = 100 it is 0.000032.
OK, we measure sex and weight; add race (white or non-white). The chance of at least one discrepancy is higher still; about three to four times higher than with just one characteristic.
How many measurable characteristics are there? Can we think of more? Height, blood pressure, various blood levels, ejection fraction, and on and on and on. At least three to four dozen—call it 50-100—characteristics might be important and are routinely measured in medical trials.
You can see where we’re going. Every person is, by definition, different from every other person. There are enormous differences between people at every level, from physiological to cellular to genetic.
Eventually, of course, it becomes all but certain that there will be a discrepancy between the two groups. Whether you measure that characteristic is irrelevant. It—actually many—will be there. And recall: not all these characteristics will present equally and uniformly; they’ll be all over the place. Discrepancies, imbalances between groups, are always there.
“Randomly” splitting groups, therefore, does not, and cannot, “balance” groups.
But control can.
Control means taking those characteristics we think are important, and splitting the groups such that each receives an equal proportion. There will always be, even when we control, discrepancies between the groups we did not control, and one or some of them might be responsible for the results. But if we choose our groups carefully, then the chance of this is small.
While “randomness” cannot live up to its promise, there is one good reason to use it in medical trials. The human animal cannot be trusted. He will cheat, steal, trick, and lie, even to himself—especially to himself.
You can’t trust a man to pick who receives what treatment for fear he will game the system, maybe even unawares. You have to remove the decision to a disinterested authority. Like a non-human computer.
Or a statistician.