I was wondering if you could help me as a layman grasp a problem I’m wrestling with. Say I have a large tray and pour a gallon of white paint in it. Then I ‘toss’ a quart of black paint on top of it. The end result looks like a work of modern art. Assume the white paint represents patients who won’t have heart attacks and black those that will. Imagine now we are color blind to complete the analogy. What sampling logic can I use to ensure ‘randomization’ for a clinical trial using the spilled paint as an analogy?
First, randomization is of no interest, except in those cases we worry about a person trying to fool himself, or others. This is why we have referees flip coins at sporting events, because it is presumed the results cannot be controlled. But we could just as easily have the referee decide without a coin. After all, we presume the referee is fair.
We fear doctors aren’t so fair, particularly when it comes to investigating their hot new treatment ideas. Like I’ve said a million times: all scientists agree confirmation bias exists, they just think it always happens to the other guy. Hence removing causal control of evidence selection.
Randomization is asked for in classical statistics, especially frequentism, because it is believed probability is ontic, and thus the randomization is adding something to an observation. If you select a randomization systematically, and not randomly, the observation has been “blessed”, as it were, and somehow it counts less. I am in agreement with Judea Pearl on this subject: cause is primary, not randomness.
None of that is very systematically thought out, but then neither is much in the philosophical aspects of frequentism. We can blame this on the math, which is too easy and beautiful.
Second, how to sample your paint? Well, to what end? Since all the black paints will have heart attacks, presumably we can’t save them, but we might like to find them to sell them cushions they can carry around for when they eventually keel over.
By definition, blacks are all over the place with no pattern that we can predict—and we can’t predict because we can’t identify all the causes of their dispersion. We know one part of the cause, the tossing, but the other aspects are a mystery. So we begin by believing the blacks would be anywhere. We do have the idea that 1 in 4 people (if we consider paint comprised as individual dots) are blacks.
If you’re going to now sample, you have to have a sampling mechanism, by which I mean you have to cause people to come into your net. On the assumption blacks could be anywhere, then it does not matter what you do: just grab people as they walk by your door. One in four, on average—and here we could compute this distribution exactly (which distribution changes dynamically, since we know the population size)—will be black.
But then there might be the idea that people are clustered together, in the sense that threads of blackness run through the connected blobs of white. Yet because we don’t know the structure, or the cause of the structure, of these isolated contiguous groups, it’s the same as believing the blacks could be anywhere.
Think of it this way (another example I use). If I tell you the evidence is “We have a 6-sided device that must be activated and upon each activated it can take one of 6 states only, labeled 1-6”, the probability of “6” is 1/6. But change the evidence to “We have a 6-sided device that must be activated and upon each activated it can take one of 6 states only, with some sides possibly more frequent, labeled 1-6”, the probability of “6” is still 1/6. Because we don’t have any information in the new evidence that allows us to change our probability.
We can’t push that analogy any farther, though. For the second set of evidence leads to a prediction question after the first point (first activation) is sampled. That’s not the same with the paint because we are not trying to predict geographically.
There is no reason to do any geographic sampling, because here we know some of that initial cause of blackness. By assumption, again, blacks could be anywhere. So we don’t need to be careful about setting up some kind of grid, or blocks, and sampling within these. Unless we do know, like in the picture heading the post. But this implies we know something of the cause of the placement of the blacks. If we do have that, then certainly we can use it.
Gist: if you can’t tell black from white by looking, then just grab whomever comes by, until you’ve collected enough evidence to be confident the predictions you make with your probability model will have skill.