Let’s drag the statistician’s hoary threadbare ball-filled bag out of the cupboard to make a point. In it are 3 white balls and 2 black.
Using the statistical syllogism, we deduce the probability of drawing out a white ball as 3/5.
We could in a similar manner deduce the probability of any proposition conditional on this evidence. Such as the mean of draws of size n (with balls replaced in the bag after each draw). Or variance, or whatever. The math to do this isn’t hard.
If you were in a hurry, or hated math, or just weren’t that good at it (and most of us aren’t), you could program a “simulation” that draws imaginary balls from imaginary bags, and then use these simulated results to estimate the probability of any proposition. It’s easy: count the number of times the proposition is true in each simulation and divide by the number of simulations.
What’s a “simulation”? Nothing but a proxy recreation of the causes that bring about effects. Here, we have to have a cause of picking out a ball, and a way of simulating this drawing. It’s so easy that I leave it as a homework problem.
The simulation is only a crutch, an approximation to the analytic answer which is easily had. The “randomness” of the simulation does zero—zip, zilch, nada—toward proving the goodness of the simulation approximation. Read these two articles for why: Making Random Draws Is Nuts, and The Gremlins Of MCMC: Or, Computer Simulations Are Not What You Think.
That the “randomness”, which only means “unknownness”, is thought to confer correctness of bootstrapping is why I say in the title “seems to work”. The scrambled nature of the simulations are not why results are sometimes decent. That can be explained much simpler.
In the balls-in-bag situation, we have the entire population of possible events; or, that is, we can readily deduce them. Either a black or white ball is drawn! (This is a deduction based on the assumption only balls are and necessarily are drawn.) For any n draws, we know exactly what the population of observables would be, in a probabilistic sense (binomial).
But we don’t always have the full population of events, or possible events, in hand. Consider the weight of each American citizen of at least 18 years old. There is a definite population, though if we wait too long some new people turn 18 and others die. So simplify this to those citizens alive on 4 July 2019.
If we could measure everybody, then we’d have the entire population again, in the probability sense. We could ask the probability of any proposition in reference to this population, and to get it we simply count. “What’s the probability a citizen weighs more than 200 lbs?” We count.
Getting populations can be costly, so we usually make do with a sample. Here we measure a fraction of citizen weights and ask the same question about probability over 200 lbs. We count again. This gives a correct probability conditional on our sample.
This works even if we measure only person, such as Yours Truly. Given my weight, what’s the probability a citizen weighs more than 200 lbs? We count. It’s 1, and the sample size is 1, so the actual deduced correct probability is 1.
It is obvious the following two probabilities can be different:
(1) Pr(Citizen > 200 lbs | Population) = p_1
(2) Pr(Citizen > 200 lbs | Sample of size 1) = p_2
What people hope is that
(3) Pr(Citizen > 200 lbs | Sample, which is Close to Population) ~ p_2
Yet since nobody knows what the weights are, we don’t know if the sample is close. Of course, we all have lots of prior information about weight, so that we know, conditional on that information, that this sample of 1 is not close. But it should also be obvious that
(4) Pr(Citizen > 200 lbs | Sample, My Information) != Pr(Citizen > 200 lbs | Sample).
By which I mean that if you judge a probability conditional on whatever information, this is a different probability than one not conditional on that information. Many look at probability statements and say “They’re wrong”, when in fact the statements are correct. What they’re doing is changing, in their mind, the conditioning information and coming to different answers. This is fine, as long as it is kept in mind that changing any condition changes the probability—and that no proposition has a probability in any unconditional sense.
What we are doing really when making judgments about this sample is this:
(3) Pr(Sample is Close to Population | My Prior Information) ~ 0.
We all know how to make the sample closer: increase its size. How much? Nobody knows. Not for sure, and this is because we don’t know what the population looks like. If we make up a guess of what the population looks like, we can use that guess to say how close a sample of a given size is to the population.
That is a different kind of bootstrap, in the plain-English use of that word. Statisticians call it the “sample size calculation”, which are always “cheats” like this. Think: if we knew the population, we don’t need to sample. If we don’t know the population, and can’t get it, we can sample, but we’ll never know, not for certain, when to stop such that the sample is close to the population unless we guess what the population looks like.
What we can do instead, in a state of true ignorance, is to begin collecting samples and then, say, plot a histogram (here of weights) of the sample. This will have a certain shape (one is pictured above). If we collect a larger sample and the shape changes only a small amount, or not at all, then we can use this to guess that the sample is “close” to the population. It’s only a guess, though, conditional on the hope that our sample is “representative”—a circular way of saying “close.” Hence, we have third kind of bootstrap.
Suppose we use this closeness trick. Then, after we have our sample, we can use the sample as in equation (3) to answer probability questions (by just counting). We’ll never know, not for certain, how far off we are, though. We can say how far wrong we are if we make guesses, or assume the guesses are true, about the population. Sometimes the math of these guesses is so elegant, some statisticians forget all the equations are all built on guesses.
But if the guess is correct, and experience shows it can be, then we have good estimates of probability questions of the population. We still have these estimates even if the guess is wrong! We just assume, as we must if we insist on getting a numerical answer, that our guess is good. Insisting on getting a numerical answer is what accounts for much over-certainty.
Now what? Suppose besides just weight, we also measured on each person the presence of at least one Y chromosome. Then we’d have two pictures of weights, one for those with Ys and one for those without. The pictures themselves are not necessary, of course; we could just order each set of weights. The pictures are for visual inspection only.
Then we might ask, “What is the probability the weights of Ys are different than non-Ys?” This probability will be 1 if they are different, and 0 if not. They are different if the two pictures don’t match up exactly. This is a true statement for the sample, and it’s conditionally true for the population assuming our guess the sample is close to the population is good. You can call this a “test” if you like, but it’s just counting.
We could instead ask “What is the probability the mean of weights of Ys are different than non-Ys?” Same answer for the same reason. Just look. Any difference is a difference.
But that’s a different question than “What is the probability the mean weights of Ys are different than non-Ys in the population?”
We cannot answer that, not without resorting to our bootstrap of the second kind, which is to put some kind of number on what the population looks like. And if we knew that, we wouldn’t have to ask the question.
If we assume the sample is close to the population, then other samples will be close to this sample, and then we can compute, using the two pictures, the probability statistics like “The mean of the Ys minus the mean of the non-Y” are less than 0. Or whatever.
This is what is formally called “the bootstrap“. Procedures using it won’t just use the pictures as they stand, and just count. They’ll instead use that “random” idea and make simulations. The idea is that the pictures can be used to make new samples, in just the same way that drawing out new balls from our bag makes new samples. We pull off observations from the sample pictures, i.e. from the bags of Ys and not-Ys, of the same size as the original sample. (See the Homework below.)
There are all kinds of proofs that show that this “works”—in the limit. Which is to say, when all evidence at the end of time has been accumulated. But it only works in the small when it turns out our guess of the closeness of the sample to the population is correct.
Incidentally, every real-life situation is finite and thus has a population. No real series goes on actually to infinity; not that we can measure.
Well, if the samples really are close to the population, then we needn’t do any of this “randomization.” All we have to do is count. And if the sample isn’t close, then we wouldn’t know it. Not unless we knew what the population looked like. But if we knew that, then we wouldn’t have to sample.
In other words, there’s lots more uncertainty in these situations than is commonly heard of.
Here’s a data set to play with, which we’ll assume is our sample.
library(ggplot2) library(car) data(Davis) x=Davis x$weight = x$weight*2.205 # change to non-barbarian unit of lbs ggplot(x) + geom_density(aes(x = weight, fill = sex), alpha = 0.9)
A density plot is one way of many to show the picture of values. If you do it, you’ll see the bump out to the right is a non-Y (here Y = M and non-Y = F). Anyway, the weights are obviously different. I leave for you to compute the means and its difference.
Now a simple bootstrap, without any of the nice frequentist properties about which we do not care, is to compute two new simulated samples by “drawing out” values with replacement from the two groups, such that you have two new sample groups of the same size. Compute the difference in means for this new sample. Then repeat for, say, 1,000 times. You’ll have a sample of mean differences.
Explain just what you’d do with this creation.
Compute the raw-counting probably a Y-citizen in the population is heavier than a non-Y citizen. Explain the difficulties of this.
To support this site using credit card or PayPal click here