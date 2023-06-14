JJ Couey, who hosts a podcast well known to some of you, and friend of the Broken Science Initiative, asked me about the so-called Law of Large Numbers. Jay wonders about sample sizes in experiments, or in observations, and says:
I really want to understand why bigger N doesn’t give you better results. If one just googles this idea, the law of large numbers is forced into your mind.
Short answer: the “law” applies to probability models, and not Reality, for nothing has a probability.
Long answer:
Let’s do the “law”, which I put in scare quotes one last time, not to emphasize its falsity, for it is true in its place, but to remind us that it applies to models, and has nothing to say, directly, about Reality.
The law has a strong and weak form. Both suppose that there exists an infinite sequence of something called “random variables”. Since this sequence is infinite, we already know we are talking about math somehow, and not Reality per se.
The idea of “random variables” is often made complex, but they are quite simple. Since “random” means unknown, and “variable” means takes more than one value, a “random variable” is an unknown value. The amount of money I have in my pocket now is, to you, a “random variable”, because that amount can vary, and you don’t know what it is. To me it is neither random nor variable, since I know it. Simple, as promised.
We can use math to quantify some probabilities. (Not all: ask me about this another day.) So we might suppose the uncertainty we have in our infinite sequence of random variables can be quantified by some parameterized probability distribution, which, in the law, has a parameter to represent it’s “expected value”. Which is sort of a center point (and a value that is not always expected in the English sense).
If we sum up a finite number of variables of this sequence, and divide by that finite number of the sum, that finite average converges in value to that parameter, in a mathematical sense, at infinity, which is a long way away.
That’s it! That’s the law.
Really, that’s it. There is a weaker version of the law, which uses that epsilon-delta business you may recall from your calculus days, to prove the same kind of thing. Mercifully, we’ll ignore this, since the differences are not important here.
Example? Before us is a device that may, and must, take one of m states, where each state has a cause of which we are ignorant. What is the probability, based on this evidence, and only this evidence, that the device is in state k? Where k is any of 1, 2, …, m. Right: 1/m.
The “expected value” is calculated by summing the value of each state multiplied by it’s probability. For this example, it’s 1 x (1/m) + 2 x (1/m) + … + m x (1/m) = [m x (m + 1)] / [2 x m]. If m = 6, then the expected value is 3.5. Which may look familiar to you.
So that, in math, an average of an infinite string of values of our mystery state must converge to [m x (m + 1)] / [2 x m].
In math, I say. What about a real device? One in Reality? Who knows! It may do anything. It may converge, it may diverge, it may catch fire and blow up. It may get stuck in one state and remain there forever. It may oscillate in some pattern we did not anticipate. It may do anything. Except operate indefinitely long.
The law is a model. Reality is on its own.
The law’s value to Reality is therefore the same as with any model. The closer that model represents Reality, the better the model will be. Reality, though, is not constrained by the model. Reality doesn’t know the model exists, because it doesn’t. The model is purely a product of our fevered imaginations. And nothing more. Which doesn’t make it bad—or good. It makes it a model.
Now some will say the law proves that as samples grow, the closer the mean of the same comes to the population mean. Pause a moment before reading further, for there is a flaw here.
The flaw is: who gets to choose the samples? Why are they samples? Why are these, and not other, observations collected together? Aha. Assumptions about cause are already being made in Reality! That’s how we pick samples in the first place. We pick things which we think have some of the same causes.
We’ve already seen coin flips don’t have a probability (nothing does) and can be caused to come up heads every time, if the causes are controlled.
That’s the first problem, that tacit or implicit knowledge of cause (in any of its four forms). Which people forget about. Causes may exist in the same way for each sample, or they may differ. It depends on how you take the sample. Which is why increasing a sample doesn’t necessarily bring you any closer to understanding what is going on. To do that, the better you control the causes, knowledge of which you (might) gain in taking the samples, the better you’ll do.
The second problem is that law says nothing about finite samples. But we don’t need the law to prove mean of samples from a finite population converge to the population mean in Reality. That is obviously true.
Suppose you want the average age of your friend’s sons. He has two. The first sample gives 5, which is the mean of one sample. The second sample gives you a mean of 7, which is also the population mean! Convergence has been reached, as expected. No law of probability was needed.
People often complicate these things by thinking immediately of hard cases, and go really wrong by adding the idea that probability has existence. Which it doesn’t.
The final answer of why increasing N doesn’t improve results is because cause is what we’re after, and cause is difficult. And Reality isn’t a model.
The law of large numbers was a long explanation that may not have cleared things up for everybody. Let’s say we have a population of something, say hummingbirds in a particular location, and we want to know their average bill length (for any number of reasons). If we randomly sample 2, 5, 10, 20, 50 and ALL individuals of the population and calculate their averages, we’ll see that the average of the samples gets closer to the true average (ALL individuals) as the sample size gets larger, and, if the variable (such as this one) is normally distributed, 20-50 samples are usually sufficient to give a decent estimate of the true average.
Of course, the many caveats you stated are also correct. Samples need to be appropriately sampled, the assumptions of the question have to be clearly delineated and so on. And, we have to understand the implications of the “law.” For example, when I was a young graduate student, I learned about one of the flaws of the idea when I was doing a correlation analysis. First I did the analysis without plotting the data, and I was excited to see that I had a very significant correlation! I was so excited, P < 0.000001 (or some such nonsense), and so I plotted my results and learned my lesson. I had TOO MANY samples. My correlation was extremely weak, and so "explained" nothing (I put it in quotes because correlations seldom explain anything) and my figure was essentially a circle of points. In fact, I learned two lessons. A large sample size may be meaningless, which leads to the fact that a small P-value may also be meaningless.
But, in ecology, it is good to understand the law of large numbers, because it allows us to do a power analysis and see if we need TOO large a sample to be economically or logistically possible, then we know we don't need to, or shouldn't, do the study. In general, we find that a sample size of 20 – 30 (per treatment) in relatively simple questions is sufficient, but in bad science, no matter how large the sample is, or how small the P-value is, GIGO remains true (GIGO – garbage in, garbage out).
OK now I’m confused. Say we are dealing with a manufacturing process with many elements combined to produce a product. Each element has its own error of production that can be accurately measured and characterized by some distribution.
Each of these is combined, let’s say, some in series and some in parallel and our objective is to ensure that the final product consistently meets a certain range of quality, whether or not an element has gone out of calibration and to know the likely number of defectives. We produce 100 million of these objects per month.
Are you saying there is nothing random in this process or that there is no such thing as random variables? This technique is the foundation point of almost all manufacturing processes and it is based upon a fluctuation of error that is random, and the law of large numbers, as far a I can understand.
Larger samples do not get ‘closer’ to the mean. The very first measurement may have been dead nuts ob. It is our confidence in the closeness that improves.
However, as indicated, the drawback is that a larger sample may introduce additional variables. For example, one set of data a client asked me to look at fell into two distinct parts when I stratified it by inspector. In another case, a dispute with a vendor turned out to be differences in the design of the instruments used in our respective companies,
Another problem with large samples closely estimating the population mean is that the population may not have a central tendency at all. After all, the average human being has only one testicle — and one ovary. Most populations I had to deal with were actually mixtures of statistical populations. This might be a lateral mixture (e.g. parts from 32 mold cavities mixed together) or longitudinal (e.g. parts measured by Adam and Betsy on different days). That’s why the size of the sample matters less than how it was selected.