JJ Couey, who hosts a podcast well known to some of you, and friend of the Broken Science Initiative, asked me about the so-called Law of Large Numbers. Jay wonders about sample sizes in experiments, or in observations, and says:

I really want to understand why bigger N doesn’t give you better results. If one just googles this idea, the law of large numbers is forced into your mind.

** Short answer**: the “law” applies to probability models, and not Reality, for nothing has a probability.

*Long answer:*

Let’s do the “law”, which I put in scare quotes one last time, not to emphasize its falsity, for it is true in its place, but to remind us that it applies to *models*, and has nothing to say, directly, about Reality.

The law has a strong and weak form. Both suppose that there exists an *infinite* sequence of something called “random variables”. Since this sequence is infinite, we already know we are talking about math somehow, and not Reality *per se.*

The idea of “random variables” is often made complex, but they are quite simple. Since “random” means *unknown*, and “variable” means *takes more than one value*, a “random variable” is an *unknown value*. The amount of money I have in my pocket now is, to you, a “random variable”, because that amount can vary, and you don’t know what it is. To me it is neither random nor variable, since I know it. Simple, as promised.

We can use math to *quantify* some probabilities. (Not all: ask me about this another day.) So we might suppose the uncertainty we have in our infinite sequence of random variables can be quantified by some parameterized probability distribution, which, in the law, has a parameter to represent it’s “expected value”. Which is sort of a center point (and a value that is not always expected in the English sense).

If we sum up a finite number of variables of this sequence, and divide by that finite number of the sum, that finite average converges in value to that parameter, in a mathematical sense, *at infinity*, which is a long way away.

That’s it! That’s the law.

Really, that’s it. There is a weaker version of the law, which uses that epsilon-delta business you may recall from your calculus days, to prove the same kind of thing. Mercifully, we’ll ignore this, since the differences are not important here.

Example? Before us is a device that may, and must, take one of m states, where each state has a cause of which we are ignorant. What is the probability, based on this evidence, and *only* this evidence, that the device is in state k? Where k is any of 1, 2, …, m. Right: 1/m.

The “expected value” is calculated by summing the value of each state multiplied by it’s probability. For this example, it’s 1 x (1/m) + 2 x (1/m) + … + m x (1/m) = [m x (m + 1)] / [2 x m]. If m = 6, then the expected value is 3.5. Which may look familiar to you.

So that, in math, an average of an infinite string of values of our mystery state must converge to [m x (m + 1)] / [2 x m].

In math, I say. What about a *real* device? One in Reality? Who knows! It may do anything. It may converge, it may diverge, it may catch fire and blow up. It may get stuck in one state and remain there forever. It may oscillate in some pattern we did not anticipate. *It may do anything*. *Except* operate indefinitely long.

The law is a model. Reality is on its own.

The law’s value to Reality is therefore the same as with any model. The closer that model represents Reality, the better the model will be. Reality, though, is not constrained by the model. Reality doesn’t know the model exists, because it doesn’t. The model is purely a product of our fevered imaginations. And nothing more. Which doesn’t make it bad—or good. It makes it a model.

Now some will say the law proves that as samples grow, the closer the mean of the same comes to the population mean. Pause a moment before reading further, for there is a flaw here.

The flaw is: who gets to choose the samples? Why are they samples? Why are these, and not other, observations collected together? Aha. Assumptions about cause are already being made in Reality! That’s how *we* pick samples in the first place. We pick things which we think have some of the same causes.

We’ve already seen coin flips don’t have a probability (nothing does) and can be caused to come up heads every time, if the causes are controlled.

That’s the first problem, that tacit or implicit knowledge of cause (in any of its four forms). Which people forget about. Causes may exist in the same way for each sample, or they may differ. It depends on how you take the sample. Which is why increasing a sample doesn’t necessarily bring you any closer to understanding what is going on. To do that, the better you control the causes, knowledge of which you (might) gain in taking the samples, the better you’ll do.

The second problem is that law says *nothing* about finite samples. But we don’t need the law to prove mean of samples from a finite population converge to the population mean in Reality. That is obviously true.

Suppose you want the average age of your friend’s sons. He has two. The first sample gives 5, which is the mean of one sample. The second sample gives you a mean of 7, which is also the population mean! Convergence has been reached, as expected. No law of probability was needed.

People often complicate these things by thinking immediately of hard cases, and go really wrong by adding the idea that probability has existence. Which it doesn’t.

The final answer of why increasing N doesn’t improve results is because cause is what we’re after, and cause is difficult. And Reality isn’t a model.

*Subscribe or donate to support this site and its wholly independent host using credit card click here*. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.

Categories: Statistics

The law of large numbers was a long explanation that may not have cleared things up for everybody. Let’s say we have a population of something, say hummingbirds in a particular location, and we want to know their average bill length (for any number of reasons). If we randomly sample 2, 5, 10, 20, 50 and ALL individuals of the population and calculate their averages, we’ll see that the average of the samples gets closer to the true average (ALL individuals) as the sample size gets larger, and, if the variable (such as this one) is normally distributed, 20-50 samples are usually sufficient to give a decent estimate of the true average.

Of course, the many caveats you stated are also correct. Samples need to be appropriately sampled, the assumptions of the question have to be clearly delineated and so on. And, we have to understand the implications of the “law.” For example, when I was a young graduate student, I learned about one of the flaws of the idea when I was doing a correlation analysis. First I did the analysis without plotting the data, and I was excited to see that I had a very significant correlation! I was so excited, P < 0.000001 (or some such nonsense), and so I plotted my results and learned my lesson. I had TOO MANY samples. My correlation was extremely weak, and so "explained" nothing (I put it in quotes because correlations seldom explain anything) and my figure was essentially a circle of points. In fact, I learned two lessons. A large sample size may be meaningless, which leads to the fact that a small P-value may also be meaningless.

But, in ecology, it is good to understand the law of large numbers, because it allows us to do a power analysis and see if we need TOO large a sample to be economically or logistically possible, then we know we don't need to, or shouldn't, do the study. In general, we find that a sample size of 20 – 30 (per treatment) in relatively simple questions is sufficient, but in bad science, no matter how large the sample is, or how small the P-value is, GIGO remains true (GIGO – garbage in, garbage out).

OK now I’m confused. Say we are dealing with a manufacturing process with many elements combined to produce a product. Each element has its own error of production that can be accurately measured and characterized by some distribution.

Each of these is combined, let’s say, some in series and some in parallel and our objective is to ensure that the final product consistently meets a certain range of quality, whether or not an element has gone out of calibration and to know the likely number of defectives. We produce 100 million of these objects per month.

Are you saying there is nothing random in this process or that there is no such thing as random variables? This technique is the foundation point of almost all manufacturing processes and it is based upon a fluctuation of error that is random, and the law of large numbers, as far a I can understand.

Larger samples do not get ‘closer’ to the mean. The very first measurement may have been dead nuts ob. It is our

confidencein the closeness that improves.However, as indicated, the drawback is that a larger sample may introduce additional variables. For example, one set of data a client asked me to look at fell into two distinct parts when I stratified it by inspector. In another case, a dispute with a vendor turned out to be differences in the design of the instruments used in our respective companies,

Another problem with large samples closely estimating the population mean is that the population may not have a central tendency at all. After all, the average human being has only one testicle — and one ovary. Most populations I had to deal with were actually mixtures of statistical populations. This might be a lateral mixture (e.g. parts from 32 mold cavities mixed together) or longitudinal (e.g. parts measured by Adam and Betsy on different days). That’s why the size of the sample matters less than how it was selected.

As if the LLN is alive and there is a true meaning behind its words. LOL.

The LLN is a theorem with premises and conclusions. It is not one-size-fits-all. In the proof of the theorem, the probability depends on the sample size and the first and second moments, which also gives hints as to why the LLN may or may not work in real-life practice.

In real life, we usually cannot know whether a sample size of, e.g., N = 100 would yield a better result than N = 105. Also, the criterion of being better is another story. What makes up a large sample size? However, one is more likely to make decisions based on the results from a larger N.

To pinpoint reasons that a larger N doesn’t give you a better result or doesn’t work, one would need to know the details of the application. For example, the majority of the observed annual hurricane losses in Miami are zero. Actuaries recognize that the sample average is most likely to be less than the so-called theoretical mean of the loss distributions.

Now, what does it

reallymean to say‘the law of large numbers is forced into your mind’?In the frequentist interpretation the Law of Large Numbers is basically a tautology. Certainly it is in the binary Bernoulli case.

Say you have a random variable X which has a 50% chance of being 1 and a 50% chance of being 0. To begin with, since it can only take on a value of 0 or 1, then if we observe the variable n times and see a (mean) average of A, then it follows that there were nA 1’s that occurred and n – nA 0’s. That is, the average completely determines the number of times that each outcome was observed. And of course knowing how many times each outcome was observed will tell us the average of the observed values.

Now in the frequentist interpretation when we say that the random variable has a 50% chance of being 1 and a 50% chance of being 0, we mean that if we keep track of the number of times that a 1 was observed and divide it by the number of observations n, then as n goes to infinity the term should approach 1/2. But for this particular variable that term is identical to the average, which means that by the very definition of the variable the average must approach the “expected value” of 1/2.

For other distributions the situation is trickier since there could be more than two outcomes, meaning that the average doesn’t determine how many times that each outcome can occur (and this gets particularly tricky with a continuous distribution.) But the overall idea is that the amount that each outcome can occur as we approach the limit is forced to converge by definition, which in turn forces the average value to converge. (In the finite case this is basically just the sum rule limit law.)

So the law of large numbers comes down to how much you trust your definition of random variables, and as our host has shown us time and time again there are plenty of reasons to be skeptical there.

Are you saying there is nothing random in this process or that there is no such thing as random variables? This technique is the foundation point of almost all manufacturing processes and it is based upon a fluctuation of error that is random, and the law of large numbers, as far a I can understand.If by “random” you mean “unknown causes” then certainly there is a great deal that is random in these processes. If by “random” you mean “uncaused” or “a Platonic mathematical variable forced reality to become closer to its distribution” then no, there is nothing random about these processes. Every error is due to some reason, and these reasons are varied. That’s what leads to the fluctuation. But since the causes are varied and usually unknown, they are “random” for the purposes of the analysis (which necessarily means that what counts as “random” is dependent on our knowledge.)

I find it bizarre how many people insist that “random variables” must have an actual existence in the wild, as if they are something you see every day. Have you ever seen a “random variable”? The only way that their existence can make sense is in terms of a Platonic realm of ideals, but most people who believe in them (in reality, not as mathematical abstractions) are not Platonists.

This is like criticizing people for what they don’t say or cannot do.

The case when sampling from a finite population

without replacementis another area of study. It is still of interest to study the difference between the sample and population means since resources and time may be limited in reality. The LLN is not applicable in this case.Of course, in the case when sampling from a finite population without replacement, there are other explanations as to

why bigger N doesn’t give you better resultsFor those talking about “random variables,” there seems to be an important misunderstanding. Saying random variable in itself is misleading. Variables are never random. However, samples of variables can be randomly selected, and that’s what that is supposed to mean. Randomly selected samples of variables chosen by the researcher. We don’t have to agree with the researcher or the conclusion, but we should be able to agree that a sample can be chosen randomly. Also, in this case, random really means independent. That is, the value taken by a sample should be independent of that in any other sample such that the value of one does not in any way influence the value of any other sample. In that case, we can say that one of the sample values is random with respect to the other sample values. And that is what people mean when they say when they ambiguously say “random variable.”

So, criticizing the term “random variable” by misunderstanding it is not really criticism.

James Roper,

I doubt if you are able to fill in any of those definitions in a practical and logical consistent manner.

For example, suppose that I call 200 people and ask them how strongly they agree with a certain political statement on a scale of 1 to 5. What is necessary to make this a “random selection?” What is necessary for the responses to be “independent?”

For example, suppose that I have a phone directory and I simply call the first 200 people. Is that random? Suppose I roll a d20, each time reading n names down the list and calling that person, where n is the roll of the die. Is that random? Relating back to the topic of the article, will following one process lead the law of large numbers to work with me and the other will not? Why?

As for independence, suppose that I end up calling a domineering mother at her house, and her weak willed son at his apartment. The son shares all his political views with his mother and gives me the exact same answers. If of course do not know anything about this. Are the results still “independent?” Obviously with the either the definition that you gave or the one used for random variables it isn’t; the mother’s response entirely fixes the son’s response and vice versa. But in any survey of this type we are likely to poll people whose opinions are related in some fashion, even if it’s not as strong as this example. So how practically do we deal with this, if “independence” is necessary for our statistical methods to work?

And of course these issues are not unique to polling. You can create similar examples with inspections of a production line. There will always be causes relating events which are unknown to the inspector. Perhaps samples A, B, C, and D were all assembled by the same employee, while the others were not. So these are not “independent.” Perhaps some of the samples used a material that had a defect missed by the supplier and the others did not. Some relation between samples like this will always exist, and the inspector can never know all of them (in fact, if he did know all of them he wouldn’t have to do the inspection.)

Thus how can “independence” exist in any practical way in reality? What is your working definition?

All,

Been about a month now. How do you all think my blog-version of ChatGPT, which I assigned the personality “Roper”, is working out?

Rudolph asked: “For example, suppose that I call 200 people and ask them how strongly they agree with a certain political statement on a scale of 1 to 5. What is necessary to make this a “random selection?” What is necessary for the responses to be “independent?””

And I can indeed answer. The assumption of that kind of study is that the 200 people were randomly selected (that assumption may be wrong, but that’s a different problem). It’s an assumption, and until you demonstrate that it was non-random, the statistics just goes with the assumption. And, if random, the responses are independent. Why is that so hard? And that part is NOT part of the large sample size issue.

And that chatGPT thing is pretty funny!

Briggs, your blog-version of ChatGPT, i.e., a chatbot trained based on the data/information from this blog, would turn out to be more like Harrier than Roper. lol.

I would get more direct answers to my questions with ChatGPT. They would be lies, but they would at least answer my questions.

Saying that a random selection means that you randomly selected people is something else.

“and, if the variable (such as this one) is normally distributed”And one knows a variable is normally distributed……how exactly?

I particularly found this amusing:

“If we randomly sample 2, 5, 10, 20, 50 and ALL individuals of the population and calculate their averages, we’ll see that the average of the samples gets closer to the true average (ALL individuals) as the sample size gets larger”This isn’t reality you are describing, Roper. You never know even the fraction of the population

randomlysampled, nor even the fraction selected by any method.Yancey, in response, remember, WE select the samples, and the math ONLY requires (in some statistics) that the resultant variable be reasonably close to normally distributed. It doesn’t have to be perfect, and there are many ways to test the distribution. But, that is another thing about the law of large numbers – when the sample size is too small, the test for normality is not very accurate, but as the sample size gets larger, it is easier to clearly test. This is not an issue, and it only explains the LLN idea a bit better.

Yancey, it is unimportant whether we know what the fraction of the population sampled was. But also remember, out of context, ambiguity makes it difficult to understand. In any case, we’re talking about the law of large numbers, and the other issues weren’t the point (yet, perhaps).

Rudolph Harrier, exactly what was the problem with my answer? I said that randomly selecting samples was to ensure independence among those samples. What is wrong with that? And, what does that have to do with the question about the LLN?

All,

Random samples does nothing. It does not ensure independence. It is pure voodoo mysticism.

See Uncertainty.

And don’t feed the trolls.

My old boss, Ed Shrock, always said that random samples were useless for anything but eliminating bias in selection; e.g., always picking you sample from the same cavity. Far better to stratify your sample according to factors thought to be potential causes. Sample from each cavity, or from each operator. Pull part of the sample from the top, middle, and bottom pf the shipping carton. [One company, puzzled why defective parts showed up in production but were never found in incoming inspection, changed the incoming inspection prosedure as follows: 1) Turn carton upside down; 2) open bottom flaps; 3) take sample. Lo! all the defects were ‘salted’ in the bottoms of the shipping boxes, counting on customer’s inspection to pull from the more easily accessible top layers.

Random variation’means variation due to many causes, no one of which is dominant.Assignable variationotoh can be ‘assigned’ to a particular cause. Compare the effort to find ‘the’ cause of a pair of standard dice coming up 12 against the effort to determine the cause of a 13. It is not that the 12 has no cause, but that it has many causes, and it would be fruitless and uneconomical to try to identify and control any one of them. Another 12 could always recur due to some other combination of causes. A 13 otoh suggests something isnot the way it usually is,and must be set right.The 12 can be eliminated if we

redesign the process, most eleganty by whiting out one dot on one of the 6s. This changes thecause systemBriggs, to say so is to tell your readers at all the statistical analysis that you have done in your book

Uncertaintyare wrong.Rudolph Harrier, it’s only fair that you let us know what “a random selection” means.

Ye Olde Statistician,

That is a good example of how to catch crooks using statistics and how to possibly and efficiently use extra information. i.e., potential causes, to draw better inference, the latter of which is one of the goals in statistical research.

My rice cooker is pretty much useless for anything but making rice. It may not cook perfect rice when the operator does not do it right or adjust the water amount when located at high altitude. It cannot bake a cake or deep-fry chicken wings. However, simply, it does what it is designed to do. An important task for literally billions of Chinese, at least.

So, simply, random samples help reduce the bias. Even in a stratified sampling, random samples are taken within each strata.

My point, sometimes, different wording does give a different impression.

I don’t think that “random selection” is a coherent concept. Most textbooks will say something like “a (uniformly) random selection means that no member of the sample is more likely to be chosen than any other” while the average Joe on the street might say that “random selection” means that you have no way of predicting which things will be picked. But of course every selection method is going to favor some members above others (i.e. the ones that it actually picks.) These ideas only make sense in the context of selecting from some abstract mathematical set an infinite number of times, but of course that never happens in reality. So on a practical level, it isn’t a coherent concept.

Nor is it a particularly useful one. What everyone is after is a

representativesample. There’s an Isaac Asimov story where a supercomputer is able to find one person in America who can be interviewed to perfectly determine the opinions of everyone in the whole country. So rather than having people vote, they just have the computer ask that one person questions. A more concrete example would be finding a one person sample who completely agrees with the mean. A poll that contacted that one person would be just as good as contacting everyone, and much quicker. But of course in reality there usually isn’t such a member of the sample, and even if there was we would not know how to find it.Still, we would like to get as close as possible. If we can only survey 200 people then we would like to survey those whose averages most closely approximate thee average of the whole population. If we do not know anything at all about the population, as we often don’t, then picking any 200 arbitrary people is as good as any other selection. In that way a “random” selection (really meaning “arbitrary,” i.e. chosen without any particular intention) could work. But this has nothing to do with “randomness” being good; we really want representative samples.

Seems that the idea of randomness confuses people. The whole idea of random is ONLY that there is no directed selection going on. That is, samples are chosen independently of each other and of any preconceived notion of what is being tested. Sure, like any human endeavor, it can be incorrect. But, the researcher should explain how samples were chosen, and if a reader doesn’t like it, they can write a rebuttal. And, as far as a “representative” sample goes, that is defined in the study as the “universe” of interest to the researcher, which should also be clearly written. We may not agree with the researcher, but if they did their best to have unbiased samples chosen from the correct subgroup of whatever they’re interested in, then the results should be reasonable (assuming all else is also correctly done). Isn’t that true of all human endeavors, just about?

Also, clearly, without a particular question in mind, harping on the utility of “random sampling” is meaningless. If you’re studying where pikas nest, all of your samples will be in places that pikas CAN nest – and not just randomly distributed across the mountainside. If you’re studying knowledge bias in high school students based on the number of siblings they have, you won’t randomly select students, you’ll select among high school students, and you’ll avoid siblings (to maintain independence).

But just saying unilaterally that random sample selection is useless misrepresents what it means.

Interestingly, Wikipedia has separate articles for LLN and LTLN (Law of Large Numbers, Law of Truly Large Numbers).

Roper AI? I did wonder about the possibility. My attempts to pry some underlying details from JJR on the topic of evolution felt very much like an exchange with ChatGPT, with a simple regurgitation of the material it was trained on, and simple repetition of the challenged assertions when cornered for supporting details. But when I was dismissed as suffering from the Dunning-Kruger effect, I knew with certainty that I wasn’t dealing with ChatGPT, since if nothing else, ChatGPT is always incredibly polite to me.

Finally, I’m with Robin on this one – confused. As a statistics-training-challenged engineer, I guess what I find most helpful are real-life case studies. As in “The silly engineer approached the problem this way, and as a result, this disaster ensued. Don’t be a silly engineer.” I have successfully addressed many, many problems using seat-of-the-pants “you’re doing it wrong” statistical methods. Have I been strolling along the cliff edge and just getting lucky?

I just read the story (https://archive.org/details/1955-08_IF/page/n3/mode/2up), it’s called “Franchise”.

Thanks for the pointer, it’s quite thought-provoking.

It takes place in 2008 (written in 1955) and depicts an extreme expertocracy, but everything is normal and not 2008 normal, 1950’s normal. I’m not sure if that was intended at the time, but that’s one of the most thought-provoking things about it because that didn’t happen in real life.

Milton Hathaway said: “My attempts to pry some underlying details from JJR on the topic of evolution felt very much like an exchange with ChatGPT, with a simple regurgitation of the material it was trained on, and simple repetition of the challenged assertions when cornered for supporting details.”

I’m intrigued. And I’m not AI, by the way. At any rate, what underlying details? What simple regurgitation? I think we had two different conversations. So, let’s try to be nice and open-minded so that the conversation can actually get somewhere, shall we? Go ahead, shoot – a question about the underlying details of evolution.

And, I agree, the problem with challenging things that are sometimes good, sometimes bad, is that without clear examples, we talk in such ambiguities that it’s hard to come to any useful conclusion. I’ve taught statistics to biology grad students for more than 20 years, and it seems like there is a fair amount of ambiguity here that makes the arguments complicated.

JH,

You must not have read Uncertainty. Sad!

YOS,

Your boss was right.

Yancey asked: “And one knows a variable is normally distributed……how exactly?”

I may have answered this, but there are a few easy tests. And, the point is, the math of what is called parametric statistics only requires that the data (well, the residuals) have close to a normal distribution – it doesn’t have to be perfect. The math is robust, so to speak. Testing is simple – any stats program has a Q-Q test for that (along with others).

Have I been strolling along the cliff edge and just getting lucky?No the fact of the matter is that most problems present obvious symptoms and can be resolved with common sense. For example, a study of failed reactor batches, when stratified by reactor A, B, C, and D, revealed that C had far more failed batches than its sisters. No t-test or ANOM was needed to see this. The engineers then did a detailed comparison of C to A, B, and D, looking for distinctions, and found that the catalyst pipe was laid differently. On the other three it came down vertically to the input port, but on C it came down to the rear of the reactor, ran along the reactor, and elbowed into the port. The reaction was exothermic, so the catalyst was being heated and activated prematurely. On that same project, we did a time plot of failures and found that while there were few during the summer, there were many in the winter. Again, the only statistics involved was simply grouping the data and drawing a picture. When they walked the piping, the engineers discovered that the pipes exited the building through the roof then re-entered the building. During the winter, this segment of pipe would sometimes be covered with snow [more so than in summer, lol], chilling the product and spoiling the reaction. They re-piped C and built a shed on the roof that could be kept at building temp during the winter. The plague of batch failures diminished to the typical non-C summer level.

All common sense. But statistics is quantified common sense and will sometimes detect a distinction or change that is NOT obvious or caution us from ‘seeing’ one that is really just common-cause noise.

Ye Olde Statistician told us about some things that simply do not need statistics. Those examples tell us nothing about when we do need statistics. It has been said that it’s easy to lie with statistics. Less commonly stated is that it’s nearly impossible to tell the truth without them (when they are useful and needed).