Let’s do a little science experiment together. Go into the closet and pull out an opaque sack or bag. Anything will do, even a large sock. If you can fit your hand in, it’s fine.
Now reach in and pull out a random observation. Pinch between your fingers a piece of probability and carefully remove it. Hold on tight! Probability is slippery. We will call this a “draw from a probability distribution”.
What does yours look like? Nothing, you say? That’s odd. Mine also looks like nothing. Let’s try again, because drawing random observations is what statisticians do all the time. If we didn’t manage to find something, the fault must lie with us, and not with the idea.
The idea is that these “random draws” will tell us what the uncertainty in some proposition is (described below). “Random draws” are used in Gibbs sampling, Metropolis-Hastings, Markov Chain Monte Carlo, bootstrap, and the like, which we can call MCMC for short.
Maybe the sack is the problem. Doesn’t seem to be anything in there. Maybe that’s because there is no such thing as probability in the sense that it is not a physical thing, not a real property of objects? Nah.
Let’s leave that aside for now and trade the sack for a computer. After all, statisticians use computers. Reach instead inside your computer for a random draw. Still nothing? Did you have the ONOFF switch in the OFF position?
You probably got nothing because when you reach into the computer for a random draw you have to do it in a specific way. Here’s how.
Step one: select a number between 0 and 1. Any will do. Close your eyes when picking, because in order to make the magic work, you have to pretend not to see it. I do not jest. Now since this pick will be on a finite, discrete machine, the number you get will be in some finite discrete set on 0 to 1. There is no harm in thinking this set is 0.1, 0.2, …, 0.9 (or indeed only, say, 0.3333 and 0.6666!). Call your number s. (Sometimes you have to pick more than one s at a time, but that is irrelevant to the philosophy.)
Step two: Transform s with a function, f(s). The function will turn s into a “random draw” from the “distribution” specified by f(). This function is like a map, like adding 1 to any number is. In other words, f(s) could be f(s) = s + 1. It will be more complicated than that, of course, but it’s the same idea. (And s might be more than one number, as mentioned.)
Step three: That f(s) is usually fed into an algorithm which transforms it again with another function, sort of like g(f(s)). That f(s) becomes input to a new function. The output of g(f(s)) is the answer we wanted, almost, which was the uncertainty of the proposition of interest.
Step four: Repeat Steps one-three many times. The result will be a pile of g(f(s)), each having a different value for every s found in Step one. From this pile, we can ask, “How many g(f(s)) are larger than x?” and use that as a guess for the probability of seeing values larger than x. And so on.
Steps two-four are reasonable and even necessary because we cannot often solve for the uncertainty of the proposition of interest analytically. The math is too hard. So we have to derive approximations. If you know any calculus, it is like finding approximations to integrals that don’t have easy solutions. You plot the curve, and bust it up into lots of sections, compute the area of each section, then add them up.
Same idea here. Except for the bit about magic. Let’s figure out what that is.
Now instead of picking “randomly”, we could just cycle through the allowable, available s, which we imagined could equal 0.1, 0.2, …, 0.9. That would give us 9 g(f(s))s. And that pile could be used as in Step four. No problem!
Of course, having only 9 in the pile would have the same effect of only slicing an integral coarsely. The approximation won’t be great. But it will be an approximation. The solution is obvious: increase the number of possible s. Maybe 0.05, 0.10, 0.15, …, 0.95 is better. Try it yourself! (There are all sorts of niceties about selecting good s as part of the steps which do not interest us philosophically. We’re not after efficiency here, but understanding.)
This still doesn’t explain the magic. To us, random means not predictable with certainty. Our sampling from s is not random, because we know with certainty what s is (as well as what f() and g() are). We are just approximating some hard math in a straightforward fashion. There is no mystery.
To some, though, random it is a property of a thing. That’s why they insist on picking s with their eyes closed. The random of the s is real, in the same way probability is real. The idea is that the random of the s, as long as we keep our eyes closed, attaches itself to f(s), which inherits the random, and f(s) in turn paints g(f(s)) with its random. Thus, the questions we ask of the pile of g(f(s))s is also random. And random means real probability. The magic has been preserved!
As long as you keep your eyes closed, that is. Open them at any point and the random vanishes! Poof!
“Are you saying, Briggs, that those who believe in the random get the wrong answers?”
Nope. I said they believe in magic. Like I wrote in the link above, it’s like they believe gremlins are what make their cars go. It’s not that cars don’t go, it’s that gremlins are the wrong explanation.
“So what difference does it make, then, since they’re getting the right answers? You just like to complain.”
Because it’s nice to be right rather than wrong. Probability and randomness are not real features of the world. They are purely epistemic. Once people grasp that, we can leave behind frequentism and a whole host of other errors.
“You are a bad person. Why should I listen to you?”
Because I’m right.
Categories: Class - Applied Statistics, Statistics
I’d like to note what a remarkable advance it is to discover/realize that “Probability and randomness are not real features of the world.” Every once in awhile, I (half) want to make a record of all the genuinely smart people in the past hundred years or so who didn’t understand that, who acted and wrote as if the opposite were true. I recently discovered that Fermi himself, “the man who knew everything,” seemed to have those mistaken ideas.
I wonder what the history of this is. We know that people drew lots in all sorts of circumstances from time immemorial (the new 12th Apostle, Matthias, chosen by lot, e.g.) But what did they think was happening when they did so?
The cognoscenti can now joke about sprinkling “randomization” over an experiment, because in some ineffable way that somehow makes things ‘better” (though in reality it often makes things worse).
But just the idea that “random” is a very slippery concept, and ultimately one that we cannot and should not take seriously, appears to be a remarkable advance in our understanding. Lots of very smart people didn’t understand that.
We went from trying to ascertain causes and yet still casting lots (whose results are caused by the gods?), to what turned out to be an overly smug Newtonian system in which no dice appear, to sprinkling randomizing dust and “drawing random observations” as an analytical ‘improvement’, to “[solving] for the uncertainty of the proposition of interest.” And that’s the smart people!
I’m going to pose questions for clarity (as opposed to resistance). Hopefully I’m not getting too much wrong here:
Usually, the reasoning given for using MCMC over grid sampling is that MCMC can more efficiently explore the domain of multi-dimensional distributions than can grid sampling. As the number of dimensions grows large, it becomes infeasible to maintain the same level of granularity per dimension using grids. The MCMC samples are not independent; they use feedback to find areas of high density, and so they can provide more information with fewer samples. That’s my understanding of the standard argument, but I’m open to corrections.
So, my questions are: if you are using grid sampling in the domain of s, does that make the above point obsolete? If so, is this substitution always possible where MCMC (MH, Gibbs, NUTS, etc) is used?
“Probability and randomness are not real features of the world. They are purely epistemic. ”
What a wonderfully concise explanation of why I’m uncomfortable with the term “random error,” but much happier with “unexplained variation.”
“To us, random means not predictable with certainty.” Doesn’t this just push the word “random” under the rug of “predictable”? I am in general agreement that randomness cannot be a quality or property of a thing. It plays havoc with causality as broadly accepted by the physicists; which they do at high cost to the coherence of any physical theory.
In the real world, where there are multiple causes with multiple effects, random means “We don’t know enough to make certain predictions”, basically. Take throwing a die. It has six sides and hopefully, is symmetric enough, such that throwing it doesn’t bias the result. Someone looking at the throw does not know: 1. how much force was applied laterally, 2. how much force was applied as a torque (twist/rotation), 3. may not know (follows from 1 and 2) how far it will fly before it hits something, 4. know which side hits at what angle, 5. how ‘elastic’ the collision with what it hit is, 6. how far it will bounce or how many bounces it will have, 7. repeat 3 through 6 for each bounce; so said person can only observe the result and knows that said result can only be one of 6 possibilities. Even with that, the observer does not know how the die was made and knows that bias is possible.
On the other hand, the more you know and/or control, you will be able to make some predictions with more certainty.
This is silly. You know perfectly well how pseudo-random numbers are generated. They are part of an exact and repeatable sequence, which cycles through it’s (very large) sequence (a grid) just like your 0.1,…., 0.9. If you know the starting point (the seed) and the grid it uses (the generator) it is perfectly predictable. It’s just a permutation of a grid walk on 0-1. No probability or randomness involved.
The general pop might not understand this and make unsound statements, but that has nothing to do with the methodology.
“Random” is one of those magical, slippery words. We think it has one meaning, but act as if it had another. Here are the true meanings of the word “random”.
1. I don’t know.
2. I don’t understand.
3. The math is too hard.
Nice! I wrote this a few years ago to explain PRNGs to some people: statisticool.com/randomnumbers.htm
I agree with much of what you wrote, but since I cannot predict much nontrivial problems in real life with absolute certainty because I cannot measure everything, cannot measure everything accurately, or doing those things might be impossible, even for very simple cases, the distinction between ‘really random’ and ‘not really random’ is entirely a philosophical than practical issue in my opinion. Some experts (of which I am not) say that there are some possibly really random stuff like thermal noise, quantum stuff, and radioactive decay.
If I want to learn about say the population of companies in the US in some way, but cannot take a literal census due to cost, etc., then I take a sample and weight it. To make things fair, one can take a sample using PRNGs. If it is really random or not is irrelevant to me in that setting (although the philosophy is very interesting) since the PRNGs don’t fail the standard DIEHARD and NIST tests for nonrandomness.
Another point is that the long term relative frequency of heads for a coin sure seems to settle down on something. For different objects, coins, tacks, dice, it settles down to different values. I wonder what it is settling down to if not what we call ‘probability’.
One cannot draw an observation from a sack because an “observation” (a number) is an example of an “abstract” (theoretical) object. However, one can draw a marble from a sack because a marble is an example of a “concrete” (really existing, physical) object. Further, one can draw a marble from a sack randomly by shaking the sack before withdrawing the marble. Confusion of the notions of “abstract” and “concrete” objects, the fallacy of reification, leads to the erroneous conclusion that the notion of a “random draw” is faulty thus must be discarded.