Ever heard of somebody “simulating” normal “random” or “stochastic” variables, or perhaps “drawing” from a normal or some other distribution? Such things form the backbone of many statistical methods, including bootstrapping, Gibbs sampling, Markov Chain Monte Carlo (MCMC), and several others.
Well, it’s both right and wrong—but more wrong than right. It’s wrong in the sense that it encourages magical thinking, confuses causality, and is an inefficient use of time. It’s right that, if assiduously applied, reasonably accurate answers from these algorithms can be had.
Way it’s said to work is that “random” or “stochastic” numbers are input into some algorithm and out pops answers to some statistical question which is not analytic, which, that is, cannot be solved by pencil and paper (or could, but at too seemingly great a difficulty).
For example, one popular way of “generating normals” is to use what’s called a Box-Muller transformation. It starts by “generating” two “random” “independent” “uniform” numbers U1 and U2 and then calculating this creature:
where Z is now said to be “standard normally distributed.” Don’t worry if you don’t follow the math, though try because we need it for later. Point is that any algorithm which needs “normals” can use this procedure.
Look at all those scare quotes! Yet each of them is proper and indicates an instance of magical thinking, a legacy of our (frequentist) past which imagined aleatory ghosts in the machines of nature, ghosts which even haunt modern Bayesians.
First, random or stochastic means unknown, and nothing more. The outcome of a coin flip is random, i.e. unknown, because you don’t know all the causes at work upon the spinning object. It is not “random” because “chance” somehow grabs the coin, has its way with it, and then deposits the coin into your hand. Randomness and chance are not causes. They are not real objects. The outcome is determined by physical forces and that’s it.
Second, there is the unfortunate, spooky tendency in probability and statistics to assume that “randomness” somehow blesses results. Nobody knows how it works; that’s why it’s magic. Yet how can unknowingness influence anything if it isn’t an ontological cause? It can’t. Yet it is felt that if the data being input to algorithms aren’t “random” then the results aren’t legitimate. This is false, but it accounts for why simulations are so often sought.
Third, since randomness is not a cause, we cannot “generate” “random” numbers in the mystical sense implied above. We can, of course, make up numbers which are unknown to some people. I’m thinking of a number between 32 and 1400: to you, the number is random, “generated”, i.e. caused, by my feverish brain. (The number is hidden in the source code of this page, incidentally.)
Fourth, there are no such thing as “uniforms”, “normals”, or any other distribution-entities. No thing in the world is “distributed uniformly” or “distributed normally” or distributed anything. Distributed-as talk is more magical thinking. To say “X is normal” is to ascribe to X a hidden power to be “normal” (or “uniform” or whatever). It is to say that magical random occult forces exist which cause X to be “normal,” that X somehow knows the values it can take and with what frequency.
This is false. The only thing we are privileged to say is things like this: “Give this-and-such set of premises, the probability X takes this value equals that”, where “that” is calculated via some distribution implied by the premises. (Ignore that the probability X takes any value for continuous distributions is always 0.) Probability is a matter of ascribable or quantifiable uncertainty, a logical relation between accepted premises and some specified proposition, and nothing more.
Fifth, since this is what probability is, computers cannot “generate” “random” numbers. What happens, in the context of our math above, is that programmers have created algorithms which will create numbers in the interval (0,1) (notice this does not include the end points); not in a coherent way, but with reference to some complex formula. This formula which, if run long enough, will produce all the numbers between (0,1) at the resolution of the computer.
Say this is every 0.01; that is, our resolution is to the nearest hundredth. Then all the numbers 0.01, 0.02, …, 0.99 will eventually show up (many will be repeated, of course). Because they do not show up in sequence, many fool themselves into thinking the numbers are “random”, and others, wanting to hold onto the mysticism but understanding the math, call the numbers “pseudo random”, an oxymoron.
But we can sidestep all this and simply write down all the numbers in the sequence, i.e. all the numbers in (0,1)2 (since we need U1 and U2) at whatever resolution we have; this might be (0.01, 0.01), (0.01, 0.02), …, (0.99, 0.99) (this is a sequence of pairs of numbers, of length 9801). We then apply the mapping of (U1, U2) to Z as given above, which produces (3.028866, 3.010924, …, 1.414971e-01).
What it looks like is shown in the picture up top.
The upper plot are the mappings of (U1, U2) to Z, along the index of the number pairs. If you’ve understood the math above, the oscillation, size, and sign changes are obvious. Spend a few moments with this. The bottom plot shows the empirical cumulative distribution of the mapped Z (black), overlayed by the (approximate) analytic standard normal distribution (red), i.e. the true distribution to high precision.
There is tight overlap between the two, except for a slight bump or step in the ECDF at 0, owing to the crude discretization of (U1, U2). Computers can do better than the nearest hundredth. Still, the error even at this crude level is trivial. I won’t show it, but even a resolution 5 time worse (nearest 0.05; number sequence length of 361) is more than good enough for most applications (a resolution of 0.1 is pushing it).
This picture gives a straightforward, calculate-this-function analysis, with no mysticism. But it works. If what we were after was, say, “What is the probability that Z is less than -1?”, all we have to do is ask. Simple as that. There are no epistemological difficulties with the interpretation.
The built-in analytic approximation is 0.159 (this is our comparator). With the resolution of 0.01, the direct method shows 0.160, which is close enough for most practical applications. A resolution of 0.05 gives 0.166, and 0.1 gives 0.172 (I’m ignoring that we could have shifted U1 or U2 to different start points; but you get the idea).
None of these have plus or minuses, though. Given our setup (starting points for U1 and U2, the mapping function), these are the answers. There is no probability attached. But we would like to have some idea of the error of the approximation. We’re cheating here, in a way, because we know the right answer (to high degree), which we always won’t. In order to get some notion how far off that 0.160 is we’d have to do more pen-and-paper work, engaging in what might be a fair amount of numerical analysis. Of course, for many standard problems, just like in MCMC approaches, this could be worked out in advance.
Contrast this to the mystical approach. Just like before, we have to specify something like a resolution, which is the number of times we must “simulate” “normals” from a standard normal—which we then collect and form the estimate of the probability of less than -1, just as before. To make it fair, pick 9801, which is the length of the 0.01-resolution series.
I ran this “simulation” once and got 0.162; a second time 0.164; a third showed 0.152. There’s the first problem. Each run of the “simulation” gives different answers. Which is the right one? They all are; a non-satisfying but true answer. So what will happen if the “simulation” itself is iterated, say 5000 times, where each time we “simulate” 9801 “normals” and each time estimate the probability, keeping track of all 9801 estimates? Let’s see, because that is the usual procedure.
Turns out 90% of the results are between 0.153 and 0.165, with a median and mean of 0.159, which equals the right answer (to the thousandth). It’s then said there’s a 90% chance the answer we’re after is between 0.153 and 0.165. This or similar intervals are used as error bounds, which are “simulated” here but (should be) calculated mechanically above. Notice that the uncertainty in the mystical approach feels greater, because the whole process is opaque and purposely vague. The numbers seem like they’re coming out of nowhere. The uncertainty is couched probabilistically, which is distracting.
It took 19 million calculations to get us this answer, incidentally, rather than the 9801 the mechanical approach produced. But if we increased the resolution to 0.005 there, we also get 0.159 at a cost of just under 40,000 calculations. Of course, MCMC fans will discover short cuts and other optimizations to implement.
Why does the “simulation” approach work, though? It does (at some expensive) give reasonable answers. Well, if we remove the mysticism about randomness and all that, we get this picture:
The upper two plots are the results of the “simulation”, while the bottom two are the mechanical mapping. The bottom two show the empirical cumulative distribution of U1 (U2 is identical) and the subsequent ECDF of the mapped normal distribution, as before. The bump at 0 is there, but is small.
The top left ECDF shows all the “uniforms” spit out by R’s runif() function. The only real difference between this and the ECDF of the mechanical approach is that the “simulation” is at a finer resolution (the first U happened to be 0.01031144, 6 orders of magnitude finer; the U’s here are not truly plain-English uniform as they are in the mechanical approach). The subsequent ECDF of Z is also finer. The red lines are the approximate truth, as before.
But don’t forget, the “simulation” just is the mechanical approach done more often. After all, the same Box-Muller equation is used to map the “uniforms” to the “normals”. The two approaches are therefore equivalent!
Which is now no surprise: of course they should be equivalent. We could have taken the (sorted) Us from the “simulation” as if they were the mechanical grid (U1, U2) and applied the mapping, or we could have pretended the Us from the “simulation” were “random” and then applied the mapping. Either way, same answer.
The only difference (and advantage) seems to be in the built-in error guess from the “simulation”, with its consequent fuzzy interpretation. But we could have a guess of error from the mechanical algorithm, too, either by numerical analysis means as mentioned, or even by computer approximation (one way: estimate quantities using a coarse, then fine, then finest grid and measure the rate of change of the estimates; with a little analysis thrown in, this makes a fine solution).
The benefit of the mechanical approach is the demystification of the process. It focuses the mind on the math and reminds us that probability is nothing but a numerical measure of uncertainty, not a live thing which imbues “variables” with life and which by some sorcery gives meaning and authority to results.