ET Jaynes in his must-have Probability Theory: The Logic of Science said, “It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought.”
This is profoundly true. And today I bring you an example. (One which I’ve been meaning to get to for a very long time, but events, etc.)
People use a thing called Markov Chain Monte Carlo, and other similar “random”-number generators, to estimate certain mathematical functions. In probability and statistics, these functions are probability distributions. The Gaussian, or “normal”, is a probability distribution. There are an endless number of distributions, and most cannot be computed analytically, or not with any ease.
That is, it’s easy to write down the formula for a “normal” distribution, and use a computer to approximate it analytically. It has to be an approximation because this distribution gives probabilities to every real number, of which there are an uncountable infinity of them, and all computers are finite. But we can get as close as we want to the real answer given time. And, really, for most applications getting “close enough” is close enough.
Problem is there are a host of distributions where the answer is hard to write into a computer. The distributions are unwieldy and mathematically complex. It turns out, though, that these complex distributions can be thought of as being built from several smaller, less complex, and easily computed distributions. The end result, we must keep in mind, is just a boring probability distribution expressing uncertainty in some (in science) observable.
It was discovered that if we acted “as if” we were “drawing” numbers from those simpler smaller distributions that make up the larger complex one, fair approximations to the complex distributions could be made. These are the MCMC and similar “random” number methods.
That story can be read in full here: The Gremlins Of MCMC: Or, Computer Simulations Are Not What You Think. Gist: “random” only means “unknown”, and in these methods we turn a blind eye to the known to pretend to get the unknown, so as to bless the results as if they were “random.” No, it doesn’t make sense.
The problem with these methods is not that they don’t work. They do, but inefficiently. The problem is they induce the false idea that Nature “picks” probability distributions, and makes “draws” from them. The problem is they create the false idea that probability exists.
This is such a strange thought to have. Especially for ad hoc probability models. The idea that there is a “true distribution”, that not only caused the observations in the past, but it poised to cause new ones in the future, if all is aligned just right, is weird.
Never mind all that; that’s what the above linked post is for. Instead, let’s turn to the cheerful news that Jaynes was right yet again. There is a way to eschew the ponderous “random” numbers approach, which in addition to all their philosophical difficulties, are like watching an NRO reader come to realize that that conservative cruise to meet Rich Lowry maybe wasn’t the best way to spend ten thousand dollars. “Random” methods are sloooooooooooooooooow. And resource hogs.
Enter the integrated nested Laplace approximation (INLA). A nonrandomized replacement to approximate those complex distributions.
Now I won’t show the math, since this isn’t a math blog. This article describes it all in nice detail. I will outline the idea, though, because it’s fun and because we have to beat the corpse of “random” methods to a thin a powder as we can to discourage people from the old mistakes.
Reach back and recall your old calculus days. Taylor series approximations to functions ring a bell? Probability distributions are just functions. Idea is you take any complex function and made something like a simple quadratic out of it, tossing out the “higher order” terms. A quadratic is like f(x) = a + bx + cx^2. Easy to compute.
A Laplace approximation is like that, but the general form of a Taylor series. It gets you to close enough scads faster, and orders of magnitude saner.
For a simple regression with a sample size of a 100,000, the INLA took less than a minute’s computation time, but a standard MCMC routine took 148 minutes, which is 14 minutes shy of 3 hours. Slick.
But think how much easier to explain to somebody what is going on. There is no mysticism whatsoever about “randomness.”
Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.