The title, you’ll be shocked to learn, is sarcastic. You’ll forgive the tone after reading NBC’s article
“Big Data’s Big Misses: 2016 Was a Bad Year For Predictions“, which opens
From the United States to the United Kingdom, the last 12 months will be remembered for missed calls, surprises and upsets that didn’t just beat the odds, but that shattered them – and not just in politics. On the most basic level, 2016 was a bad year for data.
Au contraire, mon Probabiliste: it was a terrific year for data. People from boardrooms to broadcast booths slavered over it. Articles that glowed harder than the Fukushima pile spilled forth, a cataract. Classes both near and far were conducted. Fealties were sworn. Big Data was—and is—in. Why, and why this is inevitable and bound to lead to disappoint, below. First:
Going into Election Day, most odds makers had Trump as long long shot, with a less than 30% chance of winning the presidency. The data mavens at fivethirtyeight.com, gave Trump a 28.6% chance of winning. The Upshot at the New York Times gave him a 15% chance of winning. Others had the numbers much lower, 2% or less.
Reminder: these probabilities differ because the evidence differs. All probability is conditional.
The article continues with major lapses in sports prognostications, then continues:
All those results made the numbers, and the people who created them, look silly. In other words, many of us may be focused on how pollsters “got it wrong” in the U.S. presidential race, but those data wranglers had plenty of company in Las Vegas and London. (And remember, as many pollsters will remind you, Hillary Clinton did win the popular vote.)
But the misses on Trump and Brexit were more complicated than the sports books. They were about misreading and mis-predicting the behavior of millions of people. And in those two cases, the data tools may be part of the problem.
Most of the traditional data measures on the 2016 election data were pretty consistent in showing a big Clinton win. It could be that 2016 is telling us the electorate now functions differently.
The data tools are part of the problem in two ways: (1) the models themselves, and (2) the false belief that everything can be quantified. We’ll go in inverse order.
Science oft leads to scientism, of which a symptom is the belief that everything can be quantified, or at least approximately quantified. Thus (partly) the phenomenon of Big Data. (The facet of Big Data that merely involves storage, processing, and computation of massive buckets of bits, while important, such as how individuals can be tracked, I ignore; I am interested here only in prediction.) Big Data is at least the hope that if enough things are measured, prediction will be easy. Cram every tidbit of information into some universal algorithm, ask that algorithm a question, and out will pop the answer.
Computers get better, faster, stronger. And, after all, do not computers now routinely beat humans in chess and go? They do, but those games have known rules which are of trivial complexity. What are the rules that account for human behavior? How do you program them? Nobody knows the answer to either question. The hope that as technology progresses we’ll learn these rules is bound to fail.
Some have the idea that, eventually, the brain and body will be mapped down to the atomic level. Imagine that this is so. A person’s behavior, these people say, can then be projected (and understood) within the uncertainties imposed by quantum mechanics (or whatever might be its replacement). This follows. If we really could, never mind how, know and computerize how every element in a person’s body interacts then via true rules of elemental interaction, as in equations of motions of a gas, we could predict that person’s behavior. We could predict his very thoughts!
Alas, the missing assumption is that man is entirely a physical creature, which is false. There isn’t a person alive (though there may be many dead) who understand the rules of spiritual interaction. There is no way to quantify the spiritual, and thus no way to quantify the spiritual-physical interaction. Thus complete persons will ever be closed off to Science.
Besides, we’re never interested in persons, but person-environments. Even if you could map every element in a person, and there was no spiritual dimension, you’d have to also map every element in his universe to capture the person-environment interactions. Meaning, of course, the computer you use would have to be larger than the universe mapped.
There is more hope that we can predict group behavior. People are quirky but mobs are predictable, it is said. Compile data into big enough a pile and we can show just how far ISIS will have spread fifty years hence. Which means also predicting how non-ISIS groups interact with ISIS. Which means we’re still in deep kimchee, accuracy wise.
Much, much more to say on this topic. Today is only a teaser. About the election predictions themselves, before the outcome I wrote this, cautioning against the idea that we have correctly identified all the things that should—and can—be measured. Computer models, as is obvious, are biased towards that which can be measured. That which can’t be measured, even if it is that which is most important or influential, won’t be computed. This is a tautology with a twist. Because that which is computed is computed on a computer, and labeled inter alia “sophisticated”, the output will be accorded undue weight because of scientism.
About the ineptness of the models themselves, see this book, especially the latter chapters. One must admit (as I do in the book and elsewhere) that computer scientists are genuises at naming their algorithms. They sing. They beguile. They imbue the same hopefulness in one’s breasts as do those infomercials one sees upon waking after falling asleep in front of the tube about new ways to chop vegetables and cook and serve them with no mess. Yet when the algorithm arrives in the mail, you unwrap it and try applying it to human behavior in real life, they work just as good as do those new fangled vegetable peelers.