Word is police are buying statistical algorithms which purport to predict crime. Not just at the neighborhood- or block-level, but for persons—meaning for you. Which of our readers is most likely to snatch up a knife from the drawer and run amok? Some enterprising entrepreneur is willing to charge us to answer.
That’s not the new news. The twist is that there are now algorithms to predict which cops will turn rogue. According to Five Thirty Eight’s “We Now Have Algorithms To Predict Police Misconduct“:
Many police departments have early warning systems — software that tracks each officer’s performance and aims to forecast potential problems. The systems identify officers with troubling patterns of behavior, allowing superiors to monitor these cops more closely or intervene and send them to counseling.
What goes into these computer programs?
Incidents that officers deemed stressful were a major contributor; cops who had taken part in suicide and domestic-violence calls earlier in their shifts were much more likely to be involved in adverse interactions later in the day.
Also: previous incidents of bad behavior on the part of individual cops. These are all, of course, just the sorts of things that feed into the “algorithm” that police supervisors have always used to sniff out bad apples. That informal decision making process was not perfect, but, as everyday experience confirms, it must have at least been adequate, if not superior to good enough. So why muck things up with an official piece of software?
I’ll give you one good reason: bureaucratic cowardice. Scenario 1: “Don’t blame the department, Mr Mayor, the Rogue-o-Matic 3000 only gave Officer Smith a 12.391771% of going on the take.” Scenario 2: “I’m sorry, sergeant, we just can’t promote you. Men with PhDs in computer science have developed a sophisticated and powerful program that says you’re likely to beat suspects with your truncheon.”
Bureaucrats love avoiding responsibility; they also adore taking credit. Using some sort of official screening algorithm lets them do both. Avoiding responsibility we’ve just seen. Taking credit is easy, too. “Commissioner, here’s my bi-semi-quarterly amended report, which my office spent the last eight weeks writing, which proves that since we’ve adopted the Screenerator XM-550-FP, we prevented 172.3 crimes, one of which was a rape!, which is as sexist a crime as there is.” That these nonexistent crimes never existed everybody forgets.
Blanket screening does more harm than good. The article tells us when one algorithm was employed to ID bad apples “50 percent of the flagged officers in the data set did not” go rogue. But don’t worry. This is a New & Improved! algorithm which “flags 15 percent fewer officers than the old one.”
False positives—saying good cops will be bad ones—are an enormous problem. Huge number of reputations tainted or ruined, false suspicions raised. False negatives—saying bad cops are good—are even worse. Some cop who otherwise would have been obviously flagged by some supervisor is given a pass because a computer says he’s okay. Or people who would not have been otherwise flagged are never looked at.
The main problem with using a computer-machine-deep-learning-neural-network-big-data-algorithm is that the results appear better than they are. Worse, they appear “objective.” People trust them too much. Why?
Well, haven’t computers beat Grand Masters in chess and go? Then they surely can predict which cop will go rogue! Right?
No. To paraphrase Young Frankenstein “Chess and go are Tinkertoys! We’re talking about the central nervous system!” and about the entirety of human lives, facets which outstrip the complexity of these trivial games to the same degree an interplanetary spaceship does compared to a thrown rock. What do I mean?
All these algorithms, including any future ones that might be invented, work in the same way. A list of measurements thought probative of the outcome (a cop going rogue), are compiled. There is no rule on what this list of measurements should be: they can be anything. For cops going rogue these might be: the number of previous incidents of misbehavior, age, job function, number of donuts eaten and whether jelly is preferred over plain, etc., etc.
Don’t scoff at the donuts. It could be that cops who eat more jelly than plain donuts go rogue more often. Who knows?
In the end, the measurements are fed into the algorithm and they mathematically interact with each and with the outcome. The problem is: nobody knows which measurements are best and, worse, nobody knows how the measurements taken interact with one another. The complexity is too great. The accuracy rate is anemic.
We’ll next week do a part II on the ethics of precrime. So let’s here only discuss the nature of the algorithms themselves.
Categories: Culture, Statistics
This is not just in police departments.
There is a huge effort in consulting companies now to market services in “Insider Threat” detection. Edward Snowden is the proximate cause of this. But the bureaucratic terror of whistleblowers and spies drives this. The pitch to intelligence agencies (Snowden, Ames, Nicholoson), the military (see Bradley Manning), and some commercial firms (rogue traders) is easy and lucrative.
Insider Threat is the keyword:
“Proactive detection from the inside out.
Monitoring cyber footprints and _behavioral indicators_ in a single platform is the ideal combination to detect and prevent insider attacks before they occur.”
That old bogeyman–“behavioral indicators.”
We can read your mind by watching your actions.
If people use the information to help aid decisions I don’t think this is so bad – but that’s the rub isn’t it. If it is just used rote it is just replacing one bureaucracy with another.
I’d challenge the assertion that the traditional way supervision does it alright. The vast majority of cops aren’t that bad – so saying no one is a bad apple is a pretty reasonable strategy for most supervisors. Then when an officer does mess up it is not typically the supervisor who is on the hook anyway.
Departments can compile pretty simple statistics though to indicate officers they should intervene with. Just a simple tally of civilian complaints and resisting arrest charges are what I would recommend (no fancy models needed). What concerns me more about the fancy models as performance metrics to change behavior is that they are so opaque the agents have little to no agency to affect their performance – since a bunch of the inputs are out of their control.
Mr. Christian, it has come to my attention that there is a 50% chance that you will mutiny within the next week. Therefore I will have you hanged from the yardarm to forestall this possibility.
Will this new method replace the polygraph?
You’re taking me back … way back …
As in Philip K. Dick’s “Minority Report”?
A cop could use it to figure which symptoms of their roguery needs a bit more work to keep it secretive.
Hire psychics. It’s cheaper and they are probably just as accurate. I just cannot believe how psychic predictions and mind-reading have become mainstream now that numbers are used to “explain” the phenomena.
Answer to “Which of our readers is most likely to snatch up a knife from the drawer and run amok? “
The quite guy that no one expects. Trust me, that’s most likely.
The non-existent crimes? Like the non-existent early deaths due to “x” and so forth? How cool it is that science can now tell us with certain what did not
happen as well as what did.
Gary in Erko—my thought, also. It didn’t take that long for people to learn to game the social services, court and psychiatric communities. Should be much easier with an algorithm.
I think what needs to be emphasized is that the deeper quibble is with the *decisions* being made with these models. Sure the models are generally crap, but its that we begin to rely on them and forget the ramifications of false positives…
On the plus side, a whole new swatch of people will learn about sensitivity and specificity when they get fired and take their bosses to court.
I meant swath. Brigg’s enemies have found me! Darn you algorithms!
If we are to believe in the ability of algorithms to pick bad apples why are they not used before the bad apples are hired?
I suppose it comes down to the skill of the algorithm and what’s an acceptable number of bad apples. If we accept that even the algorithms cannot be perfect the question is how much more skill does an algorithm need to show before it can replace the old methodology. The old methodology will also produce false positives and negatives. Again the question is does the algorithm reduce the number of both.
Is the question of what you feed the algorithm really any different than what criteria a supervisor uses in the normal course of duty? Perhaps what bothers us is turning over the entire decision making process to the algorithm. In other words, if the algorithm identifies a bad apple, a certain course of action must be followed versus using the algorithm output as one input into a human made decision. But now we’re talking ethics.
This piece may not be a terrific start in a discussion of predictive algorithms. You’ve identified some of the problems, but I think they should be separated out.
1. Above all, separate out the human decisions from the formalisms. I mean not only in terms of decision theory, how the result of the algorithm is used: “I, Chief Joe Schmoe (not a computer program), decide to fire Officer Lefty Caruso, on such-and-such a basis.”
Which is critically important, but save for an ethics discussion.
I also mean separate out all the times in the algorithm where a human decision inserts itself into the formalisms, and then invisibly flits out again.
Just like when we decide that the calculated mean of our observed data will serve as the numerical value for the central parameter of a distribution function. WE, our decision, thus generates a probability distribution — not a formalism.
I’m also troubled by the implication in the piece that human behavior is a special class of thing that has the (unique?) property of evermore being too ‘complex’ for any predictive algorithm. I’m not sure you really wanted to say that comprehensively and decisively; and anyway, I’m sure not ready to say that decisively.
You’ve identified other problems that also deserve separate discussion:
2. Knowing which measurements are ‘best’.
3. Knowing how the measurements taken interact with one another.
4. The accuracy rate. (The skill of the model?)
So, for example, if an algorithm’s ‘skill’ were found to be thus-and-so, then it is a matter of decision what to do with that information. I’m suggesting that we should separate out ALL the human decisions in the entire chain of logic and mathematical formalisms from critiques of the logic and formalisms.
There are also the massive ethical, authority, and political issues involved; such as hiding behind an algorithm that you pretend ‘just happened’. And so on.
But it’s not just a matter of ethics. Otherwise, we’re begging humanly-important questions that are not fully answered by recalling ethical principles, such as: What do we decide to do if the accuracy of some future algorithm does indeed turn out to be greater than some percentage that we think of as ‘accurate enough’? Do we then even much care which measurements are ‘best’, when our decision is that the current measurements are ‘good enough’? And so forth.
So, we should discuss (a) the limitations of the formalisms, (b) carefully lay out each instance within the entire process in which a human decision is/must be inserted, (c) debate the practical merits of making a decision based on thus-and-so a result from our algorithm, and (d) understand this all within an ethical framework.
Sounds like a lot of work!
Reminiscent of early Virus checking algorithms which checked for date changes on .exe files. Reportedly, a programmer got fired because the dates on the .exe files he was building changed!
False positives are a problem everywhere
I’d rather have algorithms than psychologists!
JMJ: I’m sure the psychologists would prefer that option also.
Good points. Let’s add the further ethical problem that algorithms need to see positive cases in order to fit to them (although there is work out there to try to deal with that problem, see SMOTE as an example).
In other words, an algorithm designer must allow police brutality in order to predict it. Get that past an IRB!
Some kind of an objective-as-possible warning system is a good idea. I’ve heard quite a few tales of cops sent to deal with things at the worst possible time for them.
Which will go rogue? Rudimentary phrenology will tell us it’s the one pictured just to the left of the mayor. And perhaps the one with his head hidden behind the mayor’s, I can’t say.
“Experts: Obama’s plan to predict future leakers unproven, unlikely to work”
This was among the initial refutations of these systems, in 2013.
“The techniques are a key pillar of the Insider Threat Program, an unprecedented government-wide crackdown under which millions of federal bureaucrats and contractors must watch out for “high-risk persons or behaviors” among co-workers. Those who fail to report them could face penalties, including criminal charges.”
This approach is very good at getting co-workers to snitch on each other. And at generating false positives.
JMJ: How about guidelines for how much stress a cop can be subjected to in a day or week? If they witnessed a suicide or shooting, maybe a couple of days off? It would have to be mandatory and the same for all, since any variation would make cops look “weak” to other cops. We’re still left with the fact that some people deal better than others and that people are far from objective, so an “objective as possible” system is going to be difficult to come up with. I have read stories of cops who should have been sent home after an incident, so work is definitely needed.
Since you don’t like psychologists, who can we send the cop to talk through what happened? What do we do with the cop? You don’t like religion either, or that’s what you are always saying. So do we send them out to get drunk? What do we do?
It’s just not that complicated. Do the algorithms have more skill than supervisors (who, by the way, may be involved in the rogue behavior)?
The argument that the Alphago victory is meaningless with respect to the ability of algorithms due to the vastly more complex biological nervous systems involved in human behavior is a poor one. One can as easily argue that “if algorithms are better than humans at such a ‘simple’ problem as Go, they must certainly be even more superior at such a complex problem as human behavior.” And, in fact, there is evidence that this is so. Look at algorithmic trading in the markets. And if ever there was a problem in predicting the behavior of humans, the markets are such a problem.
“if algorithms are better than humans at such a ‘simple’ problem as Go, they must certainly be even more superior at such a complex problem as human behavior.”
My mind boggles at the assumptions implicit in this statement.
Attention Humans. The game is up. Surrender peacefully and you won’t get hurt.
Yes again “The love of theory is the root of all evil” the hand prints of the Senior Executive Service (SES) are all over the algorithm; I can smell it all the way to the west coast.
You’ll note, I’m sure, that the quoted sentence is not an argument I made. I stated that it’s no more facile of an argument than is “it’s irrelevant to algorithms’ ability, relative to supervisors, to predict rogue police behavior that they can beat the best Go players because human behavior is so much more complex than Go.” Both arguments are nonsense.
Nevertheless, algorithms, within certain domains, are more successful than humans, even when those domains involve, at one level or another, human behavior. Again, given Dr. Briggs’ position on climate models, I would think that he’d agree that we can theorize and speculate all we want but the question will be answered by actually seeing if the algorithms have more skill than supervisors. He implicitly agrees that there’s value in trying to understand the likelihood of police “going rogue.” Or, to correctly use an oft misstated aphorism, the proof of the pudding is in the eating.
Concerning the matter of using models to forecast what would otherwise be up to humans to forecast, the preponderance of the evidence would suggest that this is a good idea. See here. Also, statistical models will probably improve over time moreso than humans.
Just the same, any such model will probably be underutilized. People are generally averse to trusting models over their own judgement even when the model is more accurate than their own judgement.
Rob, a few days reading zero hedge might reduce your confidence in the algo traders. They use a lot of really underhanded tactics as well as a focus on speed not knowledge in order to arbitrage their way to $. Look up “flash trading”.
I have a system used by a company to track the performance of monthly inspections. They are supposed to complete the inspection and log their results in before the end of the month.
There are two numbers that we plot in that data:
Compliance says whether or not the activity is being doing properly.
Completion checks to see that the box was checked.
The important piece of information in this inspection?
When were the boxes checked? At the end of the month at 10pm? The beginning of the month? periodically over the course of the month?
The goal of the inspection logger was to try and get the folks at the plants off their asses, out into the plant to observe what was happening and fix things that weren’t being done properly.
What was recorded was just whether or not something was done properly. If everything was done properly, we suspect the person of marking everything positive and not checking everything. If everything is marked negatively, WTF are they doing over there. If everything is marked N/A?
People are not computers.
I am not heartless because I do not run up to my child every time he falls down. I do everything I can to behave as if I am heartless though.
From time to time, I fail both directions though. My assessment of the situation makes me run up faster than I would otherwise. My assessment makes me think that the problem is less bad than it was. When I broke my leg, there was nothing to indicate it happened. I was just a heap on the ground screaming. Nothing had hit me. I had just screwed up a soccer kick. Someone finally say “I think he might have been hurt”….
I use scream level with my kids. When my son says “my leg hurts here”, I listen. If he is screaming a certain way, I act. If he is screaming a different way, I don’t.
I cannot explain the difference.