A British judge has thrown a use of Bayes’s rule out of his court. Not only that, his honor (Lordship?) ruled “against using similar statistical analysis in the courts in future.”
A ruling to which this dedicated Bayesian says, “Hear, hear!”
My opinion may be in the minority: the Guardian quotes Professor Norman Fenton, a mathematician at Queen Mary, University of London: “The impact will be quite shattering.”
“We hope the court of appeal will reconsider this ruling,” says Colin Aitken, professor of forensic statistics at the University of Edinburgh, and the chairman of the Royal Statistical Society’s working group on statistics and the law. It’s usual, he explains, for forensic experts to use Bayes’ theorem even when data is limited, by making assumptions and then drawing up reasonable estimates of what the numbers might be. Being unable to do this, he says, could risk miscarriages of justice.
“From being quite precise and being able to quantify your uncertainty, you’ve got to give a completely bland statement as an expert, which says ‘maybe’ or ‘maybe not’. No numbers,” explains Fenton.
Fenton in his objection has hit upon the key reason I support the judge: no numbers! Let me explain.
Bayes’s rule is so simple that it can be proved using the most elementary of arguments. Even frequentist theory admits Bayes’s rule. It is, given the axioms of probability, simply true. How can any judge ban what is both true and trivially so? Here are the details of the particular case:
In the shoeprint murder case, for example, it meant figuring out the chance that the print at the crime scene came from the same pair of Nike trainers as those found at the suspect’s house, given how common those kinds of shoes are, the size of the shoe, how the sole had been worn down and any damage to it. Between 1996 and 2006, for example, Nike distributed 786,000 pairs of trainers. This might suggest a match doesn’t mean very much. But if you take into account that there are 1,200 different sole patterns of Nike trainers and around 42 million pairs of sports shoes sold every year, a matching pair becomes more significant.
All probabilities, including of course those used in Bayes’s rule, are conditional on given evidence. For instance, we can calculate, using the rule, and assuming the suspect is guilty, the probability his shoe prints match that from a pair of “random” tennis shoes.
But to do this requires knowing how many shoes are “out there.” And just what does that mean? The evidence that Nike “distributed 786,000 pairs of trainers” in the years 1996 to 2006 was given. That’s fine, and using that information will give us, after inputing them into the formula, the probability we want. A deliciously precise number, too, to as many decimal points as we like.
But why use 1996 as the starting year? Why not 1995 or 1997? Why not start in June of 1998? Nike might have distributed that exact amount of shoes—and chances are this number is only an approximation—but how many were actually sold? What about other shoes not manufactured by Nike but which are similar? And how many were sold to residents living in just the area in which the suspect lived or murder took place?
And what it is “the area”? Ten blocks? A square mile? How many shoes sold elsewhere were bought on Ebay, say, and shipped to the area? Do all the shoes leave prints at the same rate? Some might have deeper treads and thus are more likely to leave a trail.
It doesn’t matter which assumptions you make. Any set of assumptions will give you, via the formula, a precise answer. That is, an answer which appears precise and which has the imprimatur of science behind it.
But each set of assumptions will give you a different precise answer. Which set of assumptions is just the right one? I have no idea, and neither do the lawyers. But the jury might know.
They form a combined common sense and can better judge what this kind of evidence might mean. The prosecution and defense can bring up the points which they consider salient, and Bayes’s rule can still be explained without the use of explicit formulae—the difference between the probability of guilt given the shoe prints match and the probability of shoe prints matching given guilt can still be highlighted.
But showing the jury some impressive mathematical apparatus which when invoked spits out exact numerical results isn’t, as the judge rightly ruled, fair. The math is not and should not be evidence because some might assume that the complexity of the math is itself proof of the results of using the math. And in this case, the assumptions are so varied and so vague that insisting on precise answers is silly.
Actually, the judge did not ban Bayes’s rule: he banned unwarranted precision. He “decided that Bayes’ theorem shouldn’t again be used unless the underlying statistics are ‘firm’.” To which I again say, Amen.
(Incidentally, I would ban p-values for the same reasons.)
Thanks to readers Andrew Kennett and Mr Anonymous who suggested this post.
Categories: Culture, Statistics
A very important point well made.
If more people, including politicians, could apply the same logic to other fields (e.g., nutrition studies or research on climate to name only two) by laying out as many of the major and minor assumptions that are likely important to a particular study, a great deal of alleged “certainty” would disappear.
This would allow the rest of us to stop worrying about the possibility that those amongst us taking a multiple vitamin are reducing our life expectancy or that a glass of wine a day can only lead to an an early grave.
I can understand the problem if the shoe calculation was the sole (* ahem *) evidence for conviction. If so, it’s use was outrageous. It’s a bit of a reach though to rule out all Bayesian derived statistics (did His Lordship mean for his ruling to apply to Frequentist methods as well?). It goes to support why the prosecution thought owning the shoes was significant. This should be included along with other evidence. Preponderance and all that. It’s really no worse than being found owning a gun similar to the one used in a murder. What are the chances of that?
As for the calculations: 1996 was probably chosen because that’s where the data started. Is it better that the jurors make up there own statistical probabilities? They will you know. Want to guess how many will be way off the mark? The defense team may have dropped the ball if they didn’t point out possible problems with the statistics. I suspect the trial itself is often perceived as evidence of guilt. That’s a big hurdle in itself.
In the US faulty statistics are used a lot. For example, where is the study that determines the probability that no two people have the same fingerprints given the methods used to classify them?
Someday I will learn to spell (or at least improve my editing skills).
Some quite shocking miscarriages of justice have occured in the Uk as a result of such ‘expert evidence’ spurious exact numbers. Good riddance.
As “they” say, all models (statistical or otherwise) are wrong but some are more useful than others. The problem with the useful ones is the nature of the usefulness. All to often they are used to support an agenda other than the discovery of the truth. For example: lending a fake certianty to a pre-determined conclusion. The problem with fake certianty is that it usually produces a lot of dead bodies. One of which might be your own.
Relying on the results of a statistical calculation alone to expose the truth is simply a variant of “the computer said so, so it must be true.” It is at best nothing more than ancient tribal soothsaying clothed in modern techno-babble. Believe in it if you wish but don’t expect the results to be at all reliable.
Statistical calculation, even Bayesian, can be quite precise telling us the quality of our assumptions WITHIN our set of assumptions. However, they do not and cannot tell us the quality of the connection BETWEEN our assumptions and reality. That can only come from a vast quantity of information that is outside of the numbers accumulated, the assumptions used, and the calculations performed. A low p value or high Bayesian probability by themselves contain less information than a random phone number extracted from a random phone book.
The bottom line is that if your life depends upon it, you want something more substantial than a low p or high Bayesian value to rely upon.
Keywords in Fenton’s comment: when data is limited.
I have seen AGW calculations where the temperatures were given to a hundredth of a degree, i.e. three digits behind the decimal point. The publication of results with such phoney precision is common and people seem unable to recognize the fallacy of unwarranted precision. Just because your computer can carry seven decimal digits (32 bit computer word) doesn’t mean alll seven digits are of significance.
“But each set of assumptions will give you a different precise answer.” Briggs, this quote is an exquisite example of why we love you man (IYKWIM). You can’t imagine, well, maybe you can actually, how often I’ve had to have this conversation with managers and even with other statistical practitioners. I think I first remember hearing GIGO about 1970ish but I’m sure it must have been around in some form since people have been making assumptions and calculating a result based on those assumptions.
The mere presense of numbers will give people more faith in the assertion than an equivalent statement without numbers. Oooh, those coppers have math, they must have fingered the correct man.
How many different types of shoes did the defendant have in his home?
This is not a new judgement in the UK; see http://homepages.mcs.vuw.ac.nz/~vignaux/docs/Adams_NLJ.html on a 1996 ruling to the same effect.
The prosecution’s technique sounds like begging the question.
How many of you out there have a ten-year old pair of sneakers?
I couldn’t agree more! There was a celebrated case of deaths in paediatric cardiac surgery in the UK, in which the surgeons involved consulted frequented statisticians overtheir complication rates(why shouldn’t they? Surgeons are not statisticians and would seek advice). They were completely shredded on the basis of Bayian statistics and were sackec, never to pactice again. A number of simulations did not support the Baysian model. It’s trendy, exciting, has some logical rigour, but not exhaustively.