The terms "prosecutor's fallacy" and "defense attorney's fallacy" were originated by William C. Thompson and Edward Schumann in their classic article Interpretation of Statistical Evidence in Criminal Trials: The Prosecutor's Fallacy and the Defense Attorney's Fallacy (1987).
Concrete examples are helpful to understanding the statistical reasoning behind these ideas:
1. Conditional Probability. Consider this case: you win the lottery jackpot. You are then charged with having cheated, for instance with having bribed lottery officials. At the trial, the prosecutor points out that winning the lottery without cheating is extremely unlikely, and that therefore your being innocent must be comparably unlikely. This reasoning is intuitively faulty — it could be applied to any lottery winner, even though we know somebody wins the lottery every week. The flaw in the logic is that the prosecutor has failed to take account of the low prior probability that you and not somebody else would win the lottery in the first place. One example of this fallacy which was once routinely used in Britain among child care agencies and law enforcement was Meadow's law, which led to a number of highly publicised cases of wrongful conviction for murder. The law claimed that in child cot death (SIDS) "One is a tragedy, two is suspicious and three is murder unless there is proof to the contrary."
2. Multiple Testing In another scenario, assume a rape has been committed and that a sample is compared against 20,000 men who have their DNA on record in a database. A match is found, that man is accused and at his trial, it is testified that the probability that two DNA profiles match by chance is only 1 in 10,000. This does not mean the probability that the suspect is innocent is 1 in 10,000. Since 20,000 men were tested, there were 20,000 opportunities to find a match by chance; the probability that there was at least one DNA match is
which is considerably more than 1 in 10,000. (The probability that exactly one of the 20,000 men has a match is about 27%, which is still rather high.)
Finding a person innocent or guilty can be viewed in mathematical terms as a form of binary classification.
A thought experiment can clarify this. A big bowl is filled with a large but unknown number of balls. Some of the balls are made of wood, and some of them are made of plastic. Of the wooden balls, 100% are white; out of the plastic balls, 99% are red and only 1% are white. A ball is pulled out at random, and observed to be white. Can the probability that the ball is wooden be calculated from the information given?
The answer is no: without knowledge of the relative proportions of wooden and plastic balls, we cannot tell how likely it is that the ball is wooden. If the number of plastic balls is far larger than the number of wooden balls, for instance, then a white ball pulled from the bowl at random is far more likely to be a white plastic ball than a white wooden ball — even though white plastic balls are a minority of the whole set of plastic balls.
The significance of this thought experiment can be seen when we substitute "guilty" and "innocent" for "wooden" and "plastic", respectively, and substitute "evidence is observed" for "white". If we observe a particular type of evidence (for instance, a suspect sharing a rare blood type with a sample that was left at a crime scene) we may think that, on that information alone, we can judge the suspect as very probably guilty. But if the number of innocent people is far larger than the number of guilty people, then the number of innocent people in whom the evidence is nevertheless observed (i.e., people who share that same rare blood type but were not involved with the crime) may also be substantially larger than the number of people of whom the evidence is observed and are guilty.
The fallacy can be analyzed using conditional probability: Suppose E is the observed evidence, and I stands for "accused is innocent". We know that P(E|I) (the probability that the evidence would be observed if the accused were innocent) is tiny. The prosecutor wrongly concludes that P(I|E) (the probability that the accused is innocent, given the evidence E) is comparatively tiny. However, P(E|I) and P(I|E) are quite different; using Bayes' theorem we see
So the prior probability of innocence P(I) and the overall probability of the observed evidence P(E) need to be taken into account. Note that P(E) is the probability that evidence is observed regardless of innocence; in the third expression, it is expressed in the denominator as the sum of the probability that the person is innocent but the evidence is against him (P(E|I) times P(I)) and the probability that the person is guilty and that the evidence is against him (P(E|I bar) times P(I bar)). If P(I) is much larger than P(E), then P(I|E) can be large as well.
We can also formulate Bayes' theorem with odds:
Without knowledge of the prior odds of I, the small value of P(E|I) does not necessarily imply that Odds(I|E) is small. (P(E|~I), the probability that the evidence is observed given the accused is guilty, is assumed to be high.)
The fallacy lies in the fact that the prior probability of guilt is not taken into account. If this probability is small, then the effect of the presented evidence is to increase that probability dramatically (by a factor of P(E|I) /P(E|~I)), but does not necessarily make it overwhelming. (In the example below of a city with 10 million people, the presented evidence raises the prior probability of guilt of 1 in 10 million to a posterior probability of guilt of 1 in 10.)
The prosecutor's fallacy is therefore no fallacy if the prior odds of guilt are assumed to be 1:1 or higher. The prior odds in fact depend on the circumstances. Was the person a suspect before the new evidence or not?
In this picture then, the fallacy consists in the fact that the prosecutor claims an absolutely low probability of innocence, without mentioning that the information he conveniently omitted would have led to a different estimate.
In legal terms, the prosecutor is operating in terms of a presumption of guilt, as he is obliged to, but which is contrary to the jurors' obligatory presumption of innocence whereby a person is assumed to be innocent unless found guilty. If the person is suspected solely on the basis of this piece of evidence, then a more reasonable value for the prior odds of guilt might be a value estimated from the overall frequency of the given crime in the general population.
Suppose there is a one-in-a-million chance of a match given that the accused is innocent. The prosecutor says this means there is only a one-in-a-million chance of innocence. But if everyone in a community of 10 million people is tested, one expects 10 matches even if everyone tested is innocent.
The defendant's fallacy would be to say, "We would expect 10 matches in this city of 10 million people, so this particular piece of evidence suggests there is a 90% chance that the accused is innocent. So this evidence cannot be used to point to a conclusion of guilt, and should be excluded."
The problem with the defendant's argument is that there may be other available evidence which on its own is also not conclusive. For example if CCTV cameras surrounding the scene of the crime spotted one hundred people there at the relevant time, one of which was the accused, then the defendant could claim: "The video suggests a 99% chance that the defendant is innocent. The match suggested a 90% chance of innocence. So the conclusion should be a finding of innocence."
When the photographic evidence is combined with the match, the two together point strongly towards guilt, since (assuming the chances of being in the photograph and having the match are independent for an innocent person) the chance that the accused is innocent is about 0.0001. Although this is not conclusive proof and only establishes low probability of innocence in a simplified model excluding other potential explanations such as a person being framed, it provides a much more compelling argument than either piece of evidence alone.
The argument goes that the prior probability that the man is innocent is 9,999,999/10,000,000. While the likelihood of having the match and being in the video may be 1 if guilty, the likelihood of the match if innocent is 1/1,000,000, and the likelihood of being in the video if innocent is 1/100,000, so (assuming independence) the likelihood of both happening if innocent is 1/100,000,000,000. That gives a posterior probability of being innocent of 9,999,999/100,009,999,999 which is 0.000099989991... or about 0.01%.
An interesting example of this concept is the case of Sally Clark, a British woman who was accused in 1998 of having killed her first child at 11 weeks of age, then conceived another child and allegedly killed it at 8 weeks of age. The defense claimed that these were two cases of sudden infant death syndrome (SIDS or "cot death"); neither prosecution nor defense offered any other explanations for the deaths. The prosecution had expert witness Sir Roy Meadow testify that the probability of two children in the same family dying from SIDS is about 1 in 73 million. Some press reports at the time reported this as the probability that the deaths were accidental or the probability that Sally Clark was innocent. But this is incorrect, because it does not take into consideration the prior probability that an arbitrarily chosen woman would murder two of her children. Mrs Clark was convicted in 1999, resulting in a press release by the Royal Statistical Society which pointed out the mistake.
To provide proper context for this number, the figure of 1 in 73 million (or whatever the correct value is) should have been compared to the probability of a mother killing one child, conceiving another and killing that one too. (The figure of 1 in 73 million has another flaw: it assumes that SIDS deaths within the same family are statistically independent, which they may not be. If there are common environmental or other factors (for example, unrecognised and undiagnosed recessively inherited metabolic disease), the correct value may be larger. Likewise, the latter probability may not be as small as the square of the probability of killing one child, because if a person has the motivation and capacity for doing it once, she may well have the motivation and capacity to do it a second time.) Without further data, we can only speculate about the relative probabilities of the alternative theories.