It’s a classic in every 101 biostat textbook to do Bayes theorem in the fun-scary way. Somebody must have done it for the coronavirus, but I haven’t seen it. Goes like this.
Suppose you cough, realize This Could Be It and rush to the clinic with the other TV viewers. There is a long line of people waiting to be tested for COVID-19, the dreaded coronavirus.
They finally come to you.
They take a swipe or a jab, stick the scooped up gook into a phial, which turns gloomy pink. A positive test for the coronavirus!
Now, given everything you know about the coronavirus, or everything you assume about it, what are the chances you have it and become a two-week coronaleper?
Once you figure those odds comes the Big Question: given all that information, what is the probability you get the jump on Joe Biden and meet St Peter first?
First just the disease. Death second.
We want, in standard notation
Pr( Have Coronachan | + Test, other info).
That “other info” is in all the textbooks called “the base rate” (the based rate are the proportion of reactionaries in the population). This is mysteriously called the proportion of people in the population who have the disease already. Let’s call this base rate information B. We will come back to it, because there’s more to it than this. But it’s used like this:
Pr( Have Coronachan | B ) = p.
Doesn’t matter what value p is for now, though it’s called the probability a person has coronavirus given we know only B, the background info.
We want, in shorthand:
Pr( CY | +T & B ),
where “CY” means Coronachan Yes and +T is a positive test.
We get this by using Bayes’s theorem (read all about that in this award-eligible book).
Pr(CY | +T & B) = Pr(+T | CY & B) x Pr(CY | B) / Pr(+T | B).
On the right we have Pr(+T | CY & B) is the probability of a positive test, eithre assuming or that we know a person has coronavirus, and that we know B. We already saw Pr(CY | B). We also have Pr(+T|B), the probability of a positive test given we know B. If we don’t have access to that, we can use the magic of probability and write:
Pr(+T | B) = Pr(+T | CY & B) x Pr(CY | B) + Pr(+T | CN & B) x Pr(CN | B).
We call Pr(+T | CY & B) the sensitivity of the test. We call 1 – Pr(+T | CN & B) = Pr(-T | CN & B) the specificity.
Neither of these numbers is 100%. Both depend on B, the type of test being used. No medical test (in this category) is perfect. There are errors in both direction: positive tests when no disease is present, and negative tests when it is.
I’ve seen some claims for some coronachan tests of 80% sensitivity, but I don’t know the specificity. Call it 90%. Maybe both of these are high, maybe low. They depend on where you got the test, the kind of test, how careful the test was with your sample, and on an on. All that information is in B.
We know Pr(CY | B) + Pr(CN | B) = 1. But what either of them individually is, is tricky. Just what is the base rate for you? If you’re a worker at the Wuhan meat market you’ll get one number. If you’re in an off-the-grid cabin overlooking Lake Superior, you’ll get another. Should you use the estimate “#people who have it country A/#people in country A”? Why? What if you live up north and not in Seattle?
Base rates are tricky! There is no unique base rate! There is also no unique sensitivity and specificity! It’s a mess.
If you want to use Bayes, you gots to put somein’ in. But what?
The solution is to use lots of numbers, or none, if you don’t know any. Quantification for the sake of quantification leads to over-certainty—always bad.
Nevertheless, try Pr(+T | CY & B) = 0.8, Pr(-T | CN & B) = 0.9, and thus Pr(-T | CN & B) = 0.1.
Then we get this, for base rates from 0 to 0.1:
Sensitivity = 0.8; Specificity = 0.9 B Pr(CY|+TB) [1,] 0.00 0.000 [2,] 0.01 0.075 [3,] 0.02 0.140 [4,] 0.03 0.198 [5,] 0.04 0.250 [6,] 0.05 0.296 [7,] 0.06 0.338 [8,] 0.07 0.376 [9,] 0.08 0.410 [10,] 0.09 0.442 [11,] 0.10 0.471
Amazing, yes? If the base rate is 1%, given a positive test with these characteristics—which is considered not bad in medical circles—you have a 7.5% chance of having coronavirus. Not 100%.
Make the tests better, adding 5 points to each.
Sensitivity = 0.85; Specificity = 0.95 B Pr(CY|+TB) [1,] 0.00 0.000 [2,] 0.01 0.147 [3,] 0.02 0.258 [4,] 0.03 0.345 [5,] 0.04 0.415 [6,] 0.05 0.472 [7,] 0.06 0.520 [8,] 0.07 0.561 [9,] 0.08 0.596 [10,] 0.09 0.627 [11,] 0.10 0.654
Make them exemplary.
Sensitivity = 0.95; Specificity = 0.95 B Pr(CY|+TB) [1,] 0.00 0.000 [2,] 0.01 0.161 [3,] 0.02 0.279 [4,] 0.03 0.370 [5,] 0.04 0.442 [6,] 0.05 0.500 [7,] 0.06 0.548 [8,] 0.07 0.588 [9,] 0.08 0.623 [10,] 0.09 0.653 [11,] 0.10 0.679
Make them so good doctors salute every time they think of them!
Sensitivity = 0.99; Specificity = 0.99 B Pr(CY|+TB) [1,] 0.00 0.000 [2,] 0.01 0.500 [3,] 0.02 0.669 [4,] 0.03 0.754 [5,] 0.04 0.805 [6,] 0.05 0.839 [7,] 0.06 0.863 [8,] 0.07 0.882 [9,] 0.08 0.896 [10,] 0.09 0.907 [11,] 0.10 0.917
With a base rate of 1 out of 100, there is still only a 50/50 chance you got the bug! Only 50/50. Flip a burger. Of course, if you’re in Wuhan, or parts of Italy, B = 1% maybe isn’t so realistic. What’s your B? I have no idea. There is no unique B!
You can see that there’s going to be a lot of mistakes in classifying coronavirus cases—probably a lot of false positives, especially in initial testing. Perhaps not as many misclassifications of deaths due to the bug. Tests for cause of death are better.
The conclusion is that it’s nuts to implement large-scale testing on a population. It will lead to huge numbers of false positives—which will be everywhere painted as true positives—and more panic.
Now all this goes for only one test. Usually in medical tests you get one positive on a down-and-dirty test, and you go in for a second, better one. The number you get from calculations like above become the new base rate when using the better test. You nest these calculations.
For instance, in the down-and-dirty, you used Sensitivity = 0.85; Specificity = 0.95 and thought your B = 0.01. You get Pr(CY|+TB) = 0.147. They schedule you for a second test, which is salute worthy; i.e. Sensitivity = 0.99; Specificity = 0.99. You use a base rate of 0.147 for this. You calculate and get Pr(CY|+TB’) = 0.945, where B’ = “first test & B”; i.e. the original background information with added information on the first test.
Now 0.945 is still not 1, meaning mistakes will still be made.
Probability of death
Death comes next. What are we calculating?
Pr( Dead | CY & B),
Pr( Dead | +T & B)?
These are not the same! Be careful. The B is “overloaded.” It now contains information not only on the so-called population base rate, it also has information on death base rates—-which vary by B!
Most importantly, Dead is death from coronavirus, not being run over by a car or whatever.
Meaning knowing you’re 8 and previously healthy versus knowing you’re 80 and have emphysema give different information.
The new dead-base rate is just Pr( Dead | CY & B), which assumes you know with certainty that you have the bug. Then you have to figure your category, 8 vs. 80, and all that. We’ve heard reports no deaths 0-9, and something like 15% in 80+ year olds, though all these numbers are only good guesses. In any case, Pr( Dead | CY & B) is found by looking things up on the internet and hoping for the best. The internet never lies, right?
The Pr( Dead | +T & B) is different. It doesn’t assume certainty of the bug, only that a test or string of tests said you had it.
Pr(D | +T & B) = Pr(+T|D & B) x Pr(D|B) / Pr(+T|B).
The right hand side has three parts, which we’ll take left to right.
Pr(+T | D & B) = dead test sensitivity;
Pr(D | B) = Pr(D|CY & B)Pr(CY | B) + Pr(D|CN & B)Pr(CN|B);
Pr(+T|B) = Pr(+T | CY & B) x Pr(CY | B) + Pr(+T | CN & B) x Pr(CN | B).
Maybe the dead test sensitivity is high, meaning you’re lying on a slab dead from coronavirus (the doctors say) and do the test. Call it 0.99. Or even 1, because if they docs are saying you died of coronavirus, they had to have some test to confirm that, even if this “test” is only their own judgement. Or you could figure this +T is the string of tests you had at first.
Pr(CY | B) and Pr(CN | B) are the death base rates we had above. Pr(D|CY & B) is the stuff we looked up on who with what age and comorbities died and who didn’t.
Be careful! Pr(D|CN & B) will be 0. Because D isn’t just dead, but died from coronavirus. If CN is true, you don’t have coronavirus and can’t die from it—though it’s possible you might die of fright from wondering if you have it. Pr(+T|B) we already did.
Let’s try some numbers, using death-from-coronavirus base rates (DB) from 0 to 20%.
Test Sensitivity = 0.80; Test Specificity = 0.90 Death Sensitivity = 0.99; Initial Base rate = 0.01 DB Pr(D|+TB) [1,] 0.00 0.000 [2,] 0.02 0.002 [3,] 0.04 0.004 [4,] 0.06 0.006 [5,] 0.08 0.007 [6,] 0.10 0.009 [7,] 0.12 0.011 [8,] 0.14 0.013 [9,] 0.16 0.015 [10,] 0.18 0.017 [11,] 0.20 0.019
Even if you’re very high risk (old, smoker, say), then after learning of the initial positive test, you only have a 2% chance of croaking. If you’re low risk, say 20-29 year old and healthy with a background death rate of 2%, then you only have a 2 in a thousand chance of expiring.
Up the initial population base rate to 10%.
Test Sensitivity = 0.80; Test Specificity = 0.90 Death Sensitivity = 0.99; Initial Base rate = 0.1 DB Pr(D|+TB) [1,] 0.00 0.000 [2,] 0.02 0.012 [3,] 0.04 0.023 [4,] 0.06 0.035 [5,] 0.08 0.047 [6,] 0.10 0.058 [7,] 0.12 0.070 [8,] 0.14 0.082 [9,] 0.16 0.093 [10,] 0.18 0.105 [11,] 0.20 0.116
Much bigger chances for the highest risk, and now about 1 in a 100 (ten times higher) for the lowest.
Now let’s suppose you had your second salute-worthy test after the first, with the B = 0.01, which gave a probability of having the bug at 0.147. Then we get:
Test Sensitivity = 0.80; Test Specificity = 0.90 Death Sensitivity = 0.99; Initial Base rate = 0.1 DB Pr(D|+TB) [1,] 0.00 0.000 [2,] 0.02 0.019 [3,] 0.04 0.038 [4,] 0.06 0.057 [5,] 0.08 0.076 [6,] 0.10 0.094 [7,] 0.12 0.113 [8,] 0.14 0.132 [9,] 0.16 0.151 [10,] 0.18 0.170 [11,] 0.20 0.189
As expected, these probabilities converge to the death-base rate, because the tests are becoming more and more certain you have the disease.
That’s it. Wash your hands.
Try it yourself. I made no effort to make these pretty.
This is for tests, which I hope it’s obvious what is what.
sen = 0.99 spe = 0.99 B = seq(0,.1,.01) p.chan = (sen * B)/ (sen*B + (1-spe) *(1-B)) cbind(B,round(p.chan,3))
This is for dead.
dead.sen = 0.99 DB = seq(0,.2,.02) B = 0.147 sen = 0.99 spe = 0.99 p.chan = dead.sen * (DB*B + 0*(1-B) ) / (sen*B + (1-spe) *(1-B)) cbind(DB,round(p.chan,3))
To support this site and its wholly independent host using credit card or PayPal (in any amount) click here