Class 91: Try Not To Use the ROC Curve & AUC

Reminder: The Thursday Class is only for those interested in studying uncertainty. I don’t expect all want to read these posts. Pease don’t feel like you must. Yet, I have nowhere else to put them. Your support makes this Class possible for those who need it. Thank you. Much math alert!

We don’t need the ROC curve unless we’re designing radio receivers or tests. What we want is the probability of the event given the test: calculate that.

Video

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Study

Lecture

When we last met we derived simple optimal decisions, where both the “optimal” and the probability behind these were, as they ever are, conditional on the worthiness criteria and evidence you decide. I had thought of doing examples, but realized I hadn’t yet covered sensitivity, specificity, ROC curves and all that.

I don’t think we need spend much time on these, as they are so well known. Still, there is some value putting these creaky old terms into our notation and contrasting them with our last lesson. Also, I do not like ROC curves, AUC and all that.

Recall one of our running examples. We have some assay, like a PSA test, which produces (finite, discrete) ordered levels of some measure, call these L = (l_1, l_2, …l_m). Here, higher levels are thought more indicative of the presence of prostate cancer. We pick a level l_i and say “At values greater than l_i we’ll act like there’s prostate cancer.” We went through the difference between probability and decision last time, so review that.

We’ll call a level greater than (or greater than or equal to, if you like) l_i a “positive test”; else a “negative test.” Let the test be T, and the Disease (cancer or any dichotomous event) be D. The evidence E can include whatever past observations we have made, or it can be purely an assumed model.

With our level in hand, and with whatever additional E we bring to bear so that we can create quantitative probability models, we can compute various things, such as:

Pr(+ Test | + Disease, l_i, E) = sensitivity,
Pr(- Test | + Disease, l_i, E) = false negative test,
Pr(+ Test | – Disease, l_i, E) = false positive test,
Pr(- Test | – Disease, l_i, E) = specificity.

Obviously, one would enjoy a sensitivity and specificity equal to 1, and false negatives and positives equal to 0. But, of course, most simple measures are not direct inspections of cause, and are mere correlations, and so typically none of these probabilities are extreme.

None of these quantities ought to be used to act or decided as if a disease is present or absent, which would be silly, since these are not probabilities of disease states, and instead say something about the qualities of the test. Alas, they often are used to make these very decisions, which is rather distressing. Like the p-value, no amount of argument and proof will dissuade fans of these probabilities.

What’s wanted instead in decisions about disease state? These:

Pr(+ Disease | + Test, l_i, E) = predictive probability,
Pr(- Disease | + Test, l_i, E) = also predictive probability,
Pr(+ Disease | – Test, l_i, E) = also predictive probability,
Pr( -Disease | – Test, l_i, E) = also predictive probability.

In other words, all these are nothing but different forms of Pr(Disease | Test, l_i, E). All probabilities concerning the disease (event) itself, which is what everybody (well, most) are interested in. The exceptions are people creating the tests, who (one hopes) want good ones. Sensitivity and whatnot are good for them, but not for patients, say, who have just been confronted with the results of a test.

On the other hand, the first set are nothing but different forms of Pr(Test | Disease, l_i, E). Also all probabilities. But, unless we’re creating the test (assay), they are not interesting, because they assume you know the state of the Disease. Which you won’t know if you’re using the test to guess that state!

Sharp readers will have noticed the similarities with the depressing p-value. But only similarities, because at least Pr(Test | Disease, l_i, E) does have some uses if you are trying to create a test.

You will have also noticed that Pr(Disease | Test, l_i, E) is not a decision. We might (or somebody might have) used l_i as a decision point, as we did last time. But we could also calculate Pr(Disease | Test, l_i, E) for each l_i, i = 1, 2, … m and use that in our decisions calculus, taking into account the gains and losses for making correct and incorrect decisions. Again, this we learned how to do last time, so I needn’t repeat any of it here.

Let’s instead look at what other people do, and see if we can figure a way to talk them out of it.

The ROC Curve.

Being lazy, I stole our example from Wokepedia:

These are, they say, “ROC curve of three predictors of peptide cleaving in the proteasome“. Three different tests with differing levels: each point on each curve picks an l_i, and the curve itself is generated by running through all l_i, i = 1, 2, … m.

The “true positive rate” is the crude estimate of Pr(+ Test | + Disease, l_i, E), i.e. the sensitivity. The crude estimate is the proportion of times the test was positive (greater than l_i) and given the disease (event) was present (here not a disease, but cleaving in the proteasome). This is not the true positive rate of identifying the disease (or event).

The “false positive rate” is the crude estimate of Pr(+ Test | – Disease, l_i, E), i.e. false positive test, not disease. It is got in the same way.

Here’s where it gets weird. The area under any ROC curve is called (ta da!) the Area Under The Curve, or AUC. The larger this is, it is said, the better the test is. But that area, and indeed each curve, is a function of each of the l_i, where actual decisions about the disease or cleavage or event would be based on Pr(Disease | Test, l_i, E) and on the gains and losses you specify (as in the last lesson) leading to an optimal l_i. You only pick one. Or because the gains and losses might be generic, because you are considering a suite of them for different users of your model, you might have several l_i, one per each specified each gain-loss matrix a (see last time). But still each user picks just one l_i. The model is judged at that l_i, for him.

And you will have noticed, because of the language, that Pr(Disease | Test, l_i, E) is no official part of the ROC curve. We could, of course, plot Pr(+Disease | +Test, L, E) or Pr(-Disease | -Test, L, E) for all l_i in L. And then marry that with our a (the gains and losses matrix).

As useful as a ROC curve might be in creating better tests, it is not useful at all in deciding what to do about the main thing of interest. Nor in judging models, which we learned, has to be done by specifying worthiness propositions (those gains and losses). Just like P-values, which make one-size-fits-all decisions, the ROC curve assumes symmetric costs and and gains. And only related to quality of the test, not the quality of the decision about the disease/event.

Another criticism. Suppose (as mathematicians say) without loss of generality the test has three levels, l_1, l_2, and l_3. And we have a collection of old observations of disease/event states (dichotomous) and test levels. Each of these levels is used in the ROC curve, which has three points. The first is “Act as if Disease is present if Test >= l_1” (recall, we are assuming higher levels mean more likely disease/events, but this works the other way, too). Well, now, that’s everybody, isn’t it. So that level would never be used. But it is part of the ROC curve. Et cetera.

We just don’t care about questions of the test behavior, unless we’re creating tests. We care about what the model says about the proposition of interest.

BAYES

Time to drag out the hoary textbook example about the probability of disease when confronted with a positive test which boasts a sensitivity of (say) 99%! This thing only shows its wizened head because we abandoned probability of event in the first place. Here’s what we want

Pr(Disease | all evidence, including + Test and test characteristics).

If we always had been giving that, then this textbook staple would never have been created. No one would have thought to do so. Because no patient ever gives a damn about sensitivity or specificity about some test. He wants to know what are the chances he’s sick. Or you want to know the probability of the event.

Positive tests are concerning, and, ceteris paribus, they always increase the chance of the having the disease, just as a negative one always decreases it. But the changes are not necessarily large or important. That’s why communicating the probability ought to take precedence. A doc could say, “Sure, you had a positive test, but only 1 out of 100 people with that have the disease.”

That probability depends on “all evidence”, which means what it says: all evidence. What is all evidence? It is, since we have learned from Day One, the evidence you consider. What is the evidence you ought to consider? That which is probative of the proposition of interest. Ideally, the evidence of cause, which gives extreme probabilities (i.e. 0 or 1).

Nothing has changed for us since that Day One. It was always Pr(Y|X) and decisions downstream from that.

I leave as homework that standard textbook example. I can’t bring myself to write it.

Here are the various ways to support this work:

Subscribe at Substack (paid or free)
Cash App: $WilliamMBriggs
Zelle: use email: matt@wmbriggs.com
Buy me a coffee
Paypal
Other credit card subscription or single donations
Hire me
Subscribe at YouTube
PASS POSTS ON TO OTHERS

Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

spetzer86 on Death By Organ DonationJuly 28, 2026
Right out of the old Monty Python skit where the guy was a "liver donor" and the guys came to…
The Four Chords of Modern Music - Roadie Music Blog on Four Chords Is All You Need: The Limited Nature of Pop MusicJuly 28, 2026
[…] Four Chords Is All You Need: The Limited Nature of Pop Music […]
Mary Sova on Wolfgang Smith For EveryoneJuly 28, 2026
I caught a small portion your interview with Ken Craycraft on my car radio yesterday, but didn't realize it was…
shawn marshall on Wolfgang Smith For EveryoneJuly 28, 2026
I'm not able to digest a lot of your essay but able to glean some fuzzy notions of the points…
JH on Death By Organ DonationJuly 27, 2026
Here they have no authority: none whatsoever. Is this an appeal to authority used in reverse? It seems that Dr.…