Class 92: Simple Optimal Decisions: PSA & Cancer Example

Class 92: Simple Optimal Decisions: PSA & Cancer Example

Reminder: The Thursday Class is only for those interested in studying uncertainty. I don’t expect all want to read these posts. Pease don’t feel like you must. Yet, I have nowhere else to put them. Your support makes this Class possible for those who need it. Thank you. Much math alert!

We flesh out our lectures on simple optimal decisions and why we don’t want or need ROC curves. The running example is the PSA test and cancer.

Video

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Study

Lecture

We finish our simple optimal decisions and anti-ROC curve lectures with an example. I’ll assume you remember all the material from those lectures and won’t repeat any of it here. Review first or you will be lost.

Or example comes from a Kaggle dataset with about 28,000 prostate specific antigen (PSA) test and biopsy status, benign or malignant, observations. There are also things like age, BMI, smoking status and the like. We’ll use a couple of these for the sake of the example, but we are not here trying to find the best cancer prediction model.

We’re going to put ourselves in the place of doctors and patients who have received data from a PSA test. We’ll use just the PSA level itself, and then the probability of malignant tumor from a simple model which includes PSA. Here, since we know logistic regressions are awful (because over-smoothing) in these situations, we use a random forest model as a function of PSA level, Age, Smoking History, and BMI. I picked these only on the basis of popularity, and they are indeed the kind of measures you’d encounter in a doctor-patient setting.

I repeat: we’re going to take both the PSA levels and the models as given. We are not here going to look at any of the model goodness measures we developed earlier. We are not critiquing the models, but trying to make decisions based on what information we have been shown. Like most people.

Recall the PSA test produces (finite, discrete) ordered levels L = (l_1, l_2, …l_m). Here, higher levels are thought more indicative of the presence of prostate cancer. In the first step, we pick a level l_i and say “At values greater than l_i, we’ll act like there’s prostate cancer.” We do this for each l_i.

Then, in the second step, we build a predictive model Pr(Malignant | L, E) = P. These P are like L: they, too, are ordered: P = (p_1, p_2, …, p_m). We pick a probability p_i and say “At probabilities greater than p_i, we’ll act like there’s prostate cancer.” We’ll do that for each p_i.

With our level or probability in hand, we have our decision. We now only need our gain-loss matrix a. These have the dichtomized values a_{Decision, Cancer State}:

  1. a_{1,1} = 50 (true positive decision),
  2. a_{0,1} = -100 (false negative decision),
  3. a_{1,0} = -20 (false positive decision),
  4. a_{0,0} = 10 (true positive decision).

These are, of course, arbitrary. I picked them only because they seemed plausible. To me. They are my worthiness criteria. They may not match yours, or even come close. But that, we learned in the first lesson, is the great benefit of this approach. You pick what is meaningful to you, knowing that what some dumb statistician picked, or what some doctor picked, can be wildly different. In other words, a model which is useful to one man may be terrible for another, as we also learned.

We also compute the sensitivity and specificity of these decision-cancer state pairs using L, and then we plot the ROC curve, as discussed last time.

EXAMPLE

There are about 28,000 observations: about 20,000 are benign and the rest malignant. The PSA levels run from just over 0 to 15 ng/mL. Custom fixes the value 4 ng/mL as a cutoff between benign and malignant.

Here is the histogram of PSA values for the two groups:

One might have hoped the malignant distribution would have shown greater frequency of high PSA values. Indeed, there seems little to distinguish these observed distributions (these are not probabilities!), which is bad news of PSA as a raw marker.

That’s backed up in ROC curve:

That red dot is supposed to the optimal decision point based on maximizing true positive – false positive. What is the PSA associated with that red dot? Can’t see it, can you. Another big weakness of the ROC. You don’t know what PSA level is associated with each point on the graph. They are all hidden in the sensitivity etc.

We also see the AUC, but since I reject this measure (why, we discussed last time), I say no more about it, except to note that since the AUC is about 0.5, and almost all the points lie on the one-to-one line, which is taken as the “random classifier.” In other words, the 50-50 guess.

Even though we can’t see what the PSA level is for each point in the graph, we can find it other ways. Above, the red dot is true positive – false positive. But other people like sensitivity + specificity. That’s called Youden’s Index. There are other choices. None of these choices take into account the actual gains and losses made using decisions based this way. Anyway, Youden’s gives a PSA = 9.2 as the cutoff.

We can incorporate those gains and losses using only the PSA in the way we discussed above and in the previous lectures. I.e. by deciding cancer if L >= l_i for each level, calculate the average utility from the data, and plot this for each level. If we do that, we get this plot:

The average utility—based on my worthiness criteria, maybe not yours!—is negative for almost all values of the PSA, until we get near the max. And, in fact, this is the max. Which means, for me, I would never use the PSA test. Unless maybe it maxed out. The PSA on average leads to lousy decisions. For me.

And likely for you, too. I varied the a_{0,1} (false negative decision) by a lot, even down to -10,000 and got versions of this same plot. This is not surprising given the observational distributions of PSA we saw above. You get about the same average number of malignants regardless of PSA level.

This is proof PSA levels have no value. It could be different levels help predictions, but only when married to other measures, such as Age, Smoking status, and BMI. Which is what we try next, building a random forest model of these measures (and PSA), and looking at average utility for the decision rules “Act like malignant if P > p_i” for each p_i the model produces.

We get this plot:

The best prediction is Pr(Malignant | L,E) = 0.38. But it’s only best by a hair, and we’d have about the same average utility if we started with, say, Pr(Malignant | L,E) = 0.25.

The reason this plot goes negative is that for high values of Pr(Malignant | L,E) we get more false negatives, which have large absolute weights. If the probability is, say, 0.75, we have a good chance of catching cancers, but at the cost of increasing the chances of false negatives.

At this point, any number of things can be done. All the usuall model goodness criteria we developed could be checked. A discussion whether this was the best a (matrix) for myself can be had. We can see how different my a is from the doctor’s, who likely puts more weight on false negatives than you do. After all, those make him look bad, whereas asking for more tests feels utterly routine, and makes the system more money (and you less).

We would still want to know if the PSA had any predictive utility. There may be non-linearities or something in it the simple methods missed and maybe random forest can catch. Besides, not I think about it, given the wisdom that “every man dies with prostate cancer, but not of it”, I’m switching my worthiness criteria! The gains are the same, but now the false negative loss is -10, and the false positive loss (given my loathing of “going to the doctor”) soars to -200. Then, using only a random forest model as a function of PSA (and nothing else), we get this:

Now the chance for Pr(Malignant |L, E) has to be greater or equal to 0.9 before I would act. That, to me anyway, seems right. And shows there is some weak predictability for PSA, at least in this model. For other models? Who knows.

DISCUSSION

PSA is in many ways a rotten example. It stinks as an assay, and it seems like we spent a lot of time spinning our wheels when looking at these plots. Which is true. But that is the beauty of the approach. We can see all before us. The focus becomes, as it should, on the worthiness criteria.

Here are the various ways to support this work:


Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *