Reminder: The Thursday Class is only for those interested in studying uncertainty. I don’t expect all want to read these posts. So please don’t feel like you must. Yet, I have nowhere else to put them besides here. Your support makes this Class possible for those who need it. Thank you. Math alert!
Calibration is an intuitive requirement and leads to a new kind of modeling called conformal prediction.
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Read!
Lecture
We are still with our model statement Pr(Y|M), where Y is or can be made into something dichotomous. We learned some common Judgment functions for these, made scoring functions because we desired math. And we learned about gaming scores and that possibly, for some W (in S(Y_Reality, Pr(Y|M) | W)), proper scores are a way around gaming. Propriety relied on “expected” scores, which aren’t the only possibility. But they do nicely illustrate gaming.
All that applied to when the model was used to make one probability statement about one Y_Reality. Which is the Y that is observed or believed. But often we use a model more that once, so we have p_i = Pr(Y_i |W), for any number of predictions. Since we have more than one prediction, we can use all we have in seeing how good the model performs. If—and this is the big if—our W indicates a mathematical way to score them.
Here some really nifty math comes in handy, first written about by Mark Schrevish in his book Theory of Statistics. He proved, for scenarios like ours with proper scoring functions, any set of predictions that are calibrated beat the same set of predictions that are uncalibrated, conditional on the score.
Calibration is simple. Every time the model states p, we have (if the model is calibrated)
$$f = \frac{\sum I(Y_i|Pr(Y|M) = p) }{n} = p,$$
and this holds for all the different p the model states. In other words, every time the model says, for example, “p = 0.3”, 30% of the Y_i = 1 (dropping the Reality, but remembering it is there, always a danger). And that holds for every unique p the model stated, like “p_i = 0.6” and “p_i = 0.01” and so on.
This is not, it is most definitely not, some frequentist requirement for all future imagined predictions in some hypothetical universe, but what was observed already. Model statements already made, that is.
Calibration is a nice and intuitive measure of performance. When our scoring function can be made into math, of course, which isn’t (I keep reminding us) always true.
Suppose we have a model that made a number of predictions in which, say, “p_i =0.9”. If the model was calibrated, we would see 90% of our past predictions having Y_i = 1 (observed). But suppose instead we get f = 0.7 for this set. We lack calibration. But we can “add back” the “missing” 0.2. Why not? That give us a new model, conditional on the original one as evidence. This new model will of course be calibrated at p = 0.9.
But of course, we can do that for all the p the first model used. The re-calibrated new model is the basis for conformal prediction. In notation (and many different ways of writing this are possible):
$$p’ = Pr(Y | Pr(Y|M) = p, E).$$
There is also likely other evidence E we use to help us build the New & Improved! model. Included in the E is at least the knowledge that we are re-calibrating. The prime on p only indicates the new prediction is not necessarily equal to p. Though, of course, there may be some values of p in the original model which were already calibrated.
Now it doesn’t quite work like this in practice. What happens instead is that the original model is built, Pr(Y|M). That M is then used to make prediction of the past observations. That is, the very observations that allowed us to build the model M in the first place. All these observations are all in M, don’t forget. And also don’t forget that any change—where by “any” I mean any—in the observations also changes M.
In other words, the conformal model is built re-using the same data M used.
It is then the p’ = Pr(Y | Pr(Y|M) = p, E) New & Improved! conformal model is released into the wild. Which we can also write as p’ = Pr(Y | M’). But doing so makes us forget our roots, and allows over-certainty to creep in.
Re-calibration is simple in the way I’ve painted it in theory. But all kinds of choices have to be made in practice. For instance, with parametric-based M the model may only spit out p that are all unique. The parameters are all “integrated out” by the time we get to Pr(Y|M), of course, but because of those parameters our predictions likely lie on the continuum, i.e. the whole interval (0,1). And never, also because of those parameters, in {0,1}. In any case, if the p are all unique—think of this ordered set of predictions: (0.048, 0.299, 0.460, 0.753, 0.790)—then every f will equal 0 or 1. No way to get re-calibration! (Make sure you see why. Watch the video.)
One possible choice is to use “probability buckets”. Pick probability intervals corresponding to whatever decisions you will make using the model. For instance, if you would make the same decision if p_i = 0.47 as p_i = 0.5, but you would make a different decision with p_i = 0.45, then pick buckets that indicate this. There is no general or universal right answer here. The right answer is all dependent on W, which also now contains whatever evidence you are using to pick the buckets.
For instance, round all p to the nearest tenth—if your decisions, and only your decisions, are not sensitive to probability predictions different than 1/10. Then you might, depending on your sample size, have sufficient Y_i at those probabilities to re-calibrate.
Modeling, like every other question of uncertainty, depends on the evidence you have. The more you have, in the form of more past Y, the better information you have. Up to a point. Always up to a point.
Now, there is certainly more to conformal prediction and calibration. But these are the basic motivations. Next time we’ll work through an example which I hope will clarify everything.
Here are the various ways to support this work:
- Subscribe at Substack (paid or free)
- Cash App: $WilliamMBriggs
- Zelle: use email: matt@wmbriggs.com
- Buy me a coffee
- Paypal
- Other credit card subscription or single donations
- Hire me
- Subscribe at YouTube
- PASS POSTS ON TO OTHERS
Discover more from William M. Briggs
Subscribe to get the latest posts sent to your email.
