Class 88: The Most Neglected Concept: Skill

Class 88: The Most Neglected Concept: Skill

Reminder: The Thursday Class is only for those interested in studying uncertainty. I don’t expect all want to read these posts. So please don’t feel like you must. Yet, I have nowhere else to put them besides here. Your support makes this Class possible for those who need it. Thank you. Math alert!

Skill is the improvement of one model over another, according to your Judgement function.

Video

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Try the code!

Lecture

Wouldn’t you like to know if your stock broker is just guessing? Be nice to have confidence your doctor knows what he’s talking about when he recommends the New & Improved! Profitol, wouldn’t it? And how about that bureaucrat who is relying on that new sociology paper to implement some obscure new policy that will effect you. Don’t you hope that study is any good?

Enter skill. The word has connotations with its ordinary English cousin, but here it takes on a technical meaning: Improvement over a rival model.

Goodness is, as ever, measured by our Judgement function, with its Worthiness premises. Right for you is not necessarily right for anybody else, as you now know by heart. Skill is easy. If we have two models, M1 and M2, then M2 has skill over M1, or just the shorthand “has skill”, if M2 does better than M1 according to your Judgement function. If M2 is worse, than M2 has no skill with respect to M1, or just the shorthand “has no skill.” Simple as that.

Usually, but of course not always, M2 is a more complex model than M1, or more costly in some way. Since regression is the most common model type in the world, and we have gone over it in excruciating detail, let’s use it as a running example. But skill works, and ought to be used, everywhere: from Lagrangian models in physics to sports betting.

The most common of regressions are normal models. The uncertainty in some observable is approximated (it is only ever an approximation) using a normal, with usually its first parameter (its location) a function of any number of measures or propositions. For instance:

$$\theta = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p.$$

The “betas” are themselves parameters which do not exist and which we ought not to have any but technical interest, but (as you remember) they usually, and sadly, become the focus of the models. We won’t make that mistake and are taking our models in a predictive sense. Pace:

$$\Pr(Y \in s | x_1 = a_1, x_2 = 2_a, \cdots, x_p = a_p, E)= \Pr(Y \in s | M_2),$$

for some set, or sets, of interest $s$, and with the $a_i$ indicating supposed, measured, or assumed values of the measures x, and where E is all the other evidence that went into creating our model, including usually past observed data D, prescriptions about how to handle the betas, but at the very least the reason why the normal model was chosen in the first place.

The simplest normal model, since you have decided to use one, is where

$$\theta = \beta’_0,$$

where I use the prime to remind us that this beta_0 is not the same beta_1 in the more complex model. And which, you recall, is proof probability is not relative frequency and does not exist. (Recall you believe in magic if you think by every model creation you conjure forth probability from some mystical realm, which is why you speak of measuring “true” values of the parameters. There are no parameters. They do not therefore have “true” values.)

In any case, with this simple model we want

$$\Pr(Y \in s | E)= \Pr(Y \in s | M_1).$$

No measures x are needed, nor were used in building the model.

Obviously, M2 ought to beat M1 at $s$, conditional on whatever Judgement function or Score you used. If, in face, M2 does not beat M1, then there is no reason to use M2 (with a cavet). Why would you? M1 is better!

There may be any number of practical and political concerns about why you must use M2 anyway, but that is because the Judgement function driving those concerns is either not yours or was foisted upon you. For instance, by some loving beneficent kindly bureaucrat, schoolmarm or the like. But that problem is too big for us, so we stick with assuming one Judgement or Score function.

You may have any number of $s$ under consideration, $s_1, s_2,…$, and sometimes M2 had skill (over M1) and sometimes it did not. Then either you, as part of your W, have different weights on those $s$ and thus can declare a winner in the contest, M2 v M1, or you say “M2 won here, M1 won there.” This brings us to…

Measure (Variable) Importance

We already learned that if we add some $x_j$ to the model, we judge it relevant if

$$\Pr(Y\in s| x_1, x_2, \cdots x_p, x_j | M3)\ne \Pr(Y\in s| x_1, x_2, \cdots x_p | M2),$$

and irrelevant if the probabilities are equal. An addition can be relevant but not increase skill, and can even decrease or eliminate it. Relevance is not a guarantee of superior predictions, only that adding the measure changes your uncertainty in the observable.

Thus, importance is easy to check. Simply take an x_j out and see how much better or worse the Judgement or Score function is. Step through all x in the same way, and in then end we know how much each x is contributing, relative to our Judgment/Score. This can be done with respect to each x and the overall “big” model, the one with all the other x in it, or with respect to M1, the model with no x in it.

Again, relevance tells us whether the x changes the probabilities, but we don’t know if the changes are in the “right” direction until the model is checked against Reality.

Observation Importance

With a fixed set of x, we can do the same service for observations. That is, the past data that went into building our model. In E was D, and $D = D_1, D_2, \cdots, D_n$, i.e. the data that was used (the old observations). Both relevance and skill can be checked for each D_i. It is sometimes useful in knowing not only that some x_j was relevant or improved skill, but which combination of x, the combination found in the D_i, were important.

This can be somewhat computationally expensive, because the model has to be refit using all D except D_i, and for each i in 1 to n. Doubtless many shortcuts can be discovered for known model types. i.e. those with known or easily approximated analytical solutions. This is a wide open field for research.

Always Check

Unless your model is, as above, M1, or there really is nothing simpler, because you have deduced your model form (which we did often at the beginning of the Class), then you will always be able to check skill. Especially if your model is ad hoc, which the majority are. Just like probability leakage, there is no excuse not to check it. And this conclusion is even stronger if it is you who are trying to sell that model, such as in a paper declaring some “novel” discovery. Or when you are reading a paper, you have to ask, “Where is their discussion of skill? Why ought I to trust this model?”

Models Of Skill

Above I mentioned a caveat. This is it. Judgement, Scores and Skill are checked on model predictions of new data. They are akin to times in races: the best man is the fastest.

But sometimes we want to predict Judgement, Scores or Skill for the model we have in hand. That is, we’ve got the creature before us, but no new data yet. What to do then?

That’s a subject for another day.

Next time, as we did for calibration, we will do some skill examples.

Here are the various ways to support this work:


Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *