Class 49: Relevance & Importance Of Evidence In Models

Class 49: Relevance & Importance Of Evidence In Models

First basics of models and the Deadly Sin. Also the superiority of relevance over the misleading idea of independence. WARNING for those reading the email version! The text below might appear to be gibberish. If so, it means the LaTeX did not render in the emails. I’m working on this. Meanwhile, please click on the headline and read the post on the site itself. Thank you.

Video

Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty

HOMEWORK: Given below; see end of lecture.

Lecture

This is an excerpt from Chapter 8 of Uncertainty.

As covered in Chapter 4, an X that is added to a model which, in the presence of the other premises (other “Xs”), does not change the probability of Y is irrelevant or not probative. The data X which are used to “fit” a model are of course themselves premises, i.e. X$_{1,1}$ = “Observed the value 112.2 mm/Hg” (for the first premise, say of systolic blood pressure, first observation from some collection). The importance of each premise, given the presence of the other premises, is judged by how much it changes the probability of Y. If an X does not result in extreme probabilities of Y, this X is not necessarily causal, though an injurious, flabbergasting tradition has developed (predominately in the “soft” sciences) which says or assumes it is.

For example, if $\Pr(\mbox{Y} | \mbox{X}_1)= \Pr(\mbox{Y} | \mbox{X}_1\mbox{X}_2)$ then X$_2$ is irrelevant in the presence of X$_1$, even if $\Pr(\mbox{Y} | \mbox{X}_2)$ is something other than the unit interval. That is, X$_2$ may be separately probative to Y but it adds no information about Y that is not already in X$_1$. There are thus two kinds of relevance, in-model, which is rather a measure of importance, how much a premise changes our understanding of Y, and out-model, whether the premises is even needed. A third is a variant of the first: sample relevance.

Suppose Y itself takes different states (like temperature) and that $\Pr(\mbox{Y}_a | \mbox{X}_1)= \Pr(\mbox{Y}_a | \mbox{X}_1\mbox{X}_2)$ but $\Pr(\mbox{Y}_b | \mbox{X}_1)\ne \Pr(\mbox{Y}_b | \mbox{X}_1\mbox{X}_2)$. X$_2$ in the presence of X$_1$—the condition which must always be stated—is then relevant to Y; or, better, relevant only when Y is Y$_b$.

Suppose $\Pr(\mbox{Y} | \mbox{X}_1)= \Pr(\mbox{Y} | \mbox{X}_1\mbox{X}_2) + \epsilon$, with the obvious constraints on $\epsilon$. Then X$_2$ in the presence of X$_1$ is relevant. Whether the difference $\epsilon$ makes any difference to any decision is not a question probability can answer. “Practically irrelevant” is not irrelevant. Irrelevance is a logical condition. The practice of modeling is the art of selecting those X (premises) which are relevant, in the presence of other premises, to Y. Invariably, some new premise will add “only” $\epsilon$ to the understanding of Y. Whether this is enough to “make a difference” is a question only the modeler and whomever is going to use the model can answer. The only “test” for relevance is thus any change in the conditional probability of Y.

Relevance, as we see next Chapter, is how models should be judged before verification of their predictions arrive. Assessing relevance is hard work—but who said modeling had to be easy? That modeling is now far too easy is a major problem; because anybody can do it, everybody thinks they’re good at it. Supposing Y is simple (yes or no, true or false), and a list of premises, the relevance of each X$_i$—it subscript indicates it is variable—is assessed by holding those other X$_j$ which are variable at some fixed level and then varying the X$_i$. For example, to assess the relevance of X$_1$, which can take the values $a_1$ and $a_2$, compute
$$\Pr(\mbox{Y} | \mbox{X}1 = a_1, \mbox{X}_2 = b, \dots, \mbox{X}_p = p, \mbox{W}), $$

where W are those premises which are fixed (deductions, assumptions, etc.), and $$ \Pr(\mbox{Y} | \mbox{X}_1 = a_2, \mbox{X}_2 = b, \dots, \mbox{X}_p = p, \mbox{W}). $$

The difference between these two probabilities is the in-model relevance of X$_1$ given the values the other X take. The out-model relevance is assessed by next computing
$$
\Pr(\mbox{Y} | \mbox{X}_2 = b, \dots, \mbox{X}_p = p, \mbox{W}),
$$
and comparing that to the model which retains X$_1$. Note that all the other X have kept their values. Sample relevance is computed by calculating the same probability but or the addition (or subtraction) of a new “data point.” Irrelevance is:
$$
\Pr(\mbox{Y} | \mbox{X}{n+1}) = \Pr(\mbox{Y} | \mbox{X}_{n}).
$$
For instance, suppose we have $n$ observations and on the $n+1$ the probability Y is true remains unchanged. Then this new data point has added no new information, in the presence of the first $n$. Of course, these may be used to hunt for those data points which are most relevant, or rather, most important, and which are irrelevant (given the others). Those familiar with classical parametric methods will see the similarities; this approach is superior because all measures are stated directly and with respect to the proposition of interest Y.

I should highlight I am not here trying to develop a set of procedures per se, only defining the philosophically relevant constituents of probability models. We want to know what it means to be a probability model—any probability model, and not just one for some stated purpose. Readers interested in working on new problems will discover lots of fertile ground here, though.

It should be by now obvious that each of these probabilities is also a prediction. They say “Here is the probability of Y should all these other things hold.” So not only probabilities, but all predictions are conditional, too. This form of model also forms the basis of how statistical methods should work. All concentration is centered on asking how X influences our knowledge of Y—and even in rare cases, how X causes or determines Y.

Relevance is made more difficult when Y is allowed to vary, but the underlying idea is the same. Except for the X of interest which is varied or removed from the model, fix the others Xs, and compute the probability of the Ys and see what changes to this probability happen. Relevance is when there are changes, and irrelevance when not. This is obviously going to be a lot of work for complex Ys and Xs, but nothing else gives a fairer and more complete picture of the uncertainty inherent in the problem. And, again, who said it had to be easy?

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.

1 Comment

  1. Tim

    What are you playing at, Briggs? Some high-falutin’ model equation junk that’ll get twisted by politicians in the end anyway? Nice try, but you’ll have to get up darned earlier in the morning to get that worm by me, sport.

Leave a Reply

Your email address will not be published. Required fields are marked *