First basics of models and the Deadly Sin. Also the superiority of relevance over the misleading idea of independence. WARNING for those reading the email version! The text below might appear to be gibberish. If so, it means the LaTeX did not render in the emails. I’m working on this. Meanwhile, please click on the headline and read the post on the site itself. Thank you.
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Given below; see end of lecture.
Lecture
This is an excerpt from Chapter 8 of Uncertainty.
As covered in Chapter 4, an X that is added to a model which, in the presence of the other premises (other “Xs”), does not change the probability of Y is irrelevant or not probative. The data X which are used to “fit” a model are of course themselves premises, i.e. X$_{1,1}$ = “Observed the value 112.2 mm/Hg” (for the first premise, say of systolic blood pressure, first observation from some collection). The importance of each premise, given the presence of the other premises, is judged by how much it changes the probability of Y. If an X does not result in extreme probabilities of Y, this X is not necessarily causal, though an injurious, flabbergasting tradition has developed (predominately in the “soft” sciences) which says or assumes it is.
For example, if $\Pr(\mbox{Y} | \mbox{X}_1)= \Pr(\mbox{Y} | \mbox{X}_1\mbox{X}_2)$ then X$_2$ is irrelevant in the presence of X$_1$, even if $\Pr(\mbox{Y} | \mbox{X}_2)$ is something other than the unit interval. That is, X$_2$ may be separately probative to Y but it adds no information about Y that is not already in X$_1$. There are thus two kinds of relevance, in-model, which is rather a measure of importance, how much a premise changes our understanding of Y, and out-model, whether the premises is even needed. A third is a variant of the first: sample relevance.
Suppose Y itself takes different states (like temperature) and that $\Pr(\mbox{Y}_a | \mbox{X}_1)= \Pr(\mbox{Y}_a | \mbox{X}_1\mbox{X}_2)$ but $\Pr(\mbox{Y}_b | \mbox{X}_1)\ne \Pr(\mbox{Y}_b | \mbox{X}_1\mbox{X}_2)$. X$_2$ in the presence of X$_1$—the condition which must always be stated—is then relevant to Y; or, better, relevant only when Y is Y$_b$.
Suppose $\Pr(\mbox{Y} | \mbox{X}_1)= \Pr(\mbox{Y} | \mbox{X}_1\mbox{X}_2) + \epsilon$, with the obvious constraints on $\epsilon$. Then X$_2$ in the presence of X$_1$ is relevant. Whether the difference $\epsilon$ makes any difference to any decision is not a question probability can answer. “Practically irrelevant” is not irrelevant. Irrelevance is a logical condition. The practice of modeling is the art of selecting those X (premises) which are relevant, in the presence of other premises, to Y. Invariably, some new premise will add “only” $\epsilon$ to the understanding of Y. Whether this is enough to “make a difference” is a question only the modeler and whomever is going to use the model can answer. The only “test” for relevance is thus any change in the conditional probability of Y.
Relevance, as we see next Chapter, is how models should be judged before verification of their predictions arrive. Assessing relevance is hard work—but who said modeling had to be easy? That modeling is now far too easy is a major problem; because anybody can do it, everybody thinks they’re good at it. Supposing Y is simple (yes or no, true or false), and a list of premises, the relevance of each X$_i$—it subscript indicates it is variable—is assessed by holding those other X$_j$ which are variable at some fixed level and then varying the X$_i$. For example, to assess the relevance of X$_1$, which can take the values $a_1$ and $a_2$, compute
$$\Pr(\mbox{Y} | \mbox{X}1 = a_1, \mbox{X}_2 = b, \dots, \mbox{X}_p = p, \mbox{W}), $$
where W are those premises which are fixed (deductions, assumptions, etc.), and $$ \Pr(\mbox{Y} | \mbox{X}_1 = a_2, \mbox{X}_2 = b, \dots, \mbox{X}_p = p, \mbox{W}). $$
The difference between these two probabilities is the in-model relevance of X$_1$ given the values the other X take. The out-model relevance is assessed by next computing
$$
\Pr(\mbox{Y} | \mbox{X}_2 = b, \dots, \mbox{X}_p = p, \mbox{W}),
$$
and comparing that to the model which retains X$_1$. Note that all the other X have kept their values. Sample relevance is computed by calculating the same probability but or the addition (or subtraction) of a new “data point.” Irrelevance is:
$$
\Pr(\mbox{Y} | \mbox{X}{n+1}) = \Pr(\mbox{Y} | \mbox{X}_{n}).
$$
For instance, suppose we have $n$ observations and on the $n+1$ the probability Y is true remains unchanged. Then this new data point has added no new information, in the presence of the first $n$. Of course, these may be used to hunt for those data points which are most relevant, or rather, most important, and which are irrelevant (given the others). Those familiar with classical parametric methods will see the similarities; this approach is superior because all measures are stated directly and with respect to the proposition of interest Y.
I should highlight I am not here trying to develop a set of procedures per se, only defining the philosophically relevant constituents of probability models. We want to know what it means to be a probability model—any probability model, and not just one for some stated purpose. Readers interested in working on new problems will discover lots of fertile ground here, though.
It should be by now obvious that each of these probabilities is also a prediction. They say “Here is the probability of Y should all these other things hold.” So not only probabilities, but all predictions are conditional, too. This form of model also forms the basis of how statistical methods should work. All concentration is centered on asking how X influences our knowledge of Y—and even in rare cases, how X causes or determines Y.
Relevance is made more difficult when Y is allowed to vary, but the underlying idea is the same. Except for the X of interest which is varied or removed from the model, fix the others Xs, and compute the probability of the Ys and see what changes to this probability happen. Relevance is when there are changes, and irrelevance when not. This is obviously going to be a lot of work for complex Ys and Xs, but nothing else gives a fairer and more complete picture of the uncertainty inherent in the problem. And, again, who said it had to be easy?
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.
What are you playing at, Briggs? Some high-falutin’ model equation junk that’ll get twisted by politicians in the end anyway? Nice try, but you’ll have to get up darned earlier in the morning to get that worm by me, sport.
Probability offers a theoretical framework for statistical modeling. The formal definition or the word of ‘independence’, P(A and B) = P(A) * P(B), where A and B are two sets or events in the sigma field, works just fine for studying probability.
Statistical modeling is concerned with identifying patterns and making inferences or predictions based on observed data. In this context, the term ‘relevance’ might be appropriate. However, for example, in real life, even if you believe shoe size is irrelevant to predicting your IQ, it would somehow provide some, not zero, information about your IQ. Do we then conclude that everything is relevant? A series of important questions can arise in statistics, which makes the studying of statistics interesting.
If Y does not vary—for instance, if everyone has the same IQ (Y) as JH in a horrifying world—you wouldn’t need probability or statistics to predict Brigg’s IQ.
Is there evidence that people believe probabilities, parameters, or mathematical objects exist physically in nature? The fact that mathematicians use the phrase “there exists” does not necessarily mean they believe in the physical existence of abstract objects. Similarly, when people say, “it has a probability of 1/2…,” it does not imply that they believe it has a physical property of probability. Can you provide direct survey results to support this notion? If not, why keep harping on such speculation?
JH,
My IQ is simplicity itself to estimate. I have no IQ.
Pick up any stats books and look under parameter estimates for phrases like “true value”. Next, pick up any probability book and look for what they call “unconditional probability”.
Two of a plethora of examples of people believing (or really saying they believe) in probability as a thing.
Always thought-provoking, William. The comments section’s a bit hostile today. To really grasp what Matt is driving at, you have to think outside the box. People put far too much weight on probabilities, distributions, p-values, and hypothesis tests. These are abstractions not tangible things. Unlike mass or velocity, they don’t exist in the world; they exist in our minds and in our models. Yet researchers often treat them as if they were inherent properties. There’s the rub.
Probability is defined on a sigma field. But what exactly is a sigma field? In simpler terms, it represents the foundational premise for probability, though often undeclared.
The examples you provided do not suggest that people believe probability physically exists in nature. For instance, if I say that the true value of 2 + 4 is 6, does that imply I think the number 6 physically exists in nature?
Yes, you “have” no IQ. Did you see how silly this is?
hey doc
Steven Hayward’s substack post on “Pope Bob” https://stevehayward.substack.com/p/pope-bob includes an image and summary of a book on “evaluative criteria and Bayes Theorem” by the now Leo X1V. Obviously I haven’t read it – and most likely won’t – but you might find it very interesting.