This week starts Part II of the Class: scientific modeling, in all its glories and, alas, failures. Science is model happy, and judges too often the goodess of models by how beautiful they are, not how well the predict Reality.
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Given below; see end of lecture.
Lecture
This is an excerpt from Chapter 9 of Uncertainty.
Statistical & Physical Models
“The only useful function of a statistician is to make predictions, and thus to provide a basis for action.”—W. Edwards Deming.
Statistical models are probability models and physical models are causal or deterministic or mixed causal-deterministic-probability models applied to observable propositions. It is observations which turn probability into statistics. Statistical and physical models are thus verifiable, and all use statistics in their verification. Statistics are summaries of or are simple observations themselves; statistics are observed propositions (or mathematical functions of them).
Classical frequentist statistical modeling emphasizes hypothesis or “significance” testing and estimation. Testing flows from the desire to know and to decide whether a proposition is true or false. And estimation comes from the wish to know how much or how strong some signal or force is. These are the correct aspirations, so it’s surprising that not only is neither goal met using classical approaches, but that everybody thinks they are. Physical models usually are on the side of angels here, engaging reality often and then, as is proper, mercilessly culling models which don’t match observations. Politics, however, in both statistics and physics, has been known to save unrealistic models.
Frequentist hypothesis testing and estimation are parameter-centric. And so is classical Bayesian prior-posterior analysis. These methods tell us about things inside models, but they are silent on what’s going on outside; that is, they are mute about reality. It is all but forgotten that we can be as certain as we like about what’s happening inside a model while remaining largely ignorant of what is happening in reality. To assume, as nearly everybody does, that the certainty which applies to a model’s guts applies equally to reality results in gross, systematic, and, at times, ridiculous over-certainty. Entire fields make their living based on misunderstandings of classic statistical theory—in both its frequentist and Bayesian forms.
However careful, say, an academic statistician is and however diligently he admonishes his readers not to over-interpret results, it is true and confirmed endlessly that his students will over-interpret and misbehave shockingly. Why? Because classical methods do not answer questions ordinary people ask, and nobody can remember the arcane and esoteric interpretations given to the questions it can answer. If you doubt this, ask any, say, sociologist for the interpretation of a hypothesis test or confidence interval. It won’t even be close to correct. Or, better, look to journals which use statistics routinely. Over-certainty abounds. The blame for this situation ultimately rests on those who invent and promulgate theories, or, in other words, us. This is because the misuses of statistics are so egregious and pervasive that the reaction from the top should have long ago been abject horror, instead of the complacency we see. It’s well past the time for a fundamental change in practice.
If we’re doing it wrong, how do we do it right? The answer is obvious: answer questions people ask. Look outside our (mostly ad hoc and in many cases dubious) models and speak of reality. We must come to a proper understanding of causality in probability models. And, as will become clear, we must guard against the we-must-do-something fallacy, a disease which largely effects academics anxious to produce research. In this and the next Chapter I’ll often use the shorthand “statistical model”, which applies to any kind of model of observables, since statistics are the main way to judge them.
The Idea
This section is addressed primarily to those who use probability and statistics but who had no part in the development of its methods; developers can skip ahead to the next Section.
The idea is this: (1) Look, (2) Don’t model. Very many times, simple summaries and plots of data are superior to models. But if you’re going to model, follow Deming’s advice: make predictions. Predictions are verifiable.
Misunderstanding statistics is causing much harm. The failings of the current methods of practicing statistics are known but largely unheeded,. Arguments by critics are now so common that I do not reproduce any but the most relevant here. Significance chasing, a.k.a. wee p-value hunting, is parameter- and hypothesis-centric, which inverses the normal order of scientific questioning and usually involves unobservable entities. The solution to the problem is not a simple replacement of frequentist with Bayesian prior-posterior distribution analysis. Both classical frequentist and Bayesian methods do not allow the assessment of model performance and usefulness. Both answer questions nobody but mathematicians ask, questions which are almost always irrelevant for real decisions.
What are relevant questions? You go to the doctor and he recommends a new pill. What do you want to know? The sensible thing: what are the chances this pill cures your disease. Statistics as currently designed stubbornly won’t answer that question. It’s worse than it sounds, because the doctor is basing his recommendation on studies which, for instance, compared this new pill against an old one, studies which pronounced whether observed differences in the pills’ effects were “significant”.
What did the researcher designing the study on which the doctor relied want to know? The sensible thing: what are the chances more people get better taking the new pill rather than the old. Again, statistics won’t answer that. Statistics will only announce whether the results were “significant”. Suppose they were. What happens next? Everybody wrongly falsely incorrectly mistakenly and without warrant assumes that significance is equivalent to knowledge that the new pill is better than the old, or that all the differences in the observations were caused by the new pill. “Significance” is the criterion of evidence the FDA, EPA, and every other rule-making body uses. This is a dismal state of affairs.
Here is how statistical modeling should, but does not, work: Compile evidence probative of some proposition of interest, and then calculate the probability this proposition is true given this evidence.
That’s all of statistics in twenty words. How is this done? Simplest example in the world. Proposition of interest, “This coin comes up head in a flip.” What evidence is probative? Well, the coin is two-sided, one side labeled heads the other tails, which when flipped (in such-and-such a manner) must show one of these two. The probability the proposition is true given this evidence is 1/2, as expected. But we’re not restricted to this evidence. A physicist might step up and measure the coin’s spin, the force of the flip, and so on. Using that evidence, he can come to a different probability the coin lands heads. Indeed, and as has been done, with precise enough measurements, the physicist can predict with something close to certainty what will happen.
Or we might have a two-valued “coin” or coin-like object the physics of which are unknown or where the knowledge of its properties is limited. We might compile the results from a sequence of “experiments”, the causal nature of which is known in varying degrees. From this, and from the starting knowledge that we have a two-valued “process”, like the coin flip we can deduce the probability the process is in one of its states in this next experiment or in the next $n$ experiments or whatever. We can form any observable proposition we like about to-be observed process. And then we can check whether our model “works” in the sense of giving useful probabilities.
In coin flips, the difference between, say, a referee on the field and a physicist is that they have different information; they have different models of the situation, as it were—as it is. Different models give different probabilities. This is not a profound statement, yet all evidence suggests remembering it is monumentally difficult.
The trial for the new pill—or for any other situation—works in the same way. Gather evidence relevant to the question at hand, this evidence becomes a model from which probabilities are calculated. Now this might happen in mathematically complicated ways, but that is nothing. We don’t need to understand the math to comprehend the answers. ““”The probability of this with respect to that” makes sense to everybody. This is statistics-as-argument. It is a predictive approach.
This form of statistics is actually used, and in more places than you might have guessed. Every time you drive through an automated toll booth the machine that takes a picture of your license plate must take the evidence it has—the picture itself, the characteristics of the kind of images stored, and the like—and calculate the probability the license is “YAC 893” or whatever. It must make a classification (a decision) based on a calculated probability and an understanding of the consequences of the decisions. If you’ve ever made a bet, say on the stock market or a sports team, you’ve used this form of statistics. Here you’re painfully aware that if only you had had better evidence (a better model), you would have formed superior probabilities. And here you understand there is a difference from the probability the team will win and the decision or bet you make.
This reality-based statistics is sometimes called predictive statistics because it predicts what will happen (but this term has multiple meanings, so be careful); there are parallels with so-called objective Bayes. In this scheme, models are checked against the world. Bad models are rejected, good ones cherished. This sounds like science; or how science used to be. Science uses predictive statistics in many fields, usually the closer these are to engineering, but it is also, or used to be primarily, found in physics. The idea is to make testable predictions so that model goodness can be assessed. This is impossible in hypothesis testing or Bayesian parameter prior-posterior analysis.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.
Discover more from William M. Briggs
Subscribe to get the latest posts sent to your email.