Submitted for your approval, yet another paper. On Probability Leakage, posted at Arxiv. Once you, my beloved readers, have had a go with it, I’ll incorporate your comments in an updated version and send it to a peer-reviewed journal, because there is no better guarantor of truth than peer review.
Only a sketch of the paper is given here. Please read it before commenting: it’ll probably save me re-typing what’s already there.
The probability leakage of model M with respect to evidence E is defined. Probability leakage is a kind of model error. It occurs when M implies that events, which are impossible given E, have positive probability. Leakage does not imply model falsification. Models with probability leakage cannot be calibrated empirically. Regression models, which are ubiquitous in statistical practice, often evince probability leakage.
The exact definition is this: we model observables y via some model M, possibly conditional on explanatory data x and indexed on (unobservable) parameters. We can derive a posterior distribution on the parameters given the old data (call it z) and assuming the truth of M. Ordinary (and inordinate) interest settles on the posterior of the parameters.
The posterior, or functions of it, are not observable. We can never know whether the posterior says something useful or nonsensical about the world. But because we assume M and we have seen z, it logically follows that the posterior predictive distribution p(y|x,z,M) exists (this “integrates out” the parameters and says something about data not yet seen). This distribution makes observable statements which can be used to verify M.
Now suppose we know, via some evidence E, that y cannot take values outside some set or interval, such as (ya,yb). This evidence implies Pr(y < ya | E) = Pr(y > yb | E) = 0. But if for some value of x (or none is x is null), that Pr(y < ya | x, z, M) > 0 or that Pr(y > yb | x, z, M) > 0, then we have a probability leakage; at least, with respect to E.
The probabilities from the predictive posterior are still true, but with respect to M, z, and x. They are not true with respect to E if there is leakage. This probability leakage is error, but only if we accept E as true. Leakage is a number between 0 (the ideal) and 1 (the model has no overlap with the reality described by E).
The term falsified is often tossed about, but in a strange and loose way. A rigorous definition is this: if a M says that a certain event cannot happen, then if that event happens the model M is falsified. That is, to be falsified M must say that an event is impossible: not unlikely, but impossible. If M implies that some event is merely unlikely, no matter how small this probability, if this event obtains M is not falsified. If M implies that the probability of some event is ε > 0 then if this event happens, M is not falsified period.
Probability leakage does not necessarily falsify M. If there is incomplete probability leakage, M says certain events have probabilities greater than 0, events which E has says are impossible (have probabilities equal to 0). If E is true, as we assume it is, then the events M said are possible cannot happen. But to have falsification of M, we need the opposite: M had to say that events which obtained were impossible.
Box gave us an aphorism which has been corrupted to (in the oral tradition), “All models are wrong.” We can see that this is false: all models are not wrong, i.e. not all are falsified.
Calibration has three components: probability, exceedance, marginality. All three are proved impossible if there is probability leakage. If M is to be evaluated by a strictly proper scoring rule, the lack of calibration guarantees that better models than M exist.
Statistics as she is practiced—not as we picture her in theoretical perfection—is rife with models exhibiting substantial probability leakage.
Regression is ubiquitous. The regression model M assumes that y is continuous and that uncertainty in y, given some explanatory variables x, is quantified by a normal distribution the parameters of which are represented, usually tacitly, by “flat” (improper) priors. This M has the advantage of mimicking standard frequentist results.
…The logical implication of M is that, for these values of x, there is about a 38% chance for values of y less than 0 (impossible values) at Location A. Even for Location B, there is still a small chance 2% for values of y less than 0 (impossible values).
An objection to the predictive approach is that interest is solely in the posterior; in whether, say, the hypothesis (H) that absentees had an effect on abandonment. But notice that the posterior does not say with what probability absentees had an effect: it instead says if M is true and given z, the probability that the parameter associated with absentees is this-and-such. If M is not true (it is falsified), then the posterior has no bearing on H. In any case, the posterior does not give us Pr(H|z), it gives Pr(H|M,z). We cannot answer whether H is likely without referencing M, and M implies the predictive posterior.
The area to the left of the vertical line represents probability leakage. The “normal model” says our uncertainty in y is characterized by a normal distribution. The “Location A” and “Location B” is from a larger regression model where one regressors is a categorical variable based on one of two locations.