I apologize for the abruptness of the notation. It will be understandable only to a few. I don’t like to use it without sufficient background because the risk of reification is enormous and dangerous. But if I did the build up (as we’re doing in the Evidence thread), I’d risk a revolt. So here is the alternative to p-values—to be used only in those rare cases where probability is quantifiable.
Warning two: for non-mathematical statisticians, the recommendations here won’t make much sense. Sorry for that. But stick around and I’ll do this all over more slowly, starting from the beginning. Start with this thread.
Note in vain attempt to ward off reification: discrete probability, assumed here, is always preferred to continuous, because nothing can be measured to infinite precision, nor can we distinguish infinite gradations in decisions.
where we are interested in the proposition Y = “We see the value y (taken by some thing)” given, or conditioned on, the propositions X1 = “We assume a”, etc., and “other evidence”, which is usually but need not be old values of y and the “Xs”.
The relationship between the Xs and Y, and the old data, is usually specified by a formal probability model itself characterized by unobservable parameters. The number of parameters is typically close to the number of Xs, but could be higher or lower depending on the type of probability model and how much causality is built into it. The “other evidence” incorporates whatever (implicit) evidence suggested the probability model.
P-values are born in frequentist thinking and are usually conditioned on one of these parameters taking a specific value. Bayesian practice at least inverts this to something more sensible, and states the “posterior” probability distribution of the “parameter of interest.”
Problem is, the parameter isn’t of interest. The value of y is. Asking a statistician about the value of y is like asking a crazed engineer what the temperature of the room is and all he will talk about is the factory setting of the bias voltage of some small component in the thermostat.
The goal of the model is to say whether X1 etc. is important in understanding the uncertainty of Y. P-values and posteriors dance around the question. Why not answer it directly? Instead of p-values and posteriors, calculate the probability of y given various values of the Xs. One way is this:
where and are values of X1 that are “sensibly different” (enough that you can make a decision on the difference), and where the values b, c, …, z make sense for the other Xs in the model. Notice the absence of parameters: if they were there once, they are now “integrated out” (actually summed over, since we’re discrete here). They are not “estimated” here because they are of zero interest.
If p1 and p2 are far apart, such that it would alter a decision you would make about y, then X1 is important and can be kept in consideration (in the model). If p1 and p2 are close, and would not cause you to change a decision about y were X1 to move from to , then X1 is not important. Whether it’s dropped from the model is up to you.
No Easy Answers
Gee, that’s a lot of work. “I have to decide about a, b, c and all the rest as well as and , and I have to figure how far apart p1 and p2 are to be ‘far’ apart?” Well, yes. Hey, it was you who put all those other Xs into consideration. If they’re in the model, you have to think about them. All that stuff interacts, or rather affects, your knowledge of y. Tough luck. Easy answers are rare. The problem was that people, using p-values, thought answers were easy.
All this follows from the truth that all probability is conditional. The conditions are the premises or evidence we put there, and the model (if any) that is used. Whether any given probability is “important” depends entirely on what decisions you make based on it. That means a probability can be important to one person and irrelevant to another.
Now it’s easy enough to give recommendations about picking to and all the rest, but I’m frightened to do so, because these can attain mythic status, like the magic number for p-values. If you’re presenting a model’s results for others, you can’t anticipate what decisions they’ll make based on it, so it’s better to present results in as “raw” a fashion as possible.
Why is this method preferred? Decisions made using p-values are fallacious, they, and even Bayesian posteriors, do not answer the questions you really want to know, and, best of all, this method allows you to directly check the usefulness of the model.
P-values and Bayesian posteriors are hit-and-run statistics. They gather evidence, posit a model, then speak (more or less) about some setting of a knob of that model as if that knob were reality. Worst, the model and conclusions reached are never checked using new information. Using this new observable method, as is in use in physics, chemistry, etc. (though they might not know it), allows one to verify the model. And, boy, would that cut down on the rampant over-certainty plaguing science.
Variation On A Theme
Note: another method for the above is:
assuming (the notation changes slightly here) y can take lots of values (like sales, or temperature, etc.). If the probability of seeing larger values of y under is “large” then X1 is important, else not.