Solved: The Best Bayesian Prior To Use In Every Situation

Solved: The Best Bayesian Prior To Use In Every Situation

Kevin Gray is back with another question, this time about priors. His last led to the post “Was Fisher Wrong? Whether Or Not Statistical Models Are Needed.” (The answer was yes and no.)

Here’s his new one: “If our choice of priors substantially affects our coefficient estimates, this just means our sample was too small. After 25 years using Bayesian statistics, my answer is…”

Bruno De Finetti, as most of us know, shouted, in bold print and in Boomer all caps, PROBABILITY DOES NOT EXIST (ellipsis original):

PROBABILITY DOES NOT EXIST

The abandonment of superstitious beliefs about the existence of the Phlogiston, the Cosmic Ether, Absolute Space and Time, … or Fairies and Witches was an essential step along the road to scientific thinking. Probability, too, if regarded as something endowed with some kind of objective existence, is no less a misleading misconception, an illusory attempt to exteriorize or materialize our true probabilistic beliefs.

He was exactly perfectly beautifully succinctly correct in this.

True, there were many sad souls who did not believe him, and a scattering of kind folk who nodded along with him. But almost nobody understood him, not then, not now. If people had grasped the full implications of his simple statement, science wouldn’t be in the mess its in now.

Allow me to repeat: probability does not exist. If we knew this, really knew it, then we also would know it makes no sense to speak of “coefficient” or “parameter estimates”. Coefficients, a.k.a. parameters, are probability parameters, and since probability does not exist, neither do parameters, and since parameters do not exist, it makes no sense to speak of “estimating” them.

You cannot estimate what does not exist.

Believing we can, and therefore believing in probability, is what caused us to believe in our models, as if they were causal representations of Reality. This is why causal and semi-causal language about parameters (coefficients) saturates science discourse. It is all wrong.

Probability, like logic, is only a measure of uncertainty in propositions, given a set of assumptions. It is epistemic only. It has no physical existence; yes, I included the quantum world in this.

I haven’t forgot the question about priors, which is answered like this.

Almost everybody picks for their analysis a parameterized probability model. This model will be ad hoc, chosen for convenience or custom, or by some vague hand-waving hope in some central limit theorem, which is mistaken as proof that probability exists (even if this were so, it would only be at infinity, which will never be reached).

Nothing has more effect on the outcome of the analysis than this ad hoc model. Often, even the data is not as important as the ad hoc model. Change the ad hoc model, change the analysis.

Enter the Bayesians, who not only write down an ad hoc model, but realize they must specify other ad hoc models for the uncertainty in the parameters of that model. This is a step in the right direction, a promotion over frequentism, a theory which insists probability exists, and therefore parameters also exist.

Bayesians are almost always frequentists at heart, just as all frequentists cannot help but interpret their analyses as Bayesians. The reasons are that Bayesians are all first trained as frequentists, and frequentist theory is incoherent; rather, it is impossible to use it in a manner coherent with theory. If you doubt, just ask any frequentist how their interpretation of their confidence interval accords with theory.

Being frequentists at heart, Bayesians fret that picking a prior, as your question suggests, is “informative”; that is, its choice affects the answer. It does. So does, and in a larger way, choosing the ad hoc model. But fretting is not done in model choice for some reason.

Anyway, great efforts are spent showing how little influence the priors have. It’s well enough, in a completist sense, and there is some mathematical fun to be had. But it’s beside the point, and doesn’t help answering which is best.

Here is what the best prior is in all circumstances. The one that, given the ad hoc model, makes the best predictions.

That turns out to be the same answer as what makes the best model. Amazing.

Only not so amazing when you consider probability doesn’t exist, and the whole point of modeling is to quantify uncertainty in some observable. The point of modelling isn’t, and shouldn’t be but is, parameter “estimation”. Because you cannot estimate what does not exist.

In other words, and without going into the math which all who want to know already know, specify the ad hoc model and parameters and data, integrate out the parameters, and produce the so-called predictive probability distribution, i.e. the model, the complete whole model, the point of the exercise.

Except in diagnosing model success or failure, ignore the “posteriors” on the parameters. Instead, vary the measure associated with the parameter and see how it changes the probability of the observable. For example, you want to know how changes in X change the probability of Y, then change X and watch the change in the probability of Y. Amazing.

Use the (whole) model to make predictions of observations never before seen or used in any way, and then see how well the (whole) model does against possible competitors (i.e. calculate skill). Either the (whole) model makes useful predictions or it doesn’t.

Simple as that. That’s old school science.

Buy my new book and own your enemies: Everything You Believe Is Wrong. Buy my old book, Uncertainty, to learn more about probability.

Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here

13 Comments

  1. Robin

    SSG Briggs, I get the point of your argument (and agree) but there is a lot to chew on with respect to your description of procedure/method, particularly this part:

    “… specify the ad hoc model and parameters and data, integrate out the parameters, and produce the so-called predictive probability distribution, i.e. the model, the complete whole model, the point of the exercise.”

    Would love to see a simple example of “integrate out the parameters”. Not sure what you mean by this statement – or how it would be applied.

    Thank you again for a very deep and challenging composition. Each post is a learning experience for me.

  2. Bill_R

    Isn’t that called nominalism? (specifically fictionalism?)

    > Fictionalism, on the other hand, is the view that (a) our mathematical sentences and theories do purport to be about abstract mathematical objects, as platonism suggests, but (b) there are no such things as abstract objects, and so (c) our mathematical theories are not true.

  3. Thanks, Matt!

    Ofttimes the outcome is a quantity and the estimated “effect” of changing X on Y can readily be seen from the coefficient estimates (which do exist; let’s not be silly).

    With categorical outcomes, marginal analysis (aka sensitivity analyses) have as you say been routine in applied statistics for decades. Packaged, these are called DSTs (Decision Support Tools), which became the rage in the business world 25 years ago. (Many are of limited utility owing to underlying issues with the data and model, however.)

    In time-series analysis, it is routine to make forecasts for future periods based on various scenarios and then, once the future has arrived, assess our model based on what actually happened.

    (Probability is a mathematical concept and in that sense, of course, does exist.)

    WRT my original question, my answer is…try and see for yourself on multiple (real) data sets with multiple model types.

  4. Rudolph Harrier

    The only way that you can call this point of view nominalism is if you think that reality consists of nothing but probabilities, which only doubles down on the problem.

    A metaphysical realist does not deny the existence of names. That is, suppose that I say that my cat is now named Scruffy McWhiskers, based on a whim. The name has absolutely no bearing on the actual essence of my cat. But it is the name of the cat, though a completely arbitrary one. There is nothing nominalist in saying that the arbitrary name is arbitrary. Nominalism enters the picture when you start saying that calling my cat a cat is an arbitrary name with no basis in reality.

    Similarly saying that probabilities do not have real existence does not mean that nothing has a real existence.

  5. Academic publications such as the Journal of the American Statistical Association and Structural Equation Modeling regularly feature articles on Bayesian statistics.

    A few popular textbooks I can recommend are:

    Doing Bayesian Data Analysis (Kruschke)
    Bayesian Methods (Gill)
    Bayesian Data Analysis (Gelman et al.)
    Statistical Rethinking: A Bayesian Course (McElreath)
    Applied Bayesian Hierarchical Methods (Congdon)

    Many excellent books on specialized topics have also been written, such as:

    Bayesian Econometric Methods (Chan et al.)
    Bayesian Psychometric Modeling (Levy and Mislevy)
    Bayesian Models for Astrophysical Data (Hilbe et al.)
    Bayesian Inference of State Space Models (Triantafyllopoulos)
    Bayesian Reasoning and Machine Learning (Barber)

    Just FYI for those looking to dig more deeply into this subject.

  6. Bill_R

    @Rudolf
    Fictionalism is a form of mathematical nominalism, the position that there are no universal and that math is just making stuff up, a fiction. This would mean no probability, no infinity, no real line or numbers and so on.

    Plato has a nice summary and there’s always Field’s book “Science without numbers” IIRC

  7. Rudolph Harrier

    There are at least three ideas that you are conflating:

    1.) Whether theorems in probability theory are true
    2.) Whether mathematical theorems in general need to describe something “in the real world” to be true. (For example, suppose that the universe is finite, but large, and there are exactly N objects in the real world. Would “N+1” then be an “actual” number? Would it be true to say “The number N+1 is larger than N?”)
    3.) Whether probabilities in particular exist in the real world, in the sense that whenever a probability is measured there is a “random variable” which the real world just gets “instances” of following an ideal “probability distribution.”

    I submit that no one believes point 3, not really. But they conflate it with point 2 and then with point 1 to try to argue for the conclusion that “if probabilities are not a feature of the universe, then probability and statistics cannot have any use since it is all arbitrary.” But this does not follow.

  8. Hagfish Bagpipe

    Briggs, how am I supposed to make my usual stupid comment when I don’t even know what the hell you’re talking about, huh? Think of the midwits, man. I mean, yeah, I get it that this Bayes devil is impeding the frequency of the phlogiston’s travel through the luminiferous aether, and the probability of parameterized priors’ posteriors butts into useful predictions about ad hoc models, sure! — but what about all the nominalist nitwits, huh? Your theory doesn’t account for that.

    Swordfish, back me up, bro.

  9. V

    RW Hamming’s approach to probability in The Art of Probability is novel, I think. He proposes symmetry as fundamental, versus frequency.

  10. Bill_R

    @Rudolf
    I think we are talking past one another. I you look at my first post, I quoted a short definition of fictionalism. de Finnetti, Briggs, and many others have argued that probability does not exist, and hence by the above can not be true or false. The underlying assumption is that truth is a property of the real world. This your point #2. Briggs goes further and paints distributions and their properties with the same brush. (and I agree)

    with regard to #1, theorems can logically follow from their premises, but the truth or falsity in a particular material application depends on how well the assumptions match the physical properties involved. Sometimes they do and other times they don’t.

    I don’t see where the “cannot have any use …” conclusion follows from this. I found probability quite useful for counting measures and finite sets, e.g. I constructed the sets (people/volunteers), the measures, and assigned the treatments.

  11. Uncle Mike

    We’re talking in circles. If probability theory was wrong, then we’d have no casinos. The dice fall as they do consistently (over a large number of trials) with the calculated parameters. I wish I owned a casino. If I did, I probably wouldn’t be chatting here with you folks.

    But probability is not a force of nature. Probability can’t make your lunch or walk your dog. It can’t make atoms spin or planets orbit. It does not drive financial markets.

    When a model makes forecasts or predictions, they either happen or not. If they do, the model was right! If they don’t, the model was wrong. Some models, such as those used by casinos, are right quite a bit, in the aggregate over many, many trials. Other models are right once in awhile but also often wrong. Some models are consistently wrong. Some models make one off predictions that may or may not happen sometime in the future if we wait long enough.

    Testing, awaiting the predicted outcome and comparing it to the model’s prediction, is the only way to know if the model was right or wrong. And if it was right once, that doesn’t mean the model will always be right. Mega testing is best.

    Sensitivity analysis, tweaking the inputs, can reveal some features of the model. But it can’t tell you if the model is right or wrong. The proof of the pudding is in the eating.

    PS — You might, if you’re clever and have a winning way, get somebody to pay you for monkeying with data sets. Lots of folks in that racket. But monkeys don’t write Shakespeare, and probability doesn’t make the sun come up.

  12. Bill_R

    Casinos were profitable before probability theory. The games are close-ended and all the outcomes known (either by frequency or direct enumeration). They get to assign the payouts and the take. So probability theory is just a post-hoc explanation for observables. Similarly for insurance.

    For custom bets (e.g. sports) they play both sides, have a take, and lay off any excess risk. For example, it’s not terribly difficult to “win” at the dog track, it’s the house and state taxes that make it difficult to come out ahead.

Leave a Reply

Your email address will not be published. Required fields are marked *