Forget Priors!

Forget Priors!

From Bill Raynor:

Hello Matt,

You like to discuss P(Y|M) a lot, but haven’t spent much time talking about the practical construction of that.

A topic I’d like to see: a constructive development of P(Y|X) for some real problem involving real objects…Every time I’ve chatted with a Bayesian about priors, I get a lot of handwaving and mathematical idealization (hello, mathematical Platonism) but very little in the way of real examples. The resulting math is very pretty and elegant, yadda, yadda. The part where I ask “what does that physically imply…” gets rather vague, quickly….

I have in mind something rather Kolomogorovian:

1. define a finite reference set of objects and measurements on those objects (e.g. body weights usually have a body attached…) You can use a finite sampling frame if you wish.
2. define a set of observable propositions on those objects and measurements (mutually exclusive and exhaustive, no infinities or absolute continuity, etc.) including the means of measurement. (e.g. the Brewers beat the Yankees, the mean difference between two (blocked) partitions of objects.) If it involves means, show how object/measurements really are additive — e.g. weights of grain from a field plot.
3. Assigning an additive measure and hence probability to that.

There are innumerable practical uses of Pr(Y|X). The worn stock examples were made for this. Let X = ‘This interocitor must take one of n states’, then if Y = ‘This interocitor is in state s = i’, we have Pr(Y|X) = 1/n.

This is as practical as it gets. Casinos use it; everybody does. And without priors! Most probability is done in this uncomplicated way. Pure probability, no formal models, no parameters.

Now let’s do your Kolomogorovian scheme to see it’s exactly the same thing. Take two blocked objects, A and B, which have finite discrete measurements taken on them, as all measurements are (all measurements are discrete and finite).

Next, since you’re interested in the means, or presumably some function of the means, we can compute A’ and B’, which are the means (also discrete and finite). We want Y = f(A’,B’).

Finally, we gather X, evidence probative about Y, call it some additive measure if you like, and compute Pr(Y|X).

Done!

See, I told you it was the same.

You can make it more complicated, but that doesn’t change the end result. Make it complicated by making X into math.

It could be, and it would be ideal, that you can deduce from X the probability of Y, as we did with the interocitor. The strategy is to consider the nature of the measurements. I give some examples in Uncertainty. Mostly folks are too anxious for the deduction, which won’t be simple, and start cramming ad hoc models into X.

Any number of models are used, most of them continuous approximations. There could be one model for A, another for B, which then implies some sort of model for A’ and B’, which in turn implies a model for Y.

All these models are usually parameterized. These parameters, being part of the models, are just another part of X. The models of the parameters, the priors, are also part of X.

Any past observations, if any, are also part of X. In the end, still Pr(Y|X).

“Why no specifics, Briggs. We want concrete examples.”

Did you try the many examples in the class? Lots of common cases. There’s no general solution, but many, many.

“Not yet. I saw them, but didn’t try. Too busy. What really worries me are the priors. They’re so much nonsense.”

Sure, like the ad hoc models themselves in which the priors are related. But they might, if you do it right, be reasonable nonsense, good approximations.

Here’s the idea. Start with a model for A and B, or start with A’ or B’, or even start with Y. It makes no difference. In the end we still get Pr(Y|X). If there are parameterized models, then if you want to try different parameters or different models with their different parameters, then you have X_1, X_2, X_3, …, each of these the full X for the choice of model and parameter uncertainty; including, of course, any past observations, and other evidence that went into the suggesting the models used.

Which is “best” depends on the uses to which you put the model, the decisions you make with it. X_17 might be great for you, lousy for me; maybe I prefer X_2. Who knows?

None of these are the true model, the cause of Y. If we knew the cause, we wouldn’t be worried about all this other nonsense. And the model isn’t true because we haven’t deduced it like we did with the interocitor.

“This is not a satisfying answer.”

Is it not? It is the true answer. It’s complete. There was no point going on about some specific example (which are in the class anyway). The idea is what counts.

To support this site and its wholly independent host using credit card or PayPal (in any amount) click here

13 Comments

  1. Bill_R

    Matt,

    details, details, details….

    Finally, we gather X, evidence probative about Y, call it some additive measure if you like, and compute Pr(Y|X).

    I was hoping for an example with specifics. Specifics about the set of Y’s and the X’s followed by specifics on the construction of P(Y|X) for all (discrete, finite) Y. I’m already familiar with the “assume a spherical unicorn” approach and have used it on many occasions.

    As an example, you could consider the R.A.F. example of paired pots of plants in Chapter 21 of Statistical Methods For Research Workers (p.44ff). very finite, discrete measurements, additive and so on. He shows how to construct a randomization distribution for a mean of differences. How would you do it in a practical case?

  2. I’m sorry, still not enough concrete examples for me to know what you’re specifically talking about and how it is claimed to have an advantage over standard methods.

    Justin

  3. Yonason

    “assume a spherical unicorn” – Bill_R

    One spherical unicorn coming up.
    https://youtu.be/nQlF-dpU5lw

    Would you like fries with that?

    More seriously, I’m with both Bill and Justin above asking for more specifics. Are there any? Or is the reason we are referred to Interociters because there aren’t?

  4. Bill_R

    Yonason,
    Good one! I’ll skip the fries, though. Do spherical unicorns eat keto fries?

    Matt, is this a sufficient outcry from the masses? The sheep look up…

  5. Briggs

    All,

    If you guys have already seen the class examples, then it could be fun to provide links to ready-to-use data, with code for traditional models, if available.

  6. Yonason

    I’m coming to the conclusion that statisticians are to math what lawyers are to the general population.
    ====================================
    RE – A case from Briggs’ excellent book.

    3 balls in a bag. What are the odds you have all the same color? It depends on how you got there.

    (A) – 3B and 3W in an urn. Remove 3, one at a time, and insert in bag. Chance of all 3B = 1/20.
    (B) – 6B and 6W in urn. Remove 3 as in A. Chance of all 3B = 1/11
    (C) – general case for XB and XW is P=(1/4)[(X-2)/(2X-1)], which for large X approaches all 3B = 1/8.

    Other scenarios (models?) can be devised which result in different probabilities. I don’t know what Bill and Justin are looking for, but I hope that short illustration gives an idea of the kind of e.g., I need to be able to begin to see how the most general abstract case can be applied to a potentially real scenario.

  7. Yonason

    P.S. – the e.g., in my last was of my own devising. It was not provided in the book. Sorry if that wasn’t clear.


  8. 3 balls in a bag. What are the odds you have all the same color? It depends on how you got there.

    (A) – 3B and 3W in an urn. Remove 3, one at a time, and insert in bag. Chance of all 3B = 1/20.

    (3/6)x(2/5)x(1/4)=1/20
    Yes.

    But Yonason that’s the point it depends on sample space, reference set, X, whatever you want to call it, these examples actually support frequentism and parameters the standard methods.
    I have several books on urn theory, for example.
    You’d need to show an example that Briggs can solve only, or better, using his method.

    Justin

  9. Briggs

    Justin,

    Frequentism is not needed. Just pure probability, with deductions based on whatever assumptions you bring. Such as numbers, colors, whatever.

  10. Yonason

    “You’d need to show an example that Briggs can solve only, or better, using his method.” – Justin

    I would be happy if he gave me detailed enough examples of ANY model/method, hopefully contrasting their effectiveness and application, with real world examples (not interociters!). One thing he does in the book is to say of the case of one black and two white (BWW) that there are 3 ways of getting it (i.e., permutations) [(BWW), (WBW), (WWB)]. However, in the model I have used, permutations are irrelevant. You don’t need them to get a probability. And that is one thing I want to understand. WHEN do you apply permutations, and when not. For my case…
    (B,W,W) – [(X/2X)*(X/(2X-1))*((X-1)/(2X-2))]
    (W,B,W) – [(X/2X)*((X/(2X-1))*((X-1)/(2X-2))]
    (W,W,B) – [(X/2X)*((X-1)/(2X-1))*(X/(2X-2))]

    All give the same answer. permutations are irrelevant here, though I know that in some cases it is essential to use permutations to calculate a correct result (energy states in quantum chemistry, for e.g.).

    So, the problem for me isn’t that Briggs is wrong, he’s not, only that the information he gives is so sparse that only a statistician can decode it. My request is that he descend from his Mt. Olympus and impart a bit of wisdom to us commoners, i.e., speaka da english, palease, and thank you.

  11. Yonason

    “Frequentism is not needed. Just pure probability, with deductions based on whatever assumptions you bring. Such as numbers, colors, whatever.” – Briggs

    Smoke and mirrors?

    But, from my e.g., I got a probability from between 1/20 to 1/8 for all same color, depending on model and conditions. …and your pure probability for 3 the same color yielded only 1/8, regardless of how it was arrived at.

    Another scenario…
    After reading about 3 balls in a bag in your excellent book, the desire to posses them becomes so wildly popular that many companies incorporate to meet the demand. Here’s how one company rises to the challenge.

    “Balls in Bags” hires Chuck, Duane, Bob and Alice. Chuck fills bags with 3 black balls. Duane with 3 white; Bob with 1 black and 2 white; Alice with 1 white and 2 black. They all fill bags at the same rate. The probability of any configuration is then 1/4, regardless of the mix.

    One day, Bob is out due to the flu, and so Alice has to sub for him. But even though she produces as many bags as before, only half will be configured as she or Bob had before. Now the probability of all same color is 1/3 for same color, and 1/6 for either (b,w,w) or (w,b,b).

    Of course it depends on what the most likely method of filling the bags is, and the extent to which any model applies, but I see no way of generalizing from what Briggs says in the book (or here) to be able to arrive at an understanding of what might happen in the real world.

    If I could do it myself, I wouldn’t need to be reading here. I don’t need to be made to feel inadequate because I can’t do it, especially since I’ve been given the impression that if I read this material I’ll somehow magically be able to. Not gonna happen without some concrete examples, though.

    What a disappointment when the thrill is gone…
    https://www.youtube.com/watch?v=CzUgX-HB9tA
    …when I ain’t never had it to begin with.

  12. Bill_R

    Yonasen,
    Agreeing with Justin on this. The method specifies the reference set and the weights which in turn defines the probability. I call it a reference set as it can be derived from a sampling framework, a permutation or randomization framework (my favorite), a prior distribution, etc. etc. If you don’t define the method, then the probability can be ill-defined.

    Yes, we can be like lawyers sometimes. I was asking Matt to demonstrate how he does it, in a purely logical fashion (without reference to sampling distributions or permutation/randomization distributions) , for practical problems.

    Bayesian approaches can be handy if you have prior data to define stuff, e.g. empirical bayes and shrinkage estimates.

  13. Yonason

    @Bill_R

    Thanks Bill. I see, as I thought might be the case, that we were posing different questions. As you can see, my concerns are more pedestrian than yours. Not being able to fly, yet, I need a scaffolding to be able to ascend for a more panoramic view. In the mean time, I just wanted to give my current view from ground level. Hard to tell the junk from the items of value from this vantage point, without some assistance.

    So, basically, you were asking a more advanced question. That still doesn’t help me, but thanks for giving me a straight answer.

Leave a Reply

Your email address will not be published. Required fields are marked *