Jelle de Jong writes in to ask:
Working as a quant analyst in finance I recently got interested in the Briggsian/Jaynesian/Bayesian interpretation of probability but am still struggling a bit with it. When reading your book/blog I was wondering what you mean when you say the ‘true value of a parameter.’ For a situation where we can imagine a (clearly defined) underlying population (say a population of people of which we have measured some property for only a sample) it’s seems clear what the connection is between the model (parameters) and the data-generating system, but if you would ‘estimate’ a binomial parameter how would you interpret this ‘estimated’ probability? Jaynes writes in his book that estimating a probability is a logical incongruity (Jaynes’ Probability Theory 9.10 on p. 292). Do you interpret the estimated parameter as an property of the (hypothetical) underlying distribution (i.e. the fraction of successes in an infinite sample) that can be estimated with corresponding quantification of uncertainty? But as this is just a model, in what sense can we speak of a true value of this parameter (The only truth is that the process will generate a number of successes). Can we give the estimated binomial parameter such a physical interpretation or is it only possible to assign a success probability, but then it would be incoherent to assign a distribution to this estimated probability.
I hope you can take the time to shed some light on my question.
With Kind Regards,
I’d start by putting my name last, in smaller font, and in parentheses, and then prefixing Laplace, Keynes (yes, that one), and especially David Stove who all took a logical view of probability. Historically, this turned out to be wise because the logical view is the correct one.
Consider the evidence—assumed as true—that E = “We have a six-sided object which when tossed shows and one side and just one side is labeled 6.” Given this evidence, I deduce the following:
Pr( ‘6 shows’ | E ) = p = 1/6.
Let’s add to our evidence by saying we’re going to A = “toss this six-sided object n times.” Then we can ask questions like this (to abuse, as they say, the notation):
Pr( ‘k 6’s show’ | E & A) = binomial(n,p,k)
where we again have deduced what the probabilities are. The ‘n’, ‘k’, and ‘p’ are all parameters of the binomial; and they are the true values, too. They follow from assuming as true E and A and by assuming we’re interested in k ‘successes’, i.e. k 6’s showing. And this is not the only time where we can deduce the value of a parameter, i.e. have complete knowledge of it; many situations are similar.
Now suppose instead we observe a game in which a ball is tossed into a box the bottom of which has holes, only some of which are colored blue. The box is a carnival game, say. We want to know, given all this information which we’ll label F, this:
Pr( ‘ball falls in blue hole’ | F ) = θ
From just F the only thing we can deduce is that 0 < θ < 1: θ isn’t 0 because some of the holes are blue, and it isn’t 1 because we know that not all holes are blue; beyond that, F tells us nothing. The point to emphasize is that we have deduced the true value (in this case values) of the parameter, which is 0 < θ < 1. (Actually, we do know more; we know the number of holes are finite, and this is actually a lot of information; however, for the sake of this post, we’ll ignore that information: but see this paper which works out this entire point rigorously.)
If we add to F another “A”, and consider n tosses of the ball, we deduce this:
Pr( ‘k blue holes’ | F & A) = binomial(n,θ,k)
where again 0 < θ < 1. We have complete knowledge of two parameters, n and k, but θ remains (mostly) a mystery.
And here we must stop unless we gather more evidence. We can make this evidence up (why not? we’ve done so thus far) or we can add evidence in the form of observational propositions: “On the first toss, the hole was not blue,” “On the second toss, the hole was blue,” and so on.
Given F and A and this new observational evidence we can call “X” (where the number of tosses in X are finite), we can deduce:
Pr( θ = t | F & A & X) = something(t)
for every possible value of t (where we have already deduced t can only live between 0 and 1; the value of ‘something’ relies on t). Very well, but this only gives us information about θ, which is only of obscure interest. It says nothing, for instance, about how many balls will go into blue holes, or the probability they will fall into blue holes. It’s just some parameter which assumes F, A, and X are true.
To get the probability of actual balls going into k actual (new) holes, we’d have to take our binomial(n,θ,k) model and hit it with Pr( θ = t | F & A & X), which you can think of a weighted average of the binomial for every possible value of θ Mathematically, we say we integrate out θ because the result of this operation is
Pr( ‘k new blue holes’ | F & A & X & n new tosses) = something(k)
where you can see there is no more (unobservable) θ and where the ‘something’ relies on k. This works even if n = k = 1 (new tosses).
It’s not useful to speak of θ as the “probability” of a ball going through a blue hole: that last equation gives that, and there is no θ in it.
Now, all statistics problems where new data is expected can and should be done in this manner. Almost none are, though.
Hope this helps!