Laplace infamously calculated the probability, given his evidence, the sun would rise in the east tomorrow at (something like) 0.9999995. Many objected, saying it’s much higher. Who was right? Laplace! WARNING for those reading the email version! The text below might appear to be gibberish. If so, it means the LaTeX did not render in the emails. I’m working on this. Meanwhile, please click on the headline and read the post on the site itself. Thank you.
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Given below; see end of lecture.
Lecture
This is an excerpt from Chapter 8 of Uncertainty.
There’s more to this example. Suppose we’ve taken our sample, $n_0$ and $n_1$, and want to know the possibilities for the remaining number of successes $d$. This is $d=N\theta – n_1$, so $N\theta = d+n_1$ ($N$ is finite here). Then
$$
\Pr(D=d | n,n_1,N, \mbox{E}) \propto \frac{1}{n+1}\frac{{n_1+d \choose d}{N-n_1-d \choose N-n-d}}{{N+1 \choose n+1}}.
$$
This is an unnormalized form of the negative hypergeometric, which is an alternate name for the beta-binomial distribution. Before we have removed any of the balls from the urn, we can compute the prior predictive distribution of $n_1$ in the initial sample. This quantity is critical since it forms the normalization constant of the posterior distribution in eq. [for $\Pr(\theta = j/N | n,n_1,N, \mbox{E})$]. To derive this we simply sum over the possible values of $\theta$ in expression [$\Pr(n_1 = j | n,\theta,N, \mbox{E})$] which is equivalent to summing a beta-binomial mass function over its range. The result is
$$
\Pr(n_1=j|n,N) = 1/(n+1), \qquad j \in {0, \ldots n}.
$$
If we reach in and grab out just one ball, the chance that it is a 1 or 0 is 1/2, no matter what $N$ is. This result is well known and forms the basis of Laplace’s rule of succession. Furthermore, if we grab $n>1$ balls, the result says that $n_1$ is equally likely to be any result in ${0,1,\ldots,n }$, and where knowledge of $N$ is also irrelevant. This is intuitive, since we began with all proportions $\theta$ being equally likely a priori and have not yet collected data that suggests otherwise.
Given the original sample $A$ whose size is now denoted by $n_a$, with $n_{1a}$ the number of successes and $n_{0a}$ the number of failures, let a new sample of size $n_b\le N-n_a$ be collected. We want to know the distribution of successes, $n_{1b}$, in this new sample. It is known that $0 \le n_{1b} <N\theta-n_a$. Given $d$, the distribution of $n_{1b}$ is clearly hypergeometric.
This allows us to compute the posterior predictive distribution on $n_{1b}$. This turns out to be:
$$
\Pr(n_{1b}|n_b,n_{1a},n_{a},N,\mbox{E}) = {n_b \choose n_{1b}}\frac{\beta(n_{1a}+n_{1b}+1,n_{0a}+n_{0b}+1)}{\beta(n_{1a}+1,n_{0a}+1)}
$$
which is a beta-binomial distribution with parameters $(n_b,n_{a1}+1,n_{a0}+1)$. Knowledge of $N$ is irrelevant here, too, except in the weak sense the total sample $n_a+n_b\le N$ . Also, $\theta$ does not appear. This quantity is far more interesting that anything we have to say about $\theta$. It conforms to the true goal of statistical modeling, which is to say sensible things about that which can be measured. Interestingly, this is the same answer one gets starting from a “flat” prior on a continuous $\theta$, integrating out the uncertainty in $\theta$ and forming the regular posterior predictive distribution.
We did not start with any parameters and we did not end with any. Parameters aren’t needed, as promised.
Laplace’s rule of succession follows the same course, although he stated it differently. Laplace derived the probability of the next observation in some ill-defined process being a success given we have seen $n_{1a}$ successes in the first $n$ instances. The well known answer is
$$
\Pr(x_{n+1} = 1|n_{1a},n,\mbox{fuzzy}) = \frac{n_{1a}+1}{n+2}
$$
where the “fuzzy” indicates Laplace was not quite clear about his premises. It is easy to check that this is equivalent to [$\Pr(n_{1b}|n_b,n_{1a},n_{a},N,\mbox{E})$]. That means the fuzziness is replaced by our firm premises E. If Laplace had started with E, all would have been well. But he chose an unfortunate example, that of the sun rising. He used the rule of succession to calculate the probability that the sun will rise tomorrow, given that it has risen every day for the past however many years. Call it 6,000 years. That makes $n =2,191,500$ which makes the probability about 0.9999995.
Large, but many objected saying it surely ought to be higher because we know a lot more about sunrises than Laplace admits in his formula. This is true. But it is also beside the point. Laplace’s formula is the right answer to a different question, that’s all. It is a deduced probability given the premises fixed in E. That we apply these premises to a sunrise is our (and Laplace’s) mistake. It doesn’t make the formula wrong.
Jaynes has an example similar to this, but for the normal, showing the derivation for the normal distribution from premises starting with something like “The error in the measurement can be any value, positive or negative.” But a hidden, or rather tacit, premise derivable from that is that the measurement can be continuous, that it has infinite gradations, which is always impossible in practice. Instead, a superior working premise is that that error in measurement can be one of only a set of values, where the values are specified by the apparatus at hand. If this set is allowed to go to the limit, then it is likely (I haven’t made the calculations) the normal would result. But if the set is fixed by the apparatus, we don’t need to go to the limit. The resulting predictive distribution will be parameter-free, a fully deduced model, just as above (it will resemble, I am guessing, a multinomial in the limit, or something like it in finite form). As I said above, this kind of thing is ripe with research topics.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.
Discover more from William M. Briggs
Subscribe to get the latest posts sent to your email.
Since you used (8.7) at the beginning of the video, let me try again.
A calculus fact: If f(N) approaches 0 as N approaches infinity and g(N) is bounded, then the limit of f(N)*g(N) is 0 as N approaches infinity.
Let f(N)= 1/(N+1) and g(N) be the hypergeometric probability function on the left-hand side of the equation in (8.7). Therefore, the limit in equation (8.7) is 0.