Just a very crude sketch today: it is not complete by any stretch. Naturally, the students in the summer class don’t receive this level of information.
Best we can tell, the universe is, at base, discrete. That is, space comes to us in packets, chunks of a definite size, roughly 10-35 meters on a side. You may think of quantum mechanics; quantum, after all, means discrete.
Now, even if this isn’t so; that is, even if the universe proves to exist as an infinitely divisible continuum, it will still be true that we cannot measure it except discreetly.
Take, for example, a physician reading blood pressure with an ordinary sphygmomanometer, the cuff with a pump and the small analog dial. At best, a physician can reliably, at a glance, gauge blood pressure to within 1 millimeter of mercury. Even digital versions of this instrument fare little better.
But, of course, these instruments can improve. The readout can continue to add decimal places as the apparatus better discerns the amount of mercury forced through a tube, even to the point—but no further—than counting individual molecules. Fractional or continuous molecules aren’t in it.
Further, every measurement is also constrained by certain bounds, which are a function of the instrument itself and the milieu in which it is employed. That is, actual measurements do not, and can not, shoot off to infinity (in either direction).
Every measurement we take is the same. This means that when we are interested in some observable, particularly in quantifying the uncertainty of this observable, we know that it can take only one value out of a set of values. That is, the observable can only take one value at a time.
I am considering what is called a “univariate” observable; also called a point measurement. It doesn’t matter if the observable is “multivariate”, also called a vector measurement. If a vector, then each element in the vector can take only one out of a set of values at any one time.
We also know that any set of measurements we take is finite. Finite can be very large, of course, but large is always short of infinite. We might not know, and often do not know, how many measurements we can take of any observable, but we always know that this count will be finite.
The situation of measuring any observable at discrete levels a finite number of times is exactly like the following situation: a bag contains N objects, some of which may be labeled 1 and the others something else. That is, any object may be a 1 or it may not be. That statement is a tautology; and based on the very limited information in it, we can tell is that an object with a 1 on it is possible.
In this bag, then, there can be no objects with a 1 on it, 1 such object, 2 such objects, and so on up to all N objects. We want the probability that no objects have a 1, just one does, and so on. Through the theorem of the symmetry of individual constants (which we can prove another day), it is easy to show that the probability of any particular outcome is 1 / (N + 1), because there are N + 1 possible outcomes.
This is, of course, the uniform distribution, in line with what people usually call an “ignorance” or “flat” prior. But it is not a prior in the usual sense. It is different because there are no parameters here, only observables. This small fact becomes the fundamental basis of the marriage of finite measurement with probability.
Suppose we take a few—something less than N—objects from the bag and note their labels. Some, none, or all of these objects will have a 1. Importantly, the number of 1s we saw in our sample give us some information about the possible values of the rest of the objects left in the bag.
No matter the value of N, we can work out the probability that no remaining objects are labeled 1, that just one is, and so on. Again, no parameters are needed. We are still talking about observables and observables only.
We can continue this process by removing more, but not yet all, objects from the bag. This gives us updated information, which we can use to update the probability that no objects remaining are labeled 1, that just one is, and so on. (For those who know, this is a hypergeometric distribution.)
Once more, we still have no need of parameters; we still talk of observables. This assumed we knew N, and that N was finite. But if we do not know N, but do know it is “large”, we can take it to the limit, and then use the resulting probabilities as approximations to the true ones. (This limit is the binomial). The limiting distribution then speaks of parameters—it is important to understand that they only arise because of the limiting (approximating) operation.
Well, you might have the idea. If we do not know N, and cannot say it is “large”, we can apply the same logic to its value as we did to the labels. Point is, all of probability can fit into a scheme where no parameters are ever needed, where everything starts with the simplest assumptions, and ends quantifying uncertainty in only what can be measured.