I am always struggling with (my limited ability of) finding ways to describe the philosophy behind logical probability, especially to people who have a difficult time unlearning classical frequentist theory. This post is more for me to test a sketch of an explanation than to be complete explication of that theory. I am writing to those who already know statistics.
If a theory—or hypothesis, argument, whatever—cannot be deduced it remains uncertain. Formally or informally, probability is used to quantify this uncertainty.
Consider your trial for murder. Your guilt must be established in the collective mind of the jury “beyond reasonable doubt.” That phrase acknowledges that the certainty of your guilt or innocence is unattainable—in their minds—but its probability can still be established.
Incidentally, whether the phrase “beyond reasonable doubt” is a historical accident or the result of careful logical reasoning is irrelevant. It’s common meaning is enough for us.
Through an obviously informal process, each jury member begins the trial with an idea of your guilt, which is modified as each new piece of evidence arises, and through discussions with other jury numbers. There are mathematical ways to model this process, but at best these models are crude idealizations.
Bayes’s formula can illustrate how jurors update their evidential probabilities, but since nobody—even the jurors—knows how to verbalize how each piece of evidence relates to the probability of guilt, these models aren’t much help.
A probability model can be written in the form of a statement: “Given this background evidence, my uncertainty in this observable is quantified by this equation.”
The background evidence is crucial: it is always there, usually (unfortunately!) tacitly. All statements made, no matter how far downstream in analysis, are conditional on this evidence.
You possess different background evidence than does the jury. Your evidence allows you to state “I did it” or “I did not do it.” In other words, this special case allows you—and only you—to say, “Given my experience, the probability that I did the deed is zero”; or one, as the case might be.
Deduction then, is a trivial form of probability model.
The observable is in this case is the crime: in statement form “You committed the murder.” The truth of that observable is not knowable with certainty to the jurors given any probability model.
The probability model itself is where the confusion comes in. It cannot exist for this, or for any, unique situation.
A classical model a statistician might incorrectly use to analyze the jury’s collective mind is “Their uncertainty in your guilt is quantified by a Bernoulli distribution.” This model a simple mathematical form, which is this: θ, where 0 < θ < 1. Notice that those bounds are strict. People mistakenly call the parameter θ of the Bernoulli "the probability" (of your guilt). It is not---unless the parameter θ has been deduced and equal to a precise number (not a range). If we do not know that value of θ, then it is just a parameter. In itself, it means nothing.
The probability (of your guilt) can be found by accounting for the uncertainty in the parameter. This is accomplished by integrating out the uncertainty in θ—essentially, the probability (of guilt) is a weighted average of possible values of θ given the evidence and background information.
But just think: we do not need θ. We have a unique situation—your trial!—and the jury, not you, ponder your probability of guilt; they certainly do not invoke an unobservable parameter. The trial evidence modifies actual probabilities, not parameters.
Now, if we wanted to analyze specific kinds of trials—note the plural: that “s” changes all—then and only then, and as long as we can be exact in the kinds of trials we mean, we can model trial outcomes.
This model is useless for outcomes of trials we have already observed. And why? Because the evidence we have is the outcomes of those trials—whose outcomes we know! Silly to point out, right? But surprisingly, this easy fact, and its immediate consequences, is often forgotten.
Another way to state this: We only need to model uncertainty for events which are uncertain. We can model your trial, but only assuming it is part of a set of (finite!) trials, the nature of which we have defined. The nature of the set-model tells us little, though, about your trial with your unique evidence.
The key is that there does not exist in the universe a unique definition for kinds of trials. We have to specify the definition in all its particulars. This, of course, becomes that background information we started with. Under which definition—exactly!—does your trial lie?
It is the uncertainty of ascriptions of guilt in those future trials that is of interest, and not of unobservable parameters.
Oh, remind me to tell you how mathematical notation commonly used in probability & statistics interferes with clear thinking.