These next posts are in the way of being notes to myself.
Logic is the study of the relation between statements. For example, if “All green men are irascible, and Bob is a green man”, then we know that “Bob is irascible” is certainly true. It isn’t true because we measured all green men and found them irascible, for there are no green men; it’s true because syllogisms like this produce valid conclusions.
We know that “there are no green men” because we know that “all observations of men have produced no green ones”, and we know that based on further evidence, extending in a chain to the a priori. As it is not necessary for what follows, this proof is left for another day.
Probability is no different than logic in that it is the study of the relation between statements. Given the premises assumed above, the probability the conclusion is true is 1. Modify the first word in the first premise (“All”) to “Most”, then the probability of the conclusion is less than 1 and greater than 0 (the range depending on the definition of “Most”).
In either case, logic or its generalization probability, we cannot know the status of a conclusion without reference to a specific set of premises. We cannot know the probability of so simple a statement as “This coin will show a head when tossed” without reference to some set of premises—which might include observational statements. Thus, all probability is, just as all logic is, conditional.
This background is necessary to emphasize that we cannot know whether a given model or theory is true—regardless if that model is wholly probabilistic, deterministic, or in between—without reference to some list of premises. In classical (frequentist) statistics, that premise is (eventually) always “Model M is true”, therefore we know with certainty, given that premise, that “Model M is true”. This premise is usually adopted post hoc, in that many models may be tried, but all are discarded except one.
The “p-value” is the probability of getting a larger (in absolute value) ad hoc statistic than the one actually observed given the premises (1) the observed data, (2) a statement about a subset of the parameter space, and most importantly (3) the model, which is assumed true. If the model is false, the p-value still makes sense because, like our green men, it only assumes the model is true. “Making sense” is not to be confused with being useful as a decision tool. It makes sense in just the same way our green men argument makes sense, but it has no bearing on any real-world decision.
Importantly, despite perpetual confusion, the p-value says nothing about whether (3) the model is true; nor does it saying anything about (2) whether the statement about the subset of the parameter space is true. The theory or model is always assumed to be true: not just likely, but certain.
I leave aside here the argument that a theory leads to a unique model: my claim is that the two words are synonymous. Whether or not this is so, a model is a unique, fixed construct (e.g., every addition or deletion of a regressor in a regression is a new theory/model). The ad hoc statistic or hypothesis test of frequentist statistics forms part of the theory/model (in this way, there are always two theories under frequentist contention, with one being accepted as true, the other false).
In Bayesian statistics, there is a natural apparatus for assessing the truth of a model. There is always the element of post hoc model selection in practice, but I’ll assume purity for this discussion. If we begin with the premise, “Models M1, M2, …, Mk are available”, and joined it with “Just one model is labeled Mi“, then the prior probability “Model Mi is true” given these premises is 1/k. It is important to understand that if the premise were merely “Model Mi is either true or false”, then the probability “Model Mi is true” is greater than 0 and less than 1, and that is all we can say. This makes sense (and it is different from the frequentist assertion/premise that “Model Mi is true”) because again all logic/probability is concerned with the connections between statements, not the statements themselves (this is the major mistake made in frequentism).
That last assertion means that the list of models under contention is always decided externally; that is, by premises which are unrelated to whether the models are true, or even good or useful. There might be some premise which says, “Given our previous knowledge of the subject at hand, these models are likely true”; that premise might go on to assign prior probabilities different than 1/k for each model under consideration. But it is of the utmost importance to understand that it is we who close the universe on acceptable models. In practice, this universe is always finite: that is, even though we can make statements about them, we can never consider an infinity of models.
In Part II, model selection and what falsifiability is.