Warning number two: This is not an incremental change in demonstration, but a fundamental rethinking. So slow down.
All of statistics is this: the art of showing the probabilities of propositions p with respect to evidence q (also propositions). If you’re a fan of equations, and there’s no reason you should be, it’s this:
Once you’ve memorized that, you’ll have mastered most of what you need. The remainder is in this tidbit: the reason equations are not always to be loved is because statistics is showing the probabilities, not quantifying the probabilities. This is essential. Not all, perhaps most, probabilities are quantifiable. But we’re sure good at fooling ourselves into thinking they are. This is where the plague of over-certainty begins. Equations give the idea that numbers are happening here and this usually isn’t true.
Usually isn’t true. Consider q = “Most tweets about politics are querulous” and p = “Tweet 430838597776715778 (an actual tweet about politics, which you haven’t yet seen) is querulous”. The probability of p given this q and only this q is not quantifiable, except in the weak sense that “most” means “more than half but not all”, thus the probability is not a unique number, but the interval greater than 0.5 and less than 1. This “weak sense” interpretation of q is, if it was not obvious, part of q, the baggage that all propositions possess, baggage which includes our understanding of the words and grammar, the context, and so forth (as is true is any logical argument).
Now in assessing the probability of p using this q you can—this is such a simple rule but one which can’t be remembered—only use q. Of course, if you’re a maverick and want to tell people you used q but actually use some q’, well, you’re at least in good (or large) company. Example: q’ is where you plug the tweet into Twitter, learn about p, then judge p with respect to this different knowledge. That’s cheating if you claim to have relied on q alone. But if you’re intent was to issue the probability of p given q’, well and fine.
“But this isn’t statistics,” I hear you saying, “Where’s the data? Where’s the model?”
Ah. Data. My dears, q is data, or, more precisely, a datum. You’re not used to see data like this, but data it is. Data with which you’re familiar are propositions, too. “A man, then a woman, then two men, and they were 34, 42, 36 and 58 years old, and …” Any statement of observations is a proposition: usually complex compound difficult strung-out propositions, but propositions nonetheless. Prove this to yourself before continuing.
The model? It was there, standing and waving for attention. You didn’t notice it because its dropped it bangles and adjustable jewelry, i.e., it’s parameters. What is a model? Well, the mechanism to assign probabilities of propositions p given evidence q. Here that mechanism was a direct deduction from premises, our q, to the probability p was true. Given (only) q there were no other possibilities for the probability of p. Deductions are the cleanest and best kind of models. They are clean in the sense that no ad hoc evidence was added to q as it nearly always is—as it certainly is if parameters are present.
In typical practice, the evidence, or premises, q are extended by pasting “I believes”, such as “I believe p is represented by a normal distribution with these two parameters.” Beliefs are nice and sometimes they can even be true but ardency is rarely a guide; tradition might be helpful, but not in fields with a history of screwy ideas (sociology, psychology, education, etc., etc.). The “I believes” are the models.
Now no matter how or where q originated, the probability of p with respect to the (augmented) q is (barring calculation mistakes) the probability of p with respect to q. Meaning, p is correct, it’s the right answer conditional on assuming q is true.
It is a separate question whether q itself is true or likely true. And the only way to tell that is to conditional q on other evidence which is not p or q; that is, q becomes ‘p’ in a new calculation. If we had conclusive evidence to q’s truth, the we’d be able to deduce the model, as in the example above. If we knew q was false, we could still calculate the probability of p, but why bother?
Indeed, why bother. For that, read Part II.