I claimed, and it is true, that all statistical problems could be written , where p is a proposition of interest and q is our evidence, or premises, or data, or data-plus-model, whatever you like to call it. Recall q is a compound proposition, including the data and whatever other knowledge we assume or possess.
I also claimed that q often contains “I believes”, in the form of “I believe the uncertainty in p is represented by this parameterized probability ‘distribution’.” Regardless whether these beliefs are true, as long as there are no calculation errors, is the true probability—because it assumes q is true, but does not seek to prove it. This is no small distinction; it must be kept continuously in mind or mistakes will be (and are) made. (More on this in the last Part.)
So let’s separate the “I believes” from q and call them m (for “models”). Thus we have where q is as before sans the arbitrary model. Now, we don’t always need models. The example I showed last time didn’t need one. Here is another where a model is not needed. Example: p = “At least 7 4’s will show in the next n throws” and q = “We have a k-sided object (where k is at least 4) which when tossed must show only one side, with sides labeled 1, 2, …, k.” We deduce the probability of p directly (it is binomial).1
It turns out, at least in theory, that we can always deduce probabilities when p and q speak of finite, discrete things, which are really all the things of interest to civilians.2 Mathematicians, statisticians, and the odd physicist, however, insist on stretching things to limits to invoke continuity. Noble tasks, worthy goals and the only real mistake these folks make in pursuing them is anxiousness. Because the “I believes” are usually stated in the continuous, infinite forms as if given to us from on high and are not themselves deduced or inferred from the evidence on hand. And—as one of my favorite jokes has it—that’s when the fight started.3
The m’s, the “I believes”, are the cause of (rightful) contention between the two main sects of statisticians, the frequentists and Bayesians. Give you an example: p = “Tomorrow’s high temperature will be 72F”; q is any sort of data we have on the subject, and m = “The uncertainty in p is characterized by a normal distribution with parameters a and b.” The parameters of this model, as they are in most, are themselves continuous and unobservable; well, they are just fictions necessary to compute the probability of p.
Which in this case is 0 regardless of the value of a and b. That’s because a normal distribution, like all continuous distributions, give 0 probabilities to all single observables. (Don’t forget this probability is true assuming q and m.) This is why we can’t ask normal questions of normals (a pun!). You can see this is the point where adherence to a lovely theory can screw with reality. Anyway, if we want to use continuous distributions we must change our propositions so that they become answerable: let p = “Tomorrow’s high temperature will be greater than 72F”. This will have some non-zero value no matter what a and b are.
And just what are a and b? Nobody knows. There is no evidence in q or m to tell us. But since knowing what they are is absolutely necessary to solve the problem, we have to make some evidence up. Bayesians start talking about “flat” or “non-informative” or “improper” priors; some like to say “maximum entropy!” (the exclamation mark is always there). This move baffles the frequentists who say, and say truly, “You’re just making it up! How do you know it’s the right answer for this problem?” The Bayesian demurs and starts discussing “objectivity” and so forth, all different names for the same maneuver he just pulled.
So the frequentists go their own way and say, “I don’t know a or b either, so I’ll just guess them using one of several functions, or test their values against this null hypothesis.” Now it’s the Bayesians turn to demand accountability. “But you have no idea if your guesses are right in this problem! And, anyway, nobody in the world believes your so-called null hypothesis.” The frequentists retort, “Well maybe we don’t know if the guesses are right in this instance, but they will be if we do problems exactly like this an infinite number of times. And nobody ever believes null hypotheses, sort of.”
The steaming opponents—who you will have noticed ignore that both made up m out of whole cloth—leave the field of battle and head back to their encampments to produce their guesses which—surprise!—are usually not that different from each other’s. This is partly because all or almost all statisticians start as frequentists and only see the light later, so everybody uses the same kind of math, and partly because there’s usually a lot of good, meaty knowledge in q to keep people from going too far astray.
But the criticisms of both are right: from the arbitrariness of the m to the arbitrary guesses of the parameters, there’s a lot of mystery. Both sides are guessing and don’t like to say so.
The alternative? Restate the problem in discrete, finite terms and then use q to deduce the probabilities in p—if they even exist as single numbers, which most times they don’t. For most applications this would be enough. For instance, do we really care about 72F in particular? Maybe the temperature at the levels (‘below 60’, ‘between 60 and 70’, ‘between 70 and 75’, ‘above 75’) are all we really care about. After all, we can’t make an infinite number of decisions based on what the temperature might be, only a finite number. This moves gives us only four categories, some good observations in q, and we won’t be adding anything arbitrary. Everything is deduced starting with premises that make sense to us, and not to some textbook.
Well, this works. And if we really are enthusiastic, we work out all the math and then, and only then, take things to the limit and ask what would happen.
See this poorly written paper for an example of the typical “unknown probability of success”.
Next, and last, time: how do we learn about q?
1I’m not going to prove it here, but we don’t need information about “uniformity”, “symmtery”, “priors” or any of that stuff. See the statistics and probability philosophy papers for more details. Just believe it for now.
2I’m not proving this here either, but if you disagree I challenge you to state a measure of interest not of the categories listed about that isn’t discrete and finite.
3My wife and I were out to eat and there was a drunk at the next table. My wife said, “That’s the guy I used to date before we were married. He started drinking the day we broke up and hasn’t stopped since.” “My God,” I said, “Who would’ve thought a guy could go on celebrating for that long!” And that’s when the fight started.