Statistics is the collection and modeling of data. By “modeling” I mean using probability to describe our uncertainty in values that data may take. Statistics, then, is applied probability. Probability is the quantitative branch of epistemology. Data are propositions of the sort, “We observe X to take the value x,” where X is usually some tangible, real-world object. We use probability to quantify the chance these propositions are true, i.e. that X takes the values x.
When we observe data, we assume that something caused this data to take the values it did. We have from no to full knowledge of this causality, depending on the circumstance. We call this knowledge a model, which may be anywhere from purely mathematical-logical to completely probabilistic. If our model is purely mathematical-logical, the values the data will take are rigidly determined; there is no uncertainty. If our model is completely probabilistic, the values the data will take are unknown to a specified extent. Most models are somewhere in between. This is general and applies to models of electrons to electorates.
Call the model for your data (X) at hand M, where X = “The data takes a specified value x.” Probability is used to say things like this:
(1) Pr(X | M);
that is, given the model, this is the probability the data takes certain values. If a rival model is proposed, it is not guaranteed that Pr(X | M1) equals Pr(X | M2) for all the possible values X can take, but even if these probabilities do match it could be that M1 and M2 are not logically equivalent.
It is extremely important to understand that the choice of the model is subjective. That is, there may be external evidence about X (call it E; evidence which is not X) which dictates a form or partial form of M, but in practice people are free to choose whatever M they wish. This is because the E that (supposedly) gives credence to M is also subjectively chosen. That is, we usually reason Pr(M | E) = 1. Nevertheless, however M is decided, the probability statements (1) are fixed, true, and are not subjective.
Now, it is the case in formal statistics models are usually indexed by unobservable parameters. Values have to be supplied for these parameters before equations like (1) can be calculated. That is, for a fixed M, indexed by parameters, equation (1) takes different values for every different value of the parameters.
There are now two situations: (A) no new data will ever be taken, and (B) new data will be taken. By “new” I do not necessarily mean data that will arise in the future, though this is the most usual case; “new” is data that was not used before.
(A) No new data
If no new data will ever be taken, we again have two possibilities. We might want to know how the data arose, or we might have competing models that we want to assess in light of X.
Now it might make sense to ask how X arose. But it might not, either. After all, if no new data is coming, then everything we need to know about X is in X itself. If we want to know how many of the X are less than this number, or greater than that, all we need do is look. Was X increasing? Just look. Was it decreasing. Just look. This approach has been greatly underused.
It is often the case that we have to decide which of a set of competing models is most likely given X. For example, a jury trial. We have two competing models, M0 = “The guy didn’t do it” and M1 = “The guy did it”. We use X (the trial data, evidence, and arguments) to compute
(2) Pr( M1 | X ),
with the assumption that Pr( M1 | X ) + Pr( M0 | X ) = 1 (probabilities of models sum to one over any set of models). Notice that the set of M is chosen by us in advance, supplied by external evidence E (such as “There is a man in the dock who is on trial, and either he did the did or did not”).
The usual mistake here (in science, not courtrooms) is to assume that exact quantifications of (2) always exist. They surely sometimes do exist, but not always. Now, if there are an infinite number of Mi, then it is possible that (2) will equal 0 (the usual case). Thus, in order to make sense of the world, we need to impose finiteness and select from a limited number of explanations for any X.
Next: classical statistical procedure; clarifications, because I’m not entirely happy with the language.