A crime has been committed. Evidence—probative background information—points to a list of suspects, a list which, as always, might be incomplete. A most likely candidate is made to stand trial. Evidence for his guilt is provided by a vigorous prosecutor. His defense is conducted by a meek public defender. The jury must estimate the probability that the accused is guilty, and if they feel this probability exceeds reasonable doubt, they must vote to convict him. Before any evidence is heard, many jurors reason that the accused is likely guilty, else he would not be on trial. Thus, instead of considering the question whether the accused is guilty, the jury decides to opine on the likelihood of the evidence given the accused were innocent. Since the evidence nearly always appears rare or usual assuming the defendant were innocent, most trials result in a conviction.
That is what classical statistics—both in it frequentist and Bayesian incarnations—is like. Some thing has happened, and a hypothesis is put forth which, based on probative background information, the investigator thinks likely caused the thing. Evidence—mostly in the form of “data”, but non-quantitative information too—for this hypothesis is put forth. Very little is offered in rebuttal. The hypothesis is also one of many possible, the list of which is probably incomplete; that is, the true hypothesis might not be considered in a given trial.
The jury consists of mathematical formulae whose duty is to report solely on the likelihood of the evidence given the hypothesis is false. And since the evidence nearly always appears rare assuming the hypothesis is false, most trials result in a conviction—meaning, the investigator’s viewpoint is confirmed.
What should be
The problem is that most events cannot, or should not, be coerced into the format of a criminal trial. It is often less important about pining down an exact cause of a non-unique events than in understanding and predicting them. Anyway, events—the “things” above—are rarely unique like particular crimes are. We do not always want to say that this variable caused this unique, singular, one-of-a-kind event. Instead, most events of interest are part of a larger structure; a stream of similar things.
For example, the weather or climate. Trying to fit carbon dioxide into the frame as the culprit for a historically observed increase in temperature is vastly less interesting and useful than in being able to accurately predict what will happen. It is easy enough to find enough evidence to convict our poor gas classically. But if it were part of a gang of gases, the others would go free. Meanwhile, in celebrating our conviction, we would remain ignorant of what will happen, or will issue poor predictions because we were so intent on assigning blame.
Or take comparing fertilizers. The classical way would be to say, seemingly authoritatively, that the hypothesis that fertilizer A is the “same” as fertilizer B has been “rejected.” When what is more important would be to say how much better fertilizer B is than A and under what circumstances.
Just think: in the data we have collected, in the particular historical circumstances which led to their collection, we know everything there is to know. We know whether fertilizer A was better than B: just count yield for both brands! I use the word “know” in the sense of rigorous proof, and do not mean “likely”—I mean certain. Classical statistics summons its forces to say something about the cause of this past data, when we should be trying to say something about data we have not yet seen, and about which we still are in the dark.
If investigators reported statistical results in terms of explicit predictions, instead of blank announcements of culpability (tables of variable names and p-values), then we would easily be able to see whether the culprits they fingered were guilty. It’s easy to do this, too. A paper in sociology could announce, “Input the values of these variables—chosen from this and that source—and then the outcome of interest is likely to be in such and such bounds.” (The technicalities of this we can discuss later.)
Those interested in the model would quickly discover whether the predictions had any value—because they could check them on new and independent data. If the models worked, then the causes asserted by the authors would carry more weight. But if the models failed—and many would—then the authors’ theories could be rejected. Which itself is a tremendous service.
Once more, we’ve reached our limit of words, and I have not done an adequate job explaining this. But stick around; the words might come.