By moi. Abstract:
There is no reason to use traditional hypothesis testing. If one’s goal is to assess past performance of a model, then a simple measure or performance with no uncertainty will do. If one’s goal is to ascertain cause, then because probability models can’t identify cause, testing does not help. If one’s goal is to decide in the face of uncertainty, then testing does not help. The last goal is to quantify uncertainty in predictions; no testing is needed and is instead unhelpful. Examples in model selection are given. Use predictive, not parametric, analysis.
Some meat from the Testing Versus Deciding section, mostly de-LaTeXified.
Suppose we have the situation just mentioned, two normal models with different priors for the observable $y$. We’ll assume these models are probative of $y$; they are obviously logically different, and practically different for small $n$. At large $n$ the difference in priors vanishes.
A frequentist would not consider these models, because in frequentist theory all parameters are fixed and onotlogically exist (presumably in some Platonic realm), but a Bayesian might work with these models, and might think to “test” between them. What possible reasons are there to test in this case?
First, what is being tested? It could be which model fits D, the past data, better. But because it is always possible to find a model which fits past data perfectly, this cannot be a general goal. In any case, if this is the goal—perhaps there was a competition—then all we have to do is look to see which model fit better. And then we are done. There is no testing in any statistical sense, other than to say which model fit best. There is no uncertainty here: one is better tout court.
The second and only other possibility is to pick the model which is most likely to fit future data better.
Fit still needs to be explained. There are many measures of model fit, but only one that counts. This is that which is aligned with a decision the model user is going to make. A model that fits well in the context of one decision might fit poorly in the context of another. Some kind of proper score is therefore needed which mimics the consequences of the decision. This is a function of the probabilistic predictions and the eventual observable. Proper scores are discussed in [paper]. It is the user of the model, and not the statistician, who should choose which definition of “fit” fits.
There is a sense that one of these models might do better at fitting, which is to say predicting, future observables. This is the decision problem. One, or one subset of models, perhaps because of cost or other considerations, must be chosen from the set of possible models.
There is also the sense that if one does not know, or know with sufficient assurance, which model is best at predictions, or that decisions among models do not have to be made, that the full uncertainty across models should be incorporated into decisions.
The two possibilities are handled next.
You may download this peer-reviewed wonder here. It will soon appear (March?) in a Springer volume Behavioral Predictive Modeling in Econometrics. I don’t have the page numbers for a citation yet, but it will be in the “Springer-Verlag book series ‘Studies in Computational Intelligence’ ISSN: 1860-949X (SCOPUS)”.
When I have some more time, I’ll post the R code and make it a statistics class post so those interested can follow along.