This is extracted from this set of papers, which should be read in full.
An ordinary regression model is written
mu = beta_0x_1 + … + beta_0x_p,
where mu is the central parameter of the normal distribution used to quantify uncertainty in the observable. Hypothesis tests help hone the eventual list of measures appearing on the right hand side. The point here is not about regression per se, but about all probability models; regression is a convenient, common, and easy example.
For every measure included in a model, an infinity of measures have been tacitly excluded, exclusions made without benefit of hypothesis tests.
Suppose in a regression the observable is patient weight loss, and the measures the usual list of medical and demographic states. One potential measure is the preferred sock color of the third nearest neighbor from the patient’s main residence. It is a silly measure because, we judge using outside common-sense knowledge, that this neighbor’s sock color cannot have any causal bearing on our patient’s weight loss. The point is not that nobody would add such a measure—nobody would—but that it could have been but was excluded without the use of hypothesis testing.
Sock color could have been measured and incorporated into the model. That it wasn’t proves two things: (1) that inclusion and exclusion of measures in models can and are made without guidance of p-values and hypothesis tests, and (2) since there are an infinity of possible measures for every model, hence an infinity of potential null-alternate hypothesis pairs, we always must make many judgments without p-values. There is no guidance in frequentist (or Bayesian) theory that says use p-values here, but use your judgment there. One man will insist on p-values for a certain X, and another will use judgment. Who is right? Why not use p-values everywhere? Or judgment everywhere? (The predictive method uses judgment aided by probability and decision.)
The only measures put into models are those which are at least suspected to be in the “causal path” of the observable. Measures which may, in part, be directly involved with the efficient and material cause of the observable are obvious, such as adding sex to medical observable models, because it is known differences in biological sex cause different things to happen to many observables.
But those measures which might cause a change in the direct partial cause, or a change in the change and so on, like income in the weight loss model, also naturally find homes (income does not directly cause weight loss, but might cause changes which in turn cause others etc. which cause weight loss). Sock color belongs to this chain only if we can tell ourselves a just-so story of how this sock color can cause changes in other causes etc. of eventual causes of the observable. This can always be done: it only takes imagination.
The (initial) knowledge or surmise of material or efficient causes comes from outside the model, or the evidence of the model. Models begin with the assumption of measures included in the causal chain. A wee p-value does not, however, confirm a cause (or cause of a cause etc.) because non-causal correlations happen. Think of seeing a rabbit in a cloud. P-values, at best highlight large correlations.
It is also common that measures with small correlations, i.e. with large p-values, where there are known, or highly suspected, causal chains between the X and Y are not expunged from models; i.e. they are kept regardless what they p-value said. These are yet more cases where p-values are ignored.
The predictive approach is agnostic about cause: it accepts conditional hypotheses and surmises and outside knowledge of cause. The predictive approach simply says the best model is that which makes the best verified predictions.
To read more about this most pressing subject, buy this award-eligible book.
To support this site using credit card or PayPal click here
“The predictive approach simply says the best model is that which makes the best verified predictions. ”
Where ‘best’ can be judged in a fairly non-biased manner by looking at statistics of the residuals and using p-values. 😉 Go through the Lady Tasting Tea example, say with multiple ladies. Maybe a lady gets some right, more right than another, but that could occur by correct guessing and not skill. How does your predictive approach take that into account?
Also, I’d be more interested in best model that uses the least number of variables to make the predictions (not just makes the most correct predictions). Otherwise, you invite the well-known statistical problem of overfitting.