The Alternative To Null Hypothesis Significance Testing

This article jumps right in with continuous-valued probability models, which are explained in Uncertainty, as is the subject of this article in great detail. This article is only a rough summary.

The idea of null hypothesis significance testing (NHST) is to “test” whether a continuous-valued parameter or parameters of a given probability model for some observable proposition (or just observable) equals some value, usually zero (which for simplification is the value assumed here). If the “test” “confirms” the zero value, the observable associated with the parameter(s) is said or thought not to cause or to be “linked to” the observable of the probability model.

One example will serve (and the one used in Uncertainty). A regression model for college grade point average based on SAT scores, high school GPA, and time spent studying. The observable of interest is CGPA, and there are continuous-valued parameters associated with each observable “predictor” (there are also other parameters which make the regression mathematically “work”).

A test might be conducted on the parameter associated with study time. The “associated with” is stressed, because the parameter is not the observable, though the two objects in practice are conflated: the parameter is reified into the observable predictor. There are two hypotheses put forward, a “null” and “alternate”. The null is that the parameter associated with the predictor equals zero, and the alternate is that it does not.

Frequentist theory does not allow the calculation of the probability either hypothesis is true; in other words, it is forbidden to calculate the probability, conditional on any evidence, the parameter equals any value. Disregarding that curious restriction, the probability the continuous-valued parameter equals any single value, conditional on information available in the probability model, is always 0 (because of the continuity of the parameter), thus the null is always falsified, and the alternate always true.

Since the parameter probabilities (which must be conditioned on something, since all probability is conditional) cannot be calculated, NHST takes another tack. A non-unique, ad hoc function of the observables called a “test statistic” is calculated. The probability this statistic takes certain values can be calculated, conditional on aspects of the model and on the assumption the null hypothesis is true (yes, NHST assumes what it seeks to prove or disprove).

If this probability is below a set threshold, i.e. the magic number, the null hypothesis is “rejected”, which is to say, it becomes (as the inventor of the method admitted) an act of will to believe the parameter is not equal to zero. At this point, reification again enters and it is believed the observable predictor is causative of or “linked to” the main observable. Ascertaining this is, as is clear, beyond the powers of the test; still, it is what is believed. “Linked to” in every mind equals “cause”, or something vague which means “almost cause.”

If the test statistic probability does not sink below the magic value, the null is not “accepted”, as this would violate the odd prescription that propositions can never be truly believed, only rejected. The inconsistency should be obvious. Instead, the null has “failed” to be rejected. At this point, the predictor observation is thought not to cause or is not “linked to” the main observable. Again, this logical jump is beyond the powers of the test, but few are bothered by this point.

Since the test statistic is ad hoc, others may be chosen; others often are, with a mind to finding one that rejects the null. The over-certainty produced by this method is beyond monumental: it is staggering. It is beyond staggering: it is inconceivable. The number of studies using NHST, and therefore arriving at decisions in an improper manner, is larger than can be counted practically.

None consider the causative implications of probability models, satisfied with assuming causality being proved by the test or resting with the vague “linked to”. But probability models can never discover cause (as is proved in Uncertainty), “linked to” has no definite or defensible meaning, and worse, when the null is not rejected, “chance” or “randomness”, which have no physical existence or causative power, are said to have caused the main observable, which is great nonsense. For instance, the phrase “due to” chance or randomness is oft used.

The alternative

The alternative is to eschew testing altogether and rely on decisions based on direct assessment of probabilities of (measurable) observables. Since decision depends on outcomes, different people can come to different conclusions using the same probability models. Just like real life.

This is sometimes known as the predictive approach (but be careful: this term has wide variance, where is can mean the opposite of the definition next given).

What happens in this: a decision maker (not a statistician) makes judgments about values of the main observable and predictor observables which are of interest to him—and possibly to him alone. The probabilities the main observable takes these values, conditional on the model and set values of the predictor observables, are calculated. These can be calculated with models incorporating some but not all predictors, and with other models including the neglected predictor variables.

In the example, suppose a decision maker was interested in CGPA greater than 3. One model, with SAT and HGPA but without time spent studying calculates the probability of new CGPAs greater than 3, conditional on the model and at set or varying levels of SAT and HGPA, are calculated. These can then be acted upon like any other probability; i.e. decisions made using these probabilities are taken, and the consequences met.

A second model like the first but adding time spent studying is entered, and the probabilities of CGPA greater than 3 are again calculated, again at levels of the (now three) predictors thought important. If these new probabilities do not differ in any actionable way from the first model, the new predictor is therefore not important to this decision maker.

Several items are apparent. The predictive method is much more labor intensive. There is no one-size-fits-all answer. Different decision makers will find different models using the same data of different utility.

Any model can be checked against truth, and its distance measured. This is impossible using NHST. The predictive approach is thus like bridge building. A model is put forward that the predicts, under certain set conditions (predictor variables), high probability the bridge will stand (or fall). The model is put to the test: the high probabilities are verified against experience. Anybody can check for themselves whether the bridge stands or falls.

Same with probability models. Any probability model of observables can and should be verified against reality. Publishing models in the predictive, and not NHST, form allows anybody to ascertain the usefulness of a model.

There are ways to, using assumptions or outside information, gauge the probability a given model from a set of models is true, but these are of limited use, especially when a given set of predictor observables must be chosen for some decision.

There is no word about cause in the predictive approach. People being people, cause will still be inferred; not always incorrectly, but always without direct warrant. However, since the models must meet reality, cause is far less likely to be falsely ascribed. Scores upon scores of regressions models that now purport via NHST to have discovered cause or “links” would be rejected out of hand in the predictive approach.

In practice, few understand NHST; the concepts are too difficult to keep in mind. Probabilities about observables conditioned on other observables, on the other hand, are easy to comprehend. Everything is observable! There are no parameters (they are “integrated out”). Predictive models are no different than betting on sporting events.

Against their adoption is the additional labor, which is significant. But it takes the decision out of the hands of a uninterested statistician and puts it back where it belongs, with the person who has an interest in the main observable. The deficits of public models will be obvious to all, a tremendous benefit, but a factor which would discourage many who push sketchy theories; people who will ask, “What’s so bad about wee p-values after all? We just have to be careful how we use them.” Of course, there is no way to be “careful” about their use. They should be forever eschewed.


  1. Anon

    I can’t believe there are no comments. Everyone must agree.

  2. DAV

    Strange. I count two comments.

  3. oldavid

    W’M Briggs should put all his statistics and modeling and hypothesis “no-testing” to work on the fantastic assumption of “Evolution” and deliver us a case for the certainty that observation and logic are wrong according to some obscure mathemagical formula.

    I am not convinced that reality can be altered by ideological prejudice “backed up” by mathemagical rationalism and highbrow red herrings.

    If Reality, and Nature, and Humanity, and Truth, and Virtue, and Reason, and Faith are in the process of “becoming” what are they “becoming” from, and to, and why, and how?

  4. John Shade

    Quote “If the “test” “confirms” the zero value, the observable associated with the parameter(s) is said or thought not to cause or to be “linked to” the observable of the probability model.”.
    This is not true. Such a result merely indicates that the evidence for a contrary result (i.e. the alternative hypothesis) is found to be weak.
    Hypothesis testing does get a tough time of it these days, and of course this is largely due, I suppose, to the widespread misinterpretation of what such tests mean. Their contribution, when ‘reasonaby’ valid – and that is another matter, is quite modest, but nevertheless useful one. All you get is an assessment of the strength of evidence for the alternative hypothesis. You do not get the last word on the matter. You do not get a conclusive result. You merely get a result which says something in the range weak to strong in favour of the alternative.

  5. Another attempt to point at the inscrutable. NHST is bad, but it is an attempt to get at an idea. The idea is built on multiple other ideas. It is really easy for such ideas to be misunderstood. I look at NHST and think that the meaning is pretty clear. It is when I look at the outcome of NHST that I think others misses the pretty clear meaning.

    Somehow long time smokers don’t get lung cancer.

    Somehow with a lack of public smoking, asthma is on the increase.

    Somehow people who go to public schools actually make something of themselves.

    There is a difference between folks who want to learn and the folks who want to be taught. The difference is hard to point at, but it is there in loud blinking lights, but NO CHILD SHALL BE LEFT BEHIND.

  6. DAV

    Quote “If the “test” “confirms” the zero value, the observable associated with the parameter(s) is said or thought not to cause or to be “linked to” the observable of the probability model.”.
    This is not true.

    Actually, it is true. The idea behind the test is to eliminate hypotheses by showing absence of correlation (i.e., no correlation = no causal link). Somewhere along the way, this and correlation does not imply causation have been forgotten. Unfortunately, correlation is too easy to find. With the chi-square test, for example, larger sample sizes tend toward lower p-values.

    The above link though makes the all-too-common mistake of saying correlation found allows rejection of the null hypothesis when correlation means that one can’t summarily reject the alternative. After all, the correlation may be spurious which is the equivalent of saying non-predictive.

Leave a Reply

Your email address will not be published. Required fields are marked *