Teaching Journal: Day 8—Hypothesis Testing: Part I

Hypothesis testing nicely encapsulates all that is wrong with frequentist statistics. It is a procedure which hides the most controversial assumption/premise. It operates under a “null” belief which nobody believes. It is highly ad hoc and blatantly subjective. It incorporates magic p-values. And it ends all with a pure act of will.

Here is how it works. Imagine (no need, actually: go to the book page and download the advertising.csv file and follow along; to learn to use R, read the book, also free) you have run two advertising campaigns A and B and are interested in weekly sales under these two campaigns. I rely on you to extend this example to other areas. I mean, this one is simple and completely general. Do not fixate on the idea of “advertising.” This explanation works equally well on any comparison.

I want to make the decision which campaign, A or B, to use country-wide and I want to base this decision on 20 weeks of data where I ran both campaigns and collected sales (why 20? it could have been any number, even 1; although frequentist hypothesis testing won’t work with just one observation each; another rank failure or the theory).

Now I could make the rule that whichever campaign had higher median sales is the better. This was B. I could have also made the rule that whichever campaign had higher third-quartile sales is better. This was A. Which is “better” is not a statistical question. It is up to you and the relates to the decisions you will make. So I could also rule that whichever had the higher mean sales was better. This was B. I could have made reference, too, to a fixed number of sales, say 500. Whichever had a greater percentage of sales greater than 500 was “better.” Or whatever else made sense to the bottom line.

Anyway, point is, if all I did was to look at the old data and make direct decisions, I do not need probability or statistics (frequentist or Bayesian). I could just look at the old data and make whatever decision I like.

But that action comes with the implicit premise that “Whatever happened in the past will certainly happen in likewise characteristic in the future.” If do not want to make this premise, and usually I don’t, I then need to invoke probability and ask something like, “Given the data I observed, what is the probability that B will continue to be better than A” if by “better” I mean higher median or mean sales. Or “A will continue to be better” if by “better” I meant higher third-quartile sales.” Or whatever other question makes sense to me about the observable data.

Hypothesis testing (nearly) always by assuming that we can quantify our uncertainty in the outcome (here, sales) with normal distributions. When I say “(nearly) always” I mean statistics as she is actually practiced. This “normality” is a mighty big assumption. It is usually false on the premise that, like here, sales cannot be less than 0. Often sociologists and the like ask questions which force answers from “1 to 5” (which they magnificently call a “Likert scale”). Two (or more) groups will answer a question, and the uncertainty in the mean of each group is assumed to follow a normal distribution. This is usually wildly false, given that, as we have just said, the numbers cannot be smaller than 1 nor larger than 5.

Normal distributions, then, are often wrong, and often wrong by quite a lot. (And if you don’t yet believe this, I’ll prove it with real data later.) This says that hypothesis testing starts badly. But ignore this badness, or the chance of it, like (nearly) everybody else does and let’s push on.

If it is accepted that our uncertainty in A is quantified by a normal distribution with parameters m_A and s_A, and similarly B with m_B and m_B, then the “null” hypothesis is that m_A = m_B and (usually, but quietly) s_A = s_B.

Stare at this and be sure to understand what it implies. It DOES NOT say that “A and B are the same.” It says our uncertainty in A and B is the same. This is quite, quite different. Obviously—as in obviously—A and B are not the same. If they were the same we could not tell them apart. This is not, as you might think, a minor objection. Far from it.

Suppose it were true that as the “null” says m_A = m_B (exactly, precisely equal). Now if s_A were not equal to s_B, then our uncertainty in A and B can be very different. It could be, depending on the exact values of s_A and s_B, that the probability of higher sales under A was larger than B, or the opposite could also be true. Stop and understand this.

Just saying something about the central parameters m does not tell us enough, not nearly enough. We need to know what is going on with all four parameters. This is why if we assume that m_A = m_B we must also assume that s_A = s_B.

The kicker is that we can never know whether m_A = m_B or s_A = s_B; no, not even for a Bayesian. These are unobservable, metaphysical parameters. This means they are unobservable. As in “cannot be seen.” So what do we do? Stick around and discover.

Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

2 Comments

DAV

June 26, 2012, 11:10 am

The blog bounced around a lot this morning. Having a bad code day?
Briggs

June 26, 2012, 6:26 pm

The road is sometimes not smooth.

N.S. Palmer on Hume’s Guillotine, Euclid’s Catapult: Induction, Axioms, and Objective EthicsJuly 17, 2026
"Some moral beliefs are universally true because they are based on human nature. However, these beliefs are essentially factual beliefs:…
PonderPoints on A New Physics Arises: Irreducible by Federico Faggin ReviewedJuly 17, 2026
You argue that Faggin is right to reject mechanistic emergence but wrong to award consciousness to every field or “seity,”…
John Pate on University of California Prof Tries To Save System From DIEingJuly 15, 2026
I don't believe China to be hampered by such nonsense or, for that matter, much of S.E.A. in general.
Rudolph Harrier on My Horrible Interaction With AIJuly 15, 2026
I don't find LLMs to be particular bad at D&D in comparison to other things. For example, I queried an…
JohnK on Hume’s Guillotine, Euclid’s Catapult: Induction, Axioms, and Objective EthicsJuly 15, 2026
Does one detect the sea-captain's lurking circularity in this "we" who "need air, gravity, food..." etc.? And don't "we" also…

Teaching Journal: Day 8—Hypothesis Testing: Part I

Related

Discover more from William M. Briggs

2 Comments

Leave a Reply

Share this:

Related

Discover more from William M. Briggs

2 Comments

Leave a Reply