The objection which will occur to those, Lord help them, who have had some statistical training is that “increased” means a combination of “linear increase” and “significance.” These objections, as we’ll next see, are chimera, but the fault they are made lies with me. Mea culpa! I hereby accept blame for the poor statistical education most people receive. We statisticians often do a terrible job teaching our subject to outsiders (ask any student and they will agree with this). We know we do poorly because scarcely anybody remembers what these and other statistical concepts mean once they leave the classroom (however, their ignorance rarely affects their confidence). For my penance, this article of clarification.
Suppose you say that “increased” meant that the data did not decrease in a statistically significant linear fashion. That is, you are willing to allow that the actual data “had a downward trend”, but that this trend—by which you mean a straight line drawn through the data—was not “significant,” and that therefore an “increase” of some kind was still a possibility. The data is shown below, with a regression line drawn through.
First we have to focus on eq. (13). We must start, continue, and end with the idea firmly in mind that the actual data did not in fact “increase” (by our definition).
Second, the regression line is a model: call it Mr. It sounds as if we want to compute
(18) Pr(X decreased | Mr),
but this is not what people want when they think about classical statistics. If we mean by “decreased” the opposite of “increased”—that is, that X went down more often than it increased or stayed the same—we can calculate (18) (or any other function of the observed data), but nobody does. They instead calculate one of two different things, depending on whether they are a frequentist or Bayesian.
Before we get to that, we first have to understand what Mr means. We don’t have to get overly specific; all we have to know is that Mr is indexed by unobservable parameters, only one of which (for this simple regression) has to do with the “trend”: call this parameter θ. It helps to think of it as the slope of the line we drew. High school geometry tells us that if θ > 0, then the line will go up, and that if θ < 0 then the line will go down. If θ = 0 then the line will be flat.
A frequentist will calculate
(19) Pr( F(X) > f(X) | Mr, θ = 0),
where f(X) is an ad hoc function of the observed data, F(X) is the same function over data never seen, and where both are subjectively chosen from a very large supply of functions (usually the absolute value of the functions are taken). The probability assumes that the “experiment” that gave rise to X will be repeated indefinitely, and that for each repetition a new F(X) will be calculated. (19) is thus the probability of seeing a larger F(X) than the actual f(X) in these repetitions, assuming Mr is true but with its “slope” parameter set equal to 0. If (19) is “small” θ is said to be “not zero”; if instead (19) is “large”, θ is said to be 0 and the trend “not statistically significant.”
A Bayesian will calculate1
(20) Pr( θ < 0 | Mr & X & E),
which is the probability that the slope is less than 0, but still assuming that the model is true and given the old data and something called “E”, which is the evidence we need to tell us about the parameters before we see any data. We call this information the “prior”, but we needn’t spend any time on it, because happily for simple regression models like Mr the frequentist and Bayesian will agree about θ. For when (20) is “large”, (19) will be “small”, and θ will be declared not to be 0; and when (20) is “small”, (19) will be “large”, and θ will be declared to be 0.
It turns out that for this data, (19) is about 10-16, which is “small”, and (20) is about 1 – (19)/2, which is “large.” For this data, classical statisticians would announce, “X did in fact decrease” or “There was a statistically significant decrease in X.” A person ignorant of any statistics will have calculated (13) long ago and concluded that, yes, the data did in fact decrease, because it did.
But suppose instead that (19) was “large” and (20) “small”, but that (13) still holds. Then the statistician would say, “The decrease in X was not statistically significant.” Unfortunately for the statistician, this is not equivalent to “X did not decrease”, because we have already agreed that it did. This situation is thus somewhat akin to Congresspersons who say “The budget is being cut” when what they mean is “We are reducing the amount of increase, but there is still an increase.” Well, that’s statistics for you.
Now, as stated earlier, we could have computed (18) and said something about the probability of the actual observable X itself decreasing given the model2. (18) is not (19) nor is it (20) (but all assume the model is true), and in general these probabilities won’t match. It turns out that (18) is easy to calculate, but in order to do so we must first supply a guess of the parameters of Mr or, if you are a “predictive” Bayesian, to guess the parameters and then “integrate them out.” That is, Mr = Mr(parameters), so before we can compute (18) we need to plug in guesses of the parameters. Bayesians can actually integrate out all uncertainty in the parameters, frequentists do something else. It doesn’t matter which method you choose—pick maximum likelihood if you’re a frequentist, or say a BLUP estimator, or on and on, all the way to frequentist predictive techniques, which sometimes mimic the Bayesian predictive techniques. All we need understand is that a guess for parameters has been made and that the uncertainty in these guesses has been accounted for. (18) can then be calculated.
Suppose, after you’ve done this, (18) is “small”. We earlier saw that (18) was like (14) or (15), so just because (18) is “small” (or “large”), this does not change (13), which states that, given the observations and our definition of “decrease”, the data did in fact decrease. (18) is conditional on a model, which we assume is true. (13) is conditional on the observations, which we assumed was error free.
If you’re unhappy about this, you have two statistical options. The first is to change the model. We pulled Mr out of a hat anyway, so why not try different M? You’re bound to find one that agrees with what you wanted, which was to say that the data did not decrease. That is, you will surely, if you search hard enough, find an M which gives pleasing results for (18) — (20). After all, who said Mr was true? Nobody. We just assumed it. We’ll talk later about how to tell how good Mr is. All we have to understand here is that we can’t talk about “significance” or “trend” without assuming a model: you can’t have one without the others. It is an impossibility.
The second option is to include more data. Reject the original question, which was “Did X increase from time 1 to 156?”, or say it wasn’t really what you meant, and that you instead meant, “X increased over the longer term.” That’s certainly vague enough, and gives you room to play because it frees you from saying exactly what you mean by “longer term.”
But, invariably, there will be somebody who pins you to the wall and insists that you define, exactly, precisely what you mean by “over the longer term.” At this point you’re stuck3, for when you pick an exact start date, say X-n where negative indexes indicate time before 1, all of what was outlined above still holds. That is, all we need do is glance at the data and compute the new (13) and even the new (18) — (20) for this X-n. Depending on the start date, these four numbers will either agree or not: they will anyway change with every new start date. And for every new model M, including physical models. And all the while, (13) remains fixed and unbudgeable.
Oh, my. We still haven’t gotten to what to do if X is measured with error, or what the physical models mean, or what is a good or bad model. Stick around.
1A Bayesian might calculate a “Bayes factor” instead of (20), but this difference does not matter here, because the conclusion would be the same. I mean, the interpretation (the meaning of what follows) would be the same.
2We might have to modify the notation of (18) to indicate whether we’re computing this probability before seeing X or after it.