There are a subset of professional statisticians—defined as folks who have had formal training in the field given to them by other professionals—who feel that the field of statistics is useless. That if you have to prove anything using statistics, what you have “proven” stands a good chance of being a chimera. That if you need “statistically significant” findings, what you’ve “found” might be nothing more than the reflection of your desires. (Climate example is below.)
Now, if I hadn’t gone to one too many Finger Lakes wineries on Saturday, not only would I expand that theme, but I would quote from a famous twentieth-century physicist who put the matter better than I ever could. As it is, you’re on your own.
We already know that what is important in any experiment the only crucial thing is control. “Randomness” is not only not needed, it can be harmful. Again, I beg of you to consider physics.
If I desire, say, to measure the spin of certain particle, I will design an apparatus that controls for every possible variable that might affect my results. I will leave as little as possible to “randomness” and I will certainly not purposely infect my experiment with it.
It is true—it is always true for any experiment of any kind—that I might miss controlling for the variable or variables that are responsible for my results. But if I am careful, diligent, and have paid strict attention to the prior evidence, it is less likely I will miss these variables.
Anyway, I conduct the experiment and publish my results. And that’s when other physicists try to reproduce what I have done; such reproduction further reduces the risk I missed controlling variables. The most usual missed-controlling variable are my own biases, which cause me to misinterpret what I have seen, such that I write down not quite what happened, but what I wanted to happen. The people re-conducting the experiment (usually) won’t have my biases; thus, they stand a better chance of reporting what actually occurred.
Think now of standard sociological experiments, where I might be interested in how sex or race (are there any other topics?) affect some measure such as answers on a survey question. I then “randomize” (i.e., introduce noise) into my survey by calling “random” people on the phone, or more usually by grabbing data from a publicly available database, itself gathered by calling people on the phone, etc.
I then statistically model how each survey question is a function of sex or race. If the software spits out a p-value less than the magic, never-to-be-questioned number of 0.05, I announce that my worst fears have been realized and that sex and race are associated with attitudes towards…whatever it was I was asking about.
It is absurdly easy to generate publishable p-values. I often say that if you can’t find a small p-value in your data, then you haven’t tried hard enough. A small p-value, which means you have found “statistically significant” results, does not say what you think it says. It says nothing about how people of different races or sexes think about your survey question. It instead says how improbable a certain function of your data is, but only if you assume some very dicey premises which have nothing to do with the data.
Worst of all is that once other sociologists look at your results, they will almost immediately agree with them. Reproductions of sociological results are rarer than unicorns. Instead, you will find the same experiment carried out in different sub-populations, announced with, “Although much is known about X, nobody has yet written about X among under-served Eskimos…What a depressing field.
Climate Temperature Trends
Here I must beg indulgence and will jump right to the math. This is for my climatological colleagues. If y is some temperature and t time, a common model fragment might be
yt = β0 + β1t + other stuff
where if β1 is positive we can announce that there is a “trend” in the data. Of course, there is no reason in the world to create this model. All we have to do is look at the data and ask whether y has or hasn’t increased over years we have observed. Perhaps it is because that question is considered too simple, thus learned, peer-reviewed papers cannot be written answering it.
Anyway, the statistical model will spit out a p-value on the assumption that β1 = 0 and on the assumption that the model is perfectly true. The p-value will tell you how unlikely is some statistic given those assumptions.
The p-value does not tell you if
Pr (β1> 0 | model and observed data).
Nor will it tell you the probability β1 takes any value. It tells you nothing whatsoever about β1. Not one damn thing. Nothing. I hope it is plain that a p-value gives no evidence of any kind about β1.
Whether or not y has increased over time, we could have seen with just a glance. But perhaps we’re interested in predicting whether y will continue increasing (or decreasing). Then we need our model. Again, the p-value gives no evidence whether a trend will continue. As in none. As in zero.
But we might turn into Bayesians and compute
Pr (β1> 0 | model and observed data).
And if that probability is sufficiently high—say it is 99%—we might claim there is a 99% chance that the trend will continue. If we did, we would be saying what is not so. For the posterior probability of β1 tells us nothing directly about future values of y. As in nothing. As in not a thing. As in knowledge of the parameters is not knowledge of the actual future temperatures.
What we can do—if we really are as interested in y as we say we are—is to compute things like this:
Pr (yt+1> yt | model and observed data).
where the time t+1 is a time beyond what we have already observed. Or this:
Pr (yt+1> yt + 1oC | model and observed data).
or any other question we have about observable data. I hope it is unnecessary to say that we need not restrict ourselves to just yt+1 and that all these probabilities are conditional on the model we used being perfectly true.
But nobody ever does this. Why they don’t is a great psychological mystery. I have no complete answer, except to say that this lack is consistent with my theory that 95% of human race is insane (a probability sufficiently high such that your author likely belongs to this majority).