Today, a convincing argument proving the current practice of probability and statistics is mostly wrong, or wrong-headed.
The picture above does—and does not—show herbicide use in several countries by year (full screen view).
The dots are the measured values. I know nothing about how the values were measured, whether the measurements have uncertainty in them, whether they all use the same measurement method by country or time, what the mix of herbicides are, or anything else. Doubtless these should not be dots but blobs to indicate the uncertainty in the measurements. But let that pass. Assume, as the graph asks us to, that the dots are the data. Know what that means?
THE DOTS ARE THE DATA!
Know what else the dots being the data means?
It means all that other material on the graph did not happen. The smoothed lines, the gray tubing are a statistical model. The dots happened. The statistical model did not happen.
The data are reality. The model is fantasy. Why substitute fantasy for reality?
Well, fantasy is more scientific. Science happens when ordinary data is turned into a model. Science happened here, and lots of it. Now let me hastily say that the folks who did this are the nicest people; highly intelligent, with good motives. They were only doing what everybody else does.
Only problem is, everybody else is wrong.
There was no reason to impose a statistical model on this data, and there was even less than no reason to use the model as a replacement for the data. And the model did just that: it replaced the data. Don’t think so? What was the first thing your eye was drawn to? You bet: the model. The model made the first impression.
This is natural, because the model is so smooth and lovely. It tells you what you want to hear. That something is going on and you know what it is. That’s the seductive lie of statistical models, that they know the cause of things. They don’t. They never do. They cannot. The model cannot tell you why those dots are there and why those dots took the values they did. The model replaces reality, remember.
Statistical and probability models are silent on cause. All of them. No probability model (statistics is just a subset of probability) gives information on cause. This is proved in the book Uncertainty: The Soul of Modeling, Probability & Statistics. That probability models cannot discern cause is why, not incidentally, all hypothesis testing should be eliminated (in their frequentist or Bayesian implementations).
Instead of testing and replacing data with what didn’t happen, probability models should only be used to characterize the uncertainty of that which we do not know. Pay attention, now. It’s going to get complicated.
We do not know the future. Thus, probability models of the past can be used to make guesses of the future. Not in a causative way, mind you. In a purely probabilistic way only. We can—and should—make statements like this, “Given our past measurements and given these assumptions about characterizing their uncertainty, the probability future data will look like such-and-such is X.”
That’s all probability models can do!
Probability models are only useful in making predictions. That’s it. Nothing more.
Sometimes we do not know the past. We know some of it, but not all of it. Probability models can make predictions of the parts we don’t know, just like they make predictions of the future we don’t know. Again, all we can and should say are thing like, “Given our past measurements and given these assumptions about characterizing their uncertainty, the probability the data in the past we don’t know looked like such-and-such is X.”
If you cannot see how vast and consequential a disruption this is from the current uses of probability models, then I have failed to convey just how shocking a change this is. Current usage focuses on the past and invents fallacies about discerning cause. The use I advocate is pure probability: its only use is to characterize (I do not say measure) uncertainty in the unknown.
Turns out the creators of the graph wanted to say something about increasing and decreasing use of herbicides. Reality (assuming again the dots are the real data without uncertainty) would have sufficed. We start with definition of ‘increase’ or ‘decrease’, then just look: the definition will be true or false. Models aren’t needed.
Here, the model on increases over-certainty in our judgement of what happened. Increasing and decreasing aren’t so easy to see in just the dots. They’re far too easy to see in the models. But we’re seeing what wasn’t there.