Here it is! The one, the only, the peer-reviewed (and therefore true) “Reality-Based Probability & Statistics: Solving the Evidential Crisis” (the link is to the pdf, which is 11 MB; there are many pictures).
This a large review paper, summing up the problems I see in statistics, with a guide of how escape from the void.
There is no question computer scientists are kicking statisticians’ asses. Hard. As we saw yesterday. The answer is anyway simple: “AI”, which is nothing but lists of if-then statements, at least divorces, or does not concern itself much with, parameters. Statistics believes these strange entities have life! All practice, frequentist or Bayes, revolves around them. We are in orbit around a fiction.
Enough already! Let’s turn our eyes toward Reality. Here’s how.
Section 2: NEVER USE HYPOTHESIS TESTS
These are refinements to “Everything Wrong With P-values Under One Roof“, here with a mind toward cause.
P-values are now officially dead. The sooner we stop talking about them and about Reality, the better.
Section 3: MODEL SELECTION USING PREDICTIVE STATISTICS
Do we need hypothesis tests? No. And we only need model selection sometimes. If we’re forced to pick between models—and since most models are free in the sense they are only bits of code, we don’t always have to pick—then we should pick with a Reality-based metric and nothing else. Sometimes models cost model, because observations cost money, and therefore we will need to select. We do this based on Reality, not parameters.
Regular readers will be familiar with the mechanics of predictive inference, probability leakage, and all that. So you can skim this section, but pay some attention to the example.
Section 4: Y CAUSE
This is it! This is the missing element. The lack of focus on cause.
Parameter estimates are often called “effect size”, though the causes thought to generate these effects are not well specified. Models are often written in causal-like form (to be described below), or cause is conceived by drawing figurative lines or “paths” between certain parameters.
Parameters are not causes, and causes don’t happen to parameters. Probability is not real. Thus cause cannot operate on it. Parameters aren’t real: same deal.
Cause is probability and statistics is mixed up, to say the least; right ideas mix with wrong and swap places. There is no consistency.
Cause is conditional. Three small words packed with meaning. All probability is conditional, too, and in the same way. Once this is understood, we have made a great leap, and we can see what is possible to know about cause and what is not.
Section 5: TRUST BUT VERIFY
“Scarcely any who use statistical models ever ask does the model work? Not works in the sense that data can be fit to it, but works in the sense that it can make useful predictions of reality of observations never before seen or used in any way. Does the model verify?”
Then some ways this can and must be done.
Section 6: THE FUTURE
“No more hypothesis testing. Models must be reported in their predictive form, where anybody (in theory) can check the results, even if they don’t have access to the original data. All models which have any claim to sincerity must be tested against reality, first in-sample, then out-of-sample. Reality must take precedence over theory.”
This is in the inaugural edition of the Asian Journal of Economics and Banking, which does not yet have a web site (it’s that new). Paper copies are available at all better libraries.