Update: See this post on the definition of confidence and credible intervals.
Submitted for your approval, a new paper. A polemic describing in nascent terms the paradise that awaits us once we cease teaching frequentist statistics to non-statisticians. This Great Bayesian Switch is to occur in August of 2013.
The paper is already in the hands of editors and referees at American Statistician, a general interest journal issued by the American Statistical Society. But the paper has also appeared on Arxiv (arXiv:1201.2590v1), a repository for “preprints” submitted by those who have passed the weak test of being recommended by other Arxiv authors.
(If I were able to draw, the graphic that would accompany this post would feature R.A. Fisher putting on his hat, suitcase in hand, heading out the door and casting a sidelong glance at the Rev. T. Bayes, who is looming in the foreground, a beatific, Mona-Lisa smile on his face.)
It is written to be understood and spoke to by trained statisticians, especially those that teach this grand subject. But all are welcome to read it. Be forewarned that if you do not know the precise, technical definitions of p-values and confidence intervals, information that mystifyingly appears to be well guarded secrets, then you will gain little from reading this paper.
My reasons for calling for the Bayesian Switch are outlined in the paper, so I won’t repeat them here. You may ask what are the chances that its recommendation will be implemented, and I will answer: somewhere well south of the chance that the EPA decides it has regulated its citizens enough and returns its budget unspent to Congress. The retort is: you have to start somewhere and my hope is that this paper will start a discussion (the probability of that is also low, I’m guessing).
I anticipate (from experts) a few counter arguments, which I peremptorily answer here:
You are claiming that Bayes is superior to frequestism?
Are you saying those that hold with frequentism are bad people?
But you are claiming that a theory I happen to believe is wrong. This is rude.
It is not.
Even if Bayes is better—and I say it isn’t—then professional statisticians who use frequentist techniques do a fine job.
Then what is your problem?
Most statistics are done by non-professional statisticians, and those that use frequentist techniques arrive at a level of certainty in their results that is unwarranted. This is because they fail to understand (or remember) what the results mean within frequentist theory. Plus, nearly everybody, including not a few statisticians, interpret frequentist results in Bayesian terms. We should eliminate this confusion.
They do not.
But they do. I challenge you to find me in any published statistical analysis, outside of an introductory textbook, of a confidence interval given the correct interpretation. If you can find even one instance where the confidence interval is not interpreted as a credible interval, then I will eat your hat.
This isn’t fair. Confidence intervals are tricky things; even Neyman had difficulties defining them. Anyway, most people don’t use them.
True. They use p-values. Do you really want to rehash that discussion?
Look here. It’s too much trouble to change. Coordinating such a thing would be impossible. All my slides are already made up! The books are already ordered. Exams are written. And I don’t mean mine. Do you know of all the frequentist statistics questions that are asked everywhere, from medicine boards to bar exams?
You have me there.
Well, then, that’s fine. It’s good to see even you can be reasonable.
Even if everybody can’t change, you can.
I won’t. Because I suspect that you have ulterior motives in your Great Switch beyond just teaching a simple theory of probability.
I do. I would eliminate forthwith the peculiar and misplaced focus on parameters that is the basis of both frequentist and most Bayesian analyses. This is the true source of rampant over-confidence that plagues science. If we required people to speak, instead of unobservable parameters and unverified assertions, but of tangible, measurable objects, then we would cut in half the flow of nonsense. I’m so confident of the correctness of this view, that I’ll even let you have the last word.
And I’ll take it. You are out of your mind.
Categories: Culture, Philosophy, Statistics
On this you are absolutely correct. Having recently observed (at second hand) the teaching of a statistics course for psychology and education graduate students at a top tier school, the level of self-awareness of real issues in statistical inference in the real world was close to zero. The course was taught by a statistics faculty member, but clearly as a chore to be got through as quickly as possible. Perhaps it is simply assumed that this is what is wanted by the psychology and education departments (and perhaps that is not wrong), but the whole exercise was embarrassingly inept.
There’s some interesting publicity for Bayesian methods here.
Gav, the old cliche has it that even a broken clock is right twice a day!
It is no doubt a triumph of hope over experience that leads me to think that you can be right more than twice if you would only try.
I’m a Bayesian myself (though I’m not a professional statistician), but here a few remarks (I’ve read the paper):
1. I’d like to spend more time assessing the fit of my models and getting rid of p-values. But reviewers don’t like this and ask for p-values.
2. Even if “civilians” start talking about the posterior probability of a hypothesis, they still will be overconfident, since most of them will not include the uncertainty of the model. Thus, it seems to me that the problem is with the machinery of hypothesis testing than with frequentism.
3. Last, but not least, I’d like to hear you talking more about abandoning the focus on non-observable variables. I’m thinking about models (which aren’t observable) and latent variables. I guess you’re talking about non-observable parameters, but what
sorry. What’s the difference with models and latent variables?
Double entendre! I love it.
a latent variable is a presumed hidden cause or at least a way to account for the interaction between two or more variables. A model is a description of result arising from the variables and may or may not include a latent variable. The model doesn’t cause anything (to the subject under study, that is). Some people do confuse models with reality though. Note: a description of result is mostly useless if it can’t predict future results.
Is Bayes and Switch legal in NY?
“wee p-value” ? Was that juxtaposition on page four intentional?
It’s good to see you a Gavin agreeing on something.
So is there a textbook you’d recommend on the sort of Bayesian statistics you favour? One with examples worked from basics would be nice.
Have to admit as a non-professional statistician that I am still trying to get my head around Bayes. Saw a very good explanation of some software which does (what they call) a bayesian switch to better understand probabilities with the example of medical testing for very rare occurrences. Apparently, when presented to a room full of cancer specialists, not one of them was aware of the impact of false positive rates that are similar to actual incidence levels! That in itself is a worry and I can see your point in this.
Will try to get through the paper, but have to admit I am worried by the stated need to know the definition of ‘p’ – it’s been a long time…. I guess I really am one of the people who shouldn’t be using frequentist stats!
Hear hear! For a similar appeal, see this open letter, and the book linked at the website with my name. With the advent of MCMC and software for using it, the great Bayesian switch is underway!
you said: “the model doesn’t cause anything”: agreed. However, all models are wrong, and our results are conditional on the models we used (why, for instance, use logistic regression instead of another model? why a two level hierarchical models instead of 3 level? and so on). So, we should assess the uncertainty of our estimations by assessing the uncertainty induced by our model. An no, I don’t think that using the posterior probability of the model being true will do the trick (see for instance what christian Roberts said: http://xianblog.wordpress.com/2012/01/13/on-using-the-data-twice/). Mainly because I know in advance that my model is wrong, so the probability of it being true is zero.
Again, I’m no expert on Bayesian matters and I may be wrong here. I’m most likely the above mentioned civilian (though I do like to understanding the math behind the statistics and I take a couple of graduate-level courses at the statistics department during my PhD).
Readers may find these interesting.
“R.A. Fisher putting on his hat, suitcase in hand, heading out the door and casting a sidelong glance at the Rev. T. Bayes,”
I rather like the idea of the Reverend Bayes exorcising the spirit of Fisher.
It’s probably best to think of models as approximation. What a good approximation is depends on your goals. Presumably your goal isn’t merely the construction of a model although doing so may be to show a possible mechanism. So, “why a two level hierarchical models instead of 3 level” depends on how close you need to come. If you don’t know ahead of time what “close enough” means, it may be time to re-evaluate what the goals of the model are.
“all models are wrong” and “I know in advance that my model is wrong, so the probability of it being true is zero”
In advance of what? Before you start? After you’ve built it, you can see if you’re getting answers within the needed (hopefully, predefined) bounds. That would imply the model is “true enough” if only for the current dataset however the approximation isn’t going to be useful to anyone for long if it departs from what can be expected in all cases — meaning “in future data collections” and not just the present one — even if the goal is to demonstrate a possible mechanism.
This is drifting a bit from my previous post which was to more or less describe the difference between a “latent variable” and a “model parameter”. Another answer might be: a “model parameter” is something inherent in the model. Most, if not all, tests will tell you if you’ve set the model parameters correctly which is NOT the same thing as meaning the model is a correctly describes “reality”. A “latent variable” OTOH is presumably a feature of “reality” that can be observed, at least theoretically.
For those who haven’t already seen it: John Kruschke’s open letter on the same topic
and also by JKK, “Why Frequentist Statistics is Voodoo”
To the author of this post: if you consider that error bars form an 85% confidence interval, then I think you will find that most physics papers correctly use CI’s in the frequentist sense.
Also, hearteningly, Wikipedia gives the correct definition (today) before the jump, and distinguishes CI’s from CI’s before the jump.
Mr. Briggs, thats a good start, but the entire field needs an overhaul. We are using methods designed for an environment that lacked computing power and data to try and explain trends and behaviours with sub-atomic precision.
Personally my skin crawls a little every time I see a linear regression, a Gaussian distribution, or someone assuming I should stop asking questions as soon as I see a number smaller than 0.05.
The lay person should be taught about gambling, simulation, and data compression. Save the easily abused things for the pros. 🙂
Ray, to be fair – none of the fathers of frequentist methods (were there mothers as well?) asked people to interpret their statistical tests improperly….
I think about Pearson for example, coming up with a very abstract thought experiment that could apply to any experiment from any science, and solving a special case with assumptions. Before we could monte carlo priors to posteriors to our heart’s content, I’d also note.
It is certain that there is atleast one error in the paper. “students look to up” should be “students to look up”
As long as you’re influentially reforming academia, maybe you could pass along this suggestion:
* A course on “data janitor” tasks — the annoying stuff statisticians get stuck with at companies because they’re the “numbers people” — could be extremely practical.
* As could a course on plainly working with numbers in applications. (Not mathematical theory, perhaps a little bit of EDA — more just things like “Divide by # capitae to make relative comparisons meaningful” or “Subtract off Y to look only at the relevant part of X”. Could discuss sqrt’s and log’s and perhaps even a bit of the grammar of graphics.
I’m actually not saying either of these things should be a full course. I’m more saying that statistics is seen as a relevant subject everyone should take because it’s the closest course to common-sense-working-with-numbers as a university gets. But neither H0 vs Ha nor Pr(θ|D) are nearly as relevant as looking-at-numbers-and-doing-something-sensible-to-them.
Lindley had once heralded 2020 as the date by which everyone would become Bayesian, but you know what? That could be thought to happen only by the Bayesians achieving an utterly pyrric victory, because there are essentially no true-blue subjective Bayesians around (Kadane is one of a few), the most popular Bayesianism, default Bayesianism, violates the likelihood principle and considers sampling distributions, without which they’d be unable to arrive at methods that will have decent error probabilities. That they then turn around and call their results Bayesian is just because they’ve been taught that deep down on the philosophical level any sensible method must be Bayesian. But everything that is done in between the introductory and concluding praise to the Bayesian gods is utterly non-Bayesian. If these users of Bayesian methods (Berger calls them “casual Bayesians” actually understood and articulated what was actually doing the work for them, they’d realize they were frequentist error statisticians.
There are no statistical questions on bar exams. You may be thinking of the LSAT, the test for general reasoning ability that prospective law school applicants take.