My mailbag is filling up. Today two questions from readers, both about statistics. Feel free to send yours in on any subject. Tomorrow is a doozy.
From reader Michael H. comes this email:
Hi Dr. Briggs,
I’ve been thinking about statistics much more now that I am required to take an applied statistics class for my actuarial certification.
I’ve watched your video on the crisis of evidence a couple of times. My understanding is that statistics cannot determine that something is a cause, but it may be able to say something of how it is a cause, for example to magnitude of the effect it will have given that we know it is a cause. Is this the case?
Moreover, your argument (at least as concerns the banana example) does not seem to imply that statistics cannot in principle determine cause, but because of the preponderance of possible causes it cannot distinguish between these. Would it be possible to determine a cause were the possible causes limited sufficiently, or is this a problem in principle? If in principle, what is the principle?
Probability models only tell us how our uncertainty changes in some proposition given varying assumptions. Therefore, probability models are not causal or deterministic. An example of a deterministic, but not causal, model is y = a + b*x. This says the value of y is determined by the values of a, b, and x. It says nothing directly about cause. Knowledge of cause comes from understanding a thing’s powers, and its nature or essence. These are not matters of probability.
Statistics, or probability models, in principle cannot determine cause because they remain mute about powers, natures, and essences. Understanding of these comes from induction (in its various types). Probability models aren’t even, except in trivial cases, deterministic. Consider regression. There the equation is a function of the central parameter. That is, the central parameter is said to be a function of various explanatory variables. The central parameter says nothing about any cause, therefore any function of it is silent on cause.
There’s lots more to say about this. I have some details in the paper “The Crisis Of Evidence: Why Probability And Statistics Cannot Discover Cause“, and much more in my (forthcoming?) book, which proves all these things.
Our second question—and here readers can help—comes from Miha.
My name is Miha [personal information removed]. I also teach “analytics” in our executive program and have done a number of lectures at the business school here as well…
Yesterday I listened to your outstanding podcast on frequentist and Bayesian statistics…twice. It is fantastic! The best summary of the major differences I have heard/read. I do have a question I hope you can help me with.
When speaking about subjective Bayesians you mentioned that you had seen – in writing – cases where they have made up wild probabilities (you were using the die example and that they might say the probability of getting a six is “95%”). I am curious if you have any such examples at hand. I would love to read a piece or two where this was done as I would like to understand the rational behind such an “absurd” choice (if there is one). This request is simply out of curiosity.
I plan to listen to more of your podcasts during my next long bike ride. Definitely wish I had found your work before I started teaching analytics. Fantastic stuff!
I was an Associate editor on an American Meteorological Society journal at one time and an author submitted a paper which purported to demonstrate how certain Bayesians methods worked. For one example, this author used a prior which, as they say in the lingo, was hugely informative. The example usually called for a “flat” prior, which I pointed out. The author responded to me that, as priors were subjective, he could use any he wished. This reasoning convinced the chief editor and the odd example was allowed. The paper was eventually published. Only the Lord knows how influential this was to the largely non-statistical readership.
Most professional statisticians wouldn’t make that kind of mistake. But then again, the nature of the source of parameters is rarely explored. Many “priors” aren’t priors at all, since they are “improper”, meaning they are not probabilities. And many so-called empirical Bayes analyses use priors that depend on the same kinds of dicey assumptions and data found in frequentist studies.
The die example I used proves—rather, strongly suggests—subjective probability is not a correct interpretation of probability. Given “Just 2 out of 3 Martians wear a hat and George is a Martian”, the probability of “George wears a hat” is 2/3. But a subjectivist can say 0.01115%. How can you prove him wrong? Answer: you cannot, not empirically. So the empirical interpretation of probability is also wrong. You can prove the 2/3 is right, however, by use of the statistical syllogism which relies on the more fundamental idea of “symmetry of logical constants”, which, even though it uses the word, has nothing do to with any physical symmetry. I prove—as in prove—this in my book.