Example Of How To Eliminate P-values & Their Replacement

Many, many more details are available in Uncertainty: The Soul of Modeling, Probability & Statistics and at this page.

Last time we learned that the way to do probability models was this:

(1) Pr( Y | X D M )

where Y is the proposition of interest, X an assumption or supposed, D some past observations, and M a group of premises which comprise our model, propositions which “map” or relate the X and D to Y. Nearly always, M is not causal, merely correlational. Causal models are as rare [as me remembering to fill in a hilarious simile].

As a for-instance we assumed Y = “The patient improves”, and X_0 = “The old treatment”, X_1 = “The New & Improved! treatment.” D are a group of observations of treatment, whether the patient improved, and any number of things we think might be related in a correlational way to Y. By “correlational” way we mean something in the causal path, or a partial cause, or something related to a cause or partial cause. If we had the causes of Y, that would be our model, and we would, scientifically speaking, be done forevermore.

M is almost always ad hoc. The usual excuse is laziness. “We’re using logistic regression,” says researcher. Why? Because the people before him used logistic regression. M can be deduced in many cases, but it is hard, brutal work—though only because our best minds have not set themselves to creating a suite of these kinds of models as they have for parameter-centric models.

Parameters do not exist (parameters in a logistic regression relate the X to the Y, etc.). They are not ontic. Because M is ad hoc, parameters are ad hoc. Which is what makes the acrimony over “priors” on parameters so depressing. By the time we’ve reached thinking about priors, we are already two or three levels of ad hociness down the hole. What’s a little more?

As I say, M can be deduced, which means there are no parameters anywhere ever. But, as it is, we can “integrate them out”, and we must do so, because (again) parameters do not exist, and because certainty in some unobservable non-existant parameters in some ad hoc model do not, they most certainly do not, translate into certainty about Y. But, of course, everybody acts as if they do.

So our cry is not only “Death to P-Values!” but “Death to Parameters!”

If we are using a parameterized model, as all regression models are, the propositions about which priors we are using are just part of M; they are part of the overall ad hociness. Point is, our bookkeeping in (1) is complete.

Enough introduction. Let’s get down to a fictitious, wholly made up, imaginary example using our scenario.

M contains a list of correlates; these are the X (M is more than the X, of course). As is usual, we suppose there are p of them, i.e. X is the compound proposition X_1 & X_2 & … & X_p. Just to hammer home the point, ideally X are those observations which give the cause of Y. Barring that, they should be related to the cause or causes. Barring that, and as is most usual, X will be—can you guess?—ad hoc.

With so much ad hociness you might ask, “Why do people take statistical models so seriously?” And you would be right to ask that—just as you are right suspecting the correct answer to that question.

Anyway, suppose X_j = “Physician’s sock color is blue”, a 0-1 “variable”. We can then compute these two probabilities:

(1) Pr( Y | X D M ),

(2) Pr( Y | X_(-j) D_(-j) M_(-j) ) = Pr( Y | [X D M]_(-j) ).

Equation (1) is the “full” M, and eq. (2) is the model sans socks. Which of these two probabilities is the correct one?

THEY BOTH ARE!

Since all probability is conditional, and we pick the X and the X are not the causes, both probabilities are correct.

Suppose we observed (1) = 0.49876 and (2) = 0.49877. This means exactly what the equations say they mean. In (1), it is the probability the patient gets better assuming all the old data including physician sock color; in (2) it is the probability the patient improves assuming all data but socks. Both assume the model.

Now I ask you the following trick question, which will be very difficult for those brought up under classical statistics to answer: Is there is a difference between (1) and (2)?

The answer is yes. Yes, 0.49876 does not equal 0.49877. They are different.

Fine. Question two: is the difference of 0.00001 important?

The answer is there is no answer. Why? Because probability is not decision. To one decision maker, interested in statements about all of humanity, that difference might make a difference. To a second decision maker, that difference is no difference at all. Fellow number two drops socks from his model. The statistician has nothing to say about the difference, nor should he. The statistician only calculates the model. The decision maker uses it.

That’s it. That’s how all of statistics should work. There remains only one small thing to note about the Xs.

Which X?

It is this: unless we are dealing with causes, the list of X is infinite. Infinite is a big number. Who gets to decide which X to include and which to leave out? Who indeed. To include any X is to assume implicitly that there is a causal connection, however weak or distantly related, to Y. These implicit premises are in M, but of course are not written out. (The mistake most make is reification; the mathematical model becomes more important than reality.)

Sock color could be causally related, weakly and distantly, to patient health. It could be that more of those docs with blue socks wear manly shoes (i.e. leather) and since manly shoes cost more, some of these docs have more money, and perhaps one reason some of these docs have more money is because they are better docs and see more or wealthier patients.

You can always tell stories like this; indeed, you must, and you do. If you did not, you would have never put the X in the model in the first place. The most important thing to recognize is this: probability is utterly and forever silent on the veracity of any causal story (unless cause is complete and known). This is why hypothesis testing—p-values, Bayes factors, etc.—are always fallacious. They mix up probability with decision.

Ken

July 31, 2017, 12:46 pm

The American Statistical Association (ASA) has taken and published regarding use of p-values consistent with Briggs’ views; see page 4 of the PDF at this link for the ASA’s Principles on the subject: http://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108

That’s the consensus position. Statisticians involved in that much still disagree on just how far and in what directions further p-value guidance ought to go; here’s a number of such views from those involved in the ASA’s consensus: http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108

Regarding those different opinions beyond the basic ASA statement:

“The p-value was never intended to be a substitute for scientific reasoning,” the ASA’s executive director, Ron Wasserstein, said in a press release. On that point, the consensus committee members agreed, but statisticians have deep philosophical differences about the proper way to approach inference and statistics, and “this was taken as a battleground for those different views,” said Steven Goodman, co-director of the Meta-Research Innovation Center at Stanford. Much of the dispute centered around technical arguments over frequentist versus Bayesian methods and possible alternatives or supplements to p-values. “There were huge differences, including profoundly different views about the core problems and practices in need of reform,” Goodman said. “People were apoplectic over it.”
That quote is from: http://fivethirtyeight.com/features/statisticians-found-one-thing-they-can-agree-on-its-time-to-stop-misusing-p-values/

The ASA statement has been shared by many others. One can search using as keywords American Statistical Association p-value and find numerous hits.
–> The necessary guidance re use of p-values is ‘out there’ and shared by many.

Implicitly assuming that instances of p-value misuse is due to ignorance by a given practitioner seems to be a problem separate from ignorance regarding the value & limitations of the analytical tool that is a p-value. That the limits of this analytical tool ARE widely known AND disseminated suggests that many instances of misuse are not evidence of ignorance, but rather, is evidence of willful intent to deceive by a given author(s).

One way to get that problem resolved, or minimized, [better than merely pointing out the flaws in given analyses] is arguing [or at least openly speculating] that the author(s) of a publication applied the statistical tool(s) so egregiously because they meant to do so, not that they didn’t know any better. Obviously, that kind of direct assault needs to be reserved to specific instances of egregious analysis/conclusion; more ambiguous analytical tools (such as the Bayesian v Frequentist debates still raging) are not so clear cut. Hold them accountable for knowing what they ought to know rather than serving them with ignorance as a defense where misuse is undeniable, AND, proper use is well communicated.

5 Comments

JohnK

July 31, 2017, 9:01 am

How I wish that this seemingly-simple post of Matt’s would be universally read and understood!
As far as I remember, “.. though only because our best minds have not set themselves to creating a suite of these kinds of models as they have for parameter-centric models” is not in Matt’s book, and is thus an additional innovation (free!). Though in his book Matt does drop a few hints here and there to smart people who want to do research — to go on and go further than he has — he never out-and-out says how much the rest of us really, really depend on smart people solving the kind of problem on view here.
Thanks in advance to the smart people who are successfully able to go on and go further in the numerous ways that Matt suggests in Uncertainty. We really, really need you.
Ken

July 31, 2017, 12:46 pm

The American Statistical Association (ASA) has taken and published regarding use of p-values consistent with Briggs’ views; see page 4 of the PDF at this link for the ASA’s Principles on the subject: http://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108

That’s the consensus position. Statisticians involved in that much still disagree on just how far and in what directions further p-value guidance ought to go; here’s a number of such views from those involved in the ASA’s consensus: http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108

Regarding those different opinions beyond the basic ASA statement:

“The p-value was never intended to be a substitute for scientific reasoning,” the ASA’s executive director, Ron Wasserstein, said in a press release. On that point, the consensus committee members agreed, but statisticians have deep philosophical differences about the proper way to approach inference and statistics, and “this was taken as a battleground for those different views,” said Steven Goodman, co-director of the Meta-Research Innovation Center at Stanford. Much of the dispute centered around technical arguments over frequentist versus Bayesian methods and possible alternatives or supplements to p-values. “There were huge differences, including profoundly different views about the core problems and practices in need of reform,” Goodman said. “People were apoplectic over it.”
That quote is from: http://fivethirtyeight.com/features/statisticians-found-one-thing-they-can-agree-on-its-time-to-stop-misusing-p-values/

The ASA statement has been shared by many others. One can search using as keywords American Statistical Association p-value and find numerous hits.
–> The necessary guidance re use of p-values is ‘out there’ and shared by many.

Implicitly assuming that instances of p-value misuse is due to ignorance by a given practitioner seems to be a problem separate from ignorance regarding the value & limitations of the analytical tool that is a p-value. That the limits of this analytical tool ARE widely known AND disseminated suggests that many instances of misuse are not evidence of ignorance, but rather, is evidence of willful intent to deceive by a given author(s).

One way to get that problem resolved, or minimized, [better than merely pointing out the flaws in given analyses] is arguing [or at least openly speculating] that the author(s) of a publication applied the statistical tool(s) so egregiously because they meant to do so, not that they didn’t know any better. Obviously, that kind of direct assault needs to be reserved to specific instances of egregious analysis/conclusion; more ambiguous analytical tools (such as the Bayesian v Frequentist debates still raging) are not so clear cut. Hold them accountable for knowing what they ought to know rather than serving them with ignorance as a defense where misuse is undeniable, AND, proper use is well communicated.
Jim Fedako

July 31, 2017, 12:48 pm

Sorry, but this sounds like an exercise in mansplaining.

Seriously, you did not seek to dialogue with me (the reader)and you never once validated my need for self-affirmation. Instead, you sought to dehumanize me through your oppressor-oppressed dichotomy, attempting to bank your so-called truths in an act of hegemonic colonization.

Shame on you.
Mactoul

August 1, 2017, 1:21 am

“Sock color could be causally related, weakly and distantly, to patient health”

Perhaps but the illustrative causal chain given is not causal at all but correlative.
brian (bulaoren)

August 1, 2017, 2:46 am

So, let me see… How could sock color be causative and not merely correlative… Ok; our doctor has blue socks (wool) but he doesn’t really like the way they look with his “Alden” cordovans. So, rather than putting them on his feet, he stuffs them down the front of his pants. Later in the day he realizes that wool can be itchy. An itchy crotch can make anyone impatient, our doctor being no exception. An impatient doctor might not provide optimal patient care QED

Example Of How To Eliminate P-values & Their Replacement

Related

5 Comments

Leave a Reply

Share this:

Related

5 Comments

Leave a Reply