The paper is “Researcher Requests for Inappropriate Analysis and Reporting: A U.S. Survey of Consulting Biostatisticians” by Min Qi Wang, Alice F. Yan, and Ralph V. Katz in Annals of Internal Medicine.
As one headline put it “1 In 4 Statisticians Say They Were Asked To Commit Scientific Fraud.”
That’s the wrong headline, though. It should read “Out of the three of four who chose to answer, one out of four biostatisticians admitted being asked to commit fraud.”
How many biostatisticians committed fraud they do not say. Smart money says at least one. Perhaps there is a way to get a p-value on that?
Anyway, our authors went on line and dangled one hundred bucks minus one in front of some ASA members, got over 500 takers (out of 4,000 asked), of which just under 400 answered the questions. We’ll never know what happened to the statisticians who vanished or to those who never bothered. Perhaps some found the questions too painful? We’ll have to agree that their missing answers don’t count—which is, after all, the standard trick. We might title the maneuver Wish Replaces Data.
Concentrate instead on (of those who answered) the top or “most severe” complaint. Which we’ll highlight.
Falsify the statistical significance (such as the P value) to support a desired result
Golly. But of those that answered—a circumlocution I will now drop, but it’s there; it’s always there—only a few say they were asked to do this. That the item was rated so severe is proof enough that p-values are magic. Or are seen as magic by most. Our refrain hasn’t changed: it’s time for the to go.
If I read the table right, it looks like the most common actual fraud request was “Stress only the significant findings, but underreport nonsignificant ones”, which just over half said happened to them. This has certainly happened to me. Often.
It’s usually subtle. “I notice the graphs cross for large x,” ran one recent request, and indicating increasing uncertainty in some measures for large x, “So can we stop the plot at x=y to show only the significant parts?”
Now this isn’t fraud, per se, and the people that asked are fine folks. They wanted to accentuate the positive and eliminate the negative, that’s all. Scientists, our younger readers may be shocked to hear, are like every other human being.
Or maybe that fits under “Do not show plot because it did not show as strong an effect as you had hoped”, which also happened to about half.
Next was “Report results before data have been cleaned and validated”; also about half. What happens here is usually more subtle. This could be laziness or anxiousness and nothing more.
Many of the other requests for fraud have to do with p-values. “Request to not properly adjust for multiple testing when ‘a priori, originally planned secondary outcomes” are shifted to an ‘a posteriori primary outcome status'”. “Conduct too many post hoc tests, but purposefully do not adjust [alpha] levels to make results look more impressive than they really are.”
That whole swath of cheating can be eliminated, or its worst effects limited, by switching to predictive methods. Put out a model in a form that can be checked by anybody, and it will be checked. Plus, you have to work (massage, tweak, manipulate) about four to eight times harder to make the results look good. I mean, anybody can get a wee p-value, but it takes a real man to get a strong predictive result.
The only other thing of real interest is the “discovery” that fraud “requests were reported most often by younger biostatisticians.”
The implies that either the fraudsters looked at the younger biostatisticians and thought them vulnerable, or the older biostatisticians more often gave in (and did not admit coercion).
How sad to think that scientists are not as they are portrayed in the movies!
Did the survey examine (or test) the congruence of the respondents to the survey population, at least by the usual demographic variables (age, sex, favorite color, etc.)? Some valid indication of similarity would quell suspicions that the results are totally worthless.
I didn’t check because the article is behind a registration wall and I don’t want my email given to yet another site that will annoy me with unwanted pleas for a subscription.
Several years ago, results for 12 double-blind randomized studies that looked at 52 questions that were posed based on results coming from observational studies found no significant results in the expected direction and five significant in the wrong direction. Additional RCTs continue to fail to replicate claims coming from observational studies, e.g. VitD, trans-fat, etc.
There appear to be systematic problems with observational studies.
I’ve been stating this for a very long time — its not that people are applying statistics wrong out of ignorance that’s the real problem, instead it is willful deceit. Why? Because favorable results, or, minimization of adverse results translates to $$ in almost every instance this occurs.
According to the article the numbers are presented as 522 queried and 390 responded with the following summary observed:
“The absolute worst offense (i.e., being asked to fake statistical significance) occurred to 3% of the survey respondents. Another 7% reported being asked to change data, and a whopping 24% — nearly 1 in 4 — said they were asked to remove or alter data.” The data is parsed into far more detail than that.
Briggs has previously explained, with some justification, that extrapolating from such survey data in other cases has credibility issues, that on top of the dubious value of the direct results (I’m paraphrasing & summarizing, both basic points). That doesn’t mean stop, or, shouldn’t mean stop.
Consider such survey data analogous to a Grand Jury review — only one-sided ‘evidence’ [for prosecution] is presented to the Grand Jury to see if a trial might be warranted, no guarantee of a conviction based on that, but if the evidence is deemed sufficient to proceed to a more rigorous inquiry, a trial, the prosecution & law enforcement continues to research and build a case.
IF the survey was reasonably objective [if the respondents were reasonably honest] the findings sure seem to indicate a basis for further research to fill in some details with more objective data … as the figures suggest unscrupulous use of statistics is rampant. And this is hardly surprising given the other indicators observed elsewhere.
Consider what a 3% measure on something this significant implies: An analogy might be a company’s quarterly forecast for earnings per share (EPS). If that’s off by 3% the market will re-value the stock price within moments of the announcement; if that’s off by 7% [or more] the resulting surprise will typically prompt the board of directors and market analysts to start questioning the executive team’s competence/ethics (review the news about GE’s $ numbers for an ongoing saga).
We are victims. We need legislation and grants! Perhaps TOF can join in.
Just remember that biostatistics is to statistics as astrology is to astronomy. Biostatisticians believe you can prove causality with statistics, they believe that confidence intervals are significant and they believe that you can extrapolate the study results from the study population to the general population. They just don’t call it extrapolation, they call it attributable risk.
One of my math professors used to warn us students that the only thing more risky than extrapolation was predicting the future.