This is in the I-wish-I-had-thought-of-it category. A simple tool that suggests where fraud or major malfunctions in statistical research might exist.
First, a description of the tool; second, a description of the suspicious studies which called for their use. The tool is SPRITE, thought up by James Heathers. Sample Parameter Reconstruction via Iterative TEchniques.
Research reports statistical results in all sorts of forms, but a common one is the sample size and its mean and standard deviation. Since these are simple calculations, with a given n, given mean, and given SD, only certain sets of data can support them. For instance, with n = 2 and mean of 3, the sample (1000, 1) is impossible regardless what the SD is. Specifying the SD further restricts the possible samples, as does other information, such as e.g. each number must be positive.
This SPRITE takes the mean, SD, etc. specifications and runs through the possible samples. Think of it as a form of reverse engineering. Once the possible samples are arrayed in front of you, interesting things might emerge.
Now for the study which led Heathers to SPRITE. It is one of a set of curious, which is to say suspicious, work coming out of the Food and Brand Lab at Cornell University. This is the opinion of inter alia New York magazine and Slate.
The paper here in question is “Attractive Names Sustain Increased Vegetable Intake in Schools” by Wansink et.al. (2012). See Retraction Watch for words with Wansink. Heathers says the paper “presents a simple thesis: change the name of ‘carrots’ and ‘beans’ and ‘broccoli’ to something exciting that the kids are doing (I don’t know, ‘Buzz Lightyear chard’ or ‘Pokemon kale’ etc.) and children will eat more of it.”
In the control group (at some elementary school), which called carrots carrots, the number of carrots served by the lunch lady had a reported mean of 19.4 and SD of 19.9, and n = 45 (kids). The number eaten of those carrots served was small, at least compared to the group called carrots “X-Ray Vision Carrots”. Wee p-values confirmed the “findings.”
Wait. Did Wansink say kids in the control group were served on average almost 20 carrots? They did. X-Ray carrot kids were served an average of 17 (but reportedly ate more).
Well, if the mean was 19.4, and SD was 19.9, what are the possibilities for the maximum number of carrots with a sample size of n = 45 and that “you can’t have less than zero carrots (there are no negative carrots, this isn’t Star Trek).”
Heathers’s SPRITE showed the minimum max-carrots was 53, with a maximum max-carrots of 73, with more likely values being in the neighborhood of 60-some carrots.
Given what we know of lunch ladies, serving trays, and size of carrots, is it plausible to suggest some kid was really served 60 carrots? Only if, according to Heathers, “at least one of [the students] is a Clydesdale horse.”
What makes the story cute is that Heathers assembled 60 baby carrots to see what the pile looked like. (I suppose Wansink could have meant slices of carrots and not carrots, but there’s no indication of this that I could discover.)
This was not the only difficulty of the Wansink study; Heathers details more. And it’s not the only difficulty with the research group. New York magazine called their work “really shoddy”.
…Wansink published a strange blog post last month, which led to the subsequent discovery of 150 errors in just four of his lab’s papers, strong signs of major problems in the lab’s other research, and a spate of questions about the quality of the work that goes on there. Wansink, meanwhile, has refused to share data that could help clear the whole thing up.
Here come the wee p-values.
Wansink was acknowledging, with surprising openness, taking a “failed study which had null results,” slicing and dicing the data until something interesting came out, and then publishing not one but four papers based on said slicing and dicing…One of the truisms of statistics, after all, is that if you analyze enough data from enough angles, you will discover relationships that are “significant,” in the statistical sense of the term, but that don’t actually mean anything.
God bless the magazine for its scare quotes around “significant”. If you can’t find a wee p-value in your data, you’re not trying hard enough. And God bless them for these final true words:
“Many of psychology’s most exciting ‘This One Simple Trick Can X’—style findings have turned out to be little more than statistical noise shaped sloppily into something that, in the right light and if you don’t look too hard, looks meaningful.”