A new peer-reviewed paper in the journal Pediatrics shows that girls are luckier than boys in avoiding spankings, that those who were spanked as children went on to greater education than those unfortunates biffed on the butts, and that those spanked had higher incomes as adults.
Yet in the paper’s abstract, we read
Harsh physical punishment in the absence of child maltreatment is associated with mood disorders, anxiety disorders, substance abuse/dependence, and personality disorders in a general population sample. These findings inform the ongoing debate around the use of physical punishment and provide evidence that harsh physical punishment independent of child maltreatment is related to mental disorders.
And then there are these summary headlines in the popular press:
The paper is “Physical Punishment and Mental Disorders: Results From a Nationally Representative US Sample” by Afifi and others. What Afifi did was to have a browse through the the National Epidemiologic Survey on Alcohol and Related Conditions and gleaned from that data those who had self-reported mental maladies and those who self-reported being spanked as a kid. This data was the result of face-to-face interviews with U.S. Census workers (surely government workers probing for intimate for-the-record details was no bar to honesty?).
Afifi, a Canuk, starts by telling us that, “The parent or caregiver’s right to use physical punishment has currently been abolished in 32 nations.” And is there a hint of lamentation when she continues, “Canada and the United States
are not included among these countries”?
Adults who answered at least “sometimes” to the question, “As a child how often were you ever pushed, grabbed, shoved, slapped or hit by your parents or any adult living in your house?” were classified as “having experienced harsh physical punishment.” Before questioning the use of harsh for something as small as being “grabbed”, let’s recall that Canada is a foreign country and they use words differently up there, eh?
A more rough-and-tumble interpretation of harsh would be “severe physical abuse, sexual abuse, emotional abuse, physical neglect, emotional neglect, or exposure to intimate partner violence” But we can’t use this definition because the poor folks who admitted to suffering this kind of “harsh” treatment “were excluded from the current sample.” Sigh.
Afifi and crew then checked off whether each remaining individual scored highly on various questionnaires for “major depression, dysthymia, mania, hypomania, any mood disorder, panic disorder with or without agoraphobia, social phobia, specific phobia, generalized anxiety disorder, posttraumatic stress disorder, agoraphobia, any anxiety disorder, any alcohol abuse/dependence, and any drug abuse/dependence.” She even grouped various of these into “clusters.” According to Tables 2 and 3 in the paper, they have 25 separate ways to be (to use a common Canadian phrase) a jelly donut short of a filling.
Anyway, the whole lot was fed into a series of logistic regression models, first adjusting for this, and then later adjusting for that. We can be grateful that Afifi eschewed the normally sacrosanct 0.95% confidence intervals and instead called “significant” those results which had p-values less than 0.01.
Unfortunately, after this promising start, Afifi forgot to adjust for all the different tests she ran. Using (for example), the Bonferroni method, to maintain that “0.01” level of significance, actual p-values would have to be 0.0004 or lower. That means a lot of the mental maladies Afifi thought were associated with mental maladies actually weren’t. Ah, well.
Then it appears she has sometimes mixed up the idea of confidence intervals and p-values. For example, in Table 1, “Widowed/divorced/separated” are given three asterisks (supposedly significant) with a confidence interval that includes 1 (which is not significant). And this happens in Table 2, too.
There were 20,607 individuals in the database (after culling). But only 1,258, or 6%, reported having remembered, or were willing to report to a government worker, “harsh” treatment. Only 6%? Really? I emphasize this to show that measurement error is in play here, which means (in frequentist theory) that p-values are too high.
The mean age of these folks was 48.4 plus-or-minus 0.2 years. Odd, that. And, for example, of those 1,258 “harshies” just 53 reported “Schizoid personality disorder”, yet this was “significantly” higher than the “un-grabbbed.” Small numbers here.
The authors also try to forget those results which appear at the start of this post: effects which show that “harsh” treatment can be good for some (the first link recognizes this). The big question any defender (even you) of this study must answer is: what other effects were positively associated with “harsh” treatment?
It is a disservice (at the least) to go into a database and look only for what you hope to find, to ignore evidence which does not support your theory. Yet that is what appears to have happened here.