Single mother, meet jobless man. pic.twitter.com/H1ECDIU0wZ
— Ninja Economics (@NinjaEconomics) May 12, 2016
(If you can’t see the picture embedded in the tweet, click here for the original.)
The axes are “Percent of men non-employed” by “Percent of births to unmarried women”. The green dots, as indicated by the main heading, are state-wide estimates. I know nothing about the estimates, but given what we know about measuring these kinds of things, it’s a good bet every citizen in each state was not measured, and instead some kind of survey was done.
Point one, a minor note: the dots aren’t real; there is uncertainty in them not shown. That’s one form of reification: when estimate becomes the thing estimated. This is more or less harmful depending on the nature of the measurement. We can guess, given our experience with estimates of this type, that the error is not large here; but it is just a guess on our part.
Men north of 70 and south of 20 are rarely employed. They’re presumably part of the green dots. And women north of 40 and south of 15 (or so) rarely give birth. They’re also presumably part of the green dots. We’re not sure exactly who is measured!
Point two, and the main point: there is no reason in the world for that red line. That red line is the real Deadly Sin of Reification.
The red line does not exist. It is not real. It is not part of any employed or non-employed man. It is not an attribute of any mother of a legitimate or illegitimate child. It is a fiction. It is unobservable.
Drawing the red line draws the eye to where it doesn’t belong. The red line pushes out the green dots, which themselves are already a bit of a fiction, and replaces them with a thing that is far too sure of itself, and isn’t even real.
The red line is—are you ready?—itself an estimate of a parameter of a regression model, which in this case is a model of the central parameter of a normal distribution representing uncertainty in the percent illegitimate births. So it is a parameter of a probability model, and parameters don’t exist. It is only one of three parameters in this model, at that, the other two being suppressed.
The parameter isn’t a causal agent, which is important. Some things are causing the illegitimate and legitimate births, just as some things are causing the men to be employed or the men to be unemployed. We do not see any causes in this data. Any causes we infer from the graph are already in our head, as it were, put there by our commonsense.
What the graph is asking us to believe, and which is easy to believe, is that unemployed men tend to father children without bothering to marry at greater rates than employed men. The direction of cause isn’t clear. Some men seek jobs after getting women pregnant, recognizing their responsibility. Are they listed as non-employed or employed? And so on.
Normally I advocate a predictive approach. That is to say, take the data as is, propose a (non-causal) probability model for the uncertainty, and then make predictions. So that, for instance, a male unemployment percent of 30 would predict an 80% chance (or whatever) of illegitimate birth rates anywhere from (say) 22% to 48%. We could do that here, but why would we? There is nothing left to predict! This is another reason not to include the misleading red line.
All fifty states are represented. The data is at the state level, which is the same locale predictions are valid. Now everybody knows states aren’t homogeneous, not racially, not economically, not most things. Predictions made by the green dot for Michigan for Detroit would be the same as for tiny Charlevoix, hundreds of miles north. The green dots smooth out all these differences. The red line says the differences don’t matter.
“Briggs, what’s the big deal? Everybody already knows shiftless men and loose women have more babies out of wedlock than do employed men and virtuous women. You’re making a big deal out of nothing.”
I agree. Everybody already does know what the graph purports to show. So why show the graph? The graph puts hard numbers to our already well known, and confirmed by common observation, beliefs. The numbers are too certain. They’re aren’t right. The hunger for quantification is too strong.
What the author the graph should have done is measured individual unemployed and employed men and whether these men fathered illegitimate or legitimate or no children. The author could have then told stories of these men and women and the reasons (the causes) of them making babies or not.
The predictions we could make would then be at the person-level, which then might have some utility.