No, I don’t think so, but the Census Bureau thought (thinks?) as much.
What follows is one of the more curious emails I’ve received, describing the experiences of Juan (not his real name), who used to work at the Census. Perhaps his story proves what readers have suspected: the more time spent with statistics, the looser your grasp of reality.
I’d really like your help with this one. I’m not sure how to answer.
…I was working at the U.S. Census Bureau doing quality control on the Current Population Survey. The primary way that we checked data quality was by performing re-interviews. We would ask the same set of questions, for the same time period, from a sub-sample of the households in our monthly survey.
One day I got the bright idea that the re-interview data I had looked a lot like a Markov chain. There was possibility that a different answer was given in the re-interview than there was in the interview. Questions that had a high frequency of answers changing were considered unreliable. I had a matrix for each question showing the frequency/probability of moving from one state (answer) to another. This looked just like the transition probability matrix that I had been taught about in my first stochastics class. I remember a problem where we had to predict tomorrow’s weather based on today’s. This was exactly the same and I went about taking a limit and calculating the stationary distributions for several of the re-interview questions.
My branch chief had me run the idea by the chief statistician at the Census Bureau and his reaction was not what I was expecting. He said that calculating the stationary distribution was simulating an immoral experiment! His thought process, as best I can remember it, was that taking the limit of that matrix was simulating re-interviewing our sample households an infinite number of times which was immoral.
A couple of years later I asked a friend, who holds a PhD in Biostatistics from Harvard, about this and she agreed with the chief statistician. This seems to me like they are taking the abstract and trying to make it real which is a huge stretch for me. Is the Bayesian interpretation of this approach different? Would Bayesians have moral qualms about calculations the stationary distribution in such a situation?
I followed up with Juan and he gave me more details confirming the story. An example of how a question on race (which was mutable) heads this post. Terrific evidence that most survey data should not be taken at face value.
Markov chains are the fancy names given to certain probabilities. Juan used weather as an example. Suppose the chance of a wet day following a dry day, given some evidence, is p01, and the chance of a wet day following a wet day, given some evidence, is p11, and say, p11 > p01. Since these are probabilities, they don’t for instance tell us why wet days are more likely to follow wet than dry days; they only characterize the uncertainty.
This is a “two-state” (wet or dry) Markov chain. Of course, you can have as many states as you like (there are 6 above), and the matrix of the probability of going from one state to another, given some evidence, describes the chain. The “stationary distribution” of any chain are the calculated probabilities of how likely the system will be in any state “in the long run”. These probabilities are no longer conditional on the previous state, but they still are (obviously) on whatever evidence was used.
There is no such thing as “the long run” as in Keynes’s quip and in the directors odd idea of infinite simulations, but these stationary distributions are useful as approximations. Say we wanted to know the chance of wet day conditional only on the evidence and not on whether yesterday was wet or dry. We get that from the stationary distribution. If, for example, p11 = 0.5 and p01 = 0.1, then the stationary distribution is π(0) = 0.17 and π(1) = 0.83 (if the back of my envelope isn’t misleading me).
What Juan did was to use the evidence of questions changing answers in the sample to guess the probability of each of the answers would be given by the population as a whole, e.g. the probability of being white, etc. Understand that this final guess was based on the guess of the transition probabilities. No matter what, guessing, i.e. models, were involved.
Is modeling immoral? Those stationary distributions are deduced, i.e. they follow no matter what, from the transition probabilities. They’re there whether they’re used or not.
One possibility is that the Census is supposed to be an enumeration—no models. And thank the Lord for that. Perhaps the director thought any introduction of models breached their mandate? Doubtful, since the document which gave the table above is filled with statistical models.
There’s even a model for the table above, which attempts to do exactly what Juan did, but using another classical test (“Index of consistency”). So I’m lost. Is this yet another frequentist panic, another instance of the Deadly Sin of Reification?
What do you think?
I, just yesterday, did some Markov expected steps to absorption calculations. Let us hope I do not need to see a confessor!
This is really odd. The use of the word “immoral” is so jarring in the context that it makes me think it is sarcasm/hyperbole for effect. I’ve had professors make similar quips, such as “using this test on non-normal data would be immoral”. In other words, it violates assumptions and rules and you shouldn’t do it. The rest is emphasis. (This is probably obvious, but I thought it worth stating.)
Juan made a transition matrix that contains the probabilities: Pr(Reinterviewed answer | first answer). The transition matrix is a model of ONLY what the second answer might be, given the first. It does not contain any information (as far as I can tell from the description given) about a re-interviewed answer being yet again re-interviewed. In other words, we do NOT have Pr(Answer_(time+1) | Answer_(time)).
What this means is that finding a stationary distribution of that transition matrix doesn’t actually give the estimate of the distribution of re-interviewed answers. It gives a distribution of a very large number of re-interviews for each person in the census under the assumption (implicit) that each re-interview can be treated as if it’s the first interview, etc. This isn’t “immoral”, but it does seem incorrect for the situation.
In my view, we can only move ahead 1 step with the transition matrix. After that point, the state transition probabilities are no longer in the scenario that they were conditioned on (first answers sent in on the first form without the pyschological effects of being re-interviewed, etc.).
Short version: The evidence gathered only lets you transition 1 step. No more than that.
However, the results from that 1 step would certainly be interesting, and might tell you something about the likelihood of census answers being ‘correct’ or not. So, if ‘immoral’ is meant as hyperbole, then the above is my take on why Juan’s method is ‘immoral’.
L’esprit de l’escalier strikes again.
Juan’s transition matrix is not memoryless. It is explicitly conditioned on a sequence of re-interviewing. You can only take the matrix as far as its state transition conditions allow.
Of course, Juan could not get great estimates of a transition matrix from this one observation, but that does not imply he couldn’t get any estimates. All probability is conditional on the assumed evidence: thus, he could fit his model.
It won’t be a wonderful model, as is, but would using it be immoral? It appears as more than hyperbole on the director’s part, his use of that word.
That simulating infinite trials must come into it. It isn’t right: there is no simulation and there are no infinite trials.
I’m still lost.
I do not believe statistics have morality or that the calculation of a statistic can be immoral. The only thing that can be immoral is the use of the answers. The census worker apparently feared misuse of the statistic’s output (translation:somehow this number will turn out to be not PC and we will be blamed). You see this in global warming science all the time. There is THE allow statistic and then there’s all the others. At least in global warming they just call the statistic wrong and mock the user. They haven’t gotten to “immoral” yet, thought I imagine it’s coming soon.
Currently, there seems to be an increasing use of resampling statistics to generate “new” data sets and more tables and graphs. Perhaps Juan was just years ahead of his time.
Could he be reifying the idea of convergence, to be saying in effect we could never reinterview an infinite number of times, so modeling that would be wrong? I don’t know.
I’m somewhat curious as to why the steady state transition matrix is useful at all, when only considering one reinterview. The one tangible thing I finally came up with is that if the observed transition matrix were close to the steady state, then the marginal survey responses would not be far wrong, even if there were a lot of individual subjects whose answers were unreliable. But this isn’t an area I have a lot of expertise in and perhaps that’s too simplistic.
When I worked for M. D. Anderson Cancer Center, we evaluated clinical trial designs by treating untold millions of computer simulated patients. We performed experiments in silico that would be unethical to perform on real patients. Does the Census Bureau believe we owe an apology to the simulated patients we exposed to hypothetical treatments?
The only thing that can be immoral is the use of the answers.
Agreed and to me there’s a question of immorality even in “asking” for a person’s race. Why do they even need to know that anymore?
My wife has some American Indian on her father’s side; and her mother’s grandfather was 100% American Indian. What do you do with that?
I’m reminded of the Longmire episode where a Tribal Council member was killed because he had introduced bloodline calculation changes to “exclude” more people from being members of the tribe (so “white people” like my wife couldn’t horn in on the action – but the calculations excluded others who were longtime and “obvious” indigenous peoples).
Longmire’s antagonist (who was part of the “conspiracy”) complained that the U.S. Government has “official” bloodline recognitions for two “species”, horses and American Indians.
I was so used to being categorized as ‘white’ (even though “people of color” have asked me if I was “mulatto” – are those PC terms?), I was filling out a form at college and “blanked” on the word “Caucasian” and started filling in something else – whoever took my form had me correct it.
I also remember the M*A*S*H episode where Klinger had an adverse reaction to a medicine that everyone else got (I think it was Sulfur). Klinger was accused of being lazy and goldbricking until it turned out people of Eastern Mediterranean blood generally have an adverse reaction to the medicine in question with symptoms matching those of Klinger’s.
Do I watch too much TV?
I wasn’t intending to say that he couldn’t get any estimates, just that trying to get a stationary distribution violated some assumptions and went beyond his available evidence. It’s the act of using evidence in the wrong context/way that might have been the prompt for ‘immoral’.
I do think that, from the available evidence, there are still estimates that can be gathered, and that the gathering and usage of those estimates would not be ‘immoral’.
John B(): Agreed on the question of race. In 1990, my niece was a year old. She is mixed race. My brother actually ran the census taker off because the census taker would not put “mixed race” on the form. My brother never did answer the question.
I generally check the “prefer not to answer” box, when available. More and more, that option exists. It seems many, many people thought race was not relevent to most questionnaires.
We could debate whether asking “what race” is moral in itself, completely apart from the notion of “immoral” statistics applied. That would be off-topic, though, so I shall try not to go further.
I’m not qualified to comment on Markov chain probabilities and I agree with Sheri on the moral implications of statistical misuse so my comment pertains to the Current Population Survey (http://www.bls.gov/cps/). Several years ago my household was chosen as one of the ~60,000 (~0.05% of total households) queried that year. Each month I received a phone call from a nice man named Calvin who dutifully asked me a boatload of demographic questions. It took about twenty minutes the first few months, but later on many questions just asked about changes from the previous month. Topics included number and ages in household, educational and employment status, etc. Nothing particularly invasive or extremely personal. Race/ethnicity was requested and I can see, given the increasing number of mixed-race individuals (the category is “Two or More Races”), how in one month someone might consider himself (or members of his household) to be of one or the other instead of the combination category. Remembering the previous month’s answer isn’t a high priority for most folks. I suppose a few also get annoyed and may start messing with their answers for entertainment purposes. Analysis of response consistency seems a reasonable precaution when national policy is affected by the data.
BTW, all you get is thanks for participating in the CPS. They don’t even give you a dollar like the Arbitron radio ratings surveys do (I got chosen once for that too).
It’s been a long time since I studied Markov processes, so forgive any errors. As I recall, transition probabilities in a Markov process are supposed to depend on the state one is in, but not on historic states. I don’t think that assumption would hold in this example.
Imagine people were interviewed 3 times. We seek, e.g., the probability that someone who identified himself as “black” on the 2nd interview would then identify himself as “black” on the 3rd interview. I think that probability would not be independent of how he identified himself on the 1st interview. Thus, this situation is not properly represented as a Markov process.
Ah, a bright idea? Not at all. Well, after reading Juan’s email, I’d not trust Juan to relay what his boss said. And I’d not conclude the following either.
It sounds more like that Juan needs to spend more time studying statistics instead.
David in Cal,
You’re just operating with different premises than Juan, hence you have a different model. Remember: probability does not tell us the cause. It could well be that these changes are only because people checked boxes on forms too quickly. Who knows?
As long as he didn’t take any classes from a frequentist…
Plus, it’s rude to imply a guest is a liar (based only on a difference of opinion on a statistical model). Dems fightin’ words.
Really? I understand that a lot of the people who frequent this blog are much better statisticians than I am. In fact I wouldn’t have e-mailed Dr. Briggs if I didn’t think he could provide insight that is beyond my reach.
The man who told me that this was a simulation of an immoral experiment was not my boss. In fact my boss liked the idea. My purpose is not to caste aspersions on this man but to try and find out if anyone thinks he had a point. He was well respected at the Census Bureau by everyone including myself.
If he had come back to me and stated that I violated basic assumptions of Markov Chains (which he was competent to do) then I would have used it as a learning experience and moved on. He didn’t and I still find his critique to be unfathomable.
If you’re looking to get something out of it then why give random answers when not answering enters you into a lottery for an extended, all expenses paid vacation at a federally maintained resort?
Say what? Not sure where you’re going with that comment. I offered the comparison to contrast public v. private sector data gathering incentives. One relies on good will or a sense of duty, the other on a nominal quid pro quo.
One relies on good will or a sense of duty
Or force of law. You are legally required to answer census questions or suffer penalties under the law. Why would they pay you?
I’d not trust my children to relay what their statistics professor teach them,
and to tell me what a stochastic process is, or what a Markov chain stochastic process is or why re-interviewing or repeated sampling however many times usually has nothing to do with a Markov stochastic process.
Am I implying that my children are liars? Not at all.
Not based only on a difference of opinion on a statistical model either. There is no valid model to start with!
You seem to always be able to somehow come up with some sort of negative views about statisticians/scientists. Reps fightin’ words.
One of the first things that ought to have been considered & explored are the questions & how they’re presented — honest interviewees might hear the same question again and perceive a different question, thus providing a different (seemingly contradictory) answer.
–> The implicit assumption that different answers to the same question reflect inconsistent responses [to the “same” question] is very commonly false [the interviewee perceives a different question and answers that accordingly]. Thus, any quantitative analysis based on the premise that the survey questions are valid & consistently heard is fundamentally flawed until that facet of analysis has been addressed.
To observe how easy a seemingly unambiguous statement can be perceived so differently, consider the classic example —
What is the person saying with this statement:
“I didn’t say he beat his wife.”
There are at least seven distinctly different messages conveyed by one making that statement, depending on which word is emphasized, or, if the statement is presented as a monotone. More when multiple points of emphasis & de-emphasis are used.
Surveys are commonly like that too.
When Juan noted: “Questions that had a high frequency of answers changing were considered unreliable.”
The proper thing to do would be to simply delete those questions & answers from the data, and analyze what’s left.
Seems to me that what the senior decision-makers were getting at with the “immoral” remark was a play on similar, & commonly used, words:
Once you have good reason to know that the “integrity” (accuracy) of certain data is “corrupt” (the data contains errors & is inaccurate) any analysis involving that data & any conclusions from that analysis are similarly tainted by the inaccuracies on which the analysis is founded — if the input data lacks “integrity” (is inaccurate), the output must also lack “integrity” (be inaccurate). Garbage In, Garbage Out.
The terms “integrity”& “corrupt” when applied to data are readily understood to mean accuracy (or precision, though these are not quite the same) & inaccuracy (or imprecision). Use of these words, which also have moral meanings in different contexts, is clearly understood in the data context as adjectives regarding the quality of the data, with no moral implications. I.E., when we hear that certain data is “corrupted” nobody with an inkling of familiarity perceives that somehow the data has been made “evil” to some extent, as if the data had its own anima/volition and will act with malice aforethought…we instantly understand the data has errors and is not to be trusted–it should be rejected (and the sources of the error(s) be fixed).
Thus, to say the analysis is “immoral” is a creative backwards extrapolation of a term having moral meaning for use with analytic terms having dual analytic & moral meanings:
It’s a play on words — basing an analysis on “corrupt” data will inevitably lead to “corrupt” results…and “corruption” is “immoral” … so doing an analysis[or “simulation”] that [you ought to know] will yield “corrupt” results [because it is based on “corrupt” data] is “immoral.”
This specific play on words mixes metaphorical concepts–introducing “moral” or [in this case] “immoral” in the context of “corrupt” data as described here is really not that uncommon and that is used in the same basic sense as the other fun expression:
“Torture the data until it confesses.”
(another with an underlying morality theme, the Spanish Inquisition, associated with intentional analytic manipulation to get a particular/desired result regardless of what the truth is [typically associated with marketing & routinely used in the financial services industry]; of course, nobody thinks data has feelings that render it susceptible to “torture” from which a “confession” is elicited — this expression reflects willful analytic misconduct, what some call an “evil” or “malevolent” analysis [Juan’s description is consistent with inadvertent analytic misconduct, so rather than being teased harshly as “evil” the softer “immoral” was used; again, use of that term (and others) in this sense isn’t that uncommon…]).
” He said that calculating the stationary distribution was simulating an immoral experiment!”
“A couple of years later I asked a friend, who holds a PhD in Biostatistics from Harvard, about this and she agreed with the chief statistician.”
My take on the wording is that the chief statistician and the friend with the PHD in Biostatistics considered it to be experimentation on human subjects. Which would be considered immoral without explicit consent from the subjects of the experiment.
If indeed these people consider statistical analysis of human subjects, wouldn’t all the reported averages, trends and so forth done by the census bureau fall under the same catagory? Census statistics are all used in human “experimentation” as it is. The statistics determine money flow, favored treatment based on race or gender, legal arguments, where one votes, how many representatives one gets and none of this is voluntary.
Juan, yes, really, it is not a bright idea. No, you don’t come across as trying to insult anyone. You don’t seem to trust your own memory! I mainly have an issue with Briggs’s strange conclusion.
Thanks Doc, for providing a more refined way rather than peeing over someone’s idea, a la Napolean vs Snowball. From now on any idea I do not like is immoral. Oh BTW, your post is downright righteous.
No, the CPS is completely voluntary, unlike the decadal census. You get selected by some process, called and asked to participate, and you can accept or refuse. I was curious about it, having used some CPS data before.
Ethical concerns about surveying is taken more seriously now. Most universities have Institutional Review Boards (IRBs) with established processes for human subject research which includes surveys. Training in research ethics is mandatory (usually an online course) with recertification every few years. If climate scientists and had been put through the process, we probably would have much more integrity in research and publication. I’d demand journalists take the training too.
CITI is one example of how the process works.
Gary: Interesting website—good to know someone thinks about ethics nowadays. The census bureau is still going to produce the statistics that affect everyone’s life no matter whether it is ethical or not because they can. Legally, they can compel you to answer questions on the census forms though not the CPS, at least some of them. However, this works the same way the law and Obama does–you refuse to follow the law and nothing happens. Nothing. No one has been fined or prosecuted for decades for refusing to answer and the Census Bureau appears to have no desire to anger Americans by fining people (it’s a misdemeanor anyway). So required, yes. Compelled, no.
I thought the same thing about climate change science–they need the course on ethics before they write anymore seriously flawed, made up papers. Journalists taking the course would help, too, unless the urge to “change the world at any cost” overwhelms their ethics as we see it doing now.
Is there some chance this is just a translation error? Wrong is often immoral, but Juan’s bright idea is certainly wrong (because he implicitly postulates an infinite regression on the error he is supposed to be measuring), but equally clearly not immoral.
since calculating statistics can’t be truly immoral, the use of the word must have been hyperbole.
unless of course “immoral” was used in a politically correct sense, where “immoral” has changed from “violates god’s laws” to mean something more like “violates regulations”.