Reader Query & Assist Request: Health Data In India


Busy Saturday, so a plea from reader Aman Rastogi:


I am a student from Lucknow University, INDIA, pursuing my Masters in Public Health there.

Few days back I have watched two of your videos on youtube about statistical fallacies and crisis of evidence in public health and it has changed my perception of viewing the collected data, thank you for that.

In those videos you were saying that without observing each and every individual we cannot come on a correct conclusion about the causation or even the association of the problem with the disease because it may give a lot of statistical junk that we would believe.

So, what would you prefer for a country like INDIA where even the data collection is a big problem because of so many reasons like the weak health information system, low salaries and huge population covering burden on the local data collector, a not much interest of people itself, etc. here are so many uncountable problems to face for a health professional. So, what can be a one solution to counter this problem and engaging all of the needed population with a reliable statistical data.

Because the data suffers here when it goes from one level to another as either professionals don’t want them to be reported or some other reason. And the policy makers get a very manipulated data that arises the big problem. Because at first the data was not collected with keen observation and then it got manipulated.

Even the big shot organizations at state level or organization like UNICEF have to rely on the collected data, whatever it is, and then they make up the policy for it.

So please give me some sort of solution to counter this problem or please publish the solution in one of your paper or book ASAP.

I put this up for reader discussion since I know little about Indian health care and almost nothing about how the Indian government collects data.

But I do know about “anecdotal” data, which has been given a bad name. “Observational” or “anecdotal” data have different senses. The first are the daily living “data” that comes to us unbidden via regular experience, “data” which is responsible for tradition, commonsense, stereotypes, street knowledge, and so on. This is usually great data, and there is little wrong with the judgments we make using it.

Nobody bats 1.000, of course. For instance, Steven Goldberg in When Wish Replaces Thought and Fads and Fallacies in the Social Sciences shows us that our stereotypes are usually correct in their form. But they aren’t always right in their theory, i.e. what caused the stereotypes to be true.

The second sense is what we usually think of as observational data, collected ad hoc, say, from health ministries, and not gathered from controlled experiment. I use control in the same sense an engineer or physicist does, actual material control of a thing, and not in the statistical sense, which isn’t control at all but a way of seeing how uncertainty might change as a thing changes. That people mix these uses up accounts for much over-certainty.

Anyway, there isn’t anything inherently wrong with this second sense of observational data, except that it’s far, far too often input into statistical routines which guarantee over-certainty, like hypothesis testing and parameter estimation. People will claim causation has been found merely because they were able to quantify the analysis of observational data. Quantification is seen universally as superior to the conclusions reached by observational data of the first kind, when usually the reverse is true. This is because observational data of the second kind is often of a much more limited nature than the first kind, which reflects the broad experience of many.

Now WHO is one of those organizations, like all modern bureaucracies, that insist on quantification. This insistence is why so much is wrongheaded in government, because the insistence drives over-certainty. And the same would hold for true with the Indian medical system if it were to embrace rapid data collection. Again, it’s not that collecting data is bad per se, but that it’s collected for the sake of collection and then quantified because that’s what turns it into Science™ is a problem.

Therefore, it would be best to advocate discussions of elders, those who have had the longest experience in medicine as she is actually practiced.

Obviously there is much more that can be said, but you get the idea, I hope.


  1. Briggs


    We figure that you won’t get paid unless you can prove sufficient (monthly? weekly?) activity. That so?

  2. I get paid twinkies! I was just wondering of you have an example of things going awry.


  3. k. Kilty

    Jmj: Toxic oil syndrome is as good an example as exists, I think. In the 1980s some tens of thousands of people in Spain died of ingesting tainted cooking oil. However for a very long time, and as the death toll mounted, public health officers were convinced that this was a pneumonia-like disease. The symptoms were like those of pneumonia and autopsies showed a bacterium consistent with pneumonia in the lungs of many victims. This overcertainty led people to discount the value of outliers. One such outlier was an infant that came down with the syndrome. I seem to recall that this infant was unique, or maybe just extremely rare, but in any event the physician who pursued a question of what characteristic this infant share with the rest of the family, versus so many other unaffected infants, lead to identification of cooking oil as the common factor. The tainted rape seed oil had come from other European countries and had been denatured with an aniline. Local refiners tried to remove the aniline and repackage the oil to sell at a discount. The story contains many interesting lessons in trade, tariffs, the morality of intentionally poisoning food stuff to protect local producers and tax receipts, but the important lesson here is this.

    Overcertainty on a model, even one supported by some observations like the bacterium, caused people to reject outliers, like the infact, as spurious noise. In fact the outliers, being rare events, carried far more information than the bulk of observations. The noise was the signal in this instance. I am not sure how one recognizes noise as important signal except to look carefully at unexpected outcomes one by one.

  4. k. Kilty

    Gee whiz. Three typos, at least, in that short post, but I think it is still quite clear.

  5. Bill S.

    Not all that far off topic?
    I was reviewing Michigan versus EPA.
    Noticed that the EPS came up with $30 billion of ancillary savings per annum as part of there after the fact cost analysis. But I can’t seem to find a copy of the actual analysis used to come up with the savings. Does anybody know how to find the EPA’s analysis?

  6. Bill S.

    Typo courtesy of visiting this website too often?

  7. Bill S: Typos are now programmed into cyberspace! 🙂

    This is the best answer I can find to your question and is referenced in most of the legal documents (the amounts are anyway):

    Not sure how on or off topic it is. Looks like a fantasy statistical write-up based on whatever looked good at the moment. I’d like to say that the US sometimes suffers from poor data collection, but that would imply we even collect data. Can’t see it in this case.

    It is an answer to JMJ’s question—there is an incredible amount of over-certainty in the estimates of cases “saved” etc by the EPA.

    Answer to blog post:
    One possibility is internet collection of data, if you can keep track of who is reporting so you don’t get duplicates. It will be lacking, but if you combine it with the government numbers, you might get a truer picture.
    Many years ago, when I got out of college, there was a job opening for someone to visit Native American reservations and log TB cases. This was one person going directly to the source. I don’t know what that would cost, but it might be possible to engage students, etc, in the project. (Think social networking today, maybe.)
    Again, years back, in the USA, in smaller towns there was often one or two people who knew what was going on. (We probably call them nosy now.) This might be a source, because as Briggs indicated, all anecdoctal evidence is not equal but some sources are less biased than others. I guess the trick is to get people to give the information without them deciding you want a certain answer. Sometimes just a conversation can best yield this. You’ll need multiple sources I think to help weed out confirmation bias, political posturing, etc.

    This really is a difficult situation and I think it’s great you see the problems involved and are trying to address them. Good luck with your studies.

Leave a Reply

Your email address will not be published. Required fields are marked *