IMS: Citation Indexes Stink

The Institute of Mathematical Statistics (I am a member) has issued a report on the wide-spread misuse of Citation Statistics.

The full report may be found here.

The non-surprising main findings are:

  • Statistics are not more accurate when they are improperly used; statistics can mislead when they are misused or misunderstood.
  • The objectivity of citations is illusory because the meaning of citations is not well-understood. A citation’s meaning can be very far from “impact”.
  • While having a single number to judge quality is indeed simple, it can lead to a shallow understanding of something as complicated as research. Numbers are not inherently superior to sound judgments.

The last point is not just relevant to citation statistics, but applies equally well to many areas, such as (thanks to Bernie for reminding me of this) trying to quantify “climate sensitivity” with just one number.

More findings from the report:

  • For journals, the impact factor is most often used for ranking. This is a simple average derived from the distribution of citations for a collection of articles in the journal. The average captures only a small amount of information about that distribution, and it is a rather crude statistic. In addition, there are many confounding factors when judging journals by citations, and any comparison of journals requires caution when using impact factors. Using the impact factor alone to judge a journal is like using weight alone to judge a person’s health.
  • For papers, instead of relying on the actual count of citations to compare individual papers, people frequently substitute the impact factor of the journals in which the papers appear. They believe that higher impact factors must mean higher citation counts. But this is often not the case! This is a pervasive misuse of statistics that needs to be challenged whenever and wherever it occurs.
  • For individual scientists, complete citation records can be difficult to compare. As a consequence, there have been attempts to find simple statistics that capture the full complexity of a scientist’s citation record with a single number. The most notable of these is the h?index, which seems to be gaining in popularity. But even a casual inspection of the h?index and its variants shows that these are naive attempts to understand complicated citation records. While they capture a small amount of information about the distribution of a scientist’s citations, they lose crucial information that is essential for the assessment of research.

I can report that many in medicine fixate and are enthralled by a journal’s “impact factor”, which is, as the report says, a horrible statistic—with an awful sounding name. The “h index” is “the largest n for which he/she has published n articles, each with at least n citations.”

Naturally, now that we statisticians have weighed in on the matter, we can expect a complete stoppage in the usage of citation statistics.


  1. Naturally, now that we statisticians have weighed in on the matter, we can expect a complete stoppage in the usage of citation statistics.

    I always like to start the morning with a good laugh.


  2. Bernie

    This reminds me of discussions around the use of cut-off scores when analyzing the impact of IQ on career success or life success: You need a certain number of citations to qualify but after that it may not mean much. Now how one decides on the cut-off score is another tricky issue.

    But on the more general point of a single measure, such an approach needs to be used very circumspectly and with a very clear understanding of the measure and its relationship to what you are interested in. For example, as a measure of research productivity I would think that such an index would be a very poor substitute for looking at the actual product. On the other hand, if you want to use it for comparing academic departments in the same discipline then there might be some benefit.

  3. PaulM

    I’m afraid I disagree somewhat. I’m reminded of that Churchill quote: ‘democracy is the worst form of government except for all the others that have been tried’. The report doesn’t really have many ‘findings’, more opinions. It completely fails to address the key problem, which is that the current peer review system is at best subjective and worst a prejudiced and unfair ‘old boy network’. Furthermore it is hugely costly in manpower.

    My research is currently being assessed by an RAE panel. There are two people on the panel who I know fairly well but I don’t work in the same field as them these days; the one panel member who does work in my field takes a rather different approach and so will probably not rate my research highly.
    Of course it is possible to pick holes in h factor statistics by dreaming up hypothetical scenarios of researchers A and B. And of course such indices need to be used carefully, and not across different disciplines, and not to the complete exclusion of peer review. But I would rather be judged on my h factor (which is about 15) than by the subjective opinions of a small panel, only one of whom has looked at only 4 of my papers (that is how the RAE works). The nice thing about citations is that automatically, a large worldwide panel of experts in exactly my field is rating my research by citing it.
    Another concern is that the ‘powers that be’ in the field got to where they are by peer review, and may fear losing their power as the emphasis on peer review is reduced. Would the queen support abolishing the monarchy?!

    Yes, citation indices stink, but so does peer review.

  4. Briggs



    Peer review, at best, tends to enforce conservative middle-of-the-road results, at worst, mediocrity.

    But a citation index is no better.

    In fact, given all of human history, the only answer seems to be time.


  5. Hmmm. Could we construct some explicit bayesian method by which citations over time are considered?

  6. Joe Triscari

    One of the points they mention but don’t pursue too aggressively is why citation is a metric worth tracking at all. I think it’s justifiable that they don’t pursue this since they are talking from a statistician’s point of view and they don’t want to comment on the “sociology of citations.” On the other hand, I do think it is key.

    I have never tried to formally classify citations but I have noticed that an article with >10 citations is frequently only building on the work of a few and very often those few are previous articles by the authors. It seems that most citations are what I call the “Marketing Section.” The author is saying, “I am working on this problem because it is important and here are a bunch of other people working in this general area.” Very few of those articles are actually work that is being built upon in the sense that the author needs results from those papers.

    In fairness, this is something I’ve observed mainly in image and signal processing papers. Other areas may be different.

    Anyway, with citation indices becoming important I can easily see reviewers insisting that citations to their or their students’ work be inserted on very thin bases. There will be publication strategies whereby you insert references the work of editors or known reviewers. Or important work will be divided into multiple articles at the expense of clarity. Oh wait…

    Even if they were statistically sound, I don’t really see these metrics as a path to objective evaluation of technical work. It will just make it clear what your “number” is. I’m pretty sure that’s not a way to improve science or the careers of its practitioners. It will probably just formalize the “mediocrity.”

Leave a Reply

Your email address will not be published. Required fields are marked *