by Ralf Neumann, Labtimes 06/2009

Research is becoming ever more global. Increasingly, huge international research networks – usually referred to as consortia or collaboration groups – perform large, data-intensive projects. The consequence being that instead of hundreds of authors, just the consortium name is given in the article byline. A fact, which appears to pose some difficulties for the “quotation counters” working at Thomson Scientific, the key player of citation-based scientrometrics.

Part of the blame, however, should also be apportioned to the many researchers, such as a Miller or a Petit, who simply couldn’t abandon their habit of always starting the references in their publications with the first author’s name. Should a paper then appear, like the infamous 2001 Nature article on the International Human Genome Project (HGP), entitled “Initial sequencing and analysis of the human genome”, widespread confusion is inevitable. The reason being that below the title, where usually the list of authors follows, the article simply states the name of the consortium; a “partial list of authors”, naming Eric Lander as the “first author”, only appears in a footnote on page two of the article.

You’ve probably already guessed what happened as a result, haven’t you? The “traditionalists” nevertheless cited this article with “ES Lander et al.” in the reference lists of their papers but just as many cited it under “International Human Genome Sequencing Consortium” as was meant and, moreover, as was fair.

Of course, this would have been no drama if Thomson Scientific had been more vigilant with its citation databases. However, it wasn’t! Instead, Thomson Scientific incorrectly treated both citation variants of the same article as two separate publications. And that had major consequences. When later, Thomson Scientific itself calculated the ten most-cited biology papers in the year 2001 in its journal Science Watch, the human genome paper by the HGP was missing!

Ironically, the simultaneously published Science paper “The Sequence of the Human Genome” by the sequencing company Celera Genomics, which had been competing with the international consortium for years, reached second place in this “Hot Paper” list under the citation “JC Venter et al.”.

This seemed strange to Nature and, therefore, they enquired at Thomson Scientific. In the end, Thomson Scientific had to apologise. After addition of the citations for the two items that, in truth, constituted one and the same paper, they had to correct the 2001 list and add the consortium paper as the “real” number one.

Just a single regrettable incident? Not at all. The Lancet also investigated and found, for example, strikingly few citations for a 1997 stroke paper that ran “International Stroke Trial Collaboration Group” in the author’s line. Upon taking a closer look, this case appears to be even more blatant: the real names behind the Collaboration Group were not mentioned until the acknowledgements at the very end of the paper – three pages of small-print, organised by function. Subsequently, the whole list started with “IS study organization – writing committee”, which, in turn, was headed by a certain Peter Sandercock from Edinburgh.

Again, you have probably guessed the outcome. More than half of the subsequent papers that cited the Lancet article specified it in their reference lists with “P Sandercock et al.”, instead of quoting it under the consortium name and, yet again, Thomson Scientific failed to realise that in both cases the same paper was meant.

Thomson Scientific’s software and its bean counters have obviously made such errors systematically over a long period. Hence, you can be quite sure that there are even more stories with similar patterns.

The moral should, therefore, be clear: Collate consortium citations carefully!

