Thanks for Checking
(February 19th, 2016) Re-analysed data in a paper about an ancient Ethiopian genome reveals that scientists make human errors, too.
He looked toward the entrance of the cave. Another dawn brought daylight to his eyes. The dark-skinned, brown-eyed, lactose-intolerant man stood proudly, watching his kids running towards the sunrise. Four thousand five hundred years into the future, he’ll be dead and Gallego-Llorente et al. will dig out his skull to extract DNA from the petrous bone of his temporal lobe. By 2015 AD, all information on his DNA will be published in Science. The information of the human’s name will, however, be lost. Scientists will re-name him “Mota”, in honour of his household, the Mota cave in the highlands of beautiful Ethiopia.
The genome from Mota tells the story of a huge reverse-migration that brought humans back to Africa. Around 3,000 years ago, Middle Eastern farmers decided to go back to the Mother continent and spike extensively the genetic pool with their Eurasian genes. Or at least that was claimed in the study by Gallego-Llorente et al. As a result of reverse-migration, sub-Saharan groups, or even the Mbuti people, a pygmy group in Central Africa, would theoretically harbour up to six percent Eurasian ancestry. The scientific community, including population geneticists Pontus Skoglund and David Reich (both Harvard), met such a widespread, back-flow of genes with the same contempt you would meet the testimony of a 4,500 year old stranger, with a made-up name. Thus, the two asked for Mota’s genome, re-analysed it and came to a very different conclusion.
What was the problem? Well, as is so often the case in many situations, it was just a communication problem. Here, samtools v0.1.19 can’t talk straight to PLINK. Samtools is a bioinformatics package for editing the mountains of information that result from sequencing Mota’s DNA. PLINK is a whole genome association tool that analyses, among others, ancestry after comparing DNA sequences. Both are great tools designed by Heng Li and Shaun Purcell, respectively.
However, a script must be run to introduce samtools to PLINK, so that they understand each other before crunching DNA data. But “somebody forgot to run the script”, admits co-author Andrea Manica, also population geneticist (Cambridge, UK). Without that script PLINK didn’t get 255,922 single nucleotide polymorphisms (SNP) present in Mota’s genome and probably thought “who cares about calling a quarter of a million of SNPs?” When the analysis was re-run, including the harmonising script, the overall picture looked similar. Mota is still Ethiopian. The contribution of Eurasian genes, however, is not extensive but rather, it is limited to eastern Africa. “It was a clear human error,” reflects Manica.
Computer programmes, as usual, are innocent. The erratum was echoed in Nature, which means that this study got two high impact factor releases. The guys from Science are still thinking about what to do with the original title, which was “Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent”. We could suggest a couple of alternative titles, I’m sure Mota would have approved: “Ancient Ethiopian genome reveals that, once again, algorithms and computers can’t calculate human error” or “Ancient Ethiopian genome reveals the scientific method still works at its best, nullius in verba: if you can’t reproduce my data, I may be wrong but let’s find it out and thank you for checking it out!”
Picture: Armeya asekar (CC BY 3.0)