The Art of Skim-Reading
(May 2nd, 2014) If you ever feel bogged down by an ever increasing pile of articles you need to read, researchers based in Ireland may have the answer to your problem: a way to “skim-read” a large amount of information and find exactly what you are looking for.
With over 20 million already available online, and new ones popping up every minute, it’s virtually impossible for researchers to read every single article related to their field. Biomedical scientists, in particular, are faced with the daunting task of keeping up with the latest discoveries across multiple disciplines, from genetics and chemistry to immunology and neuroscience.
In an attempt to provide a way to deal with this information overload, Vit Novacek, statistician from the National University of Ireland in Galway, developed a new computer-based prototype able to mimic the ability to “skim-read” vast amounts of information and “squeeze” the juiciest bits out. This new system, named SKIMMR, allows researchers to automatically process a large number of articles and find the connections between the most important concepts in those articles. This way, explains Novacek, users “can navigate that information and when they find something interesting they can get to the article that was used for computing the network and get some more details.”
The idea sparked from a collaboration with neuroscientists from the University of San Diego, who were putting together information related to spinal muscular atrophy (SMA). However, “in the end, SKIMMR became much more general, it's not only for SMA, but you can basically put in any type of textual data,” says Novacek. In fact, it’s not even just for science. The author has been “playing around” with it and has had some curious results in different fields, including records of the Irish famine and James Joyce’s literary work.
The novelty of SKIMMR lies in its “name identity recognition”, which in layman's terms, means it can recognise certain names, such as drugs, proteins or genes, for example, and establish relationships between them. SKIMMR can measure how related two words are, by analysing whether they are present in the same sentence or paragraph. It assumes two words in the same sentence are highly related to each other, but they may still be connected if present in the same paragraph. This is boosted by a similarity search to uncover further relationships between terms used.
Once the prototype was developed, the researchers needed to test SKIMMR’s potential. For this, they decided to run a large-scale simulation, using multiple random words to reproduce user behaviour. It turns out SKIMMR can outperform the state-of-the-art PubMed’s keyword search known as MeSH. “Basically, on PubMed you have annotations by a standard medical vocabulary, which are supposed to associate keywords with the articles. You can use this to navigate the articles which are related to your keywords,” says Novacek, “but the network computed by SKIMMR is more efficient than the network derived from PubMed.”
SKIMMR is still quite a novel system but, following such encouraging results, the team has ambitious plans for it! The first task is to make the navigation more interactive and user friendly, so that users can easily zoom in on a particular article and zoom out again. They’re also working on stepping-up the type of relationships between two concepts. The aim is to develop ways to move from general to more specific ideas. Novacek has dubbed this “taxonomies of concepts”.
Finally, if all goes to plan, SKIMMR’s future may well go beyond data mining and hold the secret to a systematic way for scientists to develop the next hypothesis to test in the lab. Even if researchers only have a vague notion where to go, the idea is to use SKIMMR in such a way that it could uncover unexpected logical combinations or new connections between two concepts. In combination with the researcher’s personal knowledge and insight, “you could actually achieve some sort of semi-automatic discovery process,” says Novacek. “If I had something like that in 5 years I would be very, very happy”.