Teaching a Young Dog New Tricks
(February 28th, 2017) Originally designed to sequence DNA and RNA, US-American scientists found a way to map cytosine and adenosine methylation with the Oxford Nanopore Technologies MinION.
Chemically modifying residues within DNA alters the expression of genes, and there is mounting evidence that these epigenetic changes play major roles in a wide range of biological processes, including ageing, disease and even behaviour. The main epigenetic changes involve modification of cytosine residues by adding a methyl group, or tagging histones either by methylation or acetylation. Given the growing list of biological processes, in which epigenetic modifications have been implicated, there is understandably great interest in being able to record when and where these modifications are happening.
However, measuring epigenetic changes is not anywhere as easy as sequencing DNA. But wouldn't it be great, for example, if recording epigenetic changes was as easy as, say, nanopore sequencing? Well, it turns out that may well be.
This month, Benedict Paten of the University of California Santa Cruz published a paper in Nature Methods showing how you can record methylation and acetylation events using an unadapted Oxford Nanopore's MinION sequencer without any chemical modification or preparation of the material.
How is it done? To answer that, let's recall how the minION works. The idea is to pass unamplified DNA strands through a carefully engineered pore and record the electrical resistance as the DNA strand squeezes through. Some DNA residues are wider than others, and so there will be a variation in the resistance, which can be recorded as tiny changes in an electrical current passing through the pore. With a bit of clever statistical inferencing you (or the machine, rather) can work back from the electrical pattern to the strand sequence.
Given that different residues cause different electrical patterns, it is not surprising that modifying those residues will also produce electrical effects, although very much smaller in scale. Paten's approach is to capture these slight variations in the current caused by changes in methylation state. How is this done? Paten and his coworkers took thousands of bar-coded strands of DNA that had been constitutively methylated. Each strand was bar-coded, so the sequencer could identify which sequence had gone through the pore. Paten had to do the hard work of making a new set of rules to work out the sequence from the signal. To do this, he used a combination of two machine learning approaches called "Hidden Markov Models" and "Hierarchical Dirichlet Processes" that use reverse inferencing. Once trained on a known set of sequences, the system can then be unleashed on unknown sequences.
When put to the test on DNA of known methylation states, Paten's team found that the system called the right methylation state in about 75% of cases. But is the machine learning from what they think it is? There are two reasons for thinking it is: first, the accuracy of calls is better for more accurate sequence reads. Second, methylated residues that resemble each other electrically tend to get confused with each other.
Paten also confirmed that they were able to detect methylation in plasmid DNA grown in E. coli with accuracy in the 60-70% range. Clearly, we are nowhere near the accuracy expected in DNA sequencing calls, but that is only to be expected given the smaller electrical signal it has to go on. However, it is enough to observe, as Paten did, the increasing levels of methylation known to occur in the growth phase in E coli.
The nice thing about this method, as Paten points out, is that it can be applied to existing data sets because no extra steps are needed to make the epigenetic calls. Given an appropriately trained machine, any raw nanopore sequence could be reanalysed.