Bench philosophy: 3D modelling of RNA
RNA Modelling for Dummies
by Steven D. Buckingham, Labtimes 01/2016
As more non-coding RNAs are being implicated in basic and essential biological mechanisms, understanding their 3D structure is becoming a priority. Is 3D modelling of RNA evolving into a technique for the non-expert?
One of the big lessons we have learnt over the last decade or so is that RNA is a whole lot more than the old, dog-eared central dogma lets on. In fact, only a tiny minority of the RNA in a cell is messenger RNA.
3D RNA models may be built with the interactive and user friendly programme, Assemble2. Photo: Fabrice Jossinet
And similar to proteins, the function of RNA has not just to do with their sequence but more with their structure. And when RNA starts behaving like proteins, we need to start investigating them like proteins, and that includes modelling their 3D structure.
So I would like to ask two questions: how is modelling done (as explained to the masses), and can anyone do it?
There are two general approaches you can take to modelling. Remember, what you are aiming for is a description of the 3D shape of the molecule, preferably with some insight to its flexibility and how it switches between configurations. There are two ways you can do this: the hard way, or the slightly not so hard way.
By doing it the hard way, I mean from the ground up. You work out all the forces acting on each atom and from there figure out where they will get pushed by these forces. That's no easy task. Each atom exerts forces on every other one and there are many different forces: hydrogen bonds, electrostatic forces and so on.
And you have to decide how you are going to do the maths (spoiler alert – computers do it). Of course, we are dealing with continuous force fields here and that means integration will be involved, and doing that with all the summed force fields over a molecule isn't going to be easy. But let's not worry our little biologist's head about that one: faster computers crunching clever algorithms will take care of that.
Modelling from the bottom up (also known as modelling ab initio), has two parts: a Discrete Molecular Dynamics engine that does the sums and a force field that encodes the interatomic forces. You don't need to work out your force fields by yourself, by the way. There are a number of them with names like CHARMM, AMBER and (ominously) MEDUSA. To do ab initio Discrete Molecular Dynamics simulations properly will involve quantum mechanics. Why? Due to hydrogen. Hydrogen is the smallest atom in the periodic table and when it gets ionised, we have definitely left Newton's world. But don't worry, if the thought of a biologist grappling with quantum mechanics makes you nauseous, because the programmes usually take care of this themselves.
Doing it the hard way can be made slightly easier by increasing the time interval (and there is a whole lot of literature and algorithms out there to guide you through the many pitfalls in this area) or by increasing the granularity. For instance, instead of working out the forces on every individual atom, you can treat each group, or even each nucleic acid, as an indivisible entity. It goes without saying that there is a cost to accuracy but a big saving on time.
So much for the hard way – what about the slightly less hard way? Rather than work out all the forces on atoms and work it out from the ground up, you can build a database of RNA snippets, whose force fields have already been worked out, or whose 3D structure has already been solved (a limited supply, as yet). You take the RNA you want to model, find what is in the database and cobble the snippets together as best you can. In other words, you make use of existing knowledge.
Whether you do it from the ground up or use more top-down approaches, you will want to make use of experimental constraints. You may have experimental evidence about structure from the effects of base-selective chemical reagents, such as kethoxal, or you may have restraints on flexibility provided by techniques, such as “selective 2'-hydroxyl acylation analysed by primer extension” (was the acronym SHAPE a coincidence, I wonder?). These data are incorporated into the modelling process, by representing them in the scoring function that assigns probabilities to certain structures.
So let's get modelling. To start with, what programmes and websites are out there? Let's be brave and look at modelling from the bottom up first.
iFoldRNA is not a Steve Jobs app but a web portal for RNA structure prediction. All you need is the sequence and iFoldRNA does it all (not saying how long it takes to do it, by the way). It is sped along somewhat by its granularity: it uses three spheres to represent the phosphate, sugar and base. It is said to be fine for smaller strands (up to 50) but to overcome inaccuracies in long-range interactions for larger strands has to use experimental constraints (SHAPE).
Or you could try your hand with Gromacs – the standard, freely-distributed Molecular Dynamics simulation software suite. Gromacs is really meant for protein modelling but with the appropriate force field (AMBER is the most commonly recommended) can be used for RNA structure prediction ab initio. But beware that using Gromacs for RNA is relatively new, so there is a greater danger of the non-expert getting things seriously wrong.
A popular alternative to Gromacs is CHARMM (Chemistry at HARvard Molecular Mechanics), which is also free to download (www.charmm.org), and if you want a taste of what is involved in modelling the hard way, take a look at a step-by-step tutorial at https://mmtsb.org/workshops/sean-bin_workshop_2012/Tutorials/RNA_Tutorial/RNA_Tutorial.html. Not for the fainthearted.
Neither CHARMM nor Gromacs take quantum mechanics into account but there are quantum mechanics plugins for both of them.
RosettaCommons “RNA Redesign” protocol does all this the other way around: you submit a 3D structure and the protocol works out RNA sequences that best stabilise that structure. The protocol is freely available online: http://rosie.rosettacommons.org/rna_redesign.
Several services or programmes require the secondary structure of the RNA as an input. There are several utilities to work this out from a primary sequence. MC-Fold is easy to use and can be accessed from a website www.major.iric.ca/MC-Fold. All you need is the sequence.
Most 3D RNA modelling packages take the top-down approach of using existing knowledge, especially RNA structures in a PDB database. ModeRNA takes a 3D structure of a template RNA molecule along with an alignment with the molecule to be modelled. Where there is a mismatch between the template and the target, ModeRNA will model insertions and deletions up to 17 nucleotides long, using an RNA fragment database. A strong selling point with ModeRNA is that it can handle post-transcriptional modifications.
RNAComposer makes knowledge-based modelling easy. Go to the online server (http://rnacomposer.cs.put.poznan.pl), type in your RNA sequence and its secondary structure (using bracket and dot notation) and hit submit.
NAST – the Nucleic Acid Simulation Tool – combines Molecular Dynamics with an RNA-specific knowledge-based potential. It generates a series of plausible 3D models, which can then be filtered based on experimental data. The package can be downloaded from https://simtk.org/home/nast. NAST is coarse-grained and represents each nucleotide as a bead by its 3'C atom. It uses statistical potentials to work out geometric distances, angles and dihedrals for 1, 2 then 3 consecutive residues. With the careful use of constraints, NAST can model up to around 150 nt.
The Vfold server is one of those few sites that allow both secondary and tertiary modelling at the same place. It also offers the folding thermodynamics (heat capacity melting curve) from the sequence. Secondary structure is calculated from a self-generated set of candidate structures. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimisation.
Is there anything for the absolute beginner? The computer revolution has given us gift after gift – putting human expertise into computer programmes has given us the power to do things beyond our expertise. Have the programmes and websites we have been talking about done this? Probably not. In the case of ab initio modelling, almost certainly not.
Knowledge-based modelling is not straightforward, either and good scientists will tread carefully, lest unwitting abuse brings the field of modelling into disrepute. RNA modelling today is where sequence alignment was some 20 years ago, when there were lots of good programmes but using them was as easy as nailing jelly to a wall. User friendliness followed a little later. In the RNA modelling world, this is happening with sites like Assemble2 (www.bioinformatics.org/assemble/index.html). With Assemble2 you can build your 3D model interactively. It brings many of the RNA modelling tasks under one roof, including predicting secondary structures, browsing databases and aligning. It devolves many of its actions to web services behind the scenes.
The whole process of 3D model construction is guided by interactivity made possible by UCSF's Chimera visualisation software. Most, if not all, the modelling approaches we have talked about above can be done in Assemble2. For instance, you can align to an RNA with a known structure, and the system will infer the secondary and tertiary structures. It is implemented as a stand-alone programme in Java, so any operating system can run it.
User friendliness is good but how can I be sure my model is right? “Here be dragons” is sprawled all over the RNA modelling map. So just how good are these modelling tools? This was a question addressed in a review by Christian Laing and Tamar Schlick (Curr Opin Struct Biol 21: 306-318), and their answer is not reassuring. They compared predicted structures with experimental structural data and found that accuracy was only acceptable for short (>50 nt) sequences with relatively simple topologies. They also noted, however, that utilities taking constraints into account (such as NAST and iFoldRNA) gave significantly better results for longer strands. Sadly, the quality of the outputs of the programmes they tested varied widely from one structure to another.
Thus, automatically predicting 3D RNA shapes fast and accurately is still in the future.
Last Changed: 08.02.2016