Taming Demons with MAGeK

(March 16th, 2016) Harvard University's Xiaole Shirley Liu has released a freely-available quality-control, analysis and visualisation platform for high-throughput CRISPR screens: "MAGeK-VISPR".

CRISPR (Clustered Regularly-Interspaced Short Palindromic Repeats) is the latest and greatest gene-editing approach to have hit the bench. And while we are still adjusting to the shock and awe of such a gift of a technique, some labs were quick to turn it into high-throughput screens. But while CRISPR is easier and more reliable than other gene-editing rivals, interpreting high throughput CRISPR screens is no easy matter. But fear not: the analysis barrier is being bridged by the emergence of algorithms such as RIGER (RNAi gene enrichment ranking), HitSelect and MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout). Recently, the makers of MAGeCK (great acronym - if only the National Union of Teachers had been so careful) took their algorithm, made it better, and embedded it into a complete pipeline with gene-calling, quality control and visualisation.

CRISPR works by harnessing a bacterial immune system. A DNA-cutting enzyme (such as Cas9) is directed to a target by a guide RNA (gRNA). The gRNA contains a scaffold sequence needed for Cas9 binding and a ~20 nucleotide spacer that defines the target DNA sequence. After the DNA has been cut, you can hijack the homology-directed repair by supplying your own section of repair template.

Running a high-throughput screen using CRISPR is a matter of generating a library of these gRNAs and transfecting them into host cell lines. The most common screens to date have looked for essential genes: cells are transfected at low rates (<1 per cell average) and after a period of cell culture, the number of gRNAs is counted. gRNAs targeting essential genes are, obviously, reduced.

But doing the stats is a nightmare; partly this is because it is hard to account for the effects of different knockdown efficacy of different gRNAs. On top of that, quality control is a dark art, and visualisation of both quality control and gene calls is limited.

MAGeK-VISPR's aim is to tame these three demons (gene calling, quality control and visualisation) by combining them into one pipeline. Quality control is run at the sequence level (e.g. GC content distribution), read count level (e.g. percentage of mapped reads), sample level (comparing consistency between samples) and gene level (measuring how much negative selection has occurred).

When it comes to calling the genes, MAGeK-VISPR is also claimed to be a step up from its predecessor, plain ol' MAGeK, in that it uses Expectation Maximisation rather than Robust Rank Aggregation. Liu reports better gene calling for the new algorithm when run on four publicly-available data-sets.

The VISPR bit is the visualisation component. It offers three views. First, there is a quality control view, where you can see easy-to-interpret graphical views on such things as distribution of GC content, number of zero-read gRNAs, numbers of reads and so on. Secondly there is a results view, where you can interact with several components including a gene comparison table and a cumulative distribution and histogram of the normalised gRNA counts. Finally, an experiment comparison view provides the user with Euler diagrams representing common and exclusive genes. All views are fully interactive, making it easier to perform additional analysis steps such as looking at interaction networks (through GeneMANIA) gene function (through GOrilla). Oh, and if you are worried about keeping track of the analysis and all the meta-data, fear not - the whole pipeline is managed by the work-flow management system, Snakemake.

MAGeK-VISPR is free, but while lovers of point-and-click may be disappointed, command-line heroes will be in raptures - running the pipeline is done from the terminal. However, once you have got to the VISPR (visualisation) stage, you can get your mouse out and get clicking once again.

The pipeline is coded in Python3, and can be downloaded using the "conda" utility. The Liu lab provide a set of YouTube videos to take you through the process of installation and usage, along with clear instructions and test data-sets. The command-line instructions are clear and easy to follow, and the commands' options and switches are fairly simple.

Steven Buckingham

Picture: Pixabay

Last Changes: 04.19.2016