Two Tier Prediction
by Vijay Shankar Balakrishnan, Labtimes 03/2013
You have a cocktail of peptides? Wondering whether they are antimicrobial or perhaps more? The predictor programme iAMP-2L may answer all your questions.
When was the last time you thought, “I wish I had a computer programme to predict whether my mixture of peptides is just antimicrobial, or do they have more than one clinical functions?” Don’t worry! Here’s a recent one, from Kuo-Chen Chou’s group at the Gordon Life Science Institute in San Diego, California, USA. Their paper describing the predictor programme iAMP-2L (www.jci-bioinfo.cn/iAMP-2L), has recently been accepted for publication in Analytical Biochemistry (Xuan Xiao et. al. ePub 2013).
Before dreaming further, have your peptide sequences ready. You are free to have up to 500 peptides preferably in FASTA format. It is not clear from the web server, but at least from the article, that your sequences might be 5 to 100 amino acids long. But before testing, you’ll probably want to know how iAMP-2L works.
Studying the functionality of host defense peptides that are commonly, yet paradoxically, called just antimicrobial peptides (AMPs) is important not only for drug designers but also for other academic researchers. Hence, bioinformatics have created programmes to facilitate the testing of AMP functionality. One example is the webserver-based, updated antimicrobial peptide database (APD2). APD2 has a functionality prediction tool in addition to the repository of the peptides in its store but can only tell whether yours are AMPs or not.
How does iAMP-2L differ from the rest? Simply, in terms of (a) the mathematics that is incorporated behind the prediction and (b) the level of prediction – depicted by the 2L – i.e., a two-tier (or two levels of) prediction. When you feed in your peptide sequences, the programme will predict, at the first level, whether your peptide is an AMP. If not, sorry, next please. If yes, then the programme moves on to the next level, to predict to which functional type it belongs.
The authors claim that any AMP might fit into one or more of the ten functional types: antibacterial, antifungal, antiviral, anti-HIV, anti-tumorous/cancerous, anti-parasital, anti-protist, AMPs with chemotactic property, insecticidal and spermicidal peptides.
When developing their programme, the authors asked themselves three questions: (a) how to identify whether the submitted peptides are AMPs or non-AMPs?; (b) if they are AMPs, to which functional types could they belong; and (c) how to handle the prediction of multi-functionality problems? Many other programmes can answer the first question, iAMP-2L may also solve the latter two. Let’s see how that is achieved.
When you submit the peptides, the programme generates their Pseudo-Amino Acid Composition (PseAAC). The concept of PseAAC was proposed by the Chou group in 2009 (Current Proteomics 6, 262-74). Well, what is it in simple terms? Usually, programmes for structure or function prediction try to match the query sequence with the ones in the reference databases. This is done in two ways: using a sequence alignment method, like BLAST and a discrete method – where a set of experimental properties of the standard 20 amino acids is taken into account via various modes. One disadvantage of alignment methods is the introduction of gap penalties, if the query sequences do not match with the reference sets by homology. As a result, the sequence order is lost, which in turn affects the predicted structure or function.
To avoid this, the discrete method creates PseAACs by various modes appropriate to predict different functions, for example, physico-chemical properties of the amino acids, amphiphilicity of the peptides, sub-cellular localisation, Fourier transform profiles and so on. iAMP-2L uses five physico-chemical properties of the amino acids for PseAAC generation: hydrophobicity, pK1 (-COOH), pK2 (-NH3), PI (25°C) and molecular weight.
Using set-theory, the team unified the sets and subsets of peptides. For instance, the grand unified set has AMPs in ten peptide subsets, each attributed to their functional types. AMPs with more than one functionality have been categorised into virtual AMPs. Along with that is the subset of 2405 non-AMPs. So, in the first level, iAMP-2L compares the sequence with the grand unified set using a fuzzy, nearestneighbourhood approach (FKNN) and tells whether it’s an AMP. If yes, it uses a multi- label FKNN (ML-FKNN) at the second level, to show other functionalities.
On the face of it, iAMP-2L seems to be useful. Nevertheless, some doubt certainly remains. Since many AMPs are amphiphilic, why haven’t the authors chosen the amphiphility mode in PseAAC to predict the functions, in addition to the physico-chemical properties mode? They validate the fidelity of the programme mathematically. But for an end user, it could help, if they at least explain why the amphiphilic mode hasn’t been used.
Nevertheless, the Chou group assures that they get better results with the use of these five physico-chemical properties than others. In addition, they admit that iAMP-2L can predict only up to five functional types so far and promise the rest for the future. As they quote in their article, either we can expect a new paper or a simple message on the server homepage when there is an update. Let’s hope so! But for now... you may wish to test your sequences!
Last Changed: 09.05.2013