Background: Microbial Interaction Networks (MINs) provide important information for understanding bacterial communities. MINs can be inferred by examining microbial abundance profiles. Abundance profiles are often interpreted with the Lotka Volterra model in research. However existing research fails to consider a biologically meaningful underlying mathematical model for MINs or to address the possibility of multiple solutions.
Results: In this paper we present IMPARO, a method for inferring microbial interactions through parameter optimisation. We use biologically meaningful models for both the abundance profile, as well as the MIN. We show how multiple MINs could be inferred with similar reconstructed abundance profile accuracy, and argue that a unique solution is not always satisfactory. Using our method, we successfully inferred clear interactions in the gut microbiome which have been previously observed in in-vitro experiments.
Conclusions: IMPARO was used to successfully infer microbial interactions in human microbiome samples as well as in a varied set of simulated data. The work also highlights the importance of considering multiple solutions for MINs.
Background: The biological process known as post-translational modification (PTM) is a condition whereby proteomes are modified that affects normal cell biology, and hence the pathogenesis. A number of PTMs have been discovered in the recent years and lysine phosphoglycerylation is one of the fairly recent developments. Even with a large number of proteins being sequenced in the post-genomic era, the identification of phosphoglycerylation remains a big challenge due to factors such as cost, time consumption and inefficiency involved in the experimental efforts. To overcome this issue, computational techniques have emerged to accurately identify phosphoglycerylated lysine residues. However, the computational techniques proposed so far hold limitations to correctly predict this covalent modification.
Results: We propose a new predictor in this paper called Bigram-PGK which uses evolutionary information of amino acids to try and predict phosphoglycerylated sites. The benchmark dataset which contains experimentally labelled sites is employed for this purpose and profile bigram occurrences is calculated from position specific scoring matrices of amino acids in the protein sequences. The statistical measures of this work, such as sensitivity, specificity, precision, accuracy, Mathews correlation coefficient and area under ROC curve have been reported to be 0.9642, 0.8973, 0.8253, 0.9193, 0.8330, 0.9306, respectively.
Conclusions: The proposed predictor, based on the feature of evolutionary information and support vector machine classifier, has shown great potential to effectively predict phosphoglycerylated and non-phosphoglycerylated lysine residues when compared against the existing predictors. The data and software of this work can be acquired from https://github.com/abelavit/Bigram-PGK.
Background: Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling.
Results: Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including 'CC', 'GG','AG', 'CCCG' and 'CGGC' were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity.
Conclusion: We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.
Background: Proteins perform their functions by interacting with acid radical ions. Recently, it was a challenging work to precisely predict the binding residues of acid radical ion ligands in the research field of molecular drug design.
Results: In this study, we proposed an improved method to predict the acid radical ion binding residues by using K-nearest Neighbors classifier. Meanwhile, we constructed datasets of four acid radical ion ligand (NO2-, CO32-, SO42-, PO43-) binding residues from BioLip database. Then, based on the optimal window length for each acid radical ion ligand, we refined composition information and position conservative information and extracted them as feature parameters for K-nearest Neighbors classifier. In the results of 5-fold cross-validation, the Matthew's correlation coefficient was higher than 0.45, the values of accuracy, sensitivity and specificity were all higher than 69.2%, and the false positive rate was lower than 30.8%. Further, we also performed an independent test to test the practicability of the proposed method. In the obtained results, the sensitivity was higher than 40.9%, the values of accuracy and specificity were higher than 84.2%, the Matthew's correlation coefficient was higher than 0.116, and the false positive rate was lower than 15.4%. Finally, we identified binding residues of the six metal ion ligands. In the predicted results, the values of accuracy, sensitivity and specificity were all higher than 77.6%, the Matthew's correlation coefficient was higher than 0.6, and the false positive rate was lower than 19.6%.
Conclusions: Taken together, the good results of our prediction method added new insights in the prediction of the binding residues of acid radical ion ligands.
Background: In many important life activities, the execution of protein function depends on the interaction between proteins and ligands. As an important protein binding ligand, the identification of the binding site of the ion ligands plays an important role in the study of the protein function.
Results: In this study, four acid radical ion ligands (NO2-,CO32-,SO42-,PO43-) and ten metal ion ligands (Zn2+,Cu2+,Fe2+,Fe3+,Ca2+,Mg2+,Mn2+,Na+,K+,Co2+) are selected as the research object, and the Sequential minimal optimization (SMO) algorithm based on sequence information was proposed, better prediction results were obtained by 5-fold cross validation.
Conclusions: An efficient method for predicting ion ligand binding sites was presented.