结核分枝杆菌和艰难梭菌相互作用组:细菌相互作用组预测计算系统快速发展的示范。

Microbial Informatics and Experimentation Pub Date : 2012-03-21 DOI:10.1186/2042-5783-2-4

Seshan Ananthasubramanian, Rahul Metri, Ankur Khetan, Aman Gupta, Adam Handen, Nagasuma Chandra, Madhavi Ganapathiraju

{"title":"结核分枝杆菌和艰难梭菌相互作用组:细菌相互作用组预测计算系统快速发展的示范。","authors":"Seshan Ananthasubramanian, Rahul Metri, Ankur Khetan, Aman Gupta, Adam Handen, Nagasuma Chandra, Madhavi Ganapathiraju","doi":"10.1186/2042-5783-2-4","DOIUrl":null,"url":null,"abstract":"Background: Protein-protein interaction (PPI) networks (interactomes) of most organisms, except for some model organisms, are largely unknown. Experimental methods including high-throughput techniques are highly resource intensive. Therefore, computational discovery of PPIs can accelerate biological discovery by presenting \"most-promising\" pairs of proteins that are likely to interact. For many bacteria, genome sequence, and thereby genomic context of proteomes, is readily available; additionally, for some of these proteomes, localization and functional annotations are also available, but interactomes are not available. We present here a method for rapid development of computational system to predict interactome of bacterial proteomes. While other studies have presented methods to transfer interologs across species, here, we propose transfer of computational models to benefit from cross-species annotations, thereby predicting many more novel interactions even in the absence of interologs. Mycobacterium tuberculosis (Mtb) and Clostridium difficile (CD) have been used to demonstrate the work.Results: We developed a random forest classifier over features derived from Gene Ontology annotations and genetic context scores provided by STRING database for predicting Mtb and CD interactions independently. The Mtb classifier gave a precision of 94% and a recall of 23% on a held out test set. The Mtb model was then run on all the 8 million protein pairs of the Mtb proteome, resulting in 708 new interactions (at 94% expected precision) or 1,595 new interactions at 80% expected precision. The CD classifier gave a precision of 90% and a recall of 16% on a held out test set. The CD model was run on all the 8 million protein pairs of the CD proteome, resulting in 143 new interactions (at 90% expected precision) or 580 new interactions (at 80% expected precision). We also compared the overlap of predictions of our method with STRING database interactions for CD and Mtb and also with interactions identified recently by a bacterial 2-hybrid system for Mtb. To demonstrate the utility of transfer of computational models, we made use of the developed Mtb model and used it to predict CD protein-pairs. The cross species model thus developed yielded a precision of 88% at a recall of 8%. To demonstrate transfer of features from other organisms in the absence of feature-based and interaction-based information, we transferred missing feature values from Mtb orthologs into the CD data. In transferring this data from orthologs (not interologs), we showed that a large number of interactions can be predicted.Conclusions: Rapid discovery of (partial) bacterial interactome can be made by using existing set of GO and STRING features associated with the organisms. We can make use of cross-species interactome development, when there are not even sufficient known interactions to develop a computational prediction system. Computational model of well-studied organism(s) can be employed to make the initial interactome prediction for the target organism. We have also demonstrated successfully, that annotations can be transferred from orthologs in well-studied organisms enabling accurate predictions for organisms with no annotations. These approaches can serve as building blocks to address the challenges associated with feature coverage, missing interactions towards rapid interactome discovery for bacterial organisms.Availability: The predictions for all Mtb and CD proteins are made available at: http://severus.dbmi.pitt.edu/TB and http://severus.dbmi.pitt.edu/CD respectively for browsing as well as for download.","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 ","pages":"4"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-2-4","citationCount":"10","resultStr":"{\"title\":\"Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction.\",\"authors\":\"Seshan Ananthasubramanian, Rahul Metri, Ankur Khetan, Aman Gupta, Adam Handen, Nagasuma Chandra, Madhavi Ganapathiraju\",\"doi\":\"10.1186/2042-5783-2-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Protein-protein interaction (PPI) networks (interactomes) of most organisms, except for some model organisms, are largely unknown. Experimental methods including high-throughput techniques are highly resource intensive. Therefore, computational discovery of PPIs can accelerate biological discovery by presenting \\\"most-promising\\\" pairs of proteins that are likely to interact. For many bacteria, genome sequence, and thereby genomic context of proteomes, is readily available; additionally, for some of these proteomes, localization and functional annotations are also available, but interactomes are not available. We present here a method for rapid development of computational system to predict interactome of bacterial proteomes. While other studies have presented methods to transfer interologs across species, here, we propose transfer of computational models to benefit from cross-species annotations, thereby predicting many more novel interactions even in the absence of interologs. Mycobacterium tuberculosis (Mtb) and Clostridium difficile (CD) have been used to demonstrate the work.Results: We developed a random forest classifier over features derived from Gene Ontology annotations and genetic context scores provided by STRING database for predicting Mtb and CD interactions independently. The Mtb classifier gave a precision of 94% and a recall of 23% on a held out test set. The Mtb model was then run on all the 8 million protein pairs of the Mtb proteome, resulting in 708 new interactions (at 94% expected precision) or 1,595 new interactions at 80% expected precision. The CD classifier gave a precision of 90% and a recall of 16% on a held out test set. The CD model was run on all the 8 million protein pairs of the CD proteome, resulting in 143 new interactions (at 90% expected precision) or 580 new interactions (at 80% expected precision). We also compared the overlap of predictions of our method with STRING database interactions for CD and Mtb and also with interactions identified recently by a bacterial 2-hybrid system for Mtb. To demonstrate the utility of transfer of computational models, we made use of the developed Mtb model and used it to predict CD protein-pairs. The cross species model thus developed yielded a precision of 88% at a recall of 8%. To demonstrate transfer of features from other organisms in the absence of feature-based and interaction-based information, we transferred missing feature values from Mtb orthologs into the CD data. In transferring this data from orthologs (not interologs), we showed that a large number of interactions can be predicted.Conclusions: Rapid discovery of (partial) bacterial interactome can be made by using existing set of GO and STRING features associated with the organisms. We can make use of cross-species interactome development, when there are not even sufficient known interactions to develop a computational prediction system. Computational model of well-studied organism(s) can be employed to make the initial interactome prediction for the target organism. We have also demonstrated successfully, that annotations can be transferred from orthologs in well-studied organisms enabling accurate predictions for organisms with no annotations. These approaches can serve as building blocks to address the challenges associated with feature coverage, missing interactions towards rapid interactome discovery for bacterial organisms.Availability: The predictions for all Mtb and CD proteins are made available at: http://severus.dbmi.pitt.edu/TB and http://severus.dbmi.pitt.edu/CD respectively for browsing as well as for download.\",\"PeriodicalId\":18538,\"journal\":{\"name\":\"Microbial Informatics and Experimentation\",\"volume\":\"2 \",\"pages\":\"4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/2042-5783-2-4\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microbial Informatics and Experimentation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/2042-5783-2-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Informatics and Experimentation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/2042-5783-2-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

背景:除了一些模式生物外，大多数生物的蛋白质-蛋白质相互作用(PPI)网络(相互作用组)在很大程度上是未知的。包括高通量技术在内的实验方法是高度资源密集型的。因此，PPIs的计算发现可以通过呈现可能相互作用的“最有希望的”蛋白质对来加速生物学发现。对于许多细菌来说，基因组序列和蛋白质组的基因组背景是很容易获得的;此外，对于其中一些蛋白质组，定位和功能注释也可用，但相互作用组不可用。本文提出了一种快速开发细菌蛋白质组相互作用预测计算系统的方法。虽然其他研究已经提出了跨物种转移interologi的方法，但在这里，我们提出转移计算模型以受益于跨物种注释，从而预测即使在没有interologi的情况下也会有更多新的相互作用。结核分枝杆菌(Mtb)和艰难梭菌(CD)已被用来证明这项工作。结果:我们开发了一个基于基因本体注释和STRING数据库提供的遗传上下文评分的特征的随机森林分类器，用于独立预测Mtb和CD相互作用。Mtb分类器在hold out测试集上的准确率为94%，召回率为23%。然后在Mtb蛋白质组的所有800万个蛋白质对上运行Mtb模型，产生708个新的相互作用(94%的预期精度)或1595个新的相互作用(80%的预期精度)。CD分类器在hold out测试集上的准确率为90%，召回率为16%。CD模型在CD蛋白质组的所有800万个蛋白质对上运行，产生143个新的相互作用(预期精度为90%)或580个新的相互作用(预期精度为80%)。我们还比较了我们的方法与字符串数据库中CD和Mtb相互作用的预测重叠，以及最近由细菌2-杂交系统确定的Mtb相互作用。为了证明计算模型转移的实用性，我们利用开发的Mtb模型并使用它来预测CD蛋白对。由此建立的跨物种模型的准确率为88%，召回率为8%。为了证明在缺乏基于特征和基于交互的信息的情况下从其他生物转移特征，我们将Mtb同源物中缺失的特征值转移到CD数据中。在从同源物(而非同源物)转移这些数据时，我们表明可以预测大量的相互作用。结论:利用现有的一组与微生物相关的GO和STRING特征可以快速发现(部分)细菌相互作用组。我们可以利用跨物种相互作用的发展，当甚至没有足够的已知相互作用来开发计算预测系统。研究充分的生物的计算模型可以用来对目标生物进行初始的相互作用预测。我们还成功地证明，注释可以从经过充分研究的生物体的同源物中转移，从而对没有注释的生物体进行准确的预测。这些方法可以作为构建块来解决与特征覆盖相关的挑战，缺少相互作用，从而快速发现细菌有机体的相互作用组。可用性:所有结核分枝杆菌和乳糜泻蛋白的预测分别可在http://severus.dbmi.pitt.edu/TB和http://severus.dbmi.pitt.edu/CD上浏览和下载。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction.

Background: Protein-protein interaction (PPI) networks (interactomes) of most organisms, except for some model organisms, are largely unknown. Experimental methods including high-throughput techniques are highly resource intensive. Therefore, computational discovery of PPIs can accelerate biological discovery by presenting "most-promising" pairs of proteins that are likely to interact. For many bacteria, genome sequence, and thereby genomic context of proteomes, is readily available; additionally, for some of these proteomes, localization and functional annotations are also available, but interactomes are not available. We present here a method for rapid development of computational system to predict interactome of bacterial proteomes. While other studies have presented methods to transfer interologs across species, here, we propose transfer of computational models to benefit from cross-species annotations, thereby predicting many more novel interactions even in the absence of interologs. Mycobacterium tuberculosis (Mtb) and Clostridium difficile (CD) have been used to demonstrate the work.

Results: We developed a random forest classifier over features derived from Gene Ontology annotations and genetic context scores provided by STRING database for predicting Mtb and CD interactions independently. The Mtb classifier gave a precision of 94% and a recall of 23% on a held out test set. The Mtb model was then run on all the 8 million protein pairs of the Mtb proteome, resulting in 708 new interactions (at 94% expected precision) or 1,595 new interactions at 80% expected precision. The CD classifier gave a precision of 90% and a recall of 16% on a held out test set. The CD model was run on all the 8 million protein pairs of the CD proteome, resulting in 143 new interactions (at 90% expected precision) or 580 new interactions (at 80% expected precision). We also compared the overlap of predictions of our method with STRING database interactions for CD and Mtb and also with interactions identified recently by a bacterial 2-hybrid system for Mtb. To demonstrate the utility of transfer of computational models, we made use of the developed Mtb model and used it to predict CD protein-pairs. The cross species model thus developed yielded a precision of 88% at a recall of 8%. To demonstrate transfer of features from other organisms in the absence of feature-based and interaction-based information, we transferred missing feature values from Mtb orthologs into the CD data. In transferring this data from orthologs (not interologs), we showed that a large number of interactions can be predicted.

Conclusions: Rapid discovery of (partial) bacterial interactome can be made by using existing set of GO and STRING features associated with the organisms. We can make use of cross-species interactome development, when there are not even sufficient known interactions to develop a computational prediction system. Computational model of well-studied organism(s) can be employed to make the initial interactome prediction for the target organism. We have also demonstrated successfully, that annotations can be transferred from orthologs in well-studied organisms enabling accurate predictions for organisms with no annotations. These approaches can serve as building blocks to address the challenges associated with feature coverage, missing interactions towards rapid interactome discovery for bacterial organisms.

Availability: The predictions for all Mtb and CD proteins are made available at: http://severus.dbmi.pitt.edu/TB and http://severus.dbmi.pitt.edu/CD respectively for browsing as well as for download.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Microbial Informatics and Experimentation

自引率

0.00%

发文量