A max-flow based approach to the identification of protein complexes using protein interaction and microarray data.

Jianxing Feng, Rui Jiang, Tao Jiang
{"title":"A max-flow based approach to the identification of protein complexes using protein interaction and microarray data.","authors":"Jianxing Feng,&nbsp;Rui Jiang,&nbsp;Tao Jiang","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"51-62"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用蛋白质相互作用和微阵列数据,基于最大流量的方法来鉴定蛋白质复合物。
高通量技术的出现带来了丰富的蛋白质-蛋白质相互作用(PPI)数据和微阵列基因表达谱,并为使用计算方法鉴定新的蛋白质复合物提供了很大的机会。虽然文献已经证明,仅使用蛋白质-蛋白质相互作用数据的方法可以成功地预测大量蛋白质复合物,但基因表达谱的结合可以帮助改进假定的复合物,从而提高计算方法的准确性。通过结合蛋白质-蛋白质相互作用数据和微阵列基因表达谱,我们提出了一种新的用于蛋白质复合物识别的图碎片算法(GFA)。GFA改编自寻找(加权)最密集子图的经典最大流算法,首先在蛋白质-蛋白质相互作用网络中找到大的(加权)密集子图,然后根据微阵列数据中相应的对数折叠变化对其节点进行适当加权,迭代地将每个这样的子图分解为片段,直到片段子图足够小。我们对三种广泛使用的蛋白质-蛋白质相互作用数据集进行了广泛的测试,并与最新的蛋白质复合物鉴定方法进行了比较,证明了我们的方法在预测新型蛋白质复合物的准确性、效率和能力方面具有优越的性能。鉴于我们的方法已经达到的高特异性(或精度),我们推测我们的预测结果意味着超过200种新的蛋白质复合物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Novel Gene Discovery in the Human Malaria Parasite using Nucleosome Positioning Data. Estimating support for protein-protein interaction data with applications to function prediction. On the accurate construction of consensus genetic maps. Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures. Knowledge representation and data mining for biological imaging.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1