{"title":"A max-flow based approach to the identification of protein complexes using protein interaction and microarray data.","authors":"Jianxing Feng, Rui Jiang, Tao Jiang","doi":"10.1142/9781848162648_0005","DOIUrl":null,"url":null,"abstract":"The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"51-62"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"62","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781848162648_0005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 62
Abstract
The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.