Wei Zheng, Xiaoduan Ye, Alan M Friedman, Chris Bailey-Kellogg
Protein engineering by site-directed recombination seeks to develop proteins with new or improved function, by accumulating multiple mutations from a set of homologous parent proteins. A library of hybrid proteins is created by recombining the parent proteins at specified breakpoint locations; subsequent screening/selection identifies hybrids with desirable functional characteristics. In order to improve the frequency of generating novel hybrids, this paper develops the first approach to explicitly plan for diversity in site-directed recombination, including metrics for characterizing the diversity of a planned hybrid library and efficient algorithms for optimizing experiments accordingly. The goal is to choose breakpoint locations to sample sequence space as uniformly as possible (which we argue maximizes diversity), under the constraints imposed by the recombination process and the given set of parents. A dynamic programming approach selects optimal breakpoint locations in polynomial time. Application of our method to optimizing breakpoints for an example biosynthetic enzyme, purE, demonstrates the significance of diversity optimization and the effectiveness of our algorithms.
{"title":"Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination.","authors":"Wei Zheng, Xiaoduan Ye, Alan M Friedman, Chris Bailey-Kellogg","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Protein engineering by site-directed recombination seeks to develop proteins with new or improved function, by accumulating multiple mutations from a set of homologous parent proteins. A library of hybrid proteins is created by recombining the parent proteins at specified breakpoint locations; subsequent screening/selection identifies hybrids with desirable functional characteristics. In order to improve the frequency of generating novel hybrids, this paper develops the first approach to explicitly plan for diversity in site-directed recombination, including metrics for characterizing the diversity of a planned hybrid library and efficient algorithms for optimizing experiments accordingly. The goal is to choose breakpoint locations to sample sequence space as uniformly as possible (which we argue maximizes diversity), under the constraints imposed by the recombination process and the given set of parents. A dynamic programming approach selects optimal breakpoint locations in polynomial time. Application of our method to optimizing breakpoints for an example biosynthetic enzyme, purE, demonstrates the significance of diversity optimization and the effectiveness of our algorithms.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"31-40"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27061851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860948732_0008
Wei Zheng, Xiaoduan Ye, A. Friedman, C. Bailey-Kellogg
Protein engineering by site-directed recombination seeks to develop proteins with new or improved function, by accumulating multiple mutations from a set of homologous parent proteins. A library of hybrid proteins is created by recombining the parent proteins at specified breakpoint locations; subsequent screening/selection identifies hybrids with desirable functional characteristics. In order to improve the frequency of generating novel hybrids, this paper develops the first approach to explicitly plan for diversity in site-directed recombination, including metrics for characterizing the diversity of a planned hybrid library and efficient algorithms for optimizing experiments accordingly. The goal is to choose breakpoint locations to sample sequence space as uniformly as possible (which we argue maximizes diversity), under the constraints imposed by the recombination process and the given set of parents. A dynamic programming approach selects optimal breakpoint locations in polynomial time. Application of our method to optimizing breakpoints for an example biosynthetic enzyme, purE, demonstrates the significance of diversity optimization and the effectiveness of our algorithms.
{"title":"Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination.","authors":"Wei Zheng, Xiaoduan Ye, A. Friedman, C. Bailey-Kellogg","doi":"10.1142/9781860948732_0008","DOIUrl":"https://doi.org/10.1142/9781860948732_0008","url":null,"abstract":"Protein engineering by site-directed recombination seeks to develop proteins with new or improved function, by accumulating multiple mutations from a set of homologous parent proteins. A library of hybrid proteins is created by recombining the parent proteins at specified breakpoint locations; subsequent screening/selection identifies hybrids with desirable functional characteristics. In order to improve the frequency of generating novel hybrids, this paper develops the first approach to explicitly plan for diversity in site-directed recombination, including metrics for characterizing the diversity of a planned hybrid library and efficient algorithms for optimizing experiments accordingly. The goal is to choose breakpoint locations to sample sequence space as uniformly as possible (which we argue maximizes diversity), under the constraints imposed by the recombination process and the given set of parents. A dynamic programming approach selects optimal breakpoint locations in polynomial time. Application of our method to optimizing breakpoints for an example biosynthetic enzyme, purE, demonstrates the significance of diversity optimization and the effectiveness of our algorithms.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"31-40"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860948732_0039
Justin Wilson, Manhong Dai, Elvis Jakupovic, S. Watson, F. Meng
Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.
{"title":"Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem.","authors":"Justin Wilson, Manhong Dai, Elvis Jakupovic, S. Watson, F. Meng","doi":"10.1142/9781860948732_0039","DOIUrl":"https://doi.org/10.1142/9781860948732_0039","url":null,"abstract":"Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"387-90"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64008008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cancer molecular pattern efficient discovery is essential in the molecular diagnostics. The characteristics of the gene/protein expression data are challenging traditional unsupervised classification algorithms. In this work, we describe a subspace consensus kernel clustering algorithm based on the projected gradient nonnegative matrix factorization (PG-NMF). The algorithm is a consensus kernel hierarchical clustering (CKHC) method in the subspace generated by the PG-NMF. It integrates convergence-soundness parts-based learning, subspace and kernel space clustering in the microarray and proteomics data classification. We first integrated subspace methods and kernel methods by following our framework of the input space, subspace and kernel space clustering. We demonstrate more effective classification results from our algorithm by comparison with those of the classic NMF, sparse-NMF classifications and supervised classifications (KNN and SVM) for the four benchmark cancer datasets. Our algorithm can generate a family of classification algorithms in machine learning by selecting different transforms to generate subspaces and different kernel clustering algorithms to cluster data.
{"title":"Cancer molecular pattern discovery by subspace consensus kernel classification.","authors":"Xiaoxu Han","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Cancer molecular pattern efficient discovery is essential in the molecular diagnostics. The characteristics of the gene/protein expression data are challenging traditional unsupervised classification algorithms. In this work, we describe a subspace consensus kernel clustering algorithm based on the projected gradient nonnegative matrix factorization (PG-NMF). The algorithm is a consensus kernel hierarchical clustering (CKHC) method in the subspace generated by the PG-NMF. It integrates convergence-soundness parts-based learning, subspace and kernel space clustering in the microarray and proteomics data classification. We first integrated subspace methods and kernel methods by following our framework of the input space, subspace and kernel space clustering. We demonstrate more effective classification results from our algorithm by comparison with those of the classic NMF, sparse-NMF classifications and supervised classifications (KNN and SVM) for the four benchmark cancer datasets. Our algorithm can generate a family of classification algorithms in machine learning by selecting different transforms to generate subspaces and different kernel clustering algorithms to cluster data.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"55-65"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27060666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Definitive endoderm (DE), the inner germ layer of the trilaminar embryo, forms gastrointestinal tract, its derivatives, thyroid, thymus, pancreas, lungs and liver. Studies on DE formation in Xenopus, zebrafish and mouse suggest a conserved molecular mechanism among vertebrates. However, relevant analysis on this activity in human has not been extensively carried out. With the maturity of the techniques for monitoring how human embryonic stem cells (hESCs) react to signals that determine their pluripotency, proliferation, survival, and differentiation status, we are now able to conduct a similar research in human. In this paper, we present an analysis of gene expression profiles obtained from two recent experiments to identify genes expressed differentially during the process of hESCs differentiation to DE. We have carried out a systematic study on these genes to understand the related transcriptional regulations and signaling pathways using computational predictions and comparative genome analyses. Our preliminary results draw a similar transcriptional profile of hESC-DE formation to that of other vertebrates.
{"title":"Transcriptional profiling of definitive endoderm derived from human embryonic stem cells.","authors":"Huiqing Liu, Stephen Dalton, Ying Xu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Definitive endoderm (DE), the inner germ layer of the trilaminar embryo, forms gastrointestinal tract, its derivatives, thyroid, thymus, pancreas, lungs and liver. Studies on DE formation in Xenopus, zebrafish and mouse suggest a conserved molecular mechanism among vertebrates. However, relevant analysis on this activity in human has not been extensively carried out. With the maturity of the techniques for monitoring how human embryonic stem cells (hESCs) react to signals that determine their pluripotency, proliferation, survival, and differentiation status, we are now able to conduct a similar research in human. In this paper, we present an analysis of gene expression profiles obtained from two recent experiments to identify genes expressed differentially during the process of hESCs differentiation to DE. We have carried out a systematic study on these genes to understand the related transcriptional regulations and signaling pathways using computational predictions and comparative genome analyses. Our preliminary results draw a similar transcriptional profile of hESC-DE formation to that of other vertebrates.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"79-82"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27060668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong Sun, Hakan Ferhatosmanoglu, Motonori Ota, Yusu Wang
Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them. In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. Our EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study of applying our algorithm to a miniprotein Trp-cage(24) demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events.
{"title":"Enhanced partial order curve comparison over multiple protein folding trajectories.","authors":"Hong Sun, Hakan Ferhatosmanoglu, Motonori Ota, Yusu Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them. In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. Our EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study of applying our algorithm to a miniprotein Trp-cage(24) demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"299-310"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27061076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletions, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as Insertion/Deletion (Indel) Frequency Arrays (IFA). By applying IFA to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity.
{"title":"Improvement in protein sequence-structure alignment using insertion/deletion frequency arrays.","authors":"Kyle Ellrott, Jun-tao Guo, Victor Olman, Ying Xu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletions, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as Insertion/Deletion (Indel) Frequency Arrays (IFA). By applying IFA to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"335-42"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27061562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Justin Wilson, Manhong Dai, Elvis Jakupovic, Stanley Watson, Fan Meng
Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.
{"title":"Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem.","authors":"Justin Wilson, Manhong Dai, Elvis Jakupovic, Stanley Watson, Fan Meng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"387-90"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27062097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860948732_0023
Yonghui Wu, Lan Liu, T. Close, S. Lonardi
MOTIVATION The deconvolution of the relationships between BAC clones and genes is a crucial step in the selective sequencing of the regions of interest in a genome. It usually requires combinatorial pooling of unique probes obtained from the genes (unigenes), and the screening of the BAC library using the pools in a hybridization experiment. Since several probes can hybridize to the same BAC, in order for the deconvolution to be achievable the pooling design has to be able to handle a large number of positives. As a consequence, smaller pools need to be designed which in turn increases the number of hybridization experiments possibly making the entire protocol unfeasible. RESULTS We propose a new algorithm that is capable of producing high accuracy deconvolution even in the presence of a weak pooling design, i.e., when pools are rather large. The algorithm compensates for the decrease of information in the hybridization data by taking advantage of a physical map of the BAC clones. We show that the right combination of combinatorial pooling and our algorithm not only dramatically reduces the number of pools required, but also successfully deconvolutes the BAC-gene relationships with almost perfect accuracy.
{"title":"Deconvoluting the BAC-gene relationships using a physical map.","authors":"Yonghui Wu, Lan Liu, T. Close, S. Lonardi","doi":"10.1142/9781860948732_0023","DOIUrl":"https://doi.org/10.1142/9781860948732_0023","url":null,"abstract":"MOTIVATION The deconvolution of the relationships between BAC clones and genes is a crucial step in the selective sequencing of the regions of interest in a genome. It usually requires combinatorial pooling of unique probes obtained from the genes (unigenes), and the screening of the BAC library using the pools in a hybridization experiment. Since several probes can hybridize to the same BAC, in order for the deconvolution to be achievable the pooling design has to be able to handle a large number of positives. As a consequence, smaller pools need to be designed which in turn increases the number of hybridization experiments possibly making the entire protocol unfeasible. RESULTS We propose a new algorithm that is capable of producing high accuracy deconvolution even in the presence of a weak pooling design, i.e., when pools are rather large. The algorithm compensates for the decrease of information in the hybridization data by taking advantage of a physical map of the BAC clones. We show that the right combination of combinatorial pooling and our algorithm not only dramatically reduces the number of pools required, but also successfully deconvolutes the BAC-gene relationships with almost perfect accuracy.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"203-14"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860948732_0014
H. Chua, K. Ning, W. Sung, H. Leong, L. Wong
Protein complexes are fundamental for understanding principles of cellular organizations. Accurate and fast protein complex prediction from the PPI networks of increasing sizes can serve as a guide for biological experiments to discover novel protein complexes. However, protein complex prediction from PPI networks is a hard problem, especially in situations where the PPI network is noisy. We know from previous work that proteins that do not interact, but share interaction partners (level-2 neighbors) often share biological functions. The strength of functional association can be estimated using a topological weight, FS-Weight. Here we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. All direct and indirect interactions are first weighted using topological weight (FS-Weight). Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied on this modified network. We also propose a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. In this paper, we show that 1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; 2) our complex finding algorithm performs very well on interaction networks modified in this way. Since no any other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.
{"title":"Using indirect protein-protein interactions for protein complex predication.","authors":"H. Chua, K. Ning, W. Sung, H. Leong, L. Wong","doi":"10.1142/9781860948732_0014","DOIUrl":"https://doi.org/10.1142/9781860948732_0014","url":null,"abstract":"Protein complexes are fundamental for understanding principles of cellular organizations. Accurate and fast protein complex prediction from the PPI networks of increasing sizes can serve as a guide for biological experiments to discover novel protein complexes. However, protein complex prediction from PPI networks is a hard problem, especially in situations where the PPI network is noisy. We know from previous work that proteins that do not interact, but share interaction partners (level-2 neighbors) often share biological functions. The strength of functional association can be estimated using a topological weight, FS-Weight. Here we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. All direct and indirect interactions are first weighted using topological weight (FS-Weight). Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied on this modified network. We also propose a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a \"partial clique merging\" method. In this paper, we show that 1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; 2) our complex finding algorithm performs very well on interaction networks modified in this way. Since no any other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"97-109"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1142/9781860948732_0014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}