Susan L Havre, Bobbie-Jo Webb-Robertson, Anuj Shah, Christian Posse, Banu Gopalan, Fred J Brockman
Cutting-edge biological and bioinformatics research seeks a systems perspective through the analysis of multiple types of high-throughput and other experimental data for the same sample. Systems-level analysis requires the integration and fusion of such data, typically through advanced statistics and mathematics. Visualization is a complementary computational approach that supports integration and analysis of complex data or its derivatives. We present a bioinformatics visualization prototype, Juxter, which depicts categorical information derived from or assigned to these diverse data for the purpose of comparing patterns across categorizations. The visualization allows users to easily discern correlated and anomalous patterns in the data. These patterns, which might not be detected automatically by algorithms, may reveal valuable information leading to insight and discovery. We describe the visualization and interaction capabilities and demonstrate its utility in a new field, metagenomics, which combines molecular biology and genetics to identify and characterize genetic material from multi-species microbial samples.
{"title":"Bioinformatic insights from metagenomics through visualization.","authors":"Susan L Havre, Bobbie-Jo Webb-Robertson, Anuj Shah, Christian Posse, Banu Gopalan, Fred J Brockman","doi":"10.1109/csb.2005.19","DOIUrl":"https://doi.org/10.1109/csb.2005.19","url":null,"abstract":"<p><p>Cutting-edge biological and bioinformatics research seeks a systems perspective through the analysis of multiple types of high-throughput and other experimental data for the same sample. Systems-level analysis requires the integration and fusion of such data, typically through advanced statistics and mathematics. Visualization is a complementary computational approach that supports integration and analysis of complex data or its derivatives. We present a bioinformatics visualization prototype, Juxter, which depicts categorical information derived from or assigned to these diverse data for the purpose of comparing patterns across categorizations. The visualization allows users to easily discern correlated and anomalous patterns in the data. These patterns, which might not be detected automatically by algorithms, may reveal valuable information leading to insight and discovery. We describe the visualization and interaction capabilities and demonstrate its utility in a new field, metagenomics, which combines molecular biology and genetics to identify and characterize genetic material from multi-species microbial samples.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"341-50"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.19","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yinglei Song, Chunmei Liu, Russell Malmberg, Fangfang Pan, Liming Cai
Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.
{"title":"Tree decomposition based fast search of RNA structures including pseudoknots in genomes.","authors":"Yinglei Song, Chunmei Liu, Russell Malmberg, Fangfang Pan, Liming Cai","doi":"10.1109/csb.2005.52","DOIUrl":"https://doi.org/10.1109/csb.2005.52","url":null,"abstract":"<p><p>Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"223-34"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.52","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chengyong Yang, Erliang Zeng, Tao Li, Giri Narasimhan
Clustering of gene expression data is a standard technique used to identify closely related genes. In this paper, we develop a new clustering algorithm, MSC (Multi-Source Clustering), to perform exploratory analysis using two or more diverse sources of data. In particular, we investigate the problem of improving the clustering by integrating information obtained from gene expression data with knowledge extracted from biomedical text literature. In each iteration of algorithm MSC, an EM-type procedure is employed to bootstrap the model obtained from one data source by starting with the cluster assignments obtained in the previous iteration using the other data sources. Upon convergence, the two individual models are used to construct the final cluster assignment. We compare the results of algorithm MSC for two data sources with the results obtained when the clustering is applied on the two sources of data separately. We also compare it with that obtained using the feature level integration method that performs the clustering after simply concatenating the features obtained from the two data sources. We show that the z-scores of the clustering results from MSC are better than that from the other methods. To evaluate our clusters better, function enrichment results are presented using terms from the Gene Ontology database. Finally, by investigating the success of motif detection programs that use the clusters, we show that our approach integrating gene expression data and text data reveals clusters that are biologically more meaningful than those identified using gene expression data alone.
{"title":"Clustering genes using gene expression and text literature data.","authors":"Chengyong Yang, Erliang Zeng, Tao Li, Giri Narasimhan","doi":"10.1109/csb.2005.23","DOIUrl":"https://doi.org/10.1109/csb.2005.23","url":null,"abstract":"<p><p>Clustering of gene expression data is a standard technique used to identify closely related genes. In this paper, we develop a new clustering algorithm, MSC (Multi-Source Clustering), to perform exploratory analysis using two or more diverse sources of data. In particular, we investigate the problem of improving the clustering by integrating information obtained from gene expression data with knowledge extracted from biomedical text literature. In each iteration of algorithm MSC, an EM-type procedure is employed to bootstrap the model obtained from one data source by starting with the cluster assignments obtained in the previous iteration using the other data sources. Upon convergence, the two individual models are used to construct the final cluster assignment. We compare the results of algorithm MSC for two data sources with the results obtained when the clustering is applied on the two sources of data separately. We also compare it with that obtained using the feature level integration method that performs the clustering after simply concatenating the features obtained from the two data sources. We show that the z-scores of the clustering results from MSC are better than that from the other methods. To evaluate our clusters better, function enrichment results are presented using terms from the Gene Ontology database. Finally, by investigating the success of motif detection programs that use the clusters, we show that our approach integrating gene expression data and text data reveals clusters that are biologically more meaningful than those identified using gene expression data alone.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"329-40"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.23","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias P Mann, Richard Humbert, John A Stamatoyannopolous, William Stafford Noble
PCR, the polymerase chain reaction, is a fundamental tool of molecular biology. Quantitative PCR is the gold-standard methodology for determination of DNA copy numbers, quantitating transcription, and numerous other applications. A major barrier to large-scale application of PCR for quantitative genomic analyses is the current requirement for manual validation of individual PCR reactions to ensure generation of a single product. This typically requires visual inspection either of gel electrophoreses or temperature dissociation ("melting") curves of individual PCR reactions - a time-consuming and costly process. Here we describe a robust computational solution to this fundamental problem. Using a training set of 10,080 reactions comprising multiple quantitative PCR reactions from each of 1,728 unique human genomic amplicons, we developed a support vector machine classifier capable of discriminating single-product PCR reactions with better than 99% accuracy. This approach has broad utility, and eliminates a major bottleneck to widespread application of PCR for high-throughput genomic applications.
{"title":"Automated validation of polymerase chain reactions using amplicon melting curves.","authors":"Tobias P Mann, Richard Humbert, John A Stamatoyannopolous, William Stafford Noble","doi":"10.1109/csb.2005.17","DOIUrl":"https://doi.org/10.1109/csb.2005.17","url":null,"abstract":"<p><p>PCR, the polymerase chain reaction, is a fundamental tool of molecular biology. Quantitative PCR is the gold-standard methodology for determination of DNA copy numbers, quantitating transcription, and numerous other applications. A major barrier to large-scale application of PCR for quantitative genomic analyses is the current requirement for manual validation of individual PCR reactions to ensure generation of a single product. This typically requires visual inspection either of gel electrophoreses or temperature dissociation (\"melting\") curves of individual PCR reactions - a time-consuming and costly process. Here we describe a robust computational solution to this fundamental problem. Using a training set of 10,080 reactions comprising multiple quantitative PCR reactions from each of 1,728 unique human genomic amplicons, we developed a support vector machine classifier capable of discriminating single-product PCR reactions with better than 99% accuracy. This approach has broad utility, and eliminates a major bottleneck to widespread application of PCR for high-throughput genomic applications.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"377-85"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.17","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-throughput methods for detecting protein-protein interactions (PPI) have given researchers an initial global picture of protein interactions on a genomic scale. The huge data sets generated by such experiments pose new challenges in data analysis. Though clustering methods have been successfully applied in many areas in bioinformatics, many clustering algorithms cannot be readily applied on protein interaction data sets. One main problem is that the similarity between two proteins cannot be easily defined. This paper proposes a probabilistic model to define the similarity based on conditional probabilities. We then propose a two-step method for estimating the similarity between two proteins based on protein interaction profile. In the first step, the model is trained with proteins with known annotation. Based on this model, similarities are calculated in the second step. Experiments show that our method improves performance.
{"title":"A Two-Step Approach for Clustering Proteins based on Protein Interaction Profile.","authors":"Pengjun Pei, Aidong Zhang","doi":"10.1109/BIBE.2005.10","DOIUrl":"https://doi.org/10.1109/BIBE.2005.10","url":null,"abstract":"<p><p>High-throughput methods for detecting protein-protein interactions (PPI) have given researchers an initial global picture of protein interactions on a genomic scale. The huge data sets generated by such experiments pose new challenges in data analysis. Though clustering methods have been successfully applied in many areas in bioinformatics, many clustering algorithms cannot be readily applied on protein interaction data sets. One main problem is that the similarity between two proteins cannot be easily defined. This paper proposes a probabilistic model to define the similarity based on conditional probabilities. We then propose a two-step method for estimating the similarity between two proteins based on protein interaction profile. In the first step, the model is trained with proteins with known annotation. Based on this model, similarities are calculated in the second step. Experiments show that our method improves performance.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":"2005 1544467","pages":"201-209"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBE.2005.10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27898789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Single-particle tracking provides a powerful technique for measuring dynamic cellular processes on the level of individual molecules. Much recent work has been devoted to using single particle tracking to measure long-range movement of particles on the cell surface, including methods for automated localization and tracking of particles [1-3]. However, most particle tracking studies to date ignore cell surface curvature and dynamic cellular deformation, factors frequently present in physiologically relevant situations. In this report, we perform quantitative evaluation of single-particle tracking on curved and deforming cell surfaces. We also introduce a new hybrid method that uses non-rigid cellular modeling for improved computation of single-particle tracking trajectories on the surfaces of cells undergoing deformation. This method combines single-molecule and bulk fluorescence measurements in an automated manner to enable more accurate and robust characterization of dynamic cell physiology and regulation.
{"title":"Deformable modeling for improved calculation of molecular velocities from single-particle tracking.","authors":"Peter M Kasson, Mark M Davis, Axel T Brunger","doi":"10.1109/csb.2005.28","DOIUrl":"https://doi.org/10.1109/csb.2005.28","url":null,"abstract":"<p><p>Single-particle tracking provides a powerful technique for measuring dynamic cellular processes on the level of individual molecules. Much recent work has been devoted to using single particle tracking to measure long-range movement of particles on the cell surface, including methods for automated localization and tracking of particles [1-3]. However, most particle tracking studies to date ignore cell surface curvature and dynamic cellular deformation, factors frequently present in physiologically relevant situations. In this report, we perform quantitative evaluation of single-particle tracking on curved and deforming cell surfaces. We also introduce a new hybrid method that uses non-rigid cellular modeling for improved computation of single-particle tracking trajectories on the surfaces of cells undergoing deformation. This method combines single-molecule and bulk fluorescence measurements in an automated manner to enable more accurate and robust characterization of dynamic cell physiology and regulation.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"208-11"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.28","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luay Nakhleh, Guohua Jin, Fengmei Zhao, John Mellor-Crummey
Phylogenies - the evolutionary histories of groups of organisms - are one of the most widely used tools throughout the life sciences, as well as objects of research within systematics, evolutionary biology, epidemiology, etc. Almost every tool devised to date to reconstruct phylogenies produces trees; yet it is widely understood and accepted that trees oversimplify the evolutionary histories of many groups of organims, most prominently bacteria (because of horizontal gene transfer) and plants (because of hybrid speciation). Various methods and criteria have been introduced for phylogenetic tree reconstruction. Parsimony is one of the most widely used and studied criteria, and various accurate and efficient heuristics for reconstructing trees based on parsimony have been devised. Jotun Hein suggested a straightforward extension of the parsimony criterion to phylogenetic networks. In this paper we formalize this concept, and provide the first experimental study of the quality of parsimony as a criterion for constructing and evaluating phylogenetic networks. Our results show that, when extended to phylogenetic networks, the parsimony criterion produces promising results. In a great majority of the cases in our experiments, the parsimony criterion accurately predicts the numbers and placements of non-tree events.
{"title":"Reconstructing phylogenetic networks using maximum parsimony.","authors":"Luay Nakhleh, Guohua Jin, Fengmei Zhao, John Mellor-Crummey","doi":"10.1109/csb.2005.47","DOIUrl":"https://doi.org/10.1109/csb.2005.47","url":null,"abstract":"<p><p>Phylogenies - the evolutionary histories of groups of organisms - are one of the most widely used tools throughout the life sciences, as well as objects of research within systematics, evolutionary biology, epidemiology, etc. Almost every tool devised to date to reconstruct phylogenies produces trees; yet it is widely understood and accepted that trees oversimplify the evolutionary histories of many groups of organims, most prominently bacteria (because of horizontal gene transfer) and plants (because of hybrid speciation). Various methods and criteria have been introduced for phylogenetic tree reconstruction. Parsimony is one of the most widely used and studied criteria, and various accurate and efficient heuristics for reconstructing trees based on parsimony have been devised. Jotun Hein suggested a straightforward extension of the parsimony criterion to phylogenetic networks. In this paper we formalize this concept, and provide the first experimental study of the quality of parsimony as a criterion for constructing and evaluating phylogenetic networks. Our results show that, when extended to phylogenetic networks, the parsimony criterion produces promising results. In a great majority of the cases in our experiments, the parsimony criterion accurately predicts the numbers and placements of non-tree events.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"93-102"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.47","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the major challenges in cancer diagnosis from microarray data is to develop robust classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose a meta-classification scheme which uses a robust multivariate gene selection procedure and integrates the results of several machine learning tools trained on raw and pattern data. We validate our method by applying it to distinguish diffuse large B-cell lymphoma (DLBCL) from follicular lymphoma (FL) on two independent datasets: the HuGeneFL Affmetrixy dataset of Shipp et al. (www. genome.wi.mit.du/MPR /lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our meta-classification technique achieves higher predictive accuracies than each of the individual classifiers trained on the same dataset and is robust against various data perturbations. We also find that combinations of p53 responsive genes (e.g., p53, PLK1 and CDK2) are highly predictive of the phenotype.
{"title":"A robust meta-classification strategy for cancer diagnosis from gene expression data.","authors":"Gabriela Alexe, Gyan Bhanot, Babu Venkataraghavan, Ramakrishna Ramaswamy, Jorge Lepre, Arnold J Levine, Gustavo Stolovitzky","doi":"10.1109/csb.2005.7","DOIUrl":"https://doi.org/10.1109/csb.2005.7","url":null,"abstract":"<p><p>One of the major challenges in cancer diagnosis from microarray data is to develop robust classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose a meta-classification scheme which uses a robust multivariate gene selection procedure and integrates the results of several machine learning tools trained on raw and pattern data. We validate our method by applying it to distinguish diffuse large B-cell lymphoma (DLBCL) from follicular lymphoma (FL) on two independent datasets: the HuGeneFL Affmetrixy dataset of Shipp et al. (www. genome.wi.mit.du/MPR /lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our meta-classification technique achieves higher predictive accuracies than each of the individual classifiers trained on the same dataset and is robust against various data perturbations. We also find that combinations of p53 responsive genes (e.g., p53, PLK1 and CDK2) are highly predictive of the phenotype.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"322-5"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a tree decomposition of protein structures, which can be used to efficiently solve two key subproblems of protein structure prediction: protein threading for backbone prediction and protein side-chain prediction. To develop a unified tree-decomposition based approach to these two subproblems, we model them as a geometric neighborhood graph labeling problem. Theoretically, we can have a low-degree polynomial time algorithm to decompose a geometric neighborhood graph G = (V, E) into components with size O(|V|((2/3))log|V|). The computational complexity of the tree-decomposition based graph labeling algorithms is O(|V|Delta(tw+1)) where Delta is the average number of possible labels for each vertex and tw( = O(|V|((2/3))log|V|)) the tree width of G. Empirically, tw is very small and the tree-decomposition method can solve these two problems very efficiently. This paper also compares the computational efficiency of the tree-decomposition approach with the linear programming approach to these two problems and identifies the condition under which the tree-decomposition approach is more efficient than the linear programming approach. Experimental result indicates that the tree-decomposition approach is more efficient most of the time.
{"title":"A tree-decomposition approach to protein structure prediction.","authors":"Jinbo Xu, Feng Jiao, Bonnie Berger","doi":"10.1109/csb.2005.9","DOIUrl":"https://doi.org/10.1109/csb.2005.9","url":null,"abstract":"<p><p>This paper proposes a tree decomposition of protein structures, which can be used to efficiently solve two key subproblems of protein structure prediction: protein threading for backbone prediction and protein side-chain prediction. To develop a unified tree-decomposition based approach to these two subproblems, we model them as a geometric neighborhood graph labeling problem. Theoretically, we can have a low-degree polynomial time algorithm to decompose a geometric neighborhood graph G = (V, E) into components with size O(|V|((2/3))log|V|). The computational complexity of the tree-decomposition based graph labeling algorithms is O(|V|Delta(tw+1)) where Delta is the average number of possible labels for each vertex and tw( = O(|V|((2/3))log|V|)) the tree width of G. Empirically, tw is very small and the tree-decomposition method can solve these two problems very efficiently. This paper also compares the computational efficiency of the tree-decomposition approach with the linear programming approach to these two problems and identifies the condition under which the tree-decomposition approach is more efficient than the linear programming approach. Experimental result indicates that the tree-decomposition approach is more efficient most of the time.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"247-56"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}