Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0015
Seung-Jin Sul, T. Williams
Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the ”true” tree. Post-processing techniques such as stri ct consensus trees are widely used to summarize the evolutionary relationships into a single tree. H owever, valuable information is lost during the summarization process. A more elementary step is to produce estimates of the topological differences that exist among all pairs of trees. We design a new randomized algorithm, called Hash-RF, that computes the all-to-all Robinson-Foulds (RF) distance—the most common distance metric for comparing two phylogenetic trees. Our approach uses a hash table to organize the bipartitions of a tree, and a universal hashing function makes our algorithm randomized. We compare the performance of our Hash-RF algorithm to PAUP*’s implementation of computing the all-to-all RF distance matrix. Our experiments focus on the algorithmic performance of comparing sets of biological trees, where the size of each tree ranged from 500 to 2,000 taxa and the collection of trees varied from 200 to 1,000 trees. Our experimental results clearly show that our Hash-RF algorithm is up to 500 times faster than PAUP*’s approach. Thus, Hash-RF provides an efficient alter native to a single tree summary of a collection of trees and potentially gives researchers the abil ity to explore their data in new and interesting ways.
{"title":"A Randomized Algorithm for Comparing Sets of Phylogenetic Trees","authors":"Seung-Jin Sul, T. Williams","doi":"10.1142/9781860947995_0015","DOIUrl":"https://doi.org/10.1142/9781860947995_0015","url":null,"abstract":"Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the ”true” tree. Post-processing techniques such as stri ct consensus trees are widely used to summarize the evolutionary relationships into a single tree. H owever, valuable information is lost during the summarization process. A more elementary step is to produce estimates of the topological differences that exist among all pairs of trees. We design a new randomized algorithm, called Hash-RF, that computes the all-to-all Robinson-Foulds (RF) distance—the most common distance metric for comparing two phylogenetic trees. Our approach uses a hash table to organize the bipartitions of a tree, and a universal hashing function makes our algorithm randomized. We compare the performance of our Hash-RF algorithm to PAUP*’s implementation of computing the all-to-all RF distance matrix. Our experiments focus on the algorithmic performance of comparing sets of biological trees, where the size of each tree ranged from 500 to 2,000 taxa and the collection of trees varied from 200 to 1,000 trees. Our experimental results clearly show that our Hash-RF algorithm is up to 500 times faster than PAUP*’s approach. Thus, Hash-RF provides an efficient alter native to a single tree summary of a collection of trees and potentially gives researchers the abil ity to explore their data in new and interesting ways.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"65 1","pages":"121-130"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76209414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0002
J. Nadeau
Rapidly growing evidence suggests that complex and variable interactions between host genetic and systems factors, diet, activity and lifestyle choices, and intestinal microbes control the incidence, severity and complexity of metabolic diseases. The dramatic increase in the world-wide incidence of these diseases, including obesity, diabetes, hypertension, heart disease, and fatty liver disease, raises the need for new ways to maintain health despite inherited and environmental risks. We are pursuing a comprehensive approach based on diet-induced models of metabolic disease. During the course of these studies, new and challenging statistical, analytical and computational problems were discovered. We pioneered a new paradigm for genetic studies based on chromosome substitution strains of laboratory mice. These strains involve systematically substituting each chromosome in a host strain with the corresponding chromosome from a donor strain. A genome survey with these strains therefore involves testing a panel of individual, distinct and non-overlapping genotypes, in contrast to conventional studies of heterogeneous populations. Studies of diet-induced metabolic disease with these strains have already led to striking observations. We discovered that most traits are controlled by a many genetic variants each of which has unexpectedly large phenotypic effects and that act in a highly non-additive manner. The non-additive nature of these variants challenges conventional models of the architecture of complex traits. At every level of resolution from the entire genome to very small genetic intervals, we discovered comparable levels of genetic complexity, suggesting a fractal property of complex traits. Another remarkable property of these large-effect variants is their ability to switch complex systems between alternative phenotypic states such as obese to lean and high to low cholesterol, suggesting that biological traits might be organized in a small number of stable states rather than continuous variability. Moreover, by studying correlations between non-genetic variation in pairs of traits (the genetic control of non-genetic variation), we discovered a new way to dissect the functional architecture of biological systems. Finally, a neglected aspect of these studies of metabolic disease involves the intestingal microbes. Early studies suggest that diet and host physiology affect the numbers and kinds of microbes, and that these microbes in turn affect host metabolism. These interactions between ’bugs, guts and fat’ extend systems studies from conventional aspects of genetics and biology to population considerations of the functional interactions between hosts, diet and our microbial passengers. With these models of diet-induced metabolic disease in chromosome substitution strains, we are now positioned find ways to tip complex systems from disease to health.
{"title":"Bugs, Guts and Fat - A Systems Approach to the Metabolic 'Axis of Evil'","authors":"J. Nadeau","doi":"10.1142/9781860947995_0002","DOIUrl":"https://doi.org/10.1142/9781860947995_0002","url":null,"abstract":"Rapidly growing evidence suggests that complex and variable interactions between host genetic and systems factors, diet, activity and lifestyle choices, and intestinal microbes control the incidence, severity and complexity of metabolic diseases. The dramatic increase in the world-wide incidence of these diseases, including obesity, diabetes, hypertension, heart disease, and fatty liver disease, raises the need for new ways to maintain health despite inherited and environmental risks. We are pursuing a comprehensive approach based on diet-induced models of metabolic disease. During the course of these studies, new and challenging statistical, analytical and computational problems were discovered. We pioneered a new paradigm for genetic studies based on chromosome substitution strains of laboratory mice. These strains involve systematically substituting each chromosome in a host strain with the corresponding chromosome from a donor strain. A genome survey with these strains therefore involves testing a panel of individual, distinct and non-overlapping genotypes, in contrast to conventional studies of heterogeneous populations. Studies of diet-induced metabolic disease with these strains have already led to striking observations. We discovered that most traits are controlled by a many genetic variants each of which has unexpectedly large phenotypic effects and that act in a highly non-additive manner. The non-additive nature of these variants challenges conventional models of the architecture of complex traits. At every level of resolution from the entire genome to very small genetic intervals, we discovered comparable levels of genetic complexity, suggesting a fractal property of complex traits. Another remarkable property of these large-effect variants is their ability to switch complex systems between alternative phenotypic states such as obese to lean and high to low cholesterol, suggesting that biological traits might be organized in a small number of stable states rather than continuous variability. Moreover, by studying correlations between non-genetic variation in pairs of traits (the genetic control of non-genetic variation), we discovered a new way to dissect the functional architecture of biological systems. Finally, a neglected aspect of these studies of metabolic disease involves the intestingal microbes. Early studies suggest that diet and host physiology affect the numbers and kinds of microbes, and that these microbes in turn affect host metabolism. These interactions between ’bugs, guts and fat’ extend systems studies from conventional aspects of genetics and biology to population considerations of the functional interactions between hosts, diet and our microbial passengers. With these models of diet-induced metabolic disease in chromosome substitution strains, we are now positioned find ways to tip complex systems from disease to health.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"2 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88951832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0028
M. Hayashida, T. Akutsu, H. Nagamochi
This poster proposes a novel clustering method for analyzing biological networks. In this method, each biological network is treated as an undirected graph and edges are weighted based on similarities of nodes. Then, maximal components, which are defined based on edge connectivity, are computed and the nodes are partitioned into clusters by selecting disjoint maximal components. The proposed method was applied to clustering of protein sequences and was compared with conventional clustering methods. The obtained clusters were evaluated using P-values for GO (GeneOntology) terms. The average P-values for the proposed method were better than those for other methods.
{"title":"A Novel Clustering Method for Analysis of Biological Networks using Maximal Components of Graphs","authors":"M. Hayashida, T. Akutsu, H. Nagamochi","doi":"10.1142/9781860947995_0028","DOIUrl":"https://doi.org/10.1142/9781860947995_0028","url":null,"abstract":"This poster proposes a novel clustering method for analyzing biological networks. In this method, each biological network is treated as an undirected graph and edges are weighted based on similarities of nodes. Then, maximal components, which are defined based on edge connectivity, are computed and the nodes are partitioned into clusters by selecting disjoint maximal components. The proposed method was applied to clustering of protein sequences and was compared with conventional clustering methods. The obtained clusters were evaluated using P-values for GO (GeneOntology) terms. The average P-values for the proposed method were better than those for other methods.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"6 1","pages":"257-266"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75011458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0019
T. Akutsu, Daiji Fukagawa
This paper proposes algorithms for inferring a chemical structure from a feature vector based on frequency of labeled paths and small fragments, where this inference problem has a potential application to drug design. In this paper, chemical structures are modeled as trees or tree-like structures. It is shown that the inference problems for these kinds of structures can be solved in polynomial time using dynamic programming-based algorithms. Since these algorithms are not practical, a branchand-bound type algorithm is also proposed. The result of computational experiment suggests that the algorithm can solve the inference problem in a few or few-tens of seconds for moderate size chemical compounds.
{"title":"Inferring a Chemical Structure from a Feature Vector Based on Frequency of Labeled Paths and Small Fragments","authors":"T. Akutsu, Daiji Fukagawa","doi":"10.1142/9781860947995_0019","DOIUrl":"https://doi.org/10.1142/9781860947995_0019","url":null,"abstract":"This paper proposes algorithms for inferring a chemical structure from a feature vector based on frequency of labeled paths and small fragments, where this inference problem has a potential application to drug design. In this paper, chemical structures are modeled as trees or tree-like structures. It is shown that the inference problems for these kinds of structures can be solved in polynomial time using dynamic programming-based algorithms. Since these algorithms are not practical, a branchand-bound type algorithm is also proposed. The result of computational experiment suggests that the algorithm can solve the inference problem in a few or few-tens of seconds for moderate size chemical compounds.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"44 1","pages":"165-174"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82667578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0031
K. Ning, K. F. Chong, H. Leong
This paper presents an improved algorithm for de novo sequencing of multi-charge mass spectra. Recent work based on the analysis of multi-charge mass spectra showed that taking advantage of multi-charge information can lead to higher accuracy (sensitivity and specificity) in peptide sequencing. A simple de novo algorithm, called GBST (Greedy algorithm with Best Strong Tag) was proposed and was shown to produce good results for spectra with charge > 2. In this paper, we analyze some of the shortcomings of GBST. We then present a new algorithm GST-SPC, by extending the GBST algorithm in two directions. First, we use a larger set of multi-charge strong tags and show that this improves the theoretical upper bound on performance. Second, we give an algorithm that computes a peptide sequence that is optimal with respect to shared peaks count from among all sequences that are derived from multi-charge strong tags. Experimental results demonstrate the improvement of GST-SPC over GBST.
本文提出了一种改进的多电荷质谱从头排序算法。最近基于多电荷质谱分析的研究表明,利用多电荷信息可以提高多肽测序的准确性(灵敏度和特异性)。提出了一种简单的从头开始算法GBST (Greedy algorithm with Best Strong Tag),对电荷> 2的光谱具有较好的结果。在本文中,我们分析了GBST的一些缺点。然后,通过在两个方向上扩展GBST算法,提出了一种新的GST-SPC算法。首先,我们使用了更大的多电荷强标签集,并表明这提高了性能的理论上限。其次,我们给出了一种算法,该算法计算了从多电荷强标签派生的所有序列中相对于共享峰数最优的肽序列。实验结果表明,GST-SPC比GBST有改进。
{"title":"De Novo Peptide Sequencing for Mass Spectra Based on Multi-Charge Strong Tags","authors":"K. Ning, K. F. Chong, H. Leong","doi":"10.1142/9781860947995_0031","DOIUrl":"https://doi.org/10.1142/9781860947995_0031","url":null,"abstract":"This paper presents an improved algorithm for de novo sequencing of multi-charge mass spectra. Recent work based on the analysis of multi-charge mass spectra showed that taking advantage of multi-charge information can lead to higher accuracy (sensitivity and specificity) in peptide sequencing. A simple de novo algorithm, called GBST (Greedy algorithm with Best Strong Tag) was proposed and was shown to produce good results for spectra with charge > 2. In this paper, we analyze some of the shortcomings of GBST. We then present a new algorithm GST-SPC, by extending the GBST algorithm in two directions. First, we use a larger set of multi-charge strong tags and show that this improves the theoretical upper bound on performance. Second, we give an algorithm that computes a peptide sequence that is optimal with respect to shared peaks count from among all sequences that are derived from multi-charge strong tags. Experimental results demonstrate the improvement of GST-SPC over GBST.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"50 1","pages":"287-296"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74300040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0029
J. Supper, H. Fröhlich, A. Zell
Inferring the structure of gene regulatory networks from gene expression data has attracted a growing interest during the last years. Several machine learning related methods, such as Bayesian networks, have been proposed to deal with this challenging problem. However, in many cases, network reconstructions purely based on gene expression data not lead to satisfactory results when comparing the obtained topology against a validation network. Therefore, in this paper we propose an "inverse" approach: Starting from a priori specified network topologies, we identify those parts of the network which are relevant for the gene expression data at hand. For this purpose, we employ linear ridge regression to predict the expression level of a given gene from its relevant regulators with high reliability. Calculated statistical significances of the resulting network topologies reveal that slight modifications of the pruned regulatory network enable an additional substantial improvement.
{"title":"Gene Regulatory Network Inference via Regression Based Topological Refinement","authors":"J. Supper, H. Fröhlich, A. Zell","doi":"10.1142/9781860947995_0029","DOIUrl":"https://doi.org/10.1142/9781860947995_0029","url":null,"abstract":"Inferring the structure of gene regulatory networks from gene expression data has attracted a growing interest during the last years. Several machine learning related methods, such as Bayesian networks, have been proposed to deal with this challenging problem. However, in many cases, network reconstructions purely based on gene expression data not lead to satisfactory results when comparing the obtained topology against a validation network. Therefore, in this paper we propose an \"inverse\" approach: Starting from a priori specified network topologies, we identify those parts of the network which are relevant for the gene expression data at hand. For this purpose, we employ linear ridge regression to predict the expression level of a given gene from its relevant regulators with high reliability. Calculated statistical significances of the resulting network topologies reveal that slight modifications of the pruned regulatory network enable an additional substantial improvement.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"67 1","pages":"267-276"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87457891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0014
Pinghao Wang, B. Zhou, M. Tarawneh, Daniel Chu, Chen Wang, Albert Y. Zomaya, R. Brent
Extending the idea of our previous algorithm [17, 18] we developed a new sequential quartet-based phylogenetic tree construction method. This new algorithm reconstructs the phylogenetic tree iteratively by examining at each merge step every possible super-quartet which is formed by four subtrees instead of simple quartet in our previous algorithm. Because our new algorithm evaluates super-quartet trees, each of which may consist of more than four molecular sequences, it can effectively alleviate a traditional, but important problem of quartet errors encountered in the quartetbased methods. Experiment results show that our newly proposed algorithm is capable of achieving very high accuracy and solid consistency in reconstructing the phylogenetic trees on different sets of synthetic DNA data under various evolution circumstances.
{"title":"A Global Maximum Likelihood Super-Quartet Phylogeny Method","authors":"Pinghao Wang, B. Zhou, M. Tarawneh, Daniel Chu, Chen Wang, Albert Y. Zomaya, R. Brent","doi":"10.1142/9781860947995_0014","DOIUrl":"https://doi.org/10.1142/9781860947995_0014","url":null,"abstract":"Extending the idea of our previous algorithm [17, 18] we developed a new sequential quartet-based phylogenetic tree construction method. This new algorithm reconstructs the phylogenetic tree iteratively by examining at each merge step every possible super-quartet which is formed by four subtrees instead of simple quartet in our previous algorithm. Because our new algorithm evaluates super-quartet trees, each of which may consist of more than four molecular sequences, it can effectively alleviate a traditional, but important problem of quartet errors encountered in the quartetbased methods. Experiment results show that our newly proposed algorithm is capable of achieving very high accuracy and solid consistency in reconstructing the phylogenetic trees on different sets of synthetic DNA data under various evolution circumstances.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"23 1","pages":"111-120"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89404952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0010
S. Madeira, Arlindo L. Oliveira
Biclustering algorithms have emerged as an important tool for the discovery of local patterns in gene expression data. For the case where the expression data corresponds to time-series, efficient algorithms that work with a discretized version of the expression matrix are known. However, these algorithms assume that the biclusters to be found are perfect, in the sense that each gene in the bicluster exhibits exactly the same expression pattern along the conditions that belong to it. In this work, we propose an algorithm that identifies genes with similar, but not necessarily equal, expression patterns, over a subset of the conditions. The results demonstrate that this approach identifies biclusters biologically more significant than those discovered by other algorithms in the literature.
{"title":"An Efficient Biclustering Algorithm for Finding Genes with Similar Patterns in Time-series Expression Data","authors":"S. Madeira, Arlindo L. Oliveira","doi":"10.1142/9781860947995_0010","DOIUrl":"https://doi.org/10.1142/9781860947995_0010","url":null,"abstract":"Biclustering algorithms have emerged as an important tool for the discovery of local patterns in gene expression data. For the case where the expression data corresponds to time-series, efficient algorithms that work with a discretized version of the expression matrix are known. However, these algorithms assume that the biclusters to be found are perfect, in the sense that each gene in the bicluster exhibits exactly the same expression pattern along the conditions that belong to it. In this work, we propose an algorithm that identifies genes with similar, but not necessarily equal, expression patterns, over a subset of the conditions. The results demonstrate that this approach identifies biclusters biologically more significant than those discovered by other algorithms in the literature.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"46 1","pages":"67-80"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90654935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0011
Zhipeng Cai, R. Goebel, M. Salavatipour, Yi Shi, Lizhe Xu, Guohui Lin
them all in classication is largely redundant. Furthermore, these selected genes can prevent the consideration of other individually-less but collectively-more dieren tially expressed genes. We propose to cluster genes in terms of their class discrimination strength and to limit the number of selected genes per cluster. By combining this idea with several existing single gene scoring methods, we show by experiments on two cancer microarray datasets that our methods identify gene subsets which collectively have signican tly higher classication accuracies.
{"title":"Selecting Genes with Dissimilar Discrimination Strength for Sample Class Prediction","authors":"Zhipeng Cai, R. Goebel, M. Salavatipour, Yi Shi, Lizhe Xu, Guohui Lin","doi":"10.1142/9781860947995_0011","DOIUrl":"https://doi.org/10.1142/9781860947995_0011","url":null,"abstract":"them all in classication is largely redundant. Furthermore, these selected genes can prevent the consideration of other individually-less but collectively-more dieren tially expressed genes. We propose to cluster genes in terms of their class discrimination strength and to limit the number of selected genes per cluster. By combining this idea with several existing single gene scoring methods, we show by experiments on two cancer microarray datasets that our methods identify gene subsets which collectively have signican tly higher classication accuracies.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"2014 1","pages":"81-90"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86647578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0030
Falk Hüffner, S. Wernicke, T. Zichner
To identify linear signaling pathways, Scott et al. [RECOMB, 2005] recently proposed to extract paths with high interaction probabilities from protein interaction networks. They used an algorithmic technique known as color-coding to solve this NP-hard problem; their implementation is capable of finding biologically meaningful pathways of length up to 10 proteins within hours. In this work, we give various novel algorithmic improvements for color-coding, both from a worst-case perspective as well as under practical considerations. Experiments on the interaction networks of yeast and fruit fly as well as a testbed of structurally comparable random networks demonstrate a speedup of the algorithm by orders of magnitude. This allows more complex and larger structures to be identified in reasonable time; finding paths of length up to 13 proteins can even be done in seconds and thus allows for an interactive exploration and evaluation of pathway candidates.
{"title":"Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection","authors":"Falk Hüffner, S. Wernicke, T. Zichner","doi":"10.1142/9781860947995_0030","DOIUrl":"https://doi.org/10.1142/9781860947995_0030","url":null,"abstract":"To identify linear signaling pathways, Scott et al. [RECOMB, 2005] recently proposed to extract paths with high interaction probabilities from protein interaction networks. They used an algorithmic technique known as color-coding to solve this NP-hard problem; their implementation is capable of finding biologically meaningful pathways of length up to 10 proteins within hours. In this work, we give various novel algorithmic improvements for color-coding, both from a worst-case perspective as well as under practical considerations. Experiments on the interaction networks of yeast and fruit fly as well as a testbed of structurally comparable random networks demonstrate a speedup of the algorithm by orders of magnitude. This allows more complex and larger structures to be identified in reasonable time; finding paths of length up to 13 proteins can even be done in seconds and thus allows for an interactive exploration and evaluation of pathway candidates.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"37 1","pages":"277-286"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74641685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}