首页 > 最新文献

Proceedings of the ... Asia-Pacific bioinformatics conference最新文献

英文 中文
A Randomized Algorithm for Comparing Sets of Phylogenetic Trees 一种比较系统发育树集的随机算法
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0015
Seung-Jin Sul, T. Williams
Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the ”true” tree. Post-processing techniques such as stri ct consensus trees are widely used to summarize the evolutionary relationships into a single tree. H owever, valuable information is lost during the summarization process. A more elementary step is to produce estimates of the topological differences that exist among all pairs of trees. We design a new randomized algorithm, called Hash-RF, that computes the all-to-all Robinson-Foulds (RF) distance—the most common distance metric for comparing two phylogenetic trees. Our approach uses a hash table to organize the bipartitions of a tree, and a universal hashing function makes our algorithm randomized. We compare the performance of our Hash-RF algorithm to PAUP*’s implementation of computing the all-to-all RF distance matrix. Our experiments focus on the algorithmic performance of comparing sets of biological trees, where the size of each tree ranged from 500 to 2,000 taxa and the collection of trees varied from 200 to 1,000 trees. Our experimental results clearly show that our Hash-RF algorithm is up to 500 times faster than PAUP*’s approach. Thus, Hash-RF provides an efficient alter native to a single tree summary of a collection of trees and potentially gives researchers the abil ity to explore their data in new and interesting ways.
系统发育分析通常会产生大量的候选进化树,每一个都是对“真正的”进化树的假设。严格共识树等后处理技术被广泛用于将进化关系归纳为一棵树。然而,在总结的过程中,有价值的信息丢失了。一个更基本的步骤是对所有树对之间存在的拓扑差异进行估计。我们设计了一种新的随机算法,称为哈希-RF,它计算所有到所有的罗宾逊-福尔兹(RF)距离,这是比较两个系统发育树最常见的距离度量。我们的方法使用哈希表来组织树的二分区,而通用哈希函数使我们的算法随机化。我们将我们的Hash-RF算法的性能与PAUP*计算全对全RF距离矩阵的实现进行了比较。我们的实验重点是比较生物树集的算法性能,其中每棵树的大小从500到2000个分类群不等,树木的集合从200到1000棵不等。我们的实验结果清楚地表明,我们的Hash-RF算法比PAUP*的方法快500倍。因此,Hash-RF提供了一种有效的替代树集合的单一树摘要,并可能使研究人员能够以新的和有趣的方式探索他们的数据。
{"title":"A Randomized Algorithm for Comparing Sets of Phylogenetic Trees","authors":"Seung-Jin Sul, T. Williams","doi":"10.1142/9781860947995_0015","DOIUrl":"https://doi.org/10.1142/9781860947995_0015","url":null,"abstract":"Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the ”true” tree. Post-processing techniques such as stri ct consensus trees are widely used to summarize the evolutionary relationships into a single tree. H owever, valuable information is lost during the summarization process. A more elementary step is to produce estimates of the topological differences that exist among all pairs of trees. We design a new randomized algorithm, called Hash-RF, that computes the all-to-all Robinson-Foulds (RF) distance—the most common distance metric for comparing two phylogenetic trees. Our approach uses a hash table to organize the bipartitions of a tree, and a universal hashing function makes our algorithm randomized. We compare the performance of our Hash-RF algorithm to PAUP*’s implementation of computing the all-to-all RF distance matrix. Our experiments focus on the algorithmic performance of comparing sets of biological trees, where the size of each tree ranged from 500 to 2,000 taxa and the collection of trees varied from 200 to 1,000 trees. Our experimental results clearly show that our Hash-RF algorithm is up to 500 times faster than PAUP*’s approach. Thus, Hash-RF provides an efficient alter native to a single tree summary of a collection of trees and potentially gives researchers the abil ity to explore their data in new and interesting ways.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76209414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Bugs, Guts and Fat - A Systems Approach to the Metabolic 'Axis of Evil' 细菌、肠道和脂肪——代谢“邪恶轴心”的系统研究
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0002
J. Nadeau
Rapidly growing evidence suggests that complex and variable interactions between host genetic and systems factors, diet, activity and lifestyle choices, and intestinal microbes control the incidence, severity and complexity of metabolic diseases. The dramatic increase in the world-wide incidence of these diseases, including obesity, diabetes, hypertension, heart disease, and fatty liver disease, raises the need for new ways to maintain health despite inherited and environmental risks. We are pursuing a comprehensive approach based on diet-induced models of metabolic disease. During the course of these studies, new and challenging statistical, analytical and computational problems were discovered. We pioneered a new paradigm for genetic studies based on chromosome substitution strains of laboratory mice. These strains involve systematically substituting each chromosome in a host strain with the corresponding chromosome from a donor strain. A genome survey with these strains therefore involves testing a panel of individual, distinct and non-overlapping genotypes, in contrast to conventional studies of heterogeneous populations. Studies of diet-induced metabolic disease with these strains have already led to striking observations. We discovered that most traits are controlled by a many genetic variants each of which has unexpectedly large phenotypic effects and that act in a highly non-additive manner. The non-additive nature of these variants challenges conventional models of the architecture of complex traits. At every level of resolution from the entire genome to very small genetic intervals, we discovered comparable levels of genetic complexity, suggesting a fractal property of complex traits. Another remarkable property of these large-effect variants is their ability to switch complex systems between alternative phenotypic states such as obese to lean and high to low cholesterol, suggesting that biological traits might be organized in a small number of stable states rather than continuous variability. Moreover, by studying correlations between non-genetic variation in pairs of traits (the genetic control of non-genetic variation), we discovered a new way to dissect the functional architecture of biological systems. Finally, a neglected aspect of these studies of metabolic disease involves the intestingal microbes. Early studies suggest that diet and host physiology affect the numbers and kinds of microbes, and that these microbes in turn affect host metabolism. These interactions between ’bugs, guts and fat’ extend systems studies from conventional aspects of genetics and biology to population considerations of the functional interactions between hosts, diet and our microbial passengers. With these models of diet-induced metabolic disease in chromosome substitution strains, we are now positioned find ways to tip complex systems from disease to health.
越来越多的证据表明,宿主遗传和系统因素、饮食、活动和生活方式选择以及肠道微生物之间复杂和可变的相互作用控制着代谢性疾病的发病率、严重程度和复杂性。包括肥胖、糖尿病、高血压、心脏病和脂肪肝在内的这些疾病在世界范围内的发病率急剧增加,因此需要寻找新的方法来维持健康,尽管存在遗传和环境风险。我们正在寻求一种基于饮食诱导的代谢疾病模型的综合方法。在这些研究过程中,发现了新的和具有挑战性的统计、分析和计算问题。我们开创了一种基于实验室小鼠染色体替代菌株的遗传研究新范式。这些菌株涉及系统地将宿主菌株中的每条染色体替换为供体菌株的相应染色体。因此,与传统的异质群体研究不同,用这些菌株进行基因组调查需要测试一组个体的、不同的和不重叠的基因型。对这些菌株的饮食引起的代谢疾病的研究已经产生了惊人的观察结果。我们发现,大多数性状是由许多遗传变异控制的,每个遗传变异都具有出乎意料的大表型效应,并且以高度非加性的方式起作用。这些变体的非加性挑战了复杂特征结构的传统模型。在从整个基因组到非常小的遗传间隔的每一个分辨率水平上,我们都发现了相当水平的遗传复杂性,这表明复杂性状具有分形特性。这些大效应变异的另一个显著特性是,它们能够在不同的表型状态(如肥胖到瘦弱、高胆固醇到低胆固醇)之间切换复杂系统,这表明生物性状可能是在少数稳定状态中组织的,而不是在连续的变异性中组织的。此外,通过研究性状对非遗传变异之间的相关性(非遗传变异的遗传控制),我们发现了一种剖析生物系统功能结构的新方法。最后,这些代谢性疾病研究中被忽视的一个方面涉及肠道微生物。早期的研究表明,饮食和宿主生理影响微生物的数量和种类,而这些微生物反过来又影响宿主的代谢。这些“细菌、肠道和脂肪”之间的相互作用将系统研究从传统的遗传学和生物学方面扩展到宿主、饮食和微生物乘客之间功能相互作用的种群考虑。有了这些染色体替代菌株中饮食引起的代谢性疾病的模型,我们现在找到了将复杂系统从疾病转向健康的方法。
{"title":"Bugs, Guts and Fat - A Systems Approach to the Metabolic 'Axis of Evil'","authors":"J. Nadeau","doi":"10.1142/9781860947995_0002","DOIUrl":"https://doi.org/10.1142/9781860947995_0002","url":null,"abstract":"Rapidly growing evidence suggests that complex and variable interactions between host genetic and systems factors, diet, activity and lifestyle choices, and intestinal microbes control the incidence, severity and complexity of metabolic diseases. The dramatic increase in the world-wide incidence of these diseases, including obesity, diabetes, hypertension, heart disease, and fatty liver disease, raises the need for new ways to maintain health despite inherited and environmental risks. We are pursuing a comprehensive approach based on diet-induced models of metabolic disease. During the course of these studies, new and challenging statistical, analytical and computational problems were discovered. We pioneered a new paradigm for genetic studies based on chromosome substitution strains of laboratory mice. These strains involve systematically substituting each chromosome in a host strain with the corresponding chromosome from a donor strain. A genome survey with these strains therefore involves testing a panel of individual, distinct and non-overlapping genotypes, in contrast to conventional studies of heterogeneous populations. Studies of diet-induced metabolic disease with these strains have already led to striking observations. We discovered that most traits are controlled by a many genetic variants each of which has unexpectedly large phenotypic effects and that act in a highly non-additive manner. The non-additive nature of these variants challenges conventional models of the architecture of complex traits. At every level of resolution from the entire genome to very small genetic intervals, we discovered comparable levels of genetic complexity, suggesting a fractal property of complex traits. Another remarkable property of these large-effect variants is their ability to switch complex systems between alternative phenotypic states such as obese to lean and high to low cholesterol, suggesting that biological traits might be organized in a small number of stable states rather than continuous variability. Moreover, by studying correlations between non-genetic variation in pairs of traits (the genetic control of non-genetic variation), we discovered a new way to dissect the functional architecture of biological systems. Finally, a neglected aspect of these studies of metabolic disease involves the intestingal microbes. Early studies suggest that diet and host physiology affect the numbers and kinds of microbes, and that these microbes in turn affect host metabolism. These interactions between ’bugs, guts and fat’ extend systems studies from conventional aspects of genetics and biology to population considerations of the functional interactions between hosts, diet and our microbial passengers. With these models of diet-induced metabolic disease in chromosome substitution strains, we are now positioned find ways to tip complex systems from disease to health.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88951832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Clustering Method for Analysis of Biological Networks using Maximal Components of Graphs 一种利用图的极大分量分析生物网络的聚类方法
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0028
M. Hayashida, T. Akutsu, H. Nagamochi
This poster proposes a novel clustering method for analyzing biological networks. In this method, each biological network is treated as an undirected graph and edges are weighted based on similarities of nodes. Then, maximal components, which are defined based on edge connectivity, are computed and the nodes are partitioned into clusters by selecting disjoint maximal components. The proposed method was applied to clustering of protein sequences and was compared with conventional clustering methods. The obtained clusters were evaluated using P-values for GO (GeneOntology) terms. The average P-values for the proposed method were better than those for other methods.
这张海报提出了一种新的聚类方法来分析生物网络。该方法将每个生物网络视为一个无向图,并根据节点的相似度对边缘进行加权。然后,计算基于边缘连通性定义的极大分量,并通过选取不相交的极大分量对节点进行聚类;将该方法应用于蛋白质序列的聚类,并与传统的聚类方法进行了比较。使用GO (GeneOntology)术语的p值对获得的聚类进行评估。该方法的平均p值优于其他方法。
{"title":"A Novel Clustering Method for Analysis of Biological Networks using Maximal Components of Graphs","authors":"M. Hayashida, T. Akutsu, H. Nagamochi","doi":"10.1142/9781860947995_0028","DOIUrl":"https://doi.org/10.1142/9781860947995_0028","url":null,"abstract":"This poster proposes a novel clustering method for analyzing biological networks. In this method, each biological network is treated as an undirected graph and edges are weighted based on similarities of nodes. Then, maximal components, which are defined based on edge connectivity, are computed and the nodes are partitioned into clusters by selecting disjoint maximal components. The proposed method was applied to clustering of protein sequences and was compared with conventional clustering methods. The obtained clusters were evaluated using P-values for GO (GeneOntology) terms. The average P-values for the proposed method were better than those for other methods.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75011458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inferring a Chemical Structure from a Feature Vector Based on Frequency of Labeled Paths and Small Fragments 基于标记路径和小片段频率的特征向量推断化学结构
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0019
T. Akutsu, Daiji Fukagawa
This paper proposes algorithms for inferring a chemical structure from a feature vector based on frequency of labeled paths and small fragments, where this inference problem has a potential application to drug design. In this paper, chemical structures are modeled as trees or tree-like structures. It is shown that the inference problems for these kinds of structures can be solved in polynomial time using dynamic programming-based algorithms. Since these algorithms are not practical, a branchand-bound type algorithm is also proposed. The result of computational experiment suggests that the algorithm can solve the inference problem in a few or few-tens of seconds for moderate size chemical compounds.
本文提出了基于标记路径和小片段的频率从特征向量推断化学结构的算法,其中该推断问题在药物设计中具有潜在的应用。在本文中,化学结构被建模为树或树状结构。结果表明,这类结构的推理问题可以用基于动态规划的算法在多项式时间内求解。由于这些算法不实用,本文还提出了一种分支定界型算法。计算实验结果表明,该算法可以在几秒或几十秒内解决中等大小化合物的推理问题。
{"title":"Inferring a Chemical Structure from a Feature Vector Based on Frequency of Labeled Paths and Small Fragments","authors":"T. Akutsu, Daiji Fukagawa","doi":"10.1142/9781860947995_0019","DOIUrl":"https://doi.org/10.1142/9781860947995_0019","url":null,"abstract":"This paper proposes algorithms for inferring a chemical structure from a feature vector based on frequency of labeled paths and small fragments, where this inference problem has a potential application to drug design. In this paper, chemical structures are modeled as trees or tree-like structures. It is shown that the inference problems for these kinds of structures can be solved in polynomial time using dynamic programming-based algorithms. Since these algorithms are not practical, a branchand-bound type algorithm is also proposed. The result of computational experiment suggests that the algorithm can solve the inference problem in a few or few-tens of seconds for moderate size chemical compounds.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82667578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
De Novo Peptide Sequencing for Mass Spectra Based on Multi-Charge Strong Tags 基于多电荷强标签的肽质谱从头测序
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0031
K. Ning, K. F. Chong, H. Leong
This paper presents an improved algorithm for de novo sequencing of multi-charge mass spectra. Recent work based on the analysis of multi-charge mass spectra showed that taking advantage of multi-charge information can lead to higher accuracy (sensitivity and specificity) in peptide sequencing. A simple de novo algorithm, called GBST (Greedy algorithm with Best Strong Tag) was proposed and was shown to produce good results for spectra with charge > 2. In this paper, we analyze some of the shortcomings of GBST. We then present a new algorithm GST-SPC, by extending the GBST algorithm in two directions. First, we use a larger set of multi-charge strong tags and show that this improves the theoretical upper bound on performance. Second, we give an algorithm that computes a peptide sequence that is optimal with respect to shared peaks count from among all sequences that are derived from multi-charge strong tags. Experimental results demonstrate the improvement of GST-SPC over GBST.
本文提出了一种改进的多电荷质谱从头排序算法。最近基于多电荷质谱分析的研究表明,利用多电荷信息可以提高多肽测序的准确性(灵敏度和特异性)。提出了一种简单的从头开始算法GBST (Greedy algorithm with Best Strong Tag),对电荷> 2的光谱具有较好的结果。在本文中,我们分析了GBST的一些缺点。然后,通过在两个方向上扩展GBST算法,提出了一种新的GST-SPC算法。首先,我们使用了更大的多电荷强标签集,并表明这提高了性能的理论上限。其次,我们给出了一种算法,该算法计算了从多电荷强标签派生的所有序列中相对于共享峰数最优的肽序列。实验结果表明,GST-SPC比GBST有改进。
{"title":"De Novo Peptide Sequencing for Mass Spectra Based on Multi-Charge Strong Tags","authors":"K. Ning, K. F. Chong, H. Leong","doi":"10.1142/9781860947995_0031","DOIUrl":"https://doi.org/10.1142/9781860947995_0031","url":null,"abstract":"This paper presents an improved algorithm for de novo sequencing of multi-charge mass spectra. Recent work based on the analysis of multi-charge mass spectra showed that taking advantage of multi-charge information can lead to higher accuracy (sensitivity and specificity) in peptide sequencing. A simple de novo algorithm, called GBST (Greedy algorithm with Best Strong Tag) was proposed and was shown to produce good results for spectra with charge > 2. In this paper, we analyze some of the shortcomings of GBST. We then present a new algorithm GST-SPC, by extending the GBST algorithm in two directions. First, we use a larger set of multi-charge strong tags and show that this improves the theoretical upper bound on performance. Second, we give an algorithm that computes a peptide sequence that is optimal with respect to shared peaks count from among all sequences that are derived from multi-charge strong tags. Experimental results demonstrate the improvement of GST-SPC over GBST.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74300040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Gene Regulatory Network Inference via Regression Based Topological Refinement 基于回归拓扑优化的基因调控网络推断
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0029
J. Supper, H. Fröhlich, A. Zell
Inferring the structure of gene regulatory networks from gene expression data has attracted a growing interest during the last years. Several machine learning related methods, such as Bayesian networks, have been proposed to deal with this challenging problem. However, in many cases, network reconstructions purely based on gene expression data not lead to satisfactory results when comparing the obtained topology against a validation network. Therefore, in this paper we propose an "inverse" approach: Starting from a priori specified network topologies, we identify those parts of the network which are relevant for the gene expression data at hand. For this purpose, we employ linear ridge regression to predict the expression level of a given gene from its relevant regulators with high reliability. Calculated statistical significances of the resulting network topologies reveal that slight modifications of the pruned regulatory network enable an additional substantial improvement.
从基因表达数据推断基因调控网络的结构在过去几年里引起了越来越多的兴趣。一些机器学习相关的方法,如贝叶斯网络,已经被提出来处理这个具有挑战性的问题。然而,在许多情况下,当将获得的拓扑与验证网络进行比较时,纯粹基于基因表达数据的网络重构不会产生令人满意的结果。因此,在本文中,我们提出了一种“反向”方法:从先验指定的网络拓扑开始,我们确定网络中与手头基因表达数据相关的部分。为此,我们采用线性脊回归从相关调控因子中预测给定基因的表达水平,可靠性高。计算得出的网络拓扑的统计意义表明,对修剪后的监管网络进行轻微修改可以实现额外的实质性改进。
{"title":"Gene Regulatory Network Inference via Regression Based Topological Refinement","authors":"J. Supper, H. Fröhlich, A. Zell","doi":"10.1142/9781860947995_0029","DOIUrl":"https://doi.org/10.1142/9781860947995_0029","url":null,"abstract":"Inferring the structure of gene regulatory networks from gene expression data has attracted a growing interest during the last years. Several machine learning related methods, such as Bayesian networks, have been proposed to deal with this challenging problem. However, in many cases, network reconstructions purely based on gene expression data not lead to satisfactory results when comparing the obtained topology against a validation network. Therefore, in this paper we propose an \"inverse\" approach: Starting from a priori specified network topologies, we identify those parts of the network which are relevant for the gene expression data at hand. For this purpose, we employ linear ridge regression to predict the expression level of a given gene from its relevant regulators with high reliability. Calculated statistical significances of the resulting network topologies reveal that slight modifications of the pruned regulatory network enable an additional substantial improvement.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87457891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Global Maximum Likelihood Super-Quartet Phylogeny Method 一种全局极大似然超四重奏系统发育方法
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0014
Pinghao Wang, B. Zhou, M. Tarawneh, Daniel Chu, Chen Wang, Albert Y. Zomaya, R. Brent
Extending the idea of our previous algorithm [17, 18] we developed a new sequential quartet-based phylogenetic tree construction method. This new algorithm reconstructs the phylogenetic tree iteratively by examining at each merge step every possible super-quartet which is formed by four subtrees instead of simple quartet in our previous algorithm. Because our new algorithm evaluates super-quartet trees, each of which may consist of more than four molecular sequences, it can effectively alleviate a traditional, but important problem of quartet errors encountered in the quartetbased methods. Experiment results show that our newly proposed algorithm is capable of achieving very high accuracy and solid consistency in reconstructing the phylogenetic trees on different sets of synthetic DNA data under various evolution circumstances.
我们扩展了之前算法的思想[17,18],开发了一种新的基于序列四重奏的系统发育树构建方法。该算法通过在每个合并步骤检查由四个子树组成的每个可能的超四重奏来迭代地重建系统发育树,而不是以前的算法中简单的四重奏。由于我们的新算法评估的是超四重奏树,每个树可能由四个以上的分子序列组成,因此它可以有效地缓解基于四重奏的方法中遇到的传统但重要的四重奏错误问题。实验结果表明,该算法在不同的进化环境下对不同的合成DNA数据进行系统发育树重建,具有很高的准确性和较强的一致性。
{"title":"A Global Maximum Likelihood Super-Quartet Phylogeny Method","authors":"Pinghao Wang, B. Zhou, M. Tarawneh, Daniel Chu, Chen Wang, Albert Y. Zomaya, R. Brent","doi":"10.1142/9781860947995_0014","DOIUrl":"https://doi.org/10.1142/9781860947995_0014","url":null,"abstract":"Extending the idea of our previous algorithm [17, 18] we developed a new sequential quartet-based phylogenetic tree construction method. This new algorithm reconstructs the phylogenetic tree iteratively by examining at each merge step every possible super-quartet which is formed by four subtrees instead of simple quartet in our previous algorithm. Because our new algorithm evaluates super-quartet trees, each of which may consist of more than four molecular sequences, it can effectively alleviate a traditional, but important problem of quartet errors encountered in the quartetbased methods. Experiment results show that our newly proposed algorithm is capable of achieving very high accuracy and solid consistency in reconstructing the phylogenetic trees on different sets of synthetic DNA data under various evolution circumstances.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89404952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Efficient Biclustering Algorithm for Finding Genes with Similar Patterns in Time-series Expression Data 一种高效的双聚类算法在时间序列表达数据中寻找具有相似模式的基因
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0010
S. Madeira, Arlindo L. Oliveira
Biclustering algorithms have emerged as an important tool for the discovery of local patterns in gene expression data. For the case where the expression data corresponds to time-series, efficient algorithms that work with a discretized version of the expression matrix are known. However, these algorithms assume that the biclusters to be found are perfect, in the sense that each gene in the bicluster exhibits exactly the same expression pattern along the conditions that belong to it. In this work, we propose an algorithm that identifies genes with similar, but not necessarily equal, expression patterns, over a subset of the conditions. The results demonstrate that this approach identifies biclusters biologically more significant than those discovered by other algorithms in the literature.
双聚类算法已经成为发现基因表达数据中局部模式的重要工具。对于表达式数据对应于时间序列的情况,已知有效的算法可以处理表达式矩阵的离散版本。然而,这些算法假设要找到的双聚类是完美的,从某种意义上说,双聚类中的每个基因在属于它的条件下表现出完全相同的表达模式。在这项工作中,我们提出了一种算法,该算法可以在一组条件下识别具有相似但不一定相等的表达模式的基因。结果表明,这种方法比文献中其他算法发现的方法在生物学上识别双聚类更重要。
{"title":"An Efficient Biclustering Algorithm for Finding Genes with Similar Patterns in Time-series Expression Data","authors":"S. Madeira, Arlindo L. Oliveira","doi":"10.1142/9781860947995_0010","DOIUrl":"https://doi.org/10.1142/9781860947995_0010","url":null,"abstract":"Biclustering algorithms have emerged as an important tool for the discovery of local patterns in gene expression data. For the case where the expression data corresponds to time-series, efficient algorithms that work with a discretized version of the expression matrix are known. However, these algorithms assume that the biclusters to be found are perfect, in the sense that each gene in the bicluster exhibits exactly the same expression pattern along the conditions that belong to it. In this work, we propose an algorithm that identifies genes with similar, but not necessarily equal, expression patterns, over a subset of the conditions. The results demonstrate that this approach identifies biclusters biologically more significant than those discovered by other algorithms in the literature.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90654935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Selecting Genes with Dissimilar Discrimination Strength for Sample Class Prediction 选择不同鉴别强度的基因进行样本分类预测
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0011
Zhipeng Cai, R. Goebel, M. Salavatipour, Yi Shi, Lizhe Xu, Guohui Lin
them all in classication is largely redundant. Furthermore, these selected genes can prevent the consideration of other individually-less but collectively-more dieren tially expressed genes. We propose to cluster genes in terms of their class discrimination strength and to limit the number of selected genes per cluster. By combining this idea with several existing single gene scoring methods, we show by experiments on two cancer microarray datasets that our methods identify gene subsets which collectively have signican tly higher classication accuracies.
这些分类在很大程度上是多余的。此外,这些被选择的基因可以防止考虑其他个别的(但不是集体的)更多不同表达的基因。我们建议根据它们的类区分强度对基因进行聚类,并限制每聚类所选择的基因的数量。通过将这一想法与几种现有的单基因评分方法相结合,我们在两个癌症微阵列数据集上的实验表明,我们的方法识别出的基因子集具有显着更高的分类精度。
{"title":"Selecting Genes with Dissimilar Discrimination Strength for Sample Class Prediction","authors":"Zhipeng Cai, R. Goebel, M. Salavatipour, Yi Shi, Lizhe Xu, Guohui Lin","doi":"10.1142/9781860947995_0011","DOIUrl":"https://doi.org/10.1142/9781860947995_0011","url":null,"abstract":"them all in classication is largely redundant. Furthermore, these selected genes can prevent the consideration of other individually-less but collectively-more dieren tially expressed genes. We propose to cluster genes in terms of their class discrimination strength and to limit the number of selected genes per cluster. By combining this idea with several existing single gene scoring methods, we show by experiments on two cancer microarray datasets that our methods identify gene subsets which collectively have signican tly higher classication accuracies.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86647578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection 便于信号路径检测的颜色编码算法工程
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0030
Falk Hüffner, S. Wernicke, T. Zichner
To identify linear signaling pathways, Scott et al. [RECOMB, 2005] recently proposed to extract paths with high interaction probabilities from protein interaction networks. They used an algorithmic technique known as color-coding to solve this NP-hard problem; their implementation is capable of finding biologically meaningful pathways of length up to 10 proteins within hours. In this work, we give various novel algorithmic improvements for color-coding, both from a worst-case perspective as well as under practical considerations. Experiments on the interaction networks of yeast and fruit fly as well as a testbed of structurally comparable random networks demonstrate a speedup of the algorithm by orders of magnitude. This allows more complex and larger structures to be identified in reasonable time; finding paths of length up to 13 proteins can even be done in seconds and thus allows for an interactive exploration and evaluation of pathway candidates.
为了识别线性信号通路,Scott等人[RECOMB, 2005]最近提出从蛋白质相互作用网络中提取具有高相互作用概率的路径。他们使用了一种被称为颜色编码的算法技术来解决这个np难题;他们的实现能够在数小时内找到长度达10个蛋白质的有生物学意义的途径。在这项工作中,我们从最坏情况的角度以及实际考虑出发,给出了各种新的颜色编码算法改进。酵母和果蝇相互作用网络的实验以及结构可比随机网络的测试平台表明,该算法的速度提高了几个数量级。这允许在合理的时间内识别更复杂和更大的结构;寻找长度多达13个蛋白质的路径甚至可以在几秒钟内完成,从而允许对候选途径进行交互式探索和评估。
{"title":"Algorithm Engineering for Color-Coding to Facilitate Signaling Pathway Detection","authors":"Falk Hüffner, S. Wernicke, T. Zichner","doi":"10.1142/9781860947995_0030","DOIUrl":"https://doi.org/10.1142/9781860947995_0030","url":null,"abstract":"To identify linear signaling pathways, Scott et al. [RECOMB, 2005] recently proposed to extract paths with high interaction probabilities from protein interaction networks. They used an algorithmic technique known as color-coding to solve this NP-hard problem; their implementation is capable of finding biologically meaningful pathways of length up to 10 proteins within hours. In this work, we give various novel algorithmic improvements for color-coding, both from a worst-case perspective as well as under practical considerations. Experiments on the interaction networks of yeast and fruit fly as well as a testbed of structurally comparable random networks demonstrate a speedup of the algorithm by orders of magnitude. This allows more complex and larger structures to be identified in reasonable time; finding paths of length up to 13 proteins can even be done in seconds and thus allows for an interactive exploration and evaluation of pathway candidates.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74641685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Proceedings of the ... Asia-Pacific bioinformatics conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1