Modeling species-genes data for efficient phylogenetic inference.

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2007-01-01

Wenyuan Li, Ying Liu

{"title":"Modeling species-genes data for efficient phylogenetic inference.","authors":"Wenyuan Li, Ying Liu","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, biclique methods have been proposed to construct phylogenetic trees. One of the key steps of these methods is to find complete sub-matrices (without missing entries) from a species-genes data matrix. To enumerate all complete sub-matrices, (17) described an exact algorithm, whose running time is exponential. Furthermore, it generates a large number of complete sub-matrices, many of which may not be used for tree reconstruction. Further investigating and understanding the characteristics of species-genes data may be helpful for discovering complete sub-matrices. Therefore, in this paper, we focus on quantitatively studying and understanding the characteristics of species-genes data, which can be used to guide new algorithm design for efficient phylogenetic inference. In this paper, a mathematical model is constructed to simulate the real species-genes data. The results indicate that sequence-availability probability distributions follow power law, which leads to the skewness and sparseness of the real species-genes data. Moreover, a special structure, called \"ladder structure\", is discovered in the real species-genes data. This ladder structure is used to identify complete sub-matrices, and more importantly, to reveal overlapping relationships among complete sub-matrices. To discover the distinct ladder structure in real species-genes data, we propose an efficient evolutionary dynamical system, called \"generalized replicator dynamics\". Two species-genes data sets from green plants are used to illustrate the effectiveness of our model. Empirical study has shown that our model is effective and efficient in understanding species-genes data for phylogenetic inference.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"429-40"},"PeriodicalIF":0.0000,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, biclique methods have been proposed to construct phylogenetic trees. One of the key steps of these methods is to find complete sub-matrices (without missing entries) from a species-genes data matrix. To enumerate all complete sub-matrices, (17) described an exact algorithm, whose running time is exponential. Furthermore, it generates a large number of complete sub-matrices, many of which may not be used for tree reconstruction. Further investigating and understanding the characteristics of species-genes data may be helpful for discovering complete sub-matrices. Therefore, in this paper, we focus on quantitatively studying and understanding the characteristics of species-genes data, which can be used to guide new algorithm design for efficient phylogenetic inference. In this paper, a mathematical model is constructed to simulate the real species-genes data. The results indicate that sequence-availability probability distributions follow power law, which leads to the skewness and sparseness of the real species-genes data. Moreover, a special structure, called "ladder structure", is discovered in the real species-genes data. This ladder structure is used to identify complete sub-matrices, and more importantly, to reveal overlapping relationships among complete sub-matrices. To discover the distinct ladder structure in real species-genes data, we propose an efficient evolutionary dynamical system, called "generalized replicator dynamics". Two species-genes data sets from green plants are used to illustrate the effectiveness of our model. Empirical study has shown that our model is effective and efficient in understanding species-genes data for phylogenetic inference.

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为有效的系统发育推断建立物种-基因数据模型。

近年来，人们提出了biclique方法来构建系统发育树。这些方法的关键步骤之一是从物种-基因数据矩阵中找到完整的子矩阵(不缺少条目)。为了枚举所有完整子矩阵，(17)描述了一个精确算法，其运行时间是指数的。此外，它生成了大量完整的子矩阵，其中许多可能无法用于树重建。进一步研究和了解物种基因数据的特征，有助于发现完整的亚矩阵。因此，在本文中，我们着重于定量研究和理解物种-基因数据的特征，这可以用来指导新的算法设计，以实现高效的系统发育推断。本文建立了一个数学模型来模拟真实的物种-基因数据。结果表明，序列可用性概率分布服从幂律，这导致了实际物种基因数据的偏性和稀疏性。此外，在真实的物种基因数据中发现了一种特殊的结构，称为“阶梯结构”。该阶梯结构用于识别完全子矩阵，更重要的是揭示完全子矩阵之间的重叠关系。为了发现真实物种-基因数据中独特的阶梯结构，我们提出了一种有效的进化动力系统，称为“广义复制因子动力学”。两个来自绿色植物的物种基因数据集被用来说明我们模型的有效性。实证研究表明，该模型在理解物种-基因数据进行系统发育推断方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

自引率

0.00%

发文量