为有效的系统发育推断建立物种-基因数据模型。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2007-01-01 DOI:10.1142/9781860948732_0043

Wenyuan Li, Y. Liu

{"title":"为有效的系统发育推断建立物种-基因数据模型。","authors":"Wenyuan Li, Y. Liu","doi":"10.1142/9781860948732_0043","DOIUrl":null,"url":null,"abstract":"In recent years, biclique methods have been proposed to construct phylogenetic trees. One of the key steps of these methods is to find complete sub-matrices (without missing entries) from a species-genes data matrix. To enumerate all complete sub-matrices, (17) described an exact algorithm, whose running time is exponential. Furthermore, it generates a large number of complete sub-matrices, many of which may not be used for tree reconstruction. Further investigating and understanding the characteristics of species-genes data may be helpful for discovering complete sub-matrices. Therefore, in this paper, we focus on quantitatively studying and understanding the characteristics of species-genes data, which can be used to guide new algorithm design for efficient phylogenetic inference. In this paper, a mathematical model is constructed to simulate the real species-genes data. The results indicate that sequence-availability probability distributions follow power law, which leads to the skewness and sparseness of the real species-genes data. Moreover, a special structure, called \"ladder structure\", is discovered in the real species-genes data. This ladder structure is used to identify complete sub-matrices, and more importantly, to reveal overlapping relationships among complete sub-matrices. To discover the distinct ladder structure in real species-genes data, we propose an efficient evolutionary dynamical system, called \"generalized replicator dynamics\". Two species-genes data sets from green plants are used to illustrate the effectiveness of our model. Empirical study has shown that our model is effective and efficient in understanding species-genes data for phylogenetic inference.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"429-40"},"PeriodicalIF":0.0000,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Modeling species-genes data for efficient phylogenetic inference.\",\"authors\":\"Wenyuan Li, Y. Liu\",\"doi\":\"10.1142/9781860948732_0043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, biclique methods have been proposed to construct phylogenetic trees. One of the key steps of these methods is to find complete sub-matrices (without missing entries) from a species-genes data matrix. To enumerate all complete sub-matrices, (17) described an exact algorithm, whose running time is exponential. Furthermore, it generates a large number of complete sub-matrices, many of which may not be used for tree reconstruction. Further investigating and understanding the characteristics of species-genes data may be helpful for discovering complete sub-matrices. Therefore, in this paper, we focus on quantitatively studying and understanding the characteristics of species-genes data, which can be used to guide new algorithm design for efficient phylogenetic inference. In this paper, a mathematical model is constructed to simulate the real species-genes data. The results indicate that sequence-availability probability distributions follow power law, which leads to the skewness and sparseness of the real species-genes data. Moreover, a special structure, called \\\"ladder structure\\\", is discovered in the real species-genes data. This ladder structure is used to identify complete sub-matrices, and more importantly, to reveal overlapping relationships among complete sub-matrices. To discover the distinct ladder structure in real species-genes data, we propose an efficient evolutionary dynamical system, called \\\"generalized replicator dynamics\\\". Two species-genes data sets from green plants are used to illustrate the effectiveness of our model. Empirical study has shown that our model is effective and efficient in understanding species-genes data for phylogenetic inference.\",\"PeriodicalId\":72665,\"journal\":{\"name\":\"Computational systems bioinformatics. Computational Systems Bioinformatics Conference\",\"volume\":\"6 1\",\"pages\":\"429-40\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational systems bioinformatics. Computational Systems Bioinformatics Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/9781860948732_0043\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860948732_0043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

近年来，人们提出了biclique方法来构建系统发育树。这些方法的关键步骤之一是从物种-基因数据矩阵中找到完整的子矩阵(不缺少条目)。为了枚举所有完整子矩阵，(17)描述了一个精确算法，其运行时间是指数的。此外，它生成了大量完整的子矩阵，其中许多可能无法用于树重建。进一步研究和了解物种基因数据的特征，有助于发现完整的亚矩阵。因此，在本文中，我们着重于定量研究和理解物种-基因数据的特征，这可以用来指导新的算法设计，以实现高效的系统发育推断。本文建立了一个数学模型来模拟真实的物种-基因数据。结果表明，序列可用性概率分布服从幂律，这导致了实际物种基因数据的偏性和稀疏性。此外，在真实的物种基因数据中发现了一种特殊的结构，称为“阶梯结构”。该阶梯结构用于识别完全子矩阵，更重要的是揭示完全子矩阵之间的重叠关系。为了发现真实物种-基因数据中独特的阶梯结构，我们提出了一种有效的进化动力系统，称为“广义复制因子动力学”。两个来自绿色植物的物种基因数据集被用来说明我们模型的有效性。实证研究表明，该模型在理解物种-基因数据进行系统发育推断方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Modeling species-genes data for efficient phylogenetic inference.

In recent years, biclique methods have been proposed to construct phylogenetic trees. One of the key steps of these methods is to find complete sub-matrices (without missing entries) from a species-genes data matrix. To enumerate all complete sub-matrices, (17) described an exact algorithm, whose running time is exponential. Furthermore, it generates a large number of complete sub-matrices, many of which may not be used for tree reconstruction. Further investigating and understanding the characteristics of species-genes data may be helpful for discovering complete sub-matrices. Therefore, in this paper, we focus on quantitatively studying and understanding the characteristics of species-genes data, which can be used to guide new algorithm design for efficient phylogenetic inference. In this paper, a mathematical model is constructed to simulate the real species-genes data. The results indicate that sequence-availability probability distributions follow power law, which leads to the skewness and sparseness of the real species-genes data. Moreover, a special structure, called "ladder structure", is discovered in the real species-genes data. This ladder structure is used to identify complete sub-matrices, and more importantly, to reveal overlapping relationships among complete sub-matrices. To discover the distinct ladder structure in real species-genes data, we propose an efficient evolutionary dynamical system, called "generalized replicator dynamics". Two species-genes data sets from green plants are used to illustrate the effectiveness of our model. Empirical study has shown that our model is effective and efficient in understanding species-genes data for phylogenetic inference.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

自引率

0.00%

发文量