Modeling species-genes data for efficient phylogenetic inference.

Wenyuan Li, Ying Liu
{"title":"Modeling species-genes data for efficient phylogenetic inference.","authors":"Wenyuan Li,&nbsp;Ying Liu","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, biclique methods have been proposed to construct phylogenetic trees. One of the key steps of these methods is to find complete sub-matrices (without missing entries) from a species-genes data matrix. To enumerate all complete sub-matrices, (17) described an exact algorithm, whose running time is exponential. Furthermore, it generates a large number of complete sub-matrices, many of which may not be used for tree reconstruction. Further investigating and understanding the characteristics of species-genes data may be helpful for discovering complete sub-matrices. Therefore, in this paper, we focus on quantitatively studying and understanding the characteristics of species-genes data, which can be used to guide new algorithm design for efficient phylogenetic inference. In this paper, a mathematical model is constructed to simulate the real species-genes data. The results indicate that sequence-availability probability distributions follow power law, which leads to the skewness and sparseness of the real species-genes data. Moreover, a special structure, called \"ladder structure\", is discovered in the real species-genes data. This ladder structure is used to identify complete sub-matrices, and more importantly, to reveal overlapping relationships among complete sub-matrices. To discover the distinct ladder structure in real species-genes data, we propose an efficient evolutionary dynamical system, called \"generalized replicator dynamics\". Two species-genes data sets from green plants are used to illustrate the effectiveness of our model. Empirical study has shown that our model is effective and efficient in understanding species-genes data for phylogenetic inference.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"429-40"},"PeriodicalIF":0.0000,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, biclique methods have been proposed to construct phylogenetic trees. One of the key steps of these methods is to find complete sub-matrices (without missing entries) from a species-genes data matrix. To enumerate all complete sub-matrices, (17) described an exact algorithm, whose running time is exponential. Furthermore, it generates a large number of complete sub-matrices, many of which may not be used for tree reconstruction. Further investigating and understanding the characteristics of species-genes data may be helpful for discovering complete sub-matrices. Therefore, in this paper, we focus on quantitatively studying and understanding the characteristics of species-genes data, which can be used to guide new algorithm design for efficient phylogenetic inference. In this paper, a mathematical model is constructed to simulate the real species-genes data. The results indicate that sequence-availability probability distributions follow power law, which leads to the skewness and sparseness of the real species-genes data. Moreover, a special structure, called "ladder structure", is discovered in the real species-genes data. This ladder structure is used to identify complete sub-matrices, and more importantly, to reveal overlapping relationships among complete sub-matrices. To discover the distinct ladder structure in real species-genes data, we propose an efficient evolutionary dynamical system, called "generalized replicator dynamics". Two species-genes data sets from green plants are used to illustrate the effectiveness of our model. Empirical study has shown that our model is effective and efficient in understanding species-genes data for phylogenetic inference.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为有效的系统发育推断建立物种-基因数据模型。
近年来,人们提出了biclique方法来构建系统发育树。这些方法的关键步骤之一是从物种-基因数据矩阵中找到完整的子矩阵(不缺少条目)。为了枚举所有完整子矩阵,(17)描述了一个精确算法,其运行时间是指数的。此外,它生成了大量完整的子矩阵,其中许多可能无法用于树重建。进一步研究和了解物种基因数据的特征,有助于发现完整的亚矩阵。因此,在本文中,我们着重于定量研究和理解物种-基因数据的特征,这可以用来指导新的算法设计,以实现高效的系统发育推断。本文建立了一个数学模型来模拟真实的物种-基因数据。结果表明,序列可用性概率分布服从幂律,这导致了实际物种基因数据的偏性和稀疏性。此外,在真实的物种基因数据中发现了一种特殊的结构,称为“阶梯结构”。该阶梯结构用于识别完全子矩阵,更重要的是揭示完全子矩阵之间的重叠关系。为了发现真实物种-基因数据中独特的阶梯结构,我们提出了一种有效的进化动力系统,称为“广义复制因子动力学”。两个来自绿色植物的物种基因数据集被用来说明我们模型的有效性。实证研究表明,该模型在理解物种-基因数据进行系统发育推断方面是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Novel Gene Discovery in the Human Malaria Parasite using Nucleosome Positioning Data. Estimating support for protein-protein interaction data with applications to function prediction. On the accurate construction of consensus genetic maps. Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures. Knowledge representation and data mining for biological imaging.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1