基于似然系统发育定位的高效内存管理

P. Barbera, A. Stamatakis
{"title":"基于似然系统发育定位的高效内存管理","authors":"P. Barbera, A. Stamatakis","doi":"10.1109/IPDPSW52791.2021.00041","DOIUrl":null,"url":null,"abstract":"Maximum likelihood based phylogenetic methods score phylogenetic tree topologies comprising a set of molecular sequences of the species under study, using statistical models of evolution. The scoring procedure relies on storing intermediate results at inner nodes of the tree during the tree traversal. This induces comparatively high memory requirements compared to less compute-intensive methods such as parsimony, for instance.The memory requirements are particularly large for maximum likelihood phylogenetic placement, as further intermediate results should be stored at all branches of the tree to maximize runtime performance. This has hindered numerous users of our phylogenetic placement tool EPA-NG from performing placement on large phylogenetic trees.Here, we present an approach to reduce the memory footprint of EPA-NG. Further, we have generalized our implementation and integrated it into our phylogenetic likelihood library, libpll-2, such that it can be used by other tools for phylogenetic inference. On an empirical dataset, we were able to reduce the memory requirements by up to 96% at the cost of increasing execution times by 23 times. Hence, there exists a trade-off between decreasing memory requirements and increasing execution times which we investigate. When increasing the amount of memory available for placement to a certain level, execution times are only approximately 4 times lower for the most challenging dataset we have tested. This now allows for conducting maximum likelihood based placement on substantially larger trees within reasonable times. Finally, we show that the active memory management approach introduces new challenges for parallelization and outline possible solutions.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Memory Management in Likelihood-based Phylogenetic Placement\",\"authors\":\"P. Barbera, A. Stamatakis\",\"doi\":\"10.1109/IPDPSW52791.2021.00041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Maximum likelihood based phylogenetic methods score phylogenetic tree topologies comprising a set of molecular sequences of the species under study, using statistical models of evolution. The scoring procedure relies on storing intermediate results at inner nodes of the tree during the tree traversal. This induces comparatively high memory requirements compared to less compute-intensive methods such as parsimony, for instance.The memory requirements are particularly large for maximum likelihood phylogenetic placement, as further intermediate results should be stored at all branches of the tree to maximize runtime performance. This has hindered numerous users of our phylogenetic placement tool EPA-NG from performing placement on large phylogenetic trees.Here, we present an approach to reduce the memory footprint of EPA-NG. Further, we have generalized our implementation and integrated it into our phylogenetic likelihood library, libpll-2, such that it can be used by other tools for phylogenetic inference. On an empirical dataset, we were able to reduce the memory requirements by up to 96% at the cost of increasing execution times by 23 times. Hence, there exists a trade-off between decreasing memory requirements and increasing execution times which we investigate. When increasing the amount of memory available for placement to a certain level, execution times are only approximately 4 times lower for the most challenging dataset we have tested. This now allows for conducting maximum likelihood based placement on substantially larger trees within reasonable times. Finally, we show that the active memory management approach introduces new challenges for parallelization and outline possible solutions.\",\"PeriodicalId\":170832,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW52791.2021.00041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于最大似然的系统发育方法使用进化的统计模型,对系统发育树拓扑结构进行评分,该拓扑结构包括被研究物种的一组分子序列。评分过程依赖于在树遍历期间将中间结果存储在树的内部节点上。与计算密集度较低的方法(例如parsimony)相比,这会导致相对较高的内存需求。对于最大似然系统发育放置,内存需求特别大,因为进一步的中间结果应该存储在树的所有分支中,以最大化运行时性能。这阻碍了我们的系统发育定位工具EPA-NG的许多用户在大型系统发育树上进行定位。在这里,我们提出了一种减少EPA-NG内存占用的方法。此外,我们对我们的实现进行了一般化,并将其集成到我们的系统发生可能性库libpll-2中,以便其他工具可以使用它进行系统发生推断。在一个经验数据集上,我们能够以将执行时间增加23倍为代价,将内存需求减少多达96%。因此,在减少内存需求和增加执行时间之间存在权衡,我们对此进行了研究。当将可用于放置的内存量增加到一定程度时,对于我们测试过的最具挑战性的数据集,执行时间只降低了大约4倍。现在,这允许在合理的时间内对更大的树进行基于最大可能性的放置。最后,我们展示了主动内存管理方法为并行化带来了新的挑战,并概述了可能的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Efficient Memory Management in Likelihood-based Phylogenetic Placement
Maximum likelihood based phylogenetic methods score phylogenetic tree topologies comprising a set of molecular sequences of the species under study, using statistical models of evolution. The scoring procedure relies on storing intermediate results at inner nodes of the tree during the tree traversal. This induces comparatively high memory requirements compared to less compute-intensive methods such as parsimony, for instance.The memory requirements are particularly large for maximum likelihood phylogenetic placement, as further intermediate results should be stored at all branches of the tree to maximize runtime performance. This has hindered numerous users of our phylogenetic placement tool EPA-NG from performing placement on large phylogenetic trees.Here, we present an approach to reduce the memory footprint of EPA-NG. Further, we have generalized our implementation and integrated it into our phylogenetic likelihood library, libpll-2, such that it can be used by other tools for phylogenetic inference. On an empirical dataset, we were able to reduce the memory requirements by up to 96% at the cost of increasing execution times by 23 times. Hence, there exists a trade-off between decreasing memory requirements and increasing execution times which we investigate. When increasing the amount of memory available for placement to a certain level, execution times are only approximately 4 times lower for the most challenging dataset we have tested. This now allows for conducting maximum likelihood based placement on substantially larger trees within reasonable times. Finally, we show that the active memory management approach introduces new challenges for parallelization and outline possible solutions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Time-Division Multiplexing for FPGA Considering CNN Model Switch Time Load Balancing Schemes for Large Synthetic Population-Based Complex Simulators On Data Parallelism Code Restructuring for HLS Targeting FPGAs Improving the MPI-IO Performance of Applications with Genetic Algorithm based Auto-tuning ScaDL 2021 Invited Speaker-3: AI for Social Impact: Results from multiagent reasoning and learning in the real world
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1