基于似然系统发育定位的高效内存管理

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI:10.1109/IPDPSW52791.2021.00041

P. Barbera, A. Stamatakis

{"title":"基于似然系统发育定位的高效内存管理","authors":"P. Barbera, A. Stamatakis","doi":"10.1109/IPDPSW52791.2021.00041","DOIUrl":null,"url":null,"abstract":"Maximum likelihood based phylogenetic methods score phylogenetic tree topologies comprising a set of molecular sequences of the species under study, using statistical models of evolution. The scoring procedure relies on storing intermediate results at inner nodes of the tree during the tree traversal. This induces comparatively high memory requirements compared to less compute-intensive methods such as parsimony, for instance.The memory requirements are particularly large for maximum likelihood phylogenetic placement, as further intermediate results should be stored at all branches of the tree to maximize runtime performance. This has hindered numerous users of our phylogenetic placement tool EPA-NG from performing placement on large phylogenetic trees.Here, we present an approach to reduce the memory footprint of EPA-NG. Further, we have generalized our implementation and integrated it into our phylogenetic likelihood library, libpll-2, such that it can be used by other tools for phylogenetic inference. On an empirical dataset, we were able to reduce the memory requirements by up to 96% at the cost of increasing execution times by 23 times. Hence, there exists a trade-off between decreasing memory requirements and increasing execution times which we investigate. When increasing the amount of memory available for placement to a certain level, execution times are only approximately 4 times lower for the most challenging dataset we have tested. This now allows for conducting maximum likelihood based placement on substantially larger trees within reasonable times. Finally, we show that the active memory management approach introduces new challenges for parallelization and outline possible solutions.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Memory Management in Likelihood-based Phylogenetic Placement\",\"authors\":\"P. Barbera, A. Stamatakis\",\"doi\":\"10.1109/IPDPSW52791.2021.00041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Maximum likelihood based phylogenetic methods score phylogenetic tree topologies comprising a set of molecular sequences of the species under study, using statistical models of evolution. The scoring procedure relies on storing intermediate results at inner nodes of the tree during the tree traversal. This induces comparatively high memory requirements compared to less compute-intensive methods such as parsimony, for instance.The memory requirements are particularly large for maximum likelihood phylogenetic placement, as further intermediate results should be stored at all branches of the tree to maximize runtime performance. This has hindered numerous users of our phylogenetic placement tool EPA-NG from performing placement on large phylogenetic trees.Here, we present an approach to reduce the memory footprint of EPA-NG. Further, we have generalized our implementation and integrated it into our phylogenetic likelihood library, libpll-2, such that it can be used by other tools for phylogenetic inference. On an empirical dataset, we were able to reduce the memory requirements by up to 96% at the cost of increasing execution times by 23 times. Hence, there exists a trade-off between decreasing memory requirements and increasing execution times which we investigate. When increasing the amount of memory available for placement to a certain level, execution times are only approximately 4 times lower for the most challenging dataset we have tested. This now allows for conducting maximum likelihood based placement on substantially larger trees within reasonable times. Finally, we show that the active memory management approach introduces new challenges for parallelization and outline possible solutions.\",\"PeriodicalId\":170832,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW52791.2021.00041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于最大似然的系统发育方法使用进化的统计模型，对系统发育树拓扑结构进行评分，该拓扑结构包括被研究物种的一组分子序列。评分过程依赖于在树遍历期间将中间结果存储在树的内部节点上。与计算密集度较低的方法(例如parsimony)相比，这会导致相对较高的内存需求。对于最大似然系统发育放置，内存需求特别大，因为进一步的中间结果应该存储在树的所有分支中，以最大化运行时性能。这阻碍了我们的系统发育定位工具EPA-NG的许多用户在大型系统发育树上进行定位。在这里，我们提出了一种减少EPA-NG内存占用的方法。此外，我们对我们的实现进行了一般化，并将其集成到我们的系统发生可能性库libpll-2中，以便其他工具可以使用它进行系统发生推断。在一个经验数据集上，我们能够以将执行时间增加23倍为代价，将内存需求减少多达96%。因此，在减少内存需求和增加执行时间之间存在权衡，我们对此进行了研究。当将可用于放置的内存量增加到一定程度时，对于我们测试过的最具挑战性的数据集，执行时间只降低了大约4倍。现在，这允许在合理的时间内对更大的树进行基于最大可能性的放置。最后，我们展示了主动内存管理方法为并行化带来了新的挑战，并概述了可能的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Efficient Memory Management in Likelihood-based Phylogenetic Placement

Maximum likelihood based phylogenetic methods score phylogenetic tree topologies comprising a set of molecular sequences of the species under study, using statistical models of evolution. The scoring procedure relies on storing intermediate results at inner nodes of the tree during the tree traversal. This induces comparatively high memory requirements compared to less compute-intensive methods such as parsimony, for instance.The memory requirements are particularly large for maximum likelihood phylogenetic placement, as further intermediate results should be stored at all branches of the tree to maximize runtime performance. This has hindered numerous users of our phylogenetic placement tool EPA-NG from performing placement on large phylogenetic trees.Here, we present an approach to reduce the memory footprint of EPA-NG. Further, we have generalized our implementation and integrated it into our phylogenetic likelihood library, libpll-2, such that it can be used by other tools for phylogenetic inference. On an empirical dataset, we were able to reduce the memory requirements by up to 96% at the cost of increasing execution times by 23 times. Hence, there exists a trade-off between decreasing memory requirements and increasing execution times which we investigate. When increasing the amount of memory available for placement to a certain level, execution times are only approximately 4 times lower for the most challenging dataset we have tested. This now allows for conducting maximum likelihood based placement on substantially larger trees within reasonable times. Finally, we show that the active memory management approach introduces new challenges for parallelization and outline possible solutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量