AliSim HPC:系统发育学的并行序列模拟器。

IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2023-09-02 DOI:10.1093/bioinformatics/btad540
Nhan Ly-Trong, Giuseppe M J Barca, Bui Quang Minh
{"title":"AliSim HPC:系统发育学的并行序列模拟器。","authors":"Nhan Ly-Trong, Giuseppe M J Barca, Bui Quang Minh","doi":"10.1093/bioinformatics/btad540","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Sequence simulation plays a vital role in phylogenetics with many applications, such as evaluating phylogenetic methods, testing hypotheses, and generating training data for machine-learning applications. We recently introduced a new simulator for multiple sequence alignments called AliSim, which outperformed existing tools. However, with the increasing demands of simulating large data sets, AliSim is still slow due to its sequential implementation; for example, to simulate millions of sequence alignments, AliSim took several days or weeks. Parallelization has been used for many phylogenetic inference methods but not yet for sequence simulation.</p><p><strong>Results: </strong>This paper introduces AliSim-HPC, which, for the first time, employs high-performance computing for phylogenetic simulations. AliSim-HPC parallelizes the simulation process at both multi-core and multi-CPU levels using the OpenMP and message passing interface (MPI) libraries, respectively. AliSim-HPC is highly efficient and scalable, which reduces the runtime to simulate 100 large gap-free alignments (30 000 sequences of one million sites) from over one day to 11 min using 256 CPU cores from a cluster with six computing nodes, a 153-fold speedup. While the OpenMP version can only simulate gap-free alignments, the MPI version supports insertion-deletion models like the sequential AliSim.</p><p><strong>Availability and implementation: </strong>AliSim-HPC is open-source and available as part of the new IQ-TREE version v2.2.3 at https://github.com/iqtree/iqtree2/releases with a user manual at http://www.iqtree.org/doc/AliSim.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10534053/pdf/","citationCount":"0","resultStr":"{\"title\":\"AliSim-HPC: parallel sequence simulator for phylogenetics.\",\"authors\":\"Nhan Ly-Trong, Giuseppe M J Barca, Bui Quang Minh\",\"doi\":\"10.1093/bioinformatics/btad540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Sequence simulation plays a vital role in phylogenetics with many applications, such as evaluating phylogenetic methods, testing hypotheses, and generating training data for machine-learning applications. We recently introduced a new simulator for multiple sequence alignments called AliSim, which outperformed existing tools. However, with the increasing demands of simulating large data sets, AliSim is still slow due to its sequential implementation; for example, to simulate millions of sequence alignments, AliSim took several days or weeks. Parallelization has been used for many phylogenetic inference methods but not yet for sequence simulation.</p><p><strong>Results: </strong>This paper introduces AliSim-HPC, which, for the first time, employs high-performance computing for phylogenetic simulations. AliSim-HPC parallelizes the simulation process at both multi-core and multi-CPU levels using the OpenMP and message passing interface (MPI) libraries, respectively. AliSim-HPC is highly efficient and scalable, which reduces the runtime to simulate 100 large gap-free alignments (30 000 sequences of one million sites) from over one day to 11 min using 256 CPU cores from a cluster with six computing nodes, a 153-fold speedup. While the OpenMP version can only simulate gap-free alignments, the MPI version supports insertion-deletion models like the sequential AliSim.</p><p><strong>Availability and implementation: </strong>AliSim-HPC is open-source and available as part of the new IQ-TREE version v2.2.3 at https://github.com/iqtree/iqtree2/releases with a user manual at http://www.iqtree.org/doc/AliSim.</p>\",\"PeriodicalId\":8903,\"journal\":{\"name\":\"Bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2023-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10534053/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btad540\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btad540","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

动机:序列模拟在系统发育学中发挥着至关重要的作用,具有许多应用,如评估系统发育方法、测试假设和生成机器学习应用的训练数据。我们最近推出了一种新的多序列比对模拟器AliSim,其性能优于现有工具。然而,随着模拟大数据集的需求不断增加,AliSim由于其顺序实现而仍然缓慢;例如,为了模拟数百万个序列比对,AliSim花了几天或几周的时间。并行化已被用于许多系统发育推断方法,但尚未用于序列模拟。结果:本文介绍了AliSim HPC,它首次将高性能计算用于系统发育模拟。AliSimHPC分别使用OpenMP和消息传递接口(MPI)库在多核和多CPU级别并行化模拟过程。AliSim HPC是高效和可扩展的,它减少了模拟100个大间隙无对齐的运行时间(30 000个序列的一百万个位点)从一天到11天 最小使用来自具有六个计算节点的集群的256个CPU核,速度提高了153倍。虽然OpenMP版本只能模拟无间隙对齐,但MPI版本支持插入-删除模型,如顺序AliSim。可用性和实现:AliSim HPC是开源的,可作为新IQ-TREE v2.2.3版本的一部分在https://github.com/iqtree/iqtree2/releases用户手册位于http://www.iqtree.org/doc/AliSim.
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AliSim-HPC: parallel sequence simulator for phylogenetics.

Motivation: Sequence simulation plays a vital role in phylogenetics with many applications, such as evaluating phylogenetic methods, testing hypotheses, and generating training data for machine-learning applications. We recently introduced a new simulator for multiple sequence alignments called AliSim, which outperformed existing tools. However, with the increasing demands of simulating large data sets, AliSim is still slow due to its sequential implementation; for example, to simulate millions of sequence alignments, AliSim took several days or weeks. Parallelization has been used for many phylogenetic inference methods but not yet for sequence simulation.

Results: This paper introduces AliSim-HPC, which, for the first time, employs high-performance computing for phylogenetic simulations. AliSim-HPC parallelizes the simulation process at both multi-core and multi-CPU levels using the OpenMP and message passing interface (MPI) libraries, respectively. AliSim-HPC is highly efficient and scalable, which reduces the runtime to simulate 100 large gap-free alignments (30 000 sequences of one million sites) from over one day to 11 min using 256 CPU cores from a cluster with six computing nodes, a 153-fold speedup. While the OpenMP version can only simulate gap-free alignments, the MPI version supports insertion-deletion models like the sequential AliSim.

Availability and implementation: AliSim-HPC is open-source and available as part of the new IQ-TREE version v2.2.3 at https://github.com/iqtree/iqtree2/releases with a user manual at http://www.iqtree.org/doc/AliSim.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Bioinformatics
Bioinformatics 生物-生化研究方法
CiteScore
11.20
自引率
5.20%
发文量
753
审稿时长
2.1 months
期刊介绍: The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.
期刊最新文献
MEHunter: Transformer-based mobile element variant detection from long reads Metabolic syndrome may be more frequent in treatment-naive sarcoidosis patients. Coracle—A Machine Learning Framework to Identify Bacteria Associated with Continuous Variables CoSIA: an R Bioconductor package for CrOss Species Investigation and Analysis LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1