HPMA: High-performance metagenomic alignment tool, on a large-scale GPU cluster

I. Savran, J. Rose
{"title":"HPMA: High-performance metagenomic alignment tool, on a large-scale GPU cluster","authors":"I. Savran, J. Rose","doi":"10.1109/BIBM.2015.7359757","DOIUrl":null,"url":null,"abstract":"In this paper, we present HPMA, a graphics processing unit (GPU) accelerated meta-genome sequence alignment algorithm for a collection of DNA sequences. This algorithm supports all-to-all pairwise local alignment on NVIDIA GPUs. HPMA builds on an GPU alignment algorithm that we developed earlier with the addition of a filter module. We designed and developed this new kernel function based on the suffix array data structure. The filter module improves performance by identifying a subset of sequences which meet a user-defined similarity threshold and should be considered for alignment. HPMA has the ability to balance the workload between CPU and GPU. HPMA allows us to preprocess massively large metagenomes in a reasonable amount of time in response to increasing speed of NGS sequencers. The performance of HPMA has been evaluated on a cluster of Kepler-based Tesla K20 GPUs using a variety of short DNA sequence datasets. We evaluate HPMA thoroughly with four test datasets. The first two test sets are comprised of 10 simulated datasets where read length varies from 72 to 750 base-pairs. The third test set is designed to allow a comparison with published results for GSWABE, a competing GPU alignment tool. The fourth test set is an actual metagenome of over 2 million sequences with an average length of 270 bp. We utilized a cluster of NVIDIA-K20 GPUs in the Stampede supercomputer at the Texas Advanced Computing Center (Austin, TX, USA). When running on a cluster of 10 NVIDIA K20 GPUs, HPMA is able to align 2 million simulated metagenome sequences of length 300 bp in 160 seconds. In the case of real metagenomic data, HPMA is able to align 2,038,516 sequences with an average length of 270 bp in 60 seconds.","PeriodicalId":186217,"journal":{"name":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2015.7359757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, we present HPMA, a graphics processing unit (GPU) accelerated meta-genome sequence alignment algorithm for a collection of DNA sequences. This algorithm supports all-to-all pairwise local alignment on NVIDIA GPUs. HPMA builds on an GPU alignment algorithm that we developed earlier with the addition of a filter module. We designed and developed this new kernel function based on the suffix array data structure. The filter module improves performance by identifying a subset of sequences which meet a user-defined similarity threshold and should be considered for alignment. HPMA has the ability to balance the workload between CPU and GPU. HPMA allows us to preprocess massively large metagenomes in a reasonable amount of time in response to increasing speed of NGS sequencers. The performance of HPMA has been evaluated on a cluster of Kepler-based Tesla K20 GPUs using a variety of short DNA sequence datasets. We evaluate HPMA thoroughly with four test datasets. The first two test sets are comprised of 10 simulated datasets where read length varies from 72 to 750 base-pairs. The third test set is designed to allow a comparison with published results for GSWABE, a competing GPU alignment tool. The fourth test set is an actual metagenome of over 2 million sequences with an average length of 270 bp. We utilized a cluster of NVIDIA-K20 GPUs in the Stampede supercomputer at the Texas Advanced Computing Center (Austin, TX, USA). When running on a cluster of 10 NVIDIA K20 GPUs, HPMA is able to align 2 million simulated metagenome sequences of length 300 bp in 160 seconds. In the case of real metagenomic data, HPMA is able to align 2,038,516 sequences with an average length of 270 bp in 60 seconds.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HPMA:高性能宏基因组比对工具,基于大规模GPU集群
在本文中,我们提出了HPMA,一个图形处理单元(GPU)加速元基因组序列比对算法的DNA序列集合。该算法支持NVIDIA gpu上的all-to-all成对局部对齐。HPMA建立在我们之前开发的GPU对齐算法的基础上,增加了一个过滤器模块。我们基于后缀数组数据结构设计并开发了这个新的内核函数。过滤器模块通过识别符合用户定义的相似性阈值并应考虑进行比对的序列子集来提高性能。HPMA具有平衡CPU和GPU工作负载的能力。HPMA使我们能够在合理的时间内预处理大量大型宏基因组,以响应NGS测序仪的速度提高。利用多种短DNA序列数据集,在一组基于kepler的Tesla K20 gpu上对HPMA的性能进行了评估。我们用四个测试数据集全面评估HPMA。前两个测试集由10个模拟数据集组成,读取长度从72到750碱基对不等。第三个测试集的设计目的是允许与GSWABE(一种竞争的GPU对齐工具)发布的结果进行比较。第四个测试集是一个实际的宏基因组,包含200多万个序列,平均长度为270 bp。我们在Texas Advanced Computing Center (Austin, TX, USA)的Stampede超级计算机上使用了一组NVIDIA-K20 gpu。当在10个NVIDIA K20 gpu的集群上运行时,HPMA能够在160秒内对齐200万个长度为300 bp的模拟宏基因组序列。在真正的宏基因组数据中,HPMA能够在60秒内对平均长度为270 bp的2,038,516个序列进行比对。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Rare Diseases clustering based on structural regularities at the gene structure Mining graph patterns in the protein-RNA interfaces Risk prediction of stroke: A prospective statewide study on patients in Maine Predicting diverse M-best protein contact maps Temporal weighting of clinical events in electronic health records for pharmacovigilance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1