{"title":"HPMA: High-performance metagenomic alignment tool, on a large-scale GPU cluster","authors":"I. Savran, J. Rose","doi":"10.1109/BIBM.2015.7359757","DOIUrl":null,"url":null,"abstract":"In this paper, we present HPMA, a graphics processing unit (GPU) accelerated meta-genome sequence alignment algorithm for a collection of DNA sequences. This algorithm supports all-to-all pairwise local alignment on NVIDIA GPUs. HPMA builds on an GPU alignment algorithm that we developed earlier with the addition of a filter module. We designed and developed this new kernel function based on the suffix array data structure. The filter module improves performance by identifying a subset of sequences which meet a user-defined similarity threshold and should be considered for alignment. HPMA has the ability to balance the workload between CPU and GPU. HPMA allows us to preprocess massively large metagenomes in a reasonable amount of time in response to increasing speed of NGS sequencers. The performance of HPMA has been evaluated on a cluster of Kepler-based Tesla K20 GPUs using a variety of short DNA sequence datasets. We evaluate HPMA thoroughly with four test datasets. The first two test sets are comprised of 10 simulated datasets where read length varies from 72 to 750 base-pairs. The third test set is designed to allow a comparison with published results for GSWABE, a competing GPU alignment tool. The fourth test set is an actual metagenome of over 2 million sequences with an average length of 270 bp. We utilized a cluster of NVIDIA-K20 GPUs in the Stampede supercomputer at the Texas Advanced Computing Center (Austin, TX, USA). When running on a cluster of 10 NVIDIA K20 GPUs, HPMA is able to align 2 million simulated metagenome sequences of length 300 bp in 160 seconds. In the case of real metagenomic data, HPMA is able to align 2,038,516 sequences with an average length of 270 bp in 60 seconds.","PeriodicalId":186217,"journal":{"name":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2015.7359757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we present HPMA, a graphics processing unit (GPU) accelerated meta-genome sequence alignment algorithm for a collection of DNA sequences. This algorithm supports all-to-all pairwise local alignment on NVIDIA GPUs. HPMA builds on an GPU alignment algorithm that we developed earlier with the addition of a filter module. We designed and developed this new kernel function based on the suffix array data structure. The filter module improves performance by identifying a subset of sequences which meet a user-defined similarity threshold and should be considered for alignment. HPMA has the ability to balance the workload between CPU and GPU. HPMA allows us to preprocess massively large metagenomes in a reasonable amount of time in response to increasing speed of NGS sequencers. The performance of HPMA has been evaluated on a cluster of Kepler-based Tesla K20 GPUs using a variety of short DNA sequence datasets. We evaluate HPMA thoroughly with four test datasets. The first two test sets are comprised of 10 simulated datasets where read length varies from 72 to 750 base-pairs. The third test set is designed to allow a comparison with published results for GSWABE, a competing GPU alignment tool. The fourth test set is an actual metagenome of over 2 million sequences with an average length of 270 bp. We utilized a cluster of NVIDIA-K20 GPUs in the Stampede supercomputer at the Texas Advanced Computing Center (Austin, TX, USA). When running on a cluster of 10 NVIDIA K20 GPUs, HPMA is able to align 2 million simulated metagenome sequences of length 300 bp in 160 seconds. In the case of real metagenomic data, HPMA is able to align 2,038,516 sequences with an average length of 270 bp in 60 seconds.