Xiangyuan Zhu, Bing Li, Kenli Li, Ping Shao, Yi Pan
{"title":"A Parallel Pairwise Alignment with Pruning for Large Genomic Sequences","authors":"Xiangyuan Zhu, Bing Li, Kenli Li, Ping Shao, Yi Pan","doi":"10.1109/PDCAT.2017.00047","DOIUrl":null,"url":null,"abstract":"Pairwise sequence alignment is a common and fundamental task in Computational Biology, which constitutes the basis for many Bioinformatics applications. In the post-genomic era, there is an increasing demand to align long DNA sequences to discover their functions. In this paper, we propose a parallel pairwise alignment algorithm for large genomic sequences by recursively dividing the whole genomic sequences into small pieces, with an effective pruning strategy to reduce search and computation space. We implemented rigorous tests on a 4-core computer using real genomic sequences and artificially generated sequences. The results show that our implementation can achieve speedup 10.64 with 99.75% accuracy compared to the sequential algorithm. As far as we know, this is the first time that MBP (mega base-pairs) sequences are globally aligned with an affine gap penalty.","PeriodicalId":119197,"journal":{"name":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2017.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Pairwise sequence alignment is a common and fundamental task in Computational Biology, which constitutes the basis for many Bioinformatics applications. In the post-genomic era, there is an increasing demand to align long DNA sequences to discover their functions. In this paper, we propose a parallel pairwise alignment algorithm for large genomic sequences by recursively dividing the whole genomic sequences into small pieces, with an effective pruning strategy to reduce search and computation space. We implemented rigorous tests on a 4-core computer using real genomic sequences and artificially generated sequences. The results show that our implementation can achieve speedup 10.64 with 99.75% accuracy compared to the sequential algorithm. As far as we know, this is the first time that MBP (mega base-pairs) sequences are globally aligned with an affine gap penalty.