Libgapmis: An ultrafast library for short-read single-gap alignment

Nikolaos S. Alachiotis, S. Berger, T. Flouri, S. Pissis, A. Stamatakis
{"title":"Libgapmis: An ultrafast library for short-read single-gap alignment","authors":"Nikolaos S. Alachiotis, S. Berger, T. Flouri, S. Pissis, A. Stamatakis","doi":"10.1109/BIBMW.2012.6470221","DOIUrl":null,"url":null,"abstract":"A broad variety of short-read alignment programmes has been released recently to address the task of mapping tens of millions of short reads to a reference genome, placing emphasis on various aspects of the problem. Although all programmes allow for a small number of alignment mismatches, some of them either perform poorly when allowing gap insertions or they do not allow for gap insertions at all. The seed-and-extend strategy is applied in most of these programmes: after a fast alignment between a fragment of the reference sequence and a high-quality fragment of a short read-the seed-an important problem is to extend the alignment between a relatively short succeeding fragment of the reference sequence and the remaining low-quality fragment of the read allowing a number of mismatches and the insertion of gaps in the alignment. However, the length of the short reads in combination with the gap occurrence frequency observed in various applications suggest that the single-gap alignment of (parts of) those reads is desirable. In this article, we present libgapmis, an ultrafast library for pairwise short-read single-gap alignment including accelerated SSE-based and GPU-based versions. It implements an algorithm, which computes a modified version of the traditional dynamic programming matrix for sequence alignment to solve the above alignment problem. We show that the library functions of the CPU-based version are up to 20x faster compared to competing programmes, while the respective SSE-based and GPU-based versions are up to 6x and llx faster than our CPU-based implementation, respectively. The functions made available via our library can be seamlessly integrated into any short-read alignment pipeline.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBMW.2012.6470221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

A broad variety of short-read alignment programmes has been released recently to address the task of mapping tens of millions of short reads to a reference genome, placing emphasis on various aspects of the problem. Although all programmes allow for a small number of alignment mismatches, some of them either perform poorly when allowing gap insertions or they do not allow for gap insertions at all. The seed-and-extend strategy is applied in most of these programmes: after a fast alignment between a fragment of the reference sequence and a high-quality fragment of a short read-the seed-an important problem is to extend the alignment between a relatively short succeeding fragment of the reference sequence and the remaining low-quality fragment of the read allowing a number of mismatches and the insertion of gaps in the alignment. However, the length of the short reads in combination with the gap occurrence frequency observed in various applications suggest that the single-gap alignment of (parts of) those reads is desirable. In this article, we present libgapmis, an ultrafast library for pairwise short-read single-gap alignment including accelerated SSE-based and GPU-based versions. It implements an algorithm, which computes a modified version of the traditional dynamic programming matrix for sequence alignment to solve the above alignment problem. We show that the library functions of the CPU-based version are up to 20x faster compared to competing programmes, while the respective SSE-based and GPU-based versions are up to 6x and llx faster than our CPU-based implementation, respectively. The functions made available via our library can be seamlessly integrated into any short-read alignment pipeline.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Libgapmis:一个用于短读单间隙对齐的超快库
最近发布了各种各样的短读序列比对程序,以解决将数千万个短读序列映射到参考基因组的任务,并强调了该问题的各个方面。尽管所有程序都允许少量的对齐不匹配,但其中一些程序在允许间隙插入时表现不佳,或者根本不允许间隙插入。在大多数程序中都应用了种子-扩展策略:在参考序列的片段与短序列的高质量片段(种子)之间快速比对之后,一个重要的问题是延长参考序列的相对较短的后续片段与剩余的低质量片段之间的比对,从而导致许多不匹配和在比对中插入间隙。然而,在各种应用中观察到的短读段长度与间隙发生频率的结合表明,这些读段的(部分)单间隙对齐是可取的。在本文中,我们介绍libgapmis,这是一个超快的库,用于两两短读单间隙对齐,包括基于sse和基于gpu的加速版本。实现了一种算法,该算法通过计算传统动态规划矩阵的改进版本来求解序列对齐问题。我们表明,与竞争程序相比,基于cpu的版本的库功能快了20倍,而基于sse和基于gpu的版本分别比基于cpu的实现快了6倍和16倍。通过我们的库提供的功能可以无缝地集成到任何短读对齐管道中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards comprehensive longitudinal healthcare data capture On the repetitive collection indexing problem Sampling low-energy protein-protein configurations with basin hopping The effect of measurement approach and noise level on gene selection stability Clinical research progress of treatment over Tourette syndrome with acup-mox therapy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1