基因组信息学的加速并行动态规划

S. Natarajan, N. Krishnakumar, M. Pavan, D. Pal, S. Nandy
{"title":"基因组信息学的加速并行动态规划","authors":"S. Natarajan, N. Krishnakumar, M. Pavan, D. Pal, S. Nandy","doi":"10.1109/CONECCT.2018.8482378","DOIUrl":null,"url":null,"abstract":"Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host.","PeriodicalId":430389,"journal":{"name":"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics\",\"authors\":\"S. Natarajan, N. Krishnakumar, M. Pavan, D. Pal, S. Nandy\",\"doi\":\"10.1109/CONECCT.2018.8482378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host.\",\"PeriodicalId\":430389,\"journal\":{\"name\":\"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONECCT.2018.8482378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONECCT.2018.8482378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

解析一个非常长的基因组字符串(人类基因组通常有30亿个字符长)抽象了整个生物计算的复杂性。近似字符串匹配(ASM)是捕获基因组生物复杂性的最合适的计算范式,将各种来源的生物信息集成到可处理的概率模型中。尽管计算复杂,动态规划(DP)方法被证明是非常有效的ASM,在严重的遗传数据的进化噪声中区分大量的相似性。虽然DP算法中的大量计算在多个平台上得到了加速,但不太复杂的回溯步骤仍然在主机上执行,存在显著的内存和输入/输出瓶颈。由于分析下一代测序(NGS)平台的基因组大数据需要数十亿次这样的比对,这一瓶颈可能严重影响系统性能。本文介绍了ReneGENE-DP,我们在硬件加速器上实现的DP计算,其新颖之处在于在FPGA和GPU上实现硬件回溯与分析时的前向扫描并行。最快的FPGA实现比最快的GPU实现ReneGENE-DP快43.63倍左右,而后者又比参考设计快380.85倍,参考设计是基于GPU的DP算法,在主机上进行回溯。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics
Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Strain Dependent Carrier Mobility in 8 − Pmmn Borophene: ab-initio study Diameter Scaling in III-V Gate-All-Around Transistor for Different Cross-Sections Atomistic Study of Acoustic Phonon Limited Mobility in Extremely Scaled Si and Ge Films Optimal Token Bucket Refilling for Tor network Traffic Pattern Analysis from GPS Data: A Case Study of Dhaka City
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1