一种基于fpga的加速De-Novo基因组组装的数据预处理架构

Georgios Galanos, Pavlos Malakonakis, A. Dollas
{"title":"一种基于fpga的加速De-Novo基因组组装的数据预处理架构","authors":"Georgios Galanos, Pavlos Malakonakis, A. Dollas","doi":"10.1109/BIBE52308.2021.9635499","DOIUrl":null,"url":null,"abstract":"Genome assembly is a field of bioinformatics which refers to the process of taking small fragments of genetic material and putting them back together in order to reconstruct the original DNA sequence from which the fragments originated. As the DNA genome assembly input datasets in most cases have a very large amount of data, it is important to develop custom architectures in order to speed up these processes and gain significant execution time reduction. In this paper we present the Reads Matching Filter (RMF), an input dataset prefiltering process, based on string matching and implemented on Field Programmable Gate Array (FPGA) technology, in order to reduce the genome assembly execution time. The outputs of the RMF running on the FPGA as well as the original input dataset are given as input to the Velvet genome assembler which produces the assembly of the input sequences. The Velvet genome assembler is based on the manipulation of de Bruijn graphs, and produces its output via the removal of errors and the simplication of repeated regions. The FPGA-based RMF pre-filtering process manages to speedup the entire genome assembly processing, including I/O, by up to 6 times, while maintaining the quality of the output sequence contigs (i.e. the series of overlapping DNA sequences).","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An FPGA-Based Data Pre-Processing Architecture to Accelerate De-Novo Genome Assembly\",\"authors\":\"Georgios Galanos, Pavlos Malakonakis, A. Dollas\",\"doi\":\"10.1109/BIBE52308.2021.9635499\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genome assembly is a field of bioinformatics which refers to the process of taking small fragments of genetic material and putting them back together in order to reconstruct the original DNA sequence from which the fragments originated. As the DNA genome assembly input datasets in most cases have a very large amount of data, it is important to develop custom architectures in order to speed up these processes and gain significant execution time reduction. In this paper we present the Reads Matching Filter (RMF), an input dataset prefiltering process, based on string matching and implemented on Field Programmable Gate Array (FPGA) technology, in order to reduce the genome assembly execution time. The outputs of the RMF running on the FPGA as well as the original input dataset are given as input to the Velvet genome assembler which produces the assembly of the input sequences. The Velvet genome assembler is based on the manipulation of de Bruijn graphs, and produces its output via the removal of errors and the simplication of repeated regions. The FPGA-based RMF pre-filtering process manages to speedup the entire genome assembly processing, including I/O, by up to 6 times, while maintaining the quality of the output sequence contigs (i.e. the series of overlapping DNA sequences).\",\"PeriodicalId\":343724,\"journal\":{\"name\":\"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE52308.2021.9635499\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基因组组装是生物信息学的一个领域,它是指将遗传物质的小片段重新组合在一起,以重建片段起源的原始DNA序列的过程。由于DNA基因组组装输入数据集在大多数情况下具有非常大的数据量,因此开发自定义架构以加快这些过程并显着减少执行时间非常重要。为了减少基因组组装的执行时间,本文提出了一种基于字符串匹配的输入数据集预滤波方法——Reads Matching Filter (RMF),并在现场可编程门阵列(FPGA)上实现。在FPGA上运行的RMF的输出以及原始输入数据集作为天鹅绒基因组汇编器的输入,该汇编器产生输入序列的汇编。天鹅绒基因组组装器是基于德布鲁因图的操作,并通过去除错误和重复区域的简化来产生输出。基于fpga的RMF预滤波过程能够将整个基因组组装处理(包括I/O)的速度提高6倍,同时保持输出序列contigs(即一系列重叠的DNA序列)的质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An FPGA-Based Data Pre-Processing Architecture to Accelerate De-Novo Genome Assembly
Genome assembly is a field of bioinformatics which refers to the process of taking small fragments of genetic material and putting them back together in order to reconstruct the original DNA sequence from which the fragments originated. As the DNA genome assembly input datasets in most cases have a very large amount of data, it is important to develop custom architectures in order to speed up these processes and gain significant execution time reduction. In this paper we present the Reads Matching Filter (RMF), an input dataset prefiltering process, based on string matching and implemented on Field Programmable Gate Array (FPGA) technology, in order to reduce the genome assembly execution time. The outputs of the RMF running on the FPGA as well as the original input dataset are given as input to the Velvet genome assembler which produces the assembly of the input sequences. The Velvet genome assembler is based on the manipulation of de Bruijn graphs, and produces its output via the removal of errors and the simplication of repeated regions. The FPGA-based RMF pre-filtering process manages to speedup the entire genome assembly processing, including I/O, by up to 6 times, while maintaining the quality of the output sequence contigs (i.e. the series of overlapping DNA sequences).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Structural, antimicrobial, and molecular docking study of 3-(1-(4-hydroxyphenyl)amino) ethylidene)chroman-2,4-dione and its corresponding Pd complex Multiple-Activation Parallel Convolution Network in Combination with t-SNE for the Classification of Mild Cognitive Impairment Analyzing the Impact of Resampling Approaches on Chest X-Ray Images for COVID-19 Identification in a Local Hierarchical Classification Scenario Analysis of knee joint forces in different types of jumps of top futsal players at the beginning and at the end of the preparation period Design and evaluation of a noninvasive tongue-computer interface for individuals with severe disabilities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1