{"title":"An acceleration method of short read mapping using FPGA","authors":"Y. Sogabe, T. Maruyama","doi":"10.1109/FPT.2013.6718385","DOIUrl":null,"url":null,"abstract":"The rapid development of Next Generation Sequencing (NGS) has enabled to generate more than 100G base pairs per day from one machine. The produced data are randomly fragmented DNA base pair strings, called short reads, and millions of short reads are mapped onto the reference genomes, which are complete genetic sequences, to reconstruct the sequence of the sample DNA. This short read mapping is becoming the bottle-neck of NGS systems. In this paper, we propose an FPGA system for the mapping based on a hash-index method. In our system, short reads are divided into seeds, which are fixed-length substrings used for the mapping, and the seeds are sorted using buckets. Then, the seeds in each bucket are compared in parallel with the candidate locations. With this approach, many seeds can be compared in massively parallel manner with their candidate locations, and it becomes possible to improve the processing speed by reducing the number of the random accesses to DRAM banks which store the candidate locations. Furthermore, substitutions of the nucleotides in a seed can be allowed in this parallel comparison. This makes it possible to achieve higher matching rates than previous works.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2013.6718385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
The rapid development of Next Generation Sequencing (NGS) has enabled to generate more than 100G base pairs per day from one machine. The produced data are randomly fragmented DNA base pair strings, called short reads, and millions of short reads are mapped onto the reference genomes, which are complete genetic sequences, to reconstruct the sequence of the sample DNA. This short read mapping is becoming the bottle-neck of NGS systems. In this paper, we propose an FPGA system for the mapping based on a hash-index method. In our system, short reads are divided into seeds, which are fixed-length substrings used for the mapping, and the seeds are sorted using buckets. Then, the seeds in each bucket are compared in parallel with the candidate locations. With this approach, many seeds can be compared in massively parallel manner with their candidate locations, and it becomes possible to improve the processing speed by reducing the number of the random accesses to DRAM banks which store the candidate locations. Furthermore, substitutions of the nucleotides in a seed can be allowed in this parallel comparison. This makes it possible to achieve higher matching rates than previous works.