Subho Sankar Banerjee, Mohamed El-Hadedy, Jong Bin Lim, Daniel Chen, Z. Kalbarczyk, Deming Chen, Ravishankar K. Iyer
{"title":"ASAP:可编程硬件上的加速短读对齐(仅摘要)","authors":"Subho Sankar Banerjee, Mohamed El-Hadedy, Jong Bin Lim, Daniel Chen, Z. Kalbarczyk, Deming Chen, Ravishankar K. Iyer","doi":"10.1145/3020078.3021796","DOIUrl":null,"url":null,"abstract":"The proliferation of high-throughput sequencing machines allows for the rapid generation of billions of short nucleotide fragments in a short period. This massive amount of sequence data can quickly overwhelm today's storage and compute infrastructure. This poster explores the use of hardware acceleration to significantly improve the runtime of short-read alignment (SRA), a crucial step in pre-processing sequenced genomes. It presents the design and implementation of ASAP, an accelerator for computing Levenshtein distance (LD) in the context of the SRA problem. LD computation is a prominent underlying mathematical kernel that is common to a large number of SRA tools (e.g., BLAST, BWA, SNAP) and is responsible for 50-70% of their runtime. These algorithms mentioned above calculate the exact value of LD between nucleotide strings but only use them to build a total ordering (an ordered list) of the most likely point of origin in the genome. ASAP computes an approximation of LD by encoding computation in propagation delay of circuit elements. This approximation is calculated in an accelerated fashion in hardware and preserves the original total ordering of LDs produced by the traditional algorithms. This computation is performed by constructing circuits that comprise the recursive definition of the LD computation and measuring propagation delay of a signal entering and leaving the circuit. Additionally, ASAP can explore large portions of the search space (substrings of the strings being compared) within one clock cycle, and ignore parts of the search space that does not contribute to an answer. Our design is implemented on an Altera Stratix V FPGA in an IBM POWER8 system using the CAPI interface for cache coherence across the CPU and FPGA. Our design is 200x faster (median measurement) than the equivalent C implementation of the kernel running on the host processor and 2.2x faster for an end-to-end alignment tool for 120-150bp short-read sequences.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"97 3-4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"ASAP: Accelerated Short Read Alignment on Programmable Hardware (Abstract Only)\",\"authors\":\"Subho Sankar Banerjee, Mohamed El-Hadedy, Jong Bin Lim, Daniel Chen, Z. Kalbarczyk, Deming Chen, Ravishankar K. Iyer\",\"doi\":\"10.1145/3020078.3021796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The proliferation of high-throughput sequencing machines allows for the rapid generation of billions of short nucleotide fragments in a short period. This massive amount of sequence data can quickly overwhelm today's storage and compute infrastructure. This poster explores the use of hardware acceleration to significantly improve the runtime of short-read alignment (SRA), a crucial step in pre-processing sequenced genomes. It presents the design and implementation of ASAP, an accelerator for computing Levenshtein distance (LD) in the context of the SRA problem. LD computation is a prominent underlying mathematical kernel that is common to a large number of SRA tools (e.g., BLAST, BWA, SNAP) and is responsible for 50-70% of their runtime. These algorithms mentioned above calculate the exact value of LD between nucleotide strings but only use them to build a total ordering (an ordered list) of the most likely point of origin in the genome. ASAP computes an approximation of LD by encoding computation in propagation delay of circuit elements. This approximation is calculated in an accelerated fashion in hardware and preserves the original total ordering of LDs produced by the traditional algorithms. This computation is performed by constructing circuits that comprise the recursive definition of the LD computation and measuring propagation delay of a signal entering and leaving the circuit. Additionally, ASAP can explore large portions of the search space (substrings of the strings being compared) within one clock cycle, and ignore parts of the search space that does not contribute to an answer. Our design is implemented on an Altera Stratix V FPGA in an IBM POWER8 system using the CAPI interface for cache coherence across the CPU and FPGA. Our design is 200x faster (median measurement) than the equivalent C implementation of the kernel running on the host processor and 2.2x faster for an end-to-end alignment tool for 120-150bp short-read sequences.\",\"PeriodicalId\":252039,\"journal\":{\"name\":\"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"97 3-4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3020078.3021796\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3020078.3021796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
高通量测序仪的普及使得在短时间内快速生成数十亿个短核苷酸片段成为可能。如此大量的序列数据可能会很快淹没当前的存储和计算基础设施。这张海报探讨了使用硬件加速来显著改善短读比对(SRA)的运行时间,这是预处理测序基因组的关键步骤。在SRA问题的背景下,给出了计算Levenshtein距离(LD)的加速器ASAP的设计和实现。LD计算是大量SRA工具(例如BLAST、BWA、SNAP)中常见的重要底层数学内核,占其运行时的50-70%。上面提到的这些算法计算核苷酸串之间的LD的精确值,但只使用它们来构建基因组中最可能的起源点的总排序(有序列表)。ASAP通过对电路元件传播延迟的编码计算来计算LD的近似值。这种近似是在硬件上以加速的方式计算的,并保留了传统算法产生的ld的原始总顺序。这种计算是通过构造包含LD计算的递归定义和测量进入和离开电路的信号的传播延迟的电路来执行的。此外,ASAP可以在一个时钟周期内探索大部分搜索空间(正在比较的字符串的子字符串),并忽略与答案无关的部分搜索空间。我们的设计是在IBM POWER8系统中的Altera Stratix V FPGA上实现的,使用CAPI接口实现CPU和FPGA之间的缓存一致性。我们的设计比运行在主机处理器上的等效C内核实现快200倍(测量中值),对120-150bp短读序列的端到端比对工具快2.2倍。
ASAP: Accelerated Short Read Alignment on Programmable Hardware (Abstract Only)
The proliferation of high-throughput sequencing machines allows for the rapid generation of billions of short nucleotide fragments in a short period. This massive amount of sequence data can quickly overwhelm today's storage and compute infrastructure. This poster explores the use of hardware acceleration to significantly improve the runtime of short-read alignment (SRA), a crucial step in pre-processing sequenced genomes. It presents the design and implementation of ASAP, an accelerator for computing Levenshtein distance (LD) in the context of the SRA problem. LD computation is a prominent underlying mathematical kernel that is common to a large number of SRA tools (e.g., BLAST, BWA, SNAP) and is responsible for 50-70% of their runtime. These algorithms mentioned above calculate the exact value of LD between nucleotide strings but only use them to build a total ordering (an ordered list) of the most likely point of origin in the genome. ASAP computes an approximation of LD by encoding computation in propagation delay of circuit elements. This approximation is calculated in an accelerated fashion in hardware and preserves the original total ordering of LDs produced by the traditional algorithms. This computation is performed by constructing circuits that comprise the recursive definition of the LD computation and measuring propagation delay of a signal entering and leaving the circuit. Additionally, ASAP can explore large portions of the search space (substrings of the strings being compared) within one clock cycle, and ignore parts of the search space that does not contribute to an answer. Our design is implemented on an Altera Stratix V FPGA in an IBM POWER8 system using the CAPI interface for cache coherence across the CPU and FPGA. Our design is 200x faster (median measurement) than the equivalent C implementation of the kernel running on the host processor and 2.2x faster for an end-to-end alignment tool for 120-150bp short-read sequences.