{"title":"High throughput edit distance computation on FPGA-based accelerators using HLS","authors":"Sebastiano Fabio Schifano , Marco Reggiani , Enrico Calore , Rino Micheloni , Alessia Marelli , Cristian Zambelli","doi":"10.1016/j.future.2024.107591","DOIUrl":null,"url":null,"abstract":"<div><div>Edit distance is a computational grand challenge problem to quantify the minimum number of editing operations required to modify one string of characters to the other, finding many applications of natural language processing. In recent years, relevant and increasing interest has also emerged from deoxyribonucleic acid (DNA) applications, like Next Generation Sequencing and DNA storage technologies. Both applications share two crucial features: i) the information is coded into the four bases of DNA and ii) the level of operational noise is still high causing errors in the data, requiring inclusion in the workflow of the computation of algorithms such as the edit distance for finding similarities between sequences. To boost this computation many solutions are available in the literature. Among them, the FPGAs are largely used since the data domain of those applications is strings of 4 characters represented as two-bit values, inconveniently fitting the basic data types of ordinary CPUs and GPUs, with additional benefits of providing a high level of parallelism and low processing latency. This contribution presents a computing- and energy-efficient design implementing the edit distance algorithm combining metaprogramming and High-Level Synthesis. We also assess the performance of our design targeting recent FPGA-based accelerators. Our solution uses nearly 90% of FPGA basic-block hardware resources achieving about 90% of computing efficiency delivering a maximum throughput of 16.8 TCUPS and an energy efficiency of 46 Mpair/Joule, enabling the use of FPGAs as a new class of accelerators for High Performance Computing in DNA applications.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107591"},"PeriodicalIF":6.2000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24005557","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Edit distance is a computational grand challenge problem to quantify the minimum number of editing operations required to modify one string of characters to the other, finding many applications of natural language processing. In recent years, relevant and increasing interest has also emerged from deoxyribonucleic acid (DNA) applications, like Next Generation Sequencing and DNA storage technologies. Both applications share two crucial features: i) the information is coded into the four bases of DNA and ii) the level of operational noise is still high causing errors in the data, requiring inclusion in the workflow of the computation of algorithms such as the edit distance for finding similarities between sequences. To boost this computation many solutions are available in the literature. Among them, the FPGAs are largely used since the data domain of those applications is strings of 4 characters represented as two-bit values, inconveniently fitting the basic data types of ordinary CPUs and GPUs, with additional benefits of providing a high level of parallelism and low processing latency. This contribution presents a computing- and energy-efficient design implementing the edit distance algorithm combining metaprogramming and High-Level Synthesis. We also assess the performance of our design targeting recent FPGA-based accelerators. Our solution uses nearly 90% of FPGA basic-block hardware resources achieving about 90% of computing efficiency delivering a maximum throughput of 16.8 TCUPS and an energy efficiency of 46 Mpair/Joule, enabling the use of FPGAs as a new class of accelerators for High Performance Computing in DNA applications.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.