High throughput edit distance computation on FPGA-based accelerators using HLS

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-03-01 Epub Date: 2024-11-12 DOI:10.1016/j.future.2024.107591

Sebastiano Fabio Schifano , Marco Reggiani , Enrico Calore , Rino Micheloni , Alessia Marelli , Cristian Zambelli

{"title":"High throughput edit distance computation on FPGA-based accelerators using HLS","authors":"Sebastiano Fabio Schifano , Marco Reggiani , Enrico Calore , Rino Micheloni , Alessia Marelli , Cristian Zambelli","doi":"10.1016/j.future.2024.107591","DOIUrl":null,"url":null,"abstract":"<div><div>Edit distance is a computational grand challenge problem to quantify the minimum number of editing operations required to modify one string of characters to the other, finding many applications of natural language processing. In recent years, relevant and increasing interest has also emerged from deoxyribonucleic acid (DNA) applications, like Next Generation Sequencing and DNA storage technologies. Both applications share two crucial features: i) the information is coded into the four bases of DNA and ii) the level of operational noise is still high causing errors in the data, requiring inclusion in the workflow of the computation of algorithms such as the edit distance for finding similarities between sequences. To boost this computation many solutions are available in the literature. Among them, the FPGAs are largely used since the data domain of those applications is strings of 4 characters represented as two-bit values, inconveniently fitting the basic data types of ordinary CPUs and GPUs, with additional benefits of providing a high level of parallelism and low processing latency. This contribution presents a computing- and energy-efficient design implementing the edit distance algorithm combining metaprogramming and High-Level Synthesis. We also assess the performance of our design targeting recent FPGA-based accelerators. Our solution uses nearly 90% of FPGA basic-block hardware resources achieving about 90% of computing efficiency delivering a maximum throughput of 16.8 TCUPS and an energy efficiency of 46 Mpair/Joule, enabling the use of FPGAs as a new class of accelerators for High Performance Computing in DNA applications.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107591"},"PeriodicalIF":6.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24005557","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Edit distance is a computational grand challenge problem to quantify the minimum number of editing operations required to modify one string of characters to the other, finding many applications of natural language processing. In recent years, relevant and increasing interest has also emerged from deoxyribonucleic acid (DNA) applications, like Next Generation Sequencing and DNA storage technologies. Both applications share two crucial features: i) the information is coded into the four bases of DNA and ii) the level of operational noise is still high causing errors in the data, requiring inclusion in the workflow of the computation of algorithms such as the edit distance for finding similarities between sequences. To boost this computation many solutions are available in the literature. Among them, the FPGAs are largely used since the data domain of those applications is strings of 4 characters represented as two-bit values, inconveniently fitting the basic data types of ordinary CPUs and GPUs, with additional benefits of providing a high level of parallelism and low processing latency. This contribution presents a computing- and energy-efficient design implementing the edit distance algorithm combining metaprogramming and High-Level Synthesis. We also assess the performance of our design targeting recent FPGA-based accelerators. Our solution uses nearly 90% of FPGA basic-block hardware resources achieving about 90% of computing efficiency delivering a maximum throughput of 16.8 TCUPS and an energy efficiency of 46 Mpair/Joule, enabling the use of FPGAs as a new class of accelerators for High Performance Computing in DNA applications.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用 HLS 在基于 FPGA 的加速器上实现高吞吐量编辑距离计算

编辑距离是一个计算大挑战问题，旨在量化将一串字符修改为另一串字符所需的最少编辑操作次数，在自然语言处理领域有很多应用。近年来，人们对脱氧核糖核酸（DNA）的相关应用也越来越感兴趣，如下一代测序和 DNA 存储技术。这两种应用都有两个重要特点：i) 信息被编码到 DNA 的四个碱基中；ii) 操作噪声水平仍然很高，会导致数据错误，这就要求在工作流程中加入计算算法，如查找序列间相似性的编辑距离。为了提高计算效率，文献中提供了许多解决方案。其中，FPGA 在很大程度上得到了应用，因为这些应用的数据域是以两位数值表示的 4 个字符的字符串，不方便与普通 CPU 和 GPU 的基本数据类型相匹配，而且还具有提供高并行性和低处理延迟的额外优势。本文介绍了一种结合元编程和高级合成实现编辑距离算法的计算和能效设计。我们还针对基于 FPGA 的最新加速器评估了设计的性能。我们的解决方案使用了近 90% 的 FPGA 基本块硬件资源，实现了约 90% 的计算效率，提供了 16.8 TCUPS 的最大吞吐量和 46 Mpair/Joule 的能效，使 FPGA 成为 DNA 应用中高性能计算的新型加速器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.