Ke Liu , Haonan Tong , Zhongxiang Sun, Zhixin Ren, Guangkui Huang, Hongyin Zhu, Luyang Liu, Qunyang Lin, Chuang Zhang
{"title":"将基于 FPGA 的硬件加速与关系数据库相结合","authors":"Ke Liu , Haonan Tong , Zhongxiang Sun, Zhixin Ren, Guangkui Huang, Hongyin Zhu, Luyang Liu, Qunyang Lin, Chuang Zhang","doi":"10.1016/j.parco.2024.103064","DOIUrl":null,"url":null,"abstract":"<div><p>The explosion of data over the last decades puts significant strain on the computational capacity of the central processing unit (CPU), challenging online analytical processing (OLAP). While previous studies have shown the potential of using Field Programmable Gate Arrays (FPGAs) in database systems, integrating FPGA-based hardware acceleration with relational databases remains challenging because of the complex nature of relational database operations and the need for specialized FPGA programming skills. Additionally, there are significant challenges related to optimizing FPGA-based acceleration for specific database workloads, ensuring data consistency and reliability, and integrating FPGA-based hardware acceleration with existing database infrastructure. In this study, we proposed a novel end-to-end FPGA-based acceleration system that supports native SQL statements and storage engine. We defined a callback process to reload the database query logic and customize the scanning method for database queries. Through middleware process development, we optimized offloading efficiency on PCIe bus by scheduling data transmission and computation in a pipeline workflow. Additionally, we designed a novel five-stage FPGA microarchitecture module that achieves optimal clock frequency, further enhancing offloading efficiency. Results from systematic evaluations indicate that our solution allows a single FPGA card to perform as well as 8 CPU query processes, while reducing CPU load by 34%. Compared to using 4 CPU cores, our FPGA-based acceleration system reduces query latency by 1.7 times without increasing CPU load. Furthermore, our proposed solution achieves 2.1 times computation speedup for data filtering compared with the software baseline in a single core environment. Overall, our work presents a valuable end-to-end hardware acceleration system for OLAP databases.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"119 ","pages":"Article 103064"},"PeriodicalIF":2.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167819124000024/pdfft?md5=d270aeec859768a5bff3f5d4988863f9&pid=1-s2.0-S0167819124000024-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Integrating FPGA-based hardware acceleration with relational databases\",\"authors\":\"Ke Liu , Haonan Tong , Zhongxiang Sun, Zhixin Ren, Guangkui Huang, Hongyin Zhu, Luyang Liu, Qunyang Lin, Chuang Zhang\",\"doi\":\"10.1016/j.parco.2024.103064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The explosion of data over the last decades puts significant strain on the computational capacity of the central processing unit (CPU), challenging online analytical processing (OLAP). While previous studies have shown the potential of using Field Programmable Gate Arrays (FPGAs) in database systems, integrating FPGA-based hardware acceleration with relational databases remains challenging because of the complex nature of relational database operations and the need for specialized FPGA programming skills. Additionally, there are significant challenges related to optimizing FPGA-based acceleration for specific database workloads, ensuring data consistency and reliability, and integrating FPGA-based hardware acceleration with existing database infrastructure. In this study, we proposed a novel end-to-end FPGA-based acceleration system that supports native SQL statements and storage engine. We defined a callback process to reload the database query logic and customize the scanning method for database queries. Through middleware process development, we optimized offloading efficiency on PCIe bus by scheduling data transmission and computation in a pipeline workflow. Additionally, we designed a novel five-stage FPGA microarchitecture module that achieves optimal clock frequency, further enhancing offloading efficiency. Results from systematic evaluations indicate that our solution allows a single FPGA card to perform as well as 8 CPU query processes, while reducing CPU load by 34%. Compared to using 4 CPU cores, our FPGA-based acceleration system reduces query latency by 1.7 times without increasing CPU load. Furthermore, our proposed solution achieves 2.1 times computation speedup for data filtering compared with the software baseline in a single core environment. Overall, our work presents a valuable end-to-end hardware acceleration system for OLAP databases.</p></div>\",\"PeriodicalId\":54642,\"journal\":{\"name\":\"Parallel Computing\",\"volume\":\"119 \",\"pages\":\"Article 103064\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0167819124000024/pdfft?md5=d270aeec859768a5bff3f5d4988863f9&pid=1-s2.0-S0167819124000024-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Parallel Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167819124000024\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167819124000024","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
摘要
过去几十年来,数据量激增,给中央处理器(CPU)的计算能力带来了巨大压力,给联机分析处理(OLAP)带来了挑战。虽然以前的研究已经显示了在数据库系统中使用现场可编程门阵列(FPGA)的潜力,但由于关系数据库操作的复杂性以及对专业 FPGA 编程技能的需求,将基于 FPGA 的硬件加速与关系数据库集成仍然具有挑战性。此外,在针对特定数据库工作负载优化基于 FPGA 的加速、确保数据一致性和可靠性以及将基于 FPGA 的硬件加速与现有数据库基础架构集成等方面也存在重大挑战。在本研究中,我们提出了一种新颖的端到端基于 FPGA 的加速系统,该系统支持本地 SQL 语句和存储引擎。我们定义了一个回调流程,用于重新加载数据库查询逻辑和定制数据库查询的扫描方法。通过中间件流程开发,我们在流水线工作流程中调度数据传输和计算,优化了 PCIe 总线上的卸载效率。此外,我们还设计了一种新颖的五级 FPGA 微体系结构模块,实现了最佳时钟频率,进一步提高了卸载效率。系统评估结果表明,我们的解决方案使单个 FPGA 卡的性能与 8 个 CPU 查询进程相当,同时将 CPU 负载降低了 34%。与使用 4 个 CPU 内核相比,我们基于 FPGA 的加速系统在不增加 CPU 负载的情况下将查询延迟降低了 1.7 倍。此外,与单核环境下的软件基线相比,我们提出的解决方案在数据过滤方面的计算速度提高了 2.1 倍。总之,我们的工作为 OLAP 数据库提供了一个有价值的端到端硬件加速系统。
Integrating FPGA-based hardware acceleration with relational databases
The explosion of data over the last decades puts significant strain on the computational capacity of the central processing unit (CPU), challenging online analytical processing (OLAP). While previous studies have shown the potential of using Field Programmable Gate Arrays (FPGAs) in database systems, integrating FPGA-based hardware acceleration with relational databases remains challenging because of the complex nature of relational database operations and the need for specialized FPGA programming skills. Additionally, there are significant challenges related to optimizing FPGA-based acceleration for specific database workloads, ensuring data consistency and reliability, and integrating FPGA-based hardware acceleration with existing database infrastructure. In this study, we proposed a novel end-to-end FPGA-based acceleration system that supports native SQL statements and storage engine. We defined a callback process to reload the database query logic and customize the scanning method for database queries. Through middleware process development, we optimized offloading efficiency on PCIe bus by scheduling data transmission and computation in a pipeline workflow. Additionally, we designed a novel five-stage FPGA microarchitecture module that achieves optimal clock frequency, further enhancing offloading efficiency. Results from systematic evaluations indicate that our solution allows a single FPGA card to perform as well as 8 CPU query processes, while reducing CPU load by 34%. Compared to using 4 CPU cores, our FPGA-based acceleration system reduces query latency by 1.7 times without increasing CPU load. Furthermore, our proposed solution achieves 2.1 times computation speedup for data filtering compared with the software baseline in a single core environment. Overall, our work presents a valuable end-to-end hardware acceleration system for OLAP databases.
期刊介绍:
Parallel Computing is an international journal presenting the practical use of parallel computer systems, including high performance architecture, system software, programming systems and tools, and applications. Within this context the journal covers all aspects of high-end parallel computing from single homogeneous or heterogenous computing nodes to large-scale multi-node systems.
Parallel Computing features original research work and review articles as well as novel or illustrative accounts of application experience with (and techniques for) the use of parallel computers. We also welcome studies reproducing prior publications that either confirm or disprove prior published results.
Particular technical areas of interest include, but are not limited to:
-System software for parallel computer systems including programming languages (new languages as well as compilation techniques), operating systems (including middleware), and resource management (scheduling and load-balancing).
-Enabling software including debuggers, performance tools, and system and numeric libraries.
-General hardware (architecture) concepts, new technologies enabling the realization of such new concepts, and details of commercially available systems
-Software engineering and productivity as it relates to parallel computing
-Applications (including scientific computing, deep learning, machine learning) or tool case studies demonstrating novel ways to achieve parallelism
-Performance measurement results on state-of-the-art systems
-Approaches to effectively utilize large-scale parallel computing including new algorithms or algorithm analysis with demonstrated relevance to real applications using existing or next generation parallel computer architectures.
-Parallel I/O systems both hardware and software
-Networking technology for support of high-speed computing demonstrating the impact of high-speed computation on parallel applications