{"title":"分布式存储加速器的开源DeLiBA2硬件/软件框架","authors":"Babar Khan, Carsten Heinz, Andreas Koch","doi":"10.1145/3624482","DOIUrl":null,"url":null,"abstract":"With the trend towards ever larger “big data” applications, many of the gains achievable by using specialized compute accelerators become diminished due to the growing I/O overheads. While there have been several research efforts into computational storage and FPGA implementations of the NVMe interface, to our knowledge there have been only very limited efforts to move larger parts of the Linux block I/O stack into FPGA-based hardware accelerators. Our hardware/software framework DeLiBA initially addressed this deficiency by allowing high-productivity development of software components of the I/O stack in user instead of kernel space and leverages a proven FPGA SoC framework to quickly compose and deploy the actual FPGA-based I/O accelerators. In its initial form, it achieves 10% higher throughput and up to 2.3× the I/Os per second (IOPS) for a proof-of-concept Ceph accelerator running in a real multi-node Ceph cluster. In DeLiBA2, we have extended the framework further to better support distributed storage systems, specifically by directly integrating the block I/O accelerators with a hardware-accelerated network stack, as well as by accelerating more storage functions. With these improvements, performance grows significantly: The cluster-level speed-ups now reach up to 2.8× for both throughput and IOPS relative to Ceph in software in synthetic benchmarks, and achieve end-to-end wall clock speed-ups of 20% for the real workload of building a large software package.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"49 1","pages":"0"},"PeriodicalIF":3.1000,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Open-Source DeLiBA2 Hardware/Software Framework for Distributed Storage Accelerators\",\"authors\":\"Babar Khan, Carsten Heinz, Andreas Koch\",\"doi\":\"10.1145/3624482\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the trend towards ever larger “big data” applications, many of the gains achievable by using specialized compute accelerators become diminished due to the growing I/O overheads. While there have been several research efforts into computational storage and FPGA implementations of the NVMe interface, to our knowledge there have been only very limited efforts to move larger parts of the Linux block I/O stack into FPGA-based hardware accelerators. Our hardware/software framework DeLiBA initially addressed this deficiency by allowing high-productivity development of software components of the I/O stack in user instead of kernel space and leverages a proven FPGA SoC framework to quickly compose and deploy the actual FPGA-based I/O accelerators. In its initial form, it achieves 10% higher throughput and up to 2.3× the I/Os per second (IOPS) for a proof-of-concept Ceph accelerator running in a real multi-node Ceph cluster. In DeLiBA2, we have extended the framework further to better support distributed storage systems, specifically by directly integrating the block I/O accelerators with a hardware-accelerated network stack, as well as by accelerating more storage functions. With these improvements, performance grows significantly: The cluster-level speed-ups now reach up to 2.8× for both throughput and IOPS relative to Ceph in software in synthetic benchmarks, and achieve end-to-end wall clock speed-ups of 20% for the real workload of building a large software package.\",\"PeriodicalId\":49248,\"journal\":{\"name\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2023-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3624482\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3624482","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
摘要
随着越来越大的“大数据”应用程序的趋势,由于I/O开销的增加,使用专门的计算加速器可以获得的许多收益会减少。虽然已经有一些关于NVMe接口的计算存储和FPGA实现的研究工作,但据我们所知,将Linux块I/O堆栈的大部分移动到基于FPGA的硬件加速器上的努力非常有限。我们的硬件/软件框架DeLiBA最初通过允许在用户空间而不是内核空间中高生产率地开发I/O堆栈的软件组件来解决这一缺陷,并利用经过验证的FPGA SoC框架来快速组成和部署实际的基于FPGA的I/O加速器。在其初始形式中,对于在真实的多节点Ceph集群中运行的概念验证Ceph加速器,它实现了10%的高吞吐量和高达2.3倍的每秒I/ o (IOPS)。在DeLiBA2中,我们进一步扩展了框架,以更好地支持分布式存储系统,特别是通过直接将块I/O加速器与硬件加速的网络堆栈集成,以及通过加速更多的存储功能。有了这些改进,性能显著提高:在综合基准测试中,相对于软件中的Ceph,集群级的吞吐量和IOPS加速现在达到2.8倍,对于构建大型软件包的实际工作负载,端到端时钟加速达到20%。
The Open-Source DeLiBA2 Hardware/Software Framework for Distributed Storage Accelerators
With the trend towards ever larger “big data” applications, many of the gains achievable by using specialized compute accelerators become diminished due to the growing I/O overheads. While there have been several research efforts into computational storage and FPGA implementations of the NVMe interface, to our knowledge there have been only very limited efforts to move larger parts of the Linux block I/O stack into FPGA-based hardware accelerators. Our hardware/software framework DeLiBA initially addressed this deficiency by allowing high-productivity development of software components of the I/O stack in user instead of kernel space and leverages a proven FPGA SoC framework to quickly compose and deploy the actual FPGA-based I/O accelerators. In its initial form, it achieves 10% higher throughput and up to 2.3× the I/Os per second (IOPS) for a proof-of-concept Ceph accelerator running in a real multi-node Ceph cluster. In DeLiBA2, we have extended the framework further to better support distributed storage systems, specifically by directly integrating the block I/O accelerators with a hardware-accelerated network stack, as well as by accelerating more storage functions. With these improvements, performance grows significantly: The cluster-level speed-ups now reach up to 2.8× for both throughput and IOPS relative to Ceph in software in synthetic benchmarks, and achieve end-to-end wall clock speed-ups of 20% for the real workload of building a large software package.
期刊介绍:
TRETS is the top journal focusing on research in, on, and with reconfigurable systems and on their underlying technology. The scope, rationale, and coverage by other journals are often limited to particular aspects of reconfigurable technology or reconfigurable systems. TRETS is a journal that covers reconfigurability in its own right.
Topics that would be appropriate for TRETS would include all levels of reconfigurable system abstractions and all aspects of reconfigurable technology including platforms, programming environments and application successes that support these systems for computing or other applications.
-The board and systems architectures of a reconfigurable platform.
-Programming environments of reconfigurable systems, especially those designed for use with reconfigurable systems that will lead to increased programmer productivity.
-Languages and compilers for reconfigurable systems.
-Logic synthesis and related tools, as they relate to reconfigurable systems.
-Applications on which success can be demonstrated.
The underlying technology from which reconfigurable systems are developed. (Currently this technology is that of FPGAs, but research on the nature and use of follow-on technologies is appropriate for TRETS.)
In considering whether a paper is suitable for TRETS, the foremost question should be whether reconfigurability has been essential to success. Topics such as architecture, programming languages, compilers, and environments, logic synthesis, and high performance applications are all suitable if the context is appropriate. For example, an architecture for an embedded application that happens to use FPGAs is not necessarily suitable for TRETS, but an architecture using FPGAs for which the reconfigurability of the FPGAs is an inherent part of the specifications (perhaps due to a need for re-use on multiple applications) would be appropriate for TRETS.