Fine-Grained QoS Control via Tightly-Coupled Bandwidth Monitoring and Regulation for FPGA-Based Heterogeneous SoCs

IF 6 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-12-09 DOI:10.1109/TPDS.2024.3513416

Giacomo Valente;Gianluca Brilli;Tania Di Mascio;Alessandro Capotondi;Paolo Burgio;Paolo Valente;Andrea Marongiu

{"title":"Fine-Grained QoS Control via Tightly-Coupled Bandwidth Monitoring and Regulation for FPGA-Based Heterogeneous SoCs","authors":"Giacomo Valente;Gianluca Brilli;Tania Di Mascio;Alessandro Capotondi;Paolo Burgio;Paolo Valente;Andrea Marongiu","doi":"10.1109/TPDS.2024.3513416","DOIUrl":null,"url":null,"abstract":"Commercial embedded systems increasingly rely on heterogeneous architectures that integrate general-purpose, multi-core processors, and various hardware accelerators on the same chip. This provides the high performance required by modern applications at a low cost and low power consumption, but at the same time poses new challenges. Hardware resource sharing at various levels, and in particular at the main memory controller level, results in slower execution time for the application tasks, ultimately making the system unpredictable from the point of view of timing. To enable the adoption of heterogeneous systems-on-chip (System on Chips (SoCs)) in the domain of timing-critical applications several hardware and software approaches have been proposed, bandwidth regulation based on monitoring and throttling being one of the most widely adopted. Existing solutions, however, are either too coarse-grained, limiting the control over computing engines activities, or strongly platform-dependent, addressing the problem only for specific SoCs. This article proposes an innovative approach that can accurately control main memory bandwidth usage in FPGA-based heterogeneous SoCs. In particular, it controls system bandwidth by connecting a runtime bandwidth regulation component to FPGA-based accelerators. Our solution offers dynamically configurable, fine-grained bandwidth regulation – to adapt to the varying requirements of the application over time – at a very low overhead. Furthermore, it is entirely platform-independent, capable of integration with any FPGA-based accelerator. Developed at the register-transfer level using a reference SoC platform, it is designed for easy compatibility with any FPGA-based SoC. Experimental results conducted on the Xilinx Zynq UltraScale+ platform demonstrate that our approach (i) is more than \n<inline-formula><tex-math>$100\\times$</tex-math></inline-formula>\n faster than loosely-coupled, software controlled regulators; (ii) is capable of exploiting the system bandwidth 28.7% more efficiently than tightly-coupled hardware regulators (e.g., ARM CoreLink QoS-400, where available); (iii) enables task co-scheduling solutions not feasible with state-of-the-art bandwidth regulation methods.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"326-340"},"PeriodicalIF":6.0000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Parallel and Distributed Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10786308/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Commercial embedded systems increasingly rely on heterogeneous architectures that integrate general-purpose, multi-core processors, and various hardware accelerators on the same chip. This provides the high performance required by modern applications at a low cost and low power consumption, but at the same time poses new challenges. Hardware resource sharing at various levels, and in particular at the main memory controller level, results in slower execution time for the application tasks, ultimately making the system unpredictable from the point of view of timing. To enable the adoption of heterogeneous systems-on-chip (System on Chips (SoCs)) in the domain of timing-critical applications several hardware and software approaches have been proposed, bandwidth regulation based on monitoring and throttling being one of the most widely adopted. Existing solutions, however, are either too coarse-grained, limiting the control over computing engines activities, or strongly platform-dependent, addressing the problem only for specific SoCs. This article proposes an innovative approach that can accurately control main memory bandwidth usage in FPGA-based heterogeneous SoCs. In particular, it controls system bandwidth by connecting a runtime bandwidth regulation component to FPGA-based accelerators. Our solution offers dynamically configurable, fine-grained bandwidth regulation – to adapt to the varying requirements of the application over time – at a very low overhead. Furthermore, it is entirely platform-independent, capable of integration with any FPGA-based accelerator. Developed at the register-transfer level using a reference SoC platform, it is designed for easy compatibility with any FPGA-based SoC. Experimental results conducted on the Xilinx Zynq UltraScale+ platform demonstrate that our approach (i) is more than

$100\times$

faster than loosely-coupled, software controlled regulators; (ii) is capable of exploiting the system bandwidth 28.7% more efficiently than tightly-coupled hardware regulators (e.g., ARM CoreLink QoS-400, where available); (iii) enables task co-scheduling solutions not feasible with state-of-the-art bandwidth regulation methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于fpga的异构soc紧耦合带宽监控与调节的细粒度QoS控制

商业嵌入式系统越来越依赖于异构体系结构，这些体系结构在同一芯片上集成了通用的多核处理器和各种硬件加速器。这以低成本和低功耗提供了现代应用所需的高性能，但同时也提出了新的挑战。硬件资源在不同级别上的共享，特别是在主内存控制器级别上的共享，会导致应用程序任务的执行时间变慢，最终使系统从计时的角度来看不可预测。为了在时间关键应用领域采用异构片上系统（soc），已经提出了几种硬件和软件方法，基于监控和节流的带宽调节是最广泛采用的方法之一。然而，现有的解决方案要么过于粗粒度，限制了对计算引擎活动的控制，要么高度依赖于平台，仅针对特定的soc解决问题。本文提出了一种创新的方法，可以精确地控制基于fpga的异构soc中的主存储器带宽使用。特别是，它通过将运行时带宽调节组件连接到基于fpga的加速器来控制系统带宽。我们的解决方案以非常低的开销提供动态可配置的、细粒度的带宽调节——以适应应用程序随时间变化的需求。此外，它完全独立于平台，能够与任何基于fpga的加速器集成。它使用参考SoC平台在寄存器传输级别开发，旨在与任何基于fpga的SoC轻松兼容。在赛灵思Zynq UltraScale+平台上进行的实验结果表明，我们的方法(i)比松耦合、软件控制的调节器快100倍以上；（ii）能够比紧耦合硬件调节器（如ARM CoreLink QoS-400）更有效地利用系统带宽28.7%；（iii）使任务协同调度解决方案在最先进的带宽调节方法中是不可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Parallel and Distributed Systems 工程技术-工程：电子与电气

CiteScore

11.00

自引率

9.40%

发文量

281

审稿时长

5.6 months

期刊介绍： IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers. Particular areas of interest include, but are not limited to: a) Parallel and distributed algorithms, focusing on topics such as: models of computation; numerical, combinatorial, and data-intensive parallel algorithms, scalability of algorithms and data structures for parallel and distributed systems, communication and synchronization protocols, network algorithms, scheduling, and load balancing. b) Applications of parallel and distributed computing, including computational and data-enabled science and engineering, big data applications, parallel crowd sourcing, large-scale social network analysis, management of big data, cloud and grid computing, scientific and biomedical applications, mobile computing, and cyber-physical systems. c) Parallel and distributed architectures, including architectures for instruction-level and thread-level parallelism; design, analysis, implementation, fault resilience and performance measurements of multiple-processor systems; multicore processors, heterogeneous many-core systems; petascale and exascale systems designs; novel big data architectures; special purpose architectures, including graphics processors, signal processors, network processors, media accelerators, and other special purpose processors and accelerators; impact of technology on architecture; network and interconnect architectures; parallel I/O and storage systems; architecture of the memory hierarchy; power-efficient and green computing architectures; dependable architectures; and performance modeling and evaluation. d) Parallel and distributed software, including parallel and multicore programming languages and compilers, runtime systems, operating systems, Internet computing and web services, resource management including green computing, middleware for grids, clouds, and data centers, libraries, performance modeling and evaluation, parallel programming paradigms, and programming environments and tools.