SC-CGRA: An Energy-Efficient CGRA Using Stochastic Computing

IF 6 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-09-03 DOI:10.1109/TPDS.2024.3453310

Di Mou;Bo Wang;Dajiang Liu

{"title":"SC-CGRA: An Energy-Efficient CGRA Using Stochastic Computing","authors":"Di Mou;Bo Wang;Dajiang Liu","doi":"10.1109/TPDS.2024.3453310","DOIUrl":null,"url":null,"abstract":"Stochastic Computing (SC) offers a promising computing paradigm for low-power and cost-effective applications, with the added advantage of high error tolerance. In parallel, Coarse-Grained Reconfigurable Arrays (CGRA) prove to be a highly promising platform for domain-specific applications due to their combination of energy efficiency and flexibility. Intuitively, introducing SC to CGRA would significantly reinforce the strengths of both paradigms. However, existing SC-based architectures often encounter inherent computation errors, while the stochastic number generators employed in SC result in exponentially growing latency, which is deemed unacceptable in CGRA. In this work, we propose an SC-based CGRA by replacing the exact multiplication in traditional CGRA with an SC-based multiplication. To improve the accuracy of SC and shorten the latency of Stochastic Number Generators (SNG), we introduce the leading zero shifting and comparator truncation, while keeping the length of bitstream fixed. In addition, due to the flexible interconnections among PEs, we propose a quality scaling strategy that combines neighbor PEs to achieve high-accuracy operations without switching costs like power-gating. Compared to the state-of-the-art approximate computing design of CGRA, our proposed CGRA can averagely achieve a 65.3% reduction in output error while having a 21.2% reduction in energy consumption and a noteworthy 28.37% area savings.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"2023-2038"},"PeriodicalIF":6.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Parallel and Distributed Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663960/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Stochastic Computing (SC) offers a promising computing paradigm for low-power and cost-effective applications, with the added advantage of high error tolerance. In parallel, Coarse-Grained Reconfigurable Arrays (CGRA) prove to be a highly promising platform for domain-specific applications due to their combination of energy efficiency and flexibility. Intuitively, introducing SC to CGRA would significantly reinforce the strengths of both paradigms. However, existing SC-based architectures often encounter inherent computation errors, while the stochastic number generators employed in SC result in exponentially growing latency, which is deemed unacceptable in CGRA. In this work, we propose an SC-based CGRA by replacing the exact multiplication in traditional CGRA with an SC-based multiplication. To improve the accuracy of SC and shorten the latency of Stochastic Number Generators (SNG), we introduce the leading zero shifting and comparator truncation, while keeping the length of bitstream fixed. In addition, due to the flexible interconnections among PEs, we propose a quality scaling strategy that combines neighbor PEs to achieve high-accuracy operations without switching costs like power-gating. Compared to the state-of-the-art approximate computing design of CGRA, our proposed CGRA can averagely achieve a 65.3% reduction in output error while having a 21.2% reduction in energy consumption and a noteworthy 28.37% area savings.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SC-CGRA：使用随机计算的高能效 CGRA

随机计算（Schochastic Computing，SC）为低功耗、高成本效益的应用提供了一种前景广阔的计算范式，并具有高容错性的额外优势。与此同时，粗粒度可重构阵列（CGRA）由于兼具能效和灵活性，被证明是一种非常有前途的特定领域应用平台。直观地说，将 SC 引入 CGRA 将大大加强这两种模式的优势。然而，现有的基于 SC 的架构经常会遇到固有的计算错误，而 SC 中采用的随机数字生成器会导致指数级增长的延迟，这在 CGRA 中被认为是不可接受的。在这项工作中，我们提出了一种基于 SC 的 CGRA，用基于 SC 的乘法取代传统 CGRA 中的精确乘法。为了提高 SC 的精度并缩短随机数发生器 (SNG) 的延迟，我们引入了前导零移位和比较器截断，同时保持比特流的长度不变。此外，由于 PE 之间具有灵活的互连，我们提出了一种质量缩放策略，即结合相邻 PE 实现高精度操作，而无需电源门等开关成本。与最先进的近似计算 CGRA 设计相比，我们提出的 CGRA 平均可将输出误差减少 65.3%，同时能耗减少 21.2%，面积节省 28.37%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Parallel and Distributed Systems 工程技术-工程：电子与电气

CiteScore

11.00

自引率

9.40%

发文量

281

审稿时长

5.6 months

期刊介绍： IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers. Particular areas of interest include, but are not limited to: a) Parallel and distributed algorithms, focusing on topics such as: models of computation; numerical, combinatorial, and data-intensive parallel algorithms, scalability of algorithms and data structures for parallel and distributed systems, communication and synchronization protocols, network algorithms, scheduling, and load balancing. b) Applications of parallel and distributed computing, including computational and data-enabled science and engineering, big data applications, parallel crowd sourcing, large-scale social network analysis, management of big data, cloud and grid computing, scientific and biomedical applications, mobile computing, and cyber-physical systems. c) Parallel and distributed architectures, including architectures for instruction-level and thread-level parallelism; design, analysis, implementation, fault resilience and performance measurements of multiple-processor systems; multicore processors, heterogeneous many-core systems; petascale and exascale systems designs; novel big data architectures; special purpose architectures, including graphics processors, signal processors, network processors, media accelerators, and other special purpose processors and accelerators; impact of technology on architecture; network and interconnect architectures; parallel I/O and storage systems; architecture of the memory hierarchy; power-efficient and green computing architectures; dependable architectures; and performance modeling and evaluation. d) Parallel and distributed software, including parallel and multicore programming languages and compilers, runtime systems, operating systems, Internet computing and web services, resource management including green computing, middleware for grids, clouds, and data centers, libraries, performance modeling and evaluation, parallel programming paradigms, and programming environments and tools.

期刊最新文献

Styx: An Efficient Workflow Engine for Serverless Platforms mtGEMM: An Efficient GEMM Library for Modern Multi-Core DSPs HarmonyCache: Scalable In-Network Cache With Read-Write Separation ComStar: Compression-Aware Stream Query for Heterogeneous Hybrid Architecture Accelerating Molecular Dynamics Simulations on ARM Multi-Core Processors