MCM-SR: Multiple Constant Multiplication-Based CNN Streaming Hardware Architecture for Super-Resolution

IF 3.1 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-04 DOI:10.1109/TVLSI.2024.3504513

Seung-Hwan Bae;Hyuk-Jae Lee;Hyun Kim

{"title":"MCM-SR: Multiple Constant Multiplication-Based CNN Streaming Hardware Architecture for Super-Resolution","authors":"Seung-Hwan Bae;Hyuk-Jae Lee;Hyun Kim","doi":"10.1109/TVLSI.2024.3504513","DOIUrl":null,"url":null,"abstract":"Convolutional neural network (CNN)-based super-resolution (SR) methods have become prevalent in display devices due to their superior image quality. However, the significant computational demands of CNN-based SR require hardware accelerators for real-time processing. Among the hardware architectures, the streaming architecture can significantly reduce latency and power consumption by minimizing external dynamic random access memory (DRAM) access. Nevertheless, this architecture necessitates a considerable hardware area, as each layer needs a dedicated processing engine. Furthermore, achieving high hardware utilization in this architecture requires substantial design expertise. In this article, we propose methods to reduce the hardware resources of CNN-based SR accelerators by applying the multiple constant multiplication (MCM) algorithm. We propose a loop interchange method for the convolution (CONV) operation to reduce the logic area by 23% and an adaptive loop interchange method for each layer that considers both the static random access memory (SRAM) and logic area simultaneously to reduce the SRAM size by 15%. In addition, we improve the MCM graph exploration speed by \n<inline-formula> <tex-math>$5.4\\times $ </tex-math></inline-formula>\n while maintaining the SR quality through beam search when CONV weights are approximated to reduce the hardware resources.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"75-87"},"PeriodicalIF":3.1000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10777852/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural network (CNN)-based super-resolution (SR) methods have become prevalent in display devices due to their superior image quality. However, the significant computational demands of CNN-based SR require hardware accelerators for real-time processing. Among the hardware architectures, the streaming architecture can significantly reduce latency and power consumption by minimizing external dynamic random access memory (DRAM) access. Nevertheless, this architecture necessitates a considerable hardware area, as each layer needs a dedicated processing engine. Furthermore, achieving high hardware utilization in this architecture requires substantial design expertise. In this article, we propose methods to reduce the hardware resources of CNN-based SR accelerators by applying the multiple constant multiplication (MCM) algorithm. We propose a loop interchange method for the convolution (CONV) operation to reduce the logic area by 23% and an adaptive loop interchange method for each layer that considers both the static random access memory (SRAM) and logic area simultaneously to reduce the SRAM size by 15%. In addition, we improve the MCM graph exploration speed by

$5.4\times $

while maintaining the SR quality through beam search when CONV weights are approximated to reduce the hardware resources.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MCM-SR：基于多常数乘法的CNN超分辨率流媒体硬件架构

基于卷积神经网络（CNN）的超分辨率（SR）方法由于其优越的图像质量而在显示设备中变得普遍。然而，基于cnn的SR的巨大计算需求需要硬件加速器来进行实时处理。在硬件架构中，流架构可以通过最小化外部动态随机存取存储器（DRAM）访问来显著降低延迟和功耗。然而，这种体系结构需要相当大的硬件区域，因为每一层都需要一个专用的处理引擎。此外，在这种体系结构中实现高硬件利用率需要大量的设计专业知识。在本文中，我们提出了利用多重常数乘法（multiple constant multiplication， MCM）算法来减少基于cnn的SR加速器硬件资源的方法。我们提出了一种用于卷积（CONV）操作的环路交换方法，以减少23%的逻辑面积，以及一种用于每层的自适应环路交换方法，该方法同时考虑静态随机存取存储器（SRAM）和逻辑面积，以减少15%的SRAM大小。此外，我们将MCM图的搜索速度提高了5.4倍，同时在近似CONV权值时通过波束搜索保持SR质量，以减少硬件资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.