{"title":"MCM-SR: Multiple Constant Multiplication-Based CNN Streaming Hardware Architecture for Super-Resolution","authors":"Seung-Hwan Bae;Hyuk-Jae Lee;Hyun Kim","doi":"10.1109/TVLSI.2024.3504513","DOIUrl":null,"url":null,"abstract":"Convolutional neural network (CNN)-based super-resolution (SR) methods have become prevalent in display devices due to their superior image quality. However, the significant computational demands of CNN-based SR require hardware accelerators for real-time processing. Among the hardware architectures, the streaming architecture can significantly reduce latency and power consumption by minimizing external dynamic random access memory (DRAM) access. Nevertheless, this architecture necessitates a considerable hardware area, as each layer needs a dedicated processing engine. Furthermore, achieving high hardware utilization in this architecture requires substantial design expertise. In this article, we propose methods to reduce the hardware resources of CNN-based SR accelerators by applying the multiple constant multiplication (MCM) algorithm. We propose a loop interchange method for the convolution (CONV) operation to reduce the logic area by 23% and an adaptive loop interchange method for each layer that considers both the static random access memory (SRAM) and logic area simultaneously to reduce the SRAM size by 15%. In addition, we improve the MCM graph exploration speed by \n<inline-formula> <tex-math>$5.4\\times $ </tex-math></inline-formula>\n while maintaining the SR quality through beam search when CONV weights are approximated to reduce the hardware resources.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"75-87"},"PeriodicalIF":2.8000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10777852/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Convolutional neural network (CNN)-based super-resolution (SR) methods have become prevalent in display devices due to their superior image quality. However, the significant computational demands of CNN-based SR require hardware accelerators for real-time processing. Among the hardware architectures, the streaming architecture can significantly reduce latency and power consumption by minimizing external dynamic random access memory (DRAM) access. Nevertheless, this architecture necessitates a considerable hardware area, as each layer needs a dedicated processing engine. Furthermore, achieving high hardware utilization in this architecture requires substantial design expertise. In this article, we propose methods to reduce the hardware resources of CNN-based SR accelerators by applying the multiple constant multiplication (MCM) algorithm. We propose a loop interchange method for the convolution (CONV) operation to reduce the logic area by 23% and an adaptive loop interchange method for each layer that considers both the static random access memory (SRAM) and logic area simultaneously to reduce the SRAM size by 15%. In addition, we improve the MCM graph exploration speed by
$5.4\times $
while maintaining the SR quality through beam search when CONV weights are approximated to reduce the hardware resources.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.