Pub Date : 2025-01-08DOI: 10.1109/LCA.2025.3525970
Vardhana M;Rohan Pinto
Convolutional Neural Networks are deployed mostly on GPUs or CPUs. However, due to the increasing complexity of architecture and growing performance requirements, these platforms may not be suitable for deploying inference engines. ASIC and FPGA implementations are appearing as superior alternatives to software-based solutions for achieving the required performance. In this article, an efficient architecture for accelerating convolution using the Winograd transform is proposed and implemented on FPGA. The proposed accelerator consumes 38% less resources as compared with conventional GEMM-based implementation. Analysis results indicate that our accelerator can achieve 3.5 TOP/s, 1.28 TOP/s, and 1.42 TOP/s for VGG16, ResNet18, and MobileNetV2 CNNs, respectively, at 250 MHz. The proposed accelerator demonstrates the best energy efficiency as compared with prior arts.
{"title":"High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network","authors":"Vardhana M;Rohan Pinto","doi":"10.1109/LCA.2025.3525970","DOIUrl":"https://doi.org/10.1109/LCA.2025.3525970","url":null,"abstract":"Convolutional Neural Networks are deployed mostly on GPUs or CPUs. However, due to the increasing complexity of architecture and growing performance requirements, these platforms may not be suitable for deploying inference engines. ASIC and FPGA implementations are appearing as superior alternatives to software-based solutions for achieving the required performance. In this article, an efficient architecture for accelerating convolution using the Winograd transform is proposed and implemented on FPGA. The proposed accelerator consumes 38% less resources as compared with conventional GEMM-based implementation. Analysis results indicate that our accelerator can achieve 3.5 TOP/s, 1.28 TOP/s, and 1.42 TOP/s for VGG16, ResNet18, and MobileNetV2 CNNs, respectively, at 250 MHz. The proposed accelerator demonstrates the best energy efficiency as compared with prior arts.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"21-24"},"PeriodicalIF":1.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter introduces PINSim, a user-friendly and flexible framework for simulating emerging smart vision sensors in the early design stages. PINSim enables the realization of integrated sensing and processing near and in the sensor, effectively addressing challenges such as data movement and power-hungry analog-to-digital converters. The framework offers a flexible interface and a wide range of design options for customizing the efficiency and accuracy of processing-near/in-sensor-based accelerators using a hierarchical structure. Its organization spans from the device level upward to the algorithm level. PINSim realizes instruction-accurate evaluation of circuit-level performance metrics. PINSim achieves over $25,000times$ speed-up compared to SPICE simulation with less than a 4.1% error rate on average. Furthermore, it supports both multilayer perceptron (MLP) and convolutional neural network (CNN) models, with limitations determined by IoT budget constraints. By facilitating the exploration and optimization of various design parameters, PiNSim empowers researchers and engineers to develop energy-efficient and high-performance smart vision sensors for a wide range of applications.
{"title":"PINSim: A Processing In- and Near-Sensor Simulator to Model Intelligent Vision Sensors","authors":"Sepehr Tabrizchi;Mehrdad Morsali;David Pan;Shaahin Angizi;Arman Roohi","doi":"10.1109/LCA.2024.3522777","DOIUrl":"https://doi.org/10.1109/LCA.2024.3522777","url":null,"abstract":"This letter introduces PINSim, a user-friendly and flexible framework for simulating emerging smart vision sensors in the early design stages. PINSim enables the realization of integrated sensing and processing near and in the sensor, effectively addressing challenges such as data movement and power-hungry analog-to-digital converters. The framework offers a flexible interface and a wide range of design options for customizing the efficiency and accuracy of processing-near/in-sensor-based accelerators using a hierarchical structure. Its organization spans from the device level upward to the algorithm level. PINSim realizes instruction-accurate evaluation of circuit-level performance metrics. PINSim achieves over <inline-formula><tex-math>$25,000times$</tex-math></inline-formula> speed-up compared to SPICE simulation with less than a 4.1% error rate on average. Furthermore, it supports both multilayer perceptron (MLP) and convolutional neural network (CNN) models, with limitations determined by IoT budget constraints. By facilitating the exploration and optimization of various design parameters, PiNSim empowers researchers and engineers to develop energy-efficient and high-performance smart vision sensors for a wide range of applications.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"17-20"},"PeriodicalIF":1.4,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-16DOI: 10.1109/LCA.2024.3498103
Hongtao Wang;Peiquan Jin
The introduction of Zoned Namespace SSDs (ZNS SSDs) presents new challenges for existing buffer management schemes. In addition to traditional SSD characteristics such as read/write asymmetry and limited write endurance, ZNS SSDs possess unique constraints, such as requiring sequential writes within each zone. These features make conventional buffering policies incompatible with ZNS SSDs. This paper introduces ZoneBuffer, a novel buffering scheme designed specifically for ZNS SSDs. ZoneBuffer's innovation lies in two key aspects. First, it introduces a new buffer structure comprising a Work Region and a Priority Region. The Priority Region is further divided into a clean page queue and a zone cluster of dirty pages. By confining buffer replacement to the Priority Region, ZoneBuffer ensures optimization for ZNS SSDs. Second, ZoneBuffer incorporates a lifetime-based clustering algorithm to group dirty pages within the Priority Region, optimizing write operations. Preliminary experiments conducted on a real ZNS SSD demonstrate the effectiveness of ZoneBuffer. Compared with conventional schemes like LRU and CFLRU, the results indicate that ZoneBuffer significantly improves performance.
{"title":"ZoneBuffer: An Efficient Buffer Management Scheme for ZNS SSDs","authors":"Hongtao Wang;Peiquan Jin","doi":"10.1109/LCA.2024.3498103","DOIUrl":"https://doi.org/10.1109/LCA.2024.3498103","url":null,"abstract":"The introduction of Zoned Namespace SSDs (ZNS SSDs) presents new challenges for existing buffer management schemes. In addition to traditional SSD characteristics such as read/write asymmetry and limited write endurance, ZNS SSDs possess unique constraints, such as requiring sequential writes within each zone. These features make conventional buffering policies incompatible with ZNS SSDs. This paper introduces ZoneBuffer, a novel buffering scheme designed specifically for ZNS SSDs. ZoneBuffer's innovation lies in two key aspects. First, it introduces a new buffer structure comprising a Work Region and a Priority Region. The Priority Region is further divided into a clean page queue and a zone cluster of dirty pages. By confining buffer replacement to the Priority Region, ZoneBuffer ensures optimization for ZNS SSDs. Second, ZoneBuffer incorporates a lifetime-based clustering algorithm to group dirty pages within the Priority Region, optimizing write operations. Preliminary experiments conducted on a real ZNS SSD demonstrate the effectiveness of ZoneBuffer. Compared with conventional schemes like LRU and CFLRU, the results indicate that ZoneBuffer significantly improves performance.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 2","pages":"239-242"},"PeriodicalIF":1.4,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-12DOI: 10.1109/LCA.2024.3516205
Myoungjun Chun;Jaeyong Lee;Inhyuk Choi;Jisung Park;Myungsuk Kim;Jihong Kim
Although read disturbance has emerged as a major reliability concern, managing read disturbance in modern NAND flash memory has not been thoroughly investigated yet. From a device characterization study using real modern NAND flash memory, we observe that reading a page incurs heterogeneous reliability impacts on each WL, which makes the existing block-level read reclaim extremely inefficient. We propose a new WL-level read-reclaim technique, called Straw