{"title":"内存计算加速器的 DNN 映射方法基准测试","authors":"Yimin Wang;Xuanyao Fong","doi":"10.1109/JETCAS.2023.3328864","DOIUrl":null,"url":null,"abstract":"This paper presents a study of methods for mapping the convolutional workloads in deep neural networks (DNNs) onto the computing hardware in the in-memory computing (IMC) architecture. Specifically, we focus on categorizing and benchmarking the processing element (PE)-level mapping methods, which have not been investigated in detail for IMC-based architectures. First, we categorize the PE-level mapping methods from the loop unrolling perspective and discuss the corresponding implications on input data reuse and output data reduction. Then, a mapping-oriented architecture is proposed by considering the input and output datapaths under various mapping methods. The architecture is evaluated on the 45 nm technology showing good area-efficiency and scalability, providing a hardware substrate for further performance improvements via PE-level mappings. Furthermore, we present an evaluation framework that captures the architecture behaviors and enables extensive benchmarking of mapping methods under various neural network workloads, main memory bandwidth, and digital computing throughput. The benchmarking results demonstrate significant tradeoffs in the design space and unlock new design possibilities. We present case studies to showcase preferred mapping methods for best energy consumption and/or execution time and demonstrate that a hybrid-mapping scheme enhances minimum execution time by up to 30% for the publicly-available DNN benchmarks.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Benchmarking DNN Mapping Methods for the in-Memory Computing Accelerators\",\"authors\":\"Yimin Wang;Xuanyao Fong\",\"doi\":\"10.1109/JETCAS.2023.3328864\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a study of methods for mapping the convolutional workloads in deep neural networks (DNNs) onto the computing hardware in the in-memory computing (IMC) architecture. Specifically, we focus on categorizing and benchmarking the processing element (PE)-level mapping methods, which have not been investigated in detail for IMC-based architectures. First, we categorize the PE-level mapping methods from the loop unrolling perspective and discuss the corresponding implications on input data reuse and output data reduction. Then, a mapping-oriented architecture is proposed by considering the input and output datapaths under various mapping methods. The architecture is evaluated on the 45 nm technology showing good area-efficiency and scalability, providing a hardware substrate for further performance improvements via PE-level mappings. Furthermore, we present an evaluation framework that captures the architecture behaviors and enables extensive benchmarking of mapping methods under various neural network workloads, main memory bandwidth, and digital computing throughput. The benchmarking results demonstrate significant tradeoffs in the design space and unlock new design possibilities. We present case studies to showcase preferred mapping methods for best energy consumption and/or execution time and demonstrate that a hybrid-mapping scheme enhances minimum execution time by up to 30% for the publicly-available DNN benchmarks.\",\"PeriodicalId\":48827,\"journal\":{\"name\":\"IEEE Journal on Emerging and Selected Topics in Circuits and Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2023-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Emerging and Selected Topics in Circuits and Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10302283/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10302283/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
本文研究了将深度神经网络(DNN)中的卷积工作量映射到内存计算(IMC)架构中的计算硬件上的方法。具体而言,我们将重点放在处理元件(PE)级映射方法的分类和基准测试上,这些方法尚未针对基于 IMC 的架构进行详细研究。首先,我们从循环展开的角度对 PE 级映射方法进行分类,并讨论其对输入数据重用和输出数据缩减的相应影响。然后,通过考虑各种映射方法下的输入和输出数据通路,提出了一种面向映射的架构。该架构在 45 纳米技术上进行了评估,显示出良好的面积效率和可扩展性,为通过 PE 级映射进一步提高性能提供了硬件基础。此外,我们还提出了一个评估框架,可捕捉架构行为,并在各种神经网络工作负载、主存储器带宽和数字计算吞吐量下对映射方法进行广泛的基准测试。基准测试结果表明了设计空间中的重大权衡,并揭示了新的设计可能性。我们通过案例研究展示了最佳能耗和/或执行时间的首选映射方法,并证明混合映射方案可将公开的 DNN 基准的最短执行时间最多延长 30%。
Benchmarking DNN Mapping Methods for the in-Memory Computing Accelerators
This paper presents a study of methods for mapping the convolutional workloads in deep neural networks (DNNs) onto the computing hardware in the in-memory computing (IMC) architecture. Specifically, we focus on categorizing and benchmarking the processing element (PE)-level mapping methods, which have not been investigated in detail for IMC-based architectures. First, we categorize the PE-level mapping methods from the loop unrolling perspective and discuss the corresponding implications on input data reuse and output data reduction. Then, a mapping-oriented architecture is proposed by considering the input and output datapaths under various mapping methods. The architecture is evaluated on the 45 nm technology showing good area-efficiency and scalability, providing a hardware substrate for further performance improvements via PE-level mappings. Furthermore, we present an evaluation framework that captures the architecture behaviors and enables extensive benchmarking of mapping methods under various neural network workloads, main memory bandwidth, and digital computing throughput. The benchmarking results demonstrate significant tradeoffs in the design space and unlock new design possibilities. We present case studies to showcase preferred mapping methods for best energy consumption and/or execution time and demonstrate that a hybrid-mapping scheme enhances minimum execution time by up to 30% for the publicly-available DNN benchmarks.
期刊介绍:
The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.