{"title":"一种面向边缘智能的节能时复用内存计算架构","authors":"Rui Xiao;Wenyu Jiang;Piew Yoong Chee","doi":"10.1109/JXCDC.2022.3206879","DOIUrl":null,"url":null,"abstract":"The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves \n<inline-formula> <tex-math>$18.4\\times $ </tex-math></inline-formula>\n in energy with 0.136 pJ/MAC efficiency, and \n<inline-formula> <tex-math>$19.9\\times $ </tex-math></inline-formula>\n area for 1T1R case and \n<inline-formula> <tex-math>$15.9\\times $ </tex-math></inline-formula>\n for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over \n<inline-formula> <tex-math>$16\\times $ </tex-math></inline-formula>\n area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"111-118"},"PeriodicalIF":2.0000,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893208.pdf","citationCount":"1","resultStr":"{\"title\":\"An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence\",\"authors\":\"Rui Xiao;Wenyu Jiang;Piew Yoong Chee\",\"doi\":\"10.1109/JXCDC.2022.3206879\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves \\n<inline-formula> <tex-math>$18.4\\\\times $ </tex-math></inline-formula>\\n in energy with 0.136 pJ/MAC efficiency, and \\n<inline-formula> <tex-math>$19.9\\\\times $ </tex-math></inline-formula>\\n area for 1T1R case and \\n<inline-formula> <tex-math>$15.9\\\\times $ </tex-math></inline-formula>\\n for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over \\n<inline-formula> <tex-math>$16\\\\times $ </tex-math></inline-formula>\\n area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.\",\"PeriodicalId\":54149,\"journal\":{\"name\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"volume\":\"8 2\",\"pages\":\"111-118\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2022-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893208.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9893208/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9893208/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence
The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves
$18.4\times $
in energy with 0.136 pJ/MAC efficiency, and
$19.9\times $
area for 1T1R case and
$15.9\times $
for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over
$16\times $
area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.