Shailja Pandey, Lokesh Siddhu, Preeti Ranjan Panda
{"title":"通过定制预取的深度神经网络3D DRAM动态热管理","authors":"Shailja Pandey, Lokesh Siddhu, Preeti Ranjan Panda","doi":"10.1145/3630012","DOIUrl":null,"url":null,"abstract":"Deep neural network (DNN) implementations are typically characterized by huge data sets and concurrent computation, resulting in a demand for high memory bandwidth due to intensive data movement between processors and off-chip memory. Performing DNN inference on general-purpose cores/ edge is gaining attraction to enhance user experience and reduce latency. The mismatch in the CPU and conventional DRAM speed leads to under utilization of the compute capabilities, causing increased inference time. 3D DRAM is a promising solution to effectively fulfill the bandwidth requirement of high-throughput DNNs. However, due to high power density in stacked architectures, 3D DRAMs need dynamic thermal management (DTM), resulting in performance overhead due to memory-induced CPU throttling. We study the thermal impact of DNN applications running on a 3D DRAM system, and make a case for a memory temperature-aware customized prefetch mechanism to reduce DTM overheads and significantly improve performance. In our proposed NeuroCool DTM policy, we intelligently place either DRAM ranks or tiers in low power state, using the DNN layer characteristics and access rate. We establish the generalization of our approach through training and test data sets comprising diverse data points from widely used DNN applications. Experimental results on popular DNNs show that NeuroCool results in a average performance gain of 44% (as high as 52%) and memory energy improvement of 43% (as high as 69%) over general-purpose DTM policies.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NeuroCool: Dynamic Thermal Management of 3D DRAM for Deep Neural Networks through Customized Prefetching\",\"authors\":\"Shailja Pandey, Lokesh Siddhu, Preeti Ranjan Panda\",\"doi\":\"10.1145/3630012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural network (DNN) implementations are typically characterized by huge data sets and concurrent computation, resulting in a demand for high memory bandwidth due to intensive data movement between processors and off-chip memory. Performing DNN inference on general-purpose cores/ edge is gaining attraction to enhance user experience and reduce latency. The mismatch in the CPU and conventional DRAM speed leads to under utilization of the compute capabilities, causing increased inference time. 3D DRAM is a promising solution to effectively fulfill the bandwidth requirement of high-throughput DNNs. However, due to high power density in stacked architectures, 3D DRAMs need dynamic thermal management (DTM), resulting in performance overhead due to memory-induced CPU throttling. We study the thermal impact of DNN applications running on a 3D DRAM system, and make a case for a memory temperature-aware customized prefetch mechanism to reduce DTM overheads and significantly improve performance. In our proposed NeuroCool DTM policy, we intelligently place either DRAM ranks or tiers in low power state, using the DNN layer characteristics and access rate. We establish the generalization of our approach through training and test data sets comprising diverse data points from widely used DNN applications. Experimental results on popular DNNs show that NeuroCool results in a average performance gain of 44% (as high as 52%) and memory energy improvement of 43% (as high as 69%) over general-purpose DTM policies.\",\"PeriodicalId\":50944,\"journal\":{\"name\":\"ACM Transactions on Design Automation of Electronic Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Design Automation of Electronic Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3630012\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3630012","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
NeuroCool: Dynamic Thermal Management of 3D DRAM for Deep Neural Networks through Customized Prefetching
Deep neural network (DNN) implementations are typically characterized by huge data sets and concurrent computation, resulting in a demand for high memory bandwidth due to intensive data movement between processors and off-chip memory. Performing DNN inference on general-purpose cores/ edge is gaining attraction to enhance user experience and reduce latency. The mismatch in the CPU and conventional DRAM speed leads to under utilization of the compute capabilities, causing increased inference time. 3D DRAM is a promising solution to effectively fulfill the bandwidth requirement of high-throughput DNNs. However, due to high power density in stacked architectures, 3D DRAMs need dynamic thermal management (DTM), resulting in performance overhead due to memory-induced CPU throttling. We study the thermal impact of DNN applications running on a 3D DRAM system, and make a case for a memory temperature-aware customized prefetch mechanism to reduce DTM overheads and significantly improve performance. In our proposed NeuroCool DTM policy, we intelligently place either DRAM ranks or tiers in low power state, using the DNN layer characteristics and access rate. We establish the generalization of our approach through training and test data sets comprising diverse data points from widely used DNN applications. Experimental results on popular DNNs show that NeuroCool results in a average performance gain of 44% (as high as 52%) and memory energy improvement of 43% (as high as 69%) over general-purpose DTM policies.
期刊介绍:
TODAES is a premier ACM journal in design and automation of electronic systems. It publishes innovative work documenting significant research and development advances on the specification, design, analysis, simulation, testing, and evaluation of electronic systems, emphasizing a computer science/engineering orientation. Both theoretical analysis and practical solutions are welcome.