3DNN-Xplorer：一个用于异构三维DNN加速器设计空间探索的机器学习框架

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-10-14 DOI:10.1109/TVLSI.2024.3471496

Gauthaman Murali;Min Gyu Park;Sung Kyu Lim

{"title":"3DNN-Xplorer：一个用于异构三维DNN加速器设计空间探索的机器学习框架","authors":"Gauthaman Murali;Min Gyu Park;Sung Kyu Lim","doi":"10.1109/TVLSI.2024.3471496","DOIUrl":null,"url":null,"abstract":"This article presents 3DNN-Xplorer, the first machine learning (ML)-based framework for predicting the performance of heterogeneous 3-D deep neural network (DNN) accelerators. Our ML framework facilitates the design space exploration (DSE) of heterogeneous 3-D accelerators with a two-tier compute-on-memory (CoM) configuration, considering 3-D physical design factors. Our design space encompasses four distinct heterogeneous 3-D integration styles, combining 28- and 16-nm technology nodes for both compute and memory tiers. Using extrapolation techniques with ML models trained on 10-to-256 processing element (PE) accelerator configurations, we estimate the performance of systems featuring 75–16384 PEs, achieving a maximum absolute error of 13.9% (the number of PEs is not continuous and varies based on the accelerator architecture). To ensure balanced tier areas in the design, our framework assumes the same number of PEs or on-chip memory capacity across the four integration styles, accounting for area imbalance resulting from different technology nodes. Our analysis reveals that the heterogeneous 3-D style with 28-nm compute and 16-nm memory is energy-efficient and offers notable energy savings of up to 50% and an 8.8% reduction in runtime compared to other 3-D integration styles with the same number of PEs. Similarly, the heterogeneous 3-D style with 16-nm compute and 28-nm memory is area-efficient and shows up to 8.3% runtime reduction compared to other 3-D styles with the same on-chip memory capacity.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"358-370"},"PeriodicalIF":2.8000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"3DNN-Xplorer: A Machine Learning Framework for Design Space Exploration of Heterogeneous 3-D DNN Accelerators\",\"authors\":\"Gauthaman Murali;Min Gyu Park;Sung Kyu Lim\",\"doi\":\"10.1109/TVLSI.2024.3471496\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents 3DNN-Xplorer, the first machine learning (ML)-based framework for predicting the performance of heterogeneous 3-D deep neural network (DNN) accelerators. Our ML framework facilitates the design space exploration (DSE) of heterogeneous 3-D accelerators with a two-tier compute-on-memory (CoM) configuration, considering 3-D physical design factors. Our design space encompasses four distinct heterogeneous 3-D integration styles, combining 28- and 16-nm technology nodes for both compute and memory tiers. Using extrapolation techniques with ML models trained on 10-to-256 processing element (PE) accelerator configurations, we estimate the performance of systems featuring 75–16384 PEs, achieving a maximum absolute error of 13.9% (the number of PEs is not continuous and varies based on the accelerator architecture). To ensure balanced tier areas in the design, our framework assumes the same number of PEs or on-chip memory capacity across the four integration styles, accounting for area imbalance resulting from different technology nodes. Our analysis reveals that the heterogeneous 3-D style with 28-nm compute and 16-nm memory is energy-efficient and offers notable energy savings of up to 50% and an 8.8% reduction in runtime compared to other 3-D integration styles with the same number of PEs. Similarly, the heterogeneous 3-D style with 16-nm compute and 28-nm memory is area-efficient and shows up to 8.3% runtime reduction compared to other 3-D styles with the same on-chip memory capacity.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 2\",\"pages\":\"358-370\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10715720/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10715720/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了3DNN-Xplorer，这是第一个基于机器学习（ML）的框架，用于预测异构3d深度神经网络（DNN）加速器的性能。我们的机器学习框架通过考虑三维物理设计因素的两层内存上计算（CoM）配置，促进了异构三维加速器的设计空间探索（DSE）。我们的设计空间包含四种不同的异构3d集成风格，为计算层和存储层结合了28纳米和16纳米技术节点。使用外推技术，在10到256个处理元素（PE）加速器配置上训练ML模型，我们估计了具有75-16384个PE的系统的性能，实现了13.9%的最大绝对误差（PE的数量不是连续的，并且根据加速器架构而变化）。为了确保设计中的层面积平衡，我们的框架在四种集成风格中假设相同数量的pe或片上存储器容量，考虑到不同技术节点导致的面积不平衡。我们的分析表明，具有28纳米计算和16纳米内存的异构3-D风格是节能的，与具有相同数量pe的其他3-D集成风格相比，节能高达50%，运行时间减少8.8%。同样，具有16纳米计算和28纳米内存的异构3-D风格具有面积效率，与具有相同片上存储容量的其他3-D风格相比，运行时间减少了8.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

3DNN-Xplorer: A Machine Learning Framework for Design Space Exploration of Heterogeneous 3-D DNN Accelerators

This article presents 3DNN-Xplorer, the first machine learning (ML)-based framework for predicting the performance of heterogeneous 3-D deep neural network (DNN) accelerators. Our ML framework facilitates the design space exploration (DSE) of heterogeneous 3-D accelerators with a two-tier compute-on-memory (CoM) configuration, considering 3-D physical design factors. Our design space encompasses four distinct heterogeneous 3-D integration styles, combining 28- and 16-nm technology nodes for both compute and memory tiers. Using extrapolation techniques with ML models trained on 10-to-256 processing element (PE) accelerator configurations, we estimate the performance of systems featuring 75–16384 PEs, achieving a maximum absolute error of 13.9% (the number of PEs is not continuous and varies based on the accelerator architecture). To ensure balanced tier areas in the design, our framework assumes the same number of PEs or on-chip memory capacity across the four integration styles, accounting for area imbalance resulting from different technology nodes. Our analysis reveals that the heterogeneous 3-D style with 28-nm compute and 16-nm memory is energy-efficient and offers notable energy savings of up to 50% and an 8.8% reduction in runtime compared to other 3-D integration styles with the same number of PEs. Similarly, the heterogeneous 3-D style with 16-nm compute and 28-nm memory is area-efficient and shows up to 8.3% runtime reduction compared to other 3-D styles with the same on-chip memory capacity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.

期刊最新文献

Table of Contents IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information Table of Contents IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information