在存在缺陷片段的情况下为 CNN 推理设计数据流感知网络对接器

IF 2.7 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI:10.1109/TCAD.2024.3447210
Harsh Sharma;Umit Ogras;Ananth Kalyanraman;Partha Pratim Pande
{"title":"在存在缺陷片段的情况下为 CNN 推理设计数据流感知网络对接器","authors":"Harsh Sharma;Umit Ogras;Ananth Kalyanraman;Partha Pratim Pande","doi":"10.1109/TCAD.2024.3447210","DOIUrl":null,"url":null,"abstract":"The emergence of 2.5D chiplet platforms provides a new avenue for compact scale-out implementations of deep learning (DL) workloads (WLs). Integrating multiple small chiplets using a network-on-interposer (NoI) offers not only significant cost reduction and higher manufacturing yield than 2-D ICs but also better energy efficiency and performance. However, defects in chiplets may compromise performance since they restrict the computing capability. Therefore, carefully designed chiplet and NoI link placement, and task mapping schemes, in presence of defects, are necessary. In this article, we propose a defect-aware NoI design approach using a custom-defined space-filling curve (SFC) for efficient execution of mixed WLs of convolutional neural network (CNN) inference tasks. We demonstrate that the k-ary n-cube-based NoI topologies can be degenerated into SFC-based counterparts, which we refer to as SFCed NoI topologies. They enable high performance and energy efficiency with lower fabrication costs over their parent k-ary n-cube counterparts. The SFCed approach helps us to extract high performance from an inherently defective system. We demonstrate that SFCed design achieves up to \n<inline-formula> <tex-math>$2.3\\times $ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$3.5\\times $ </tex-math></inline-formula>\n reduction in latency and energy, respectively, compared to parent NoI architectures while executing diverse DL WLs.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4190-4201"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10745841","citationCount":"0","resultStr":"{\"title\":\"A Dataflow-Aware Network-on-Interposer for CNN Inferencing in the Presence of Defective Chiplets\",\"authors\":\"Harsh Sharma;Umit Ogras;Ananth Kalyanraman;Partha Pratim Pande\",\"doi\":\"10.1109/TCAD.2024.3447210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emergence of 2.5D chiplet platforms provides a new avenue for compact scale-out implementations of deep learning (DL) workloads (WLs). Integrating multiple small chiplets using a network-on-interposer (NoI) offers not only significant cost reduction and higher manufacturing yield than 2-D ICs but also better energy efficiency and performance. However, defects in chiplets may compromise performance since they restrict the computing capability. Therefore, carefully designed chiplet and NoI link placement, and task mapping schemes, in presence of defects, are necessary. In this article, we propose a defect-aware NoI design approach using a custom-defined space-filling curve (SFC) for efficient execution of mixed WLs of convolutional neural network (CNN) inference tasks. We demonstrate that the k-ary n-cube-based NoI topologies can be degenerated into SFC-based counterparts, which we refer to as SFCed NoI topologies. They enable high performance and energy efficiency with lower fabrication costs over their parent k-ary n-cube counterparts. The SFCed approach helps us to extract high performance from an inherently defective system. We demonstrate that SFCed design achieves up to \\n<inline-formula> <tex-math>$2.3\\\\times $ </tex-math></inline-formula>\\n and \\n<inline-formula> <tex-math>$3.5\\\\times $ </tex-math></inline-formula>\\n reduction in latency and energy, respectively, compared to parent NoI architectures while executing diverse DL WLs.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"43 11\",\"pages\":\"4190-4201\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10745841\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745841/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745841/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

2.5D 芯片平台的出现为深度学习(DL)工作负载(WL)的紧凑型扩展实施提供了一条新途径。与二维集成电路相比,利用网络集成器(NoI)集成多个小型芯片不仅能显著降低成本、提高制造良率,还能提高能效和性能。然而,芯片中的缺陷可能会影响性能,因为它们限制了计算能力。因此,在存在缺陷的情况下,有必要精心设计芯片和 NoI 链路布局以及任务映射方案。在本文中,我们提出了一种缺陷感知 NoI 设计方法,使用自定义空间填充曲线(SFC)高效执行卷积神经网络(CNN)推理任务的混合 WL。我们证明,基于 k-ary n 立方体的 NoI 拓扑可以退化为基于 SFC 的对应拓扑,我们称之为 SFCed NoI 拓扑。与kary n立方体拓扑相比,SFCed NoI拓扑能以更低的制造成本实现更高的性能和能效。SFCed 方法有助于我们从固有缺陷的系统中提取高性能。我们证明,与父 NoI 架构相比,SFCed 设计在执行多样化的 DL WL 时,延迟和能耗分别降低了 2.3 倍和 3.5 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Dataflow-Aware Network-on-Interposer for CNN Inferencing in the Presence of Defective Chiplets
The emergence of 2.5D chiplet platforms provides a new avenue for compact scale-out implementations of deep learning (DL) workloads (WLs). Integrating multiple small chiplets using a network-on-interposer (NoI) offers not only significant cost reduction and higher manufacturing yield than 2-D ICs but also better energy efficiency and performance. However, defects in chiplets may compromise performance since they restrict the computing capability. Therefore, carefully designed chiplet and NoI link placement, and task mapping schemes, in presence of defects, are necessary. In this article, we propose a defect-aware NoI design approach using a custom-defined space-filling curve (SFC) for efficient execution of mixed WLs of convolutional neural network (CNN) inference tasks. We demonstrate that the k-ary n-cube-based NoI topologies can be degenerated into SFC-based counterparts, which we refer to as SFCed NoI topologies. They enable high performance and energy efficiency with lower fabrication costs over their parent k-ary n-cube counterparts. The SFCed approach helps us to extract high performance from an inherently defective system. We demonstrate that SFCed design achieves up to $2.3\times $ and $3.5\times $ reduction in latency and energy, respectively, compared to parent NoI architectures while executing diverse DL WLs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
期刊最新文献
Table of Contents NOVELLA: Nonvolatile Last-Level Cache Bypass for Optimizing Off-Chip Memory Energy FreePrune: An Automatic Pruning Framework Across Various Granularities Based on Training-Free Evaluation CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance MaskedHLS: Domain-Specific High-Level Synthesis of Masked Cryptographic Designs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1