System-Level Design and Integration of a Prototype AR/VR Hardware Featuring a Custom Low-Power DNN Accelerator Chip in 7nm Technology for Codec Avatars

H. Sumbul, Tony F. Wu, Yuecheng Li, Syed Shakib Sarwar, W. Koven, Eli Murphy-Trotzky, Xingxing Cai, E. Ansari, D. Morris, Huichu Liu, Doyun Kim, E. Beigné
{"title":"System-Level Design and Integration of a Prototype AR/VR Hardware Featuring a Custom Low-Power DNN Accelerator Chip in 7nm Technology for Codec Avatars","authors":"H. Sumbul, Tony F. Wu, Yuecheng Li, Syed Shakib Sarwar, W. Koven, Eli Murphy-Trotzky, Xingxing Cai, E. Ansari, D. Morris, Huichu Liu, Doyun Kim, E. Beigné","doi":"10.1109/CICC53496.2022.9772810","DOIUrl":null,"url":null,"abstract":"Augmented Reality / Virtual Reality (AR/VR) devices aim to connect people in the Metaverse with photorealistic virtual avatars, referred to as “Codec Avatars”. Delivering a high visual performance for Codec Avatar workloads, however, is a challenging task for mobile SoCs as AR/VR devices have limited power and form factor constraints. On-device, local, near-sensor processing provides the best system-level energy-efficiency and enables strong security and privacy features in the long run. In this work, we present a custom-built, prototype small-scale mobile SoC that achieves energy-efficient performance for running eye gaze extraction of the Codec Avatar model. The test-chip, fabricated in 7nm technology node, features a Neural Network (NN) accelerator consisting of a 1024 Multiply-Accumulate (MAC) array, 2MB on-chip SRAM, and a 32bit RISC-V CPU. The featured test-chip is integrated on a prototype mobile VR headset to run the Codec Avatar application. This work aims to show the full stack design considerations of system-level integration, hardware-aware model customization, and circuit-level acceleration to meet the challenging mobile AR/VR SoC specifications for a Codec Avatar demonstration. By re-architecting the Convolutional NN (CNN) based eye gaze extraction model and tailoring it for the hardware, the entire model fits on the chip to mitigate system-level energy and latency cost of off-chip memory accesses. By efficiently accelerating the convolution operation at the circuit-level, the presented prototype SoC achieves 30 frames per second performance with low-power consumption at low form factors. With the full-stack design considerations presented in this work, the featured test-chip consumes 22.7mW power to run inference on the entire CNN model in 16.5ms from input to output for a single sensor image. As a result, the test-chip achieves 375 µJ/frame/eye energy-efficiency within a 2.56 mm2 silicon area.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Custom Integrated Circuits Conference (CICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICC53496.2022.9772810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Augmented Reality / Virtual Reality (AR/VR) devices aim to connect people in the Metaverse with photorealistic virtual avatars, referred to as “Codec Avatars”. Delivering a high visual performance for Codec Avatar workloads, however, is a challenging task for mobile SoCs as AR/VR devices have limited power and form factor constraints. On-device, local, near-sensor processing provides the best system-level energy-efficiency and enables strong security and privacy features in the long run. In this work, we present a custom-built, prototype small-scale mobile SoC that achieves energy-efficient performance for running eye gaze extraction of the Codec Avatar model. The test-chip, fabricated in 7nm technology node, features a Neural Network (NN) accelerator consisting of a 1024 Multiply-Accumulate (MAC) array, 2MB on-chip SRAM, and a 32bit RISC-V CPU. The featured test-chip is integrated on a prototype mobile VR headset to run the Codec Avatar application. This work aims to show the full stack design considerations of system-level integration, hardware-aware model customization, and circuit-level acceleration to meet the challenging mobile AR/VR SoC specifications for a Codec Avatar demonstration. By re-architecting the Convolutional NN (CNN) based eye gaze extraction model and tailoring it for the hardware, the entire model fits on the chip to mitigate system-level energy and latency cost of off-chip memory accesses. By efficiently accelerating the convolution operation at the circuit-level, the presented prototype SoC achieves 30 frames per second performance with low-power consumption at low form factors. With the full-stack design considerations presented in this work, the featured test-chip consumes 22.7mW power to run inference on the entire CNN model in 16.5ms from input to output for a single sensor image. As a result, the test-chip achieves 375 µJ/frame/eye energy-efficiency within a 2.56 mm2 silicon area.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
系统级设计和集成AR/VR原型硬件,该硬件采用7nm技术,用于编解码器虚拟化身
增强现实/虚拟现实(AR/VR)设备旨在将虚拟世界中的人们与逼真的虚拟化身(称为“编解码器化身”)联系起来。然而,对于移动soc来说,为Codec Avatar工作负载提供高视觉性能是一项具有挑战性的任务,因为AR/VR设备具有有限的功率和外形因素限制。设备上、本地、近传感器处理提供了最佳的系统级能源效率,并在长期内实现了强大的安全和隐私功能。在这项工作中,我们提出了一个定制的原型小规模移动SoC,该SoC实现了运行Codec Avatar模型的眼睛凝视提取的节能性能。该测试芯片采用7nm工艺节点制造,具有由1024个MAC阵列、2MB片上SRAM和32位RISC-V CPU组成的神经网络(NN)加速器。该特色测试芯片集成在原型移动VR耳机上,以运行Codec Avatar应用程序。这项工作旨在展示系统级集成、硬件感知模型定制和电路级加速的全栈设计考虑,以满足编解码器Avatar演示中具有挑战性的移动AR/VR SoC规范。通过重新构建基于卷积神经网络(CNN)的眼睛注视提取模型并针对硬件进行裁剪,整个模型适合于芯片,以降低芯片外存储器访问的系统级能量和延迟成本。通过有效地加速电路级的卷积运算,所提出的原型SoC在低尺寸下实现了每秒30帧的低功耗性能。考虑到本工作中提出的全堆栈设计考虑,该特色测试芯片在16.5ms内从输入到输出对整个CNN模型运行推理,功耗为22.7mW。因此,测试芯片在2.56 mm2的硅区域内实现了375µJ/帧/眼的能量效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
All Rivers Flow to the Sea: A High Power Density Wireless Power Receiver with Split-Dual-Path Rectification and Hybrid-Quad-Path Step-Down Conversion A 400-to-12 V Fully Integrated Switched-Capacitor DC-DC Converter Achieving 119 mW/mm2 at 63.6 % Efficiency A 0.14nJ/b 200Mb/s Quasi-Balanced FSK Transceiver with Closed-Loop Modulation and Sideband Energy Detection A 2GHz voltage mode power scalable RF-Front-End with 2.5dB-NF and 0.5dBm-1dBCP High-Speed Digital-to-Analog Converter Design Towards High Dynamic Range
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1