首页 > 最新文献

2022 IEEE Custom Integrated Circuits Conference (CICC)最新文献

英文 中文
Recent Advances in High-Performance Frequency Synthesizer Design 高性能频率合成器设计的最新进展
Pub Date : 2022-04-01 DOI: 10.1109/CICC53496.2022.9772842
S. Levantino
Whether employed as local oscillators in wireless communications or radar systems, or as clock generators in data converters, high-performance frequency synthesizers are essential elements of any advanced electronic systems. In wireless applications, highly spectral-efficient modulations, such as quadrature amplitude modulation with large number of symbols (256 or above), enables higher bit rate at same bandwidth occupation, at the price of tighter constraints on the error-vector magnitude. This demands for an ultra-low jitter local oscillator (LO). For instance, the 5G new radio for cellular communications at frequencies around 28GHz calls for an integrated phase noise well below -36dBc, translating into an absolute rms jitter well below 90fs, over all the operating conditions. Similar performance is also required to the clock of high-speed analog-to-digital converters not to deteriorate their signal-to-noise ratio (SNR). An SNR of 62dB (i.e., 10 equivalent bits) at 1GHz bandwidth requires a clock jitter well below 130fs.
无论是用作无线通信或雷达系统中的本地振荡器,还是用作数据转换器中的时钟发生器,高性能频率合成器都是任何先进电子系统的基本要素。在无线应用中,高频谱效率的调制,如具有大量符号(256或以上)的正交调幅,可以在相同的带宽占用下实现更高的比特率,但代价是对误差矢量幅度的限制更严格。这需要一个超低抖动的本地振荡器(LO)。例如,用于28GHz左右频率的蜂窝通信的5G新无线电要求在所有工作条件下,集成相位噪声远低于-36dBc,转化为远低于90fs的绝对有效值抖动。高速模数转换器的时钟也需要类似的性能,以免降低其信噪比(SNR)。在1GHz带宽下实现62dB的信噪比(即10个等效比特)需要的时钟抖动远低于130fs。
{"title":"Recent Advances in High-Performance Frequency Synthesizer Design","authors":"S. Levantino","doi":"10.1109/CICC53496.2022.9772842","DOIUrl":"https://doi.org/10.1109/CICC53496.2022.9772842","url":null,"abstract":"Whether employed as local oscillators in wireless communications or radar systems, or as clock generators in data converters, high-performance frequency synthesizers are essential elements of any advanced electronic systems. In wireless applications, highly spectral-efficient modulations, such as quadrature amplitude modulation with large number of symbols (256 or above), enables higher bit rate at same bandwidth occupation, at the price of tighter constraints on the error-vector magnitude. This demands for an ultra-low jitter local oscillator (LO). For instance, the 5G new radio for cellular communications at frequencies around 28GHz calls for an integrated phase noise well below -36dBc, translating into an absolute rms jitter well below 90fs, over all the operating conditions. Similar performance is also required to the clock of high-speed analog-to-digital converters not to deteriorate their signal-to-noise ratio (SNR). An SNR of 62dB (i.e., 10 equivalent bits) at 1GHz bandwidth requires a clock jitter well below 130fs.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121279830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Review of Silicon Photonics LiDAR 硅光子激光雷达研究进展
Pub Date : 2022-04-01 DOI: 10.1109/CICC53496.2022.9772845
H. Hashemi
In the early 20th century, the reflection of electromagnetic waves emitted by ships from the metallic frame of other nearby ships was used as early collision warning system in poor visibility weather condition. In subsequent decades, the importance of detecting incoming bombers motivated the advancement of radio detection and ranging (radar) systems where the roundtrip time of a short pulse was used to find the target distance whereas the direction of the antenna was used to localize the target. Later advancements in phased array technology replaced the mechanically scanned antenna with electronically scanned versions enabling faster target localization, and ultimately, tracking multiple concurrent targets. Light detection and ranging (Iidar), initially referred to as laser radar, was developed shortly after the invention of lasers in the $1960mathrm{s}$. The primary early applications were in metrology, military, and scientific discoveries and research. There has been a recent commercial interest in lidar for its potential application in advanced driver-assistance systems (ADAS) and self-driving cars.
20世纪初,在能见度较差的天气条件下,船舶发出的电磁波被附近其他船舶的金属框架反射,作为早期碰撞预警系统。在随后的几十年里,探测来袭轰炸机的重要性推动了无线电探测和测距(雷达)系统的发展,其中使用短脉冲的往返时间来查找目标距离,而使用天线的方向来定位目标。后来相控阵技术的进步用电子扫描版本取代了机械扫描天线,从而实现更快的目标定位,并最终跟踪多个并发目标。光探测和测距(Iidar),最初称为激光雷达,是在20世纪60年代激光发明后不久发展起来的。最初的应用是在计量、军事和科学发现和研究中。最近,激光雷达在高级驾驶辅助系统(ADAS)和自动驾驶汽车中的潜在应用引起了人们的商业兴趣。
{"title":"A Review of Silicon Photonics LiDAR","authors":"H. Hashemi","doi":"10.1109/CICC53496.2022.9772845","DOIUrl":"https://doi.org/10.1109/CICC53496.2022.9772845","url":null,"abstract":"In the early 20th century, the reflection of electromagnetic waves emitted by ships from the metallic frame of other nearby ships was used as early collision warning system in poor visibility weather condition. In subsequent decades, the importance of detecting incoming bombers motivated the advancement of radio detection and ranging (radar) systems where the roundtrip time of a short pulse was used to find the target distance whereas the direction of the antenna was used to localize the target. Later advancements in phased array technology replaced the mechanically scanned antenna with electronically scanned versions enabling faster target localization, and ultimately, tracking multiple concurrent targets. Light detection and ranging (Iidar), initially referred to as laser radar, was developed shortly after the invention of lasers in the $1960mathrm{s}$. The primary early applications were in metrology, military, and scientific discoveries and research. There has been a recent commercial interest in lidar for its potential application in advanced driver-assistance systems (ADAS) and self-driving cars.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121441741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A 177 TOPS/W, Capacitor-based In-Memory Computing SRAM Macro with Stepwise-Charging/Discharging DACs and Sparsity-Optimized Bitcells for 4-Bit Deep Convolutional Neural Networks 一个177 TOPS/W,基于电容的内存计算SRAM宏,具有逐步充放电dac和稀疏优化的bitcell,用于4位深度卷积神经网络
Pub Date : 2022-04-01 DOI: 10.1109/CICC53496.2022.9772781
Bo Zhang, Jyotishman Saikia, Jian Meng, Dewei Wang, Soon-Chan Kwon, Sungmeen Myung, Hyunsoo Kim, Sang Joon Kim, Jae-sun Seo, Mingoo Seok
Capacitor-based in-memory computing (IMC) SRAM has recently gained significant attention as it achieves high energy-efficiency for deep convolutional neural networks (DCNN) and robustness against PVT variations [1], [3], [7], [8]. To further improve energy-efficiency and robustness, we identify two places of bottleneck in prior capacitive IMC works, namely (i) input drivers (or digital-to-analog converters, DACs) which charge and discharge various capacitors, and (ii) analog-to-digital converters (ADCs) which convert analog voltage/current signals into digital values.
基于电容的内存计算(IMC) SRAM最近受到了广泛关注,因为它实现了深度卷积神经网络(DCNN)的高能效和对PVT变化的鲁棒性[1],[3],[7],[8]。为了进一步提高能源效率和稳健性,我们确定了先前电容式IMC工作中的两个瓶颈,即(i)对各种电容器进行充电和放电的输入驱动器(或数模转换器,dac)和(ii)将模拟电压/电流信号转换为数字值的模数转换器(adc)。
{"title":"A 177 TOPS/W, Capacitor-based In-Memory Computing SRAM Macro with Stepwise-Charging/Discharging DACs and Sparsity-Optimized Bitcells for 4-Bit Deep Convolutional Neural Networks","authors":"Bo Zhang, Jyotishman Saikia, Jian Meng, Dewei Wang, Soon-Chan Kwon, Sungmeen Myung, Hyunsoo Kim, Sang Joon Kim, Jae-sun Seo, Mingoo Seok","doi":"10.1109/CICC53496.2022.9772781","DOIUrl":"https://doi.org/10.1109/CICC53496.2022.9772781","url":null,"abstract":"Capacitor-based in-memory computing (IMC) SRAM has recently gained significant attention as it achieves high energy-efficiency for deep convolutional neural networks (DCNN) and robustness against PVT variations [1], [3], [7], [8]. To further improve energy-efficiency and robustness, we identify two places of bottleneck in prior capacitive IMC works, namely (i) input drivers (or digital-to-analog converters, DACs) which charge and discharge various capacitors, and (ii) analog-to-digital converters (ADCs) which convert analog voltage/current signals into digital values.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122454858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
System-Level Design and Integration of a Prototype AR/VR Hardware Featuring a Custom Low-Power DNN Accelerator Chip in 7nm Technology for Codec Avatars 系统级设计和集成AR/VR原型硬件,该硬件采用7nm技术,用于编解码器虚拟化身
Pub Date : 2022-04-01 DOI: 10.1109/CICC53496.2022.9772810
H. Sumbul, Tony F. Wu, Yuecheng Li, Syed Shakib Sarwar, W. Koven, Eli Murphy-Trotzky, Xingxing Cai, E. Ansari, D. Morris, Huichu Liu, Doyun Kim, E. Beigné
Augmented Reality / Virtual Reality (AR/VR) devices aim to connect people in the Metaverse with photorealistic virtual avatars, referred to as “Codec Avatars”. Delivering a high visual performance for Codec Avatar workloads, however, is a challenging task for mobile SoCs as AR/VR devices have limited power and form factor constraints. On-device, local, near-sensor processing provides the best system-level energy-efficiency and enables strong security and privacy features in the long run. In this work, we present a custom-built, prototype small-scale mobile SoC that achieves energy-efficient performance for running eye gaze extraction of the Codec Avatar model. The test-chip, fabricated in 7nm technology node, features a Neural Network (NN) accelerator consisting of a 1024 Multiply-Accumulate (MAC) array, 2MB on-chip SRAM, and a 32bit RISC-V CPU. The featured test-chip is integrated on a prototype mobile VR headset to run the Codec Avatar application. This work aims to show the full stack design considerations of system-level integration, hardware-aware model customization, and circuit-level acceleration to meet the challenging mobile AR/VR SoC specifications for a Codec Avatar demonstration. By re-architecting the Convolutional NN (CNN) based eye gaze extraction model and tailoring it for the hardware, the entire model fits on the chip to mitigate system-level energy and latency cost of off-chip memory accesses. By efficiently accelerating the convolution operation at the circuit-level, the presented prototype SoC achieves 30 frames per second performance with low-power consumption at low form factors. With the full-stack design considerations presented in this work, the featured test-chip consumes 22.7mW power to run inference on the entire CNN model in 16.5ms from input to output for a single sensor image. As a result, the test-chip achieves 375 µJ/frame/eye energy-efficiency within a 2.56 mm2 silicon area.
增强现实/虚拟现实(AR/VR)设备旨在将虚拟世界中的人们与逼真的虚拟化身(称为“编解码器化身”)联系起来。然而,对于移动soc来说,为Codec Avatar工作负载提供高视觉性能是一项具有挑战性的任务,因为AR/VR设备具有有限的功率和外形因素限制。设备上、本地、近传感器处理提供了最佳的系统级能源效率,并在长期内实现了强大的安全和隐私功能。在这项工作中,我们提出了一个定制的原型小规模移动SoC,该SoC实现了运行Codec Avatar模型的眼睛凝视提取的节能性能。该测试芯片采用7nm工艺节点制造,具有由1024个MAC阵列、2MB片上SRAM和32位RISC-V CPU组成的神经网络(NN)加速器。该特色测试芯片集成在原型移动VR耳机上,以运行Codec Avatar应用程序。这项工作旨在展示系统级集成、硬件感知模型定制和电路级加速的全栈设计考虑,以满足编解码器Avatar演示中具有挑战性的移动AR/VR SoC规范。通过重新构建基于卷积神经网络(CNN)的眼睛注视提取模型并针对硬件进行裁剪,整个模型适合于芯片,以降低芯片外存储器访问的系统级能量和延迟成本。通过有效地加速电路级的卷积运算,所提出的原型SoC在低尺寸下实现了每秒30帧的低功耗性能。考虑到本工作中提出的全堆栈设计考虑,该特色测试芯片在16.5ms内从输入到输出对整个CNN模型运行推理,功耗为22.7mW。因此,测试芯片在2.56 mm2的硅区域内实现了375µJ/帧/眼的能量效率。
{"title":"System-Level Design and Integration of a Prototype AR/VR Hardware Featuring a Custom Low-Power DNN Accelerator Chip in 7nm Technology for Codec Avatars","authors":"H. Sumbul, Tony F. Wu, Yuecheng Li, Syed Shakib Sarwar, W. Koven, Eli Murphy-Trotzky, Xingxing Cai, E. Ansari, D. Morris, Huichu Liu, Doyun Kim, E. Beigné","doi":"10.1109/CICC53496.2022.9772810","DOIUrl":"https://doi.org/10.1109/CICC53496.2022.9772810","url":null,"abstract":"Augmented Reality / Virtual Reality (AR/VR) devices aim to connect people in the Metaverse with photorealistic virtual avatars, referred to as “Codec Avatars”. Delivering a high visual performance for Codec Avatar workloads, however, is a challenging task for mobile SoCs as AR/VR devices have limited power and form factor constraints. On-device, local, near-sensor processing provides the best system-level energy-efficiency and enables strong security and privacy features in the long run. In this work, we present a custom-built, prototype small-scale mobile SoC that achieves energy-efficient performance for running eye gaze extraction of the Codec Avatar model. The test-chip, fabricated in 7nm technology node, features a Neural Network (NN) accelerator consisting of a 1024 Multiply-Accumulate (MAC) array, 2MB on-chip SRAM, and a 32bit RISC-V CPU. The featured test-chip is integrated on a prototype mobile VR headset to run the Codec Avatar application. This work aims to show the full stack design considerations of system-level integration, hardware-aware model customization, and circuit-level acceleration to meet the challenging mobile AR/VR SoC specifications for a Codec Avatar demonstration. By re-architecting the Convolutional NN (CNN) based eye gaze extraction model and tailoring it for the hardware, the entire model fits on the chip to mitigate system-level energy and latency cost of off-chip memory accesses. By efficiently accelerating the convolution operation at the circuit-level, the presented prototype SoC achieves 30 frames per second performance with low-power consumption at low form factors. With the full-stack design considerations presented in this work, the featured test-chip consumes 22.7mW power to run inference on the entire CNN model in 16.5ms from input to output for a single sensor image. As a result, the test-chip achieves 375 µJ/frame/eye energy-efficiency within a 2.56 mm2 silicon area.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126114725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An 0.92 mJ/frame High-quality FHD Super-resolution Mobile Accelerator SoC with Hybrid-precision and Energy-efficient Cache 一种0.92 mJ/帧的高质量FHD超分辨率移动加速器SoC,具有混合精度和节能缓存
Pub Date : 2022-04-01 DOI: 10.1109/CICC53496.2022.9772778
Zhiyong Li, Sangjin Kim, Dongseok Im, Donghyeon Han, H. Yoo
With the rise of contactless communication and streaming services, Super-resolution (SR) in mobile devices has become one of the most important image processing technologies. Also, The popularity of high-end Application Processor (AP) and high resolution display in mobile drives the development of the lightweight mobile SR-CNNs [1], [2], which show the high reconstruction quality. However, the large size and wide dynamic range of both images and intermediate feature maps in CNN hidden layers pose challenges for mobile platforms. Constraints from the limited power $(< 100text{mW})$ and shared bandwidth $(< 2text{GB}/mathrm{s})$ on mobile platform, a low power and energy-efficient architecture is required.
随着非接触式通信和流媒体服务的兴起,移动设备中的超分辨率(SR)已成为最重要的图像处理技术之一。此外,高端应用处理器(AP)和高分辨率显示器在移动设备中的普及推动了轻量化移动sr - cnn的发展[1],[2],显示出较高的重构质量。然而,CNN隐藏层中图像和中间特征图的大尺寸和宽动态范围给移动平台带来了挑战。受限于移动平台上有限的功耗$(< 100text{mW})$和共享带宽$(< 2text{GB}/ maththrm {s})$的限制,需要一种低功耗、节能的架构。
{"title":"An 0.92 mJ/frame High-quality FHD Super-resolution Mobile Accelerator SoC with Hybrid-precision and Energy-efficient Cache","authors":"Zhiyong Li, Sangjin Kim, Dongseok Im, Donghyeon Han, H. Yoo","doi":"10.1109/CICC53496.2022.9772778","DOIUrl":"https://doi.org/10.1109/CICC53496.2022.9772778","url":null,"abstract":"With the rise of contactless communication and streaming services, Super-resolution (SR) in mobile devices has become one of the most important image processing technologies. Also, The popularity of high-end Application Processor (AP) and high resolution display in mobile drives the development of the lightweight mobile SR-CNNs [1], [2], which show the high reconstruction quality. However, the large size and wide dynamic range of both images and intermediate feature maps in CNN hidden layers pose challenges for mobile platforms. Constraints from the limited power $(< 100text{mW})$ and shared bandwidth $(< 2text{GB}/mathrm{s})$ on mobile platform, a low power and energy-efficient architecture is required.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121612385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Spiking Neural Network Integrated Circuits: A Review of Trends and Future Directions 脉冲神经网络集成电路:趋势和未来方向的回顾
Pub Date : 2022-03-14 DOI: 10.48550/arXiv.2203.07006
A. Basu, C. Frenkel, Lei Deng, Xueyong Zhang
The rapid growth of deep learning, spurred by its successes in various fields ranging from face recognition [1] to game playing [2], has also triggered a growing interest in the design of specialized hardware accelerators to support these algorithms. This specialized hardware targets one of two categories-either operating in datacenters or on mobile devices at the network edge. While energy efficiency is important in both cases, the need is extremely stringent in the latter class of applications due to limited battery life. Several techniques have been used in the past to improve the energy efficiency of these accelerators [3], including reducing off-chip DRAM access, managing data flow across processing elements as well as in-memory computing (IMC) by exploiting analog processing of data within digital memory arrays [4].
深度学习在从人脸识别[1]到游戏[2]等各个领域的成功推动了深度学习的快速发展,也引发了人们对设计专用硬件加速器来支持这些算法的兴趣日益浓厚。这种专用硬件的目标是两类设备中的一种——要么在数据中心中运行,要么在网络边缘的移动设备上运行。虽然能效在这两种情况下都很重要,但由于电池寿命有限,在后一类应用中对能效的要求非常严格。过去已经使用了几种技术来提高这些加速器[3]的能源效率,包括减少片外DRAM访问,管理处理元素之间的数据流,以及通过利用数字存储阵列[4]中的数据模拟处理来进行内存计算(IMC)。
{"title":"Spiking Neural Network Integrated Circuits: A Review of Trends and Future Directions","authors":"A. Basu, C. Frenkel, Lei Deng, Xueyong Zhang","doi":"10.48550/arXiv.2203.07006","DOIUrl":"https://doi.org/10.48550/arXiv.2203.07006","url":null,"abstract":"The rapid growth of deep learning, spurred by its successes in various fields ranging from face recognition [1] to game playing [2], has also triggered a growing interest in the design of specialized hardware accelerators to support these algorithms. This specialized hardware targets one of two categories-either operating in datacenters or on mobile devices at the network edge. While energy efficiency is important in both cases, the need is extremely stringent in the latter class of applications due to limited battery life. Several techniques have been used in the past to improve the energy efficiency of these accelerators [3], including reducing off-chip DRAM access, managing data flow across processing elements as well as in-memory computing (IMC) by exploiting analog processing of data within digital memory arrays [4].","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127183668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
A 915–1220 TOPS/W Hybrid In-Memory Computing based Image Restoration and Region Proposal Integrated Circuit for Neuromorphic Vision Sensors in 65nm CMOS 基于915-1220 TOPS/W混合内存计算的神经形态视觉传感器图像恢复与区域建议集成电路
Pub Date : 2022-02-24 DOI: 10.48550/arXiv.2203.01413
Xueyong Zhang, A. Basu
The bio-inspired asynchronous event-based neuromorphic vision sensors (NVS) are introducing a paradigm shift in visual information sensing and processing [1]. The feature of event-driven operation makes it ideal for low-power operation in the Internet-of-Things scenario such as traffic monitoring. However, the inherent noise in the sensor causes redundant wake-up operation and reduces tracking performance [2]. Energy efficient in-memory computing (IMC) based denoise operation allows blank-frame detection to gain 2X energy savings. Further energy savings can be obtained by exploiting spatial redundancy-objects usually occupy a small part ~5% of the frame in traffic monitoring [3]. Hence, region proposal (RP) is required to detect the region of interests (ROIs) in a valid frame along with their bounding box location coordinates, as shown in Fig. 1. For binary images, the conventional connected component labeling (CCL) algorithm [4] can propose ROIs by raster scanning the whole frame, but leads to longer search time and higher computing energy due to von Neumann operation. The promising IMC approach [3] has high energy efficiency, but has limited accuracy due to a simple algorithm constrained by in-memory operations as well as object fragmentation due to smooth surfaces (e.g. car windows) that do not generate events. In this work, we present a hybrid memory bit cell-collocated SRAM and DRAM (CRAM) consisting of 11 transistors for IMC-based image restoration (IR) and RP. The proposed CRAM supports image storage in SRAM and DRAM modes, denoise and region filling in diffusion mode and RP algorithm in projection mode.
生物启发的异步事件神经形态视觉传感器(NVS)正在引入视觉信息感知和处理的范式转变[1]。事件驱动的运行特性使其非常适合交通监控等物联网场景下的低功耗运行。然而,传感器固有的噪声会导致冗余唤醒操作,降低跟踪性能[2]。节能内存计算(IMC)为基础的降噪操作,使空白帧检测获得2倍的节能。利用空间冗余可以进一步节省能量——在交通监控中,目标通常只占帧的一小部分~5%[3]。因此,需要区域建议(RP)来检测有效帧中的兴趣区域(roi)及其边界框位置坐标,如图1所示。对于二值图像,传统的连通分量标记(CCL)算法[4]可以通过栅格扫描整个帧来提出roi,但由于采用von Neumann运算,搜索时间较长,计算能量较高。有前途的IMC方法[3]具有高能效,但由于简单的算法受到内存操作的限制以及由于光滑表面(例如车窗)不产生事件而导致的对象碎片,因此精度有限。在这项工作中,我们提出了一个由11个晶体管组成的混合存储位单元并置的SRAM和DRAM (CRAM),用于基于imc的图像恢复(IR)和RP。该算法支持SRAM和DRAM模式下的图像存储,扩散模式下的去噪和区域填充,投影模式下的RP算法。
{"title":"A 915–1220 TOPS/W Hybrid In-Memory Computing based Image Restoration and Region Proposal Integrated Circuit for Neuromorphic Vision Sensors in 65nm CMOS","authors":"Xueyong Zhang, A. Basu","doi":"10.48550/arXiv.2203.01413","DOIUrl":"https://doi.org/10.48550/arXiv.2203.01413","url":null,"abstract":"The bio-inspired asynchronous event-based neuromorphic vision sensors (NVS) are introducing a paradigm shift in visual information sensing and processing [1]. The feature of event-driven operation makes it ideal for low-power operation in the Internet-of-Things scenario such as traffic monitoring. However, the inherent noise in the sensor causes redundant wake-up operation and reduces tracking performance [2]. Energy efficient in-memory computing (IMC) based denoise operation allows blank-frame detection to gain 2X energy savings. Further energy savings can be obtained by exploiting spatial redundancy-objects usually occupy a small part ~5% of the frame in traffic monitoring [3]. Hence, region proposal (RP) is required to detect the region of interests (ROIs) in a valid frame along with their bounding box location coordinates, as shown in Fig. 1. For binary images, the conventional connected component labeling (CCL) algorithm [4] can propose ROIs by raster scanning the whole frame, but leads to longer search time and higher computing energy due to von Neumann operation. The promising IMC approach [3] has high energy efficiency, but has limited accuracy due to a simple algorithm constrained by in-memory operations as well as object fragmentation due to smooth surfaces (e.g. car windows) that do not generate events. In this work, we present a hybrid memory bit cell-collocated SRAM and DRAM (CRAM) consisting of 11 transistors for IMC-based image restoration (IR) and RP. The proposed CRAM supports image storage in SRAM and DRAM modes, denoise and region filling in diffusion mode and RP algorithm in projection mode.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116671932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Energy-Efficient and Runtime-Reconfigurable FPGA-Based Accelerator for Robotic Localization Systems 基于fpga的机器人定位系统节能可重构加速器
Pub Date : 2022-02-18 DOI: 10.1109/CICC53496.2022.9772870
Qiang Liu, Zishen Wan, Bo Yu, Weizhuang Liu, Shaoshan Liu, A. Raychowdhury
A robot usually localizes itself in an environment by estimating the collection of its position and rotation states, while constructing a map of unknown surroundings, giving rise to the notion of Simultaneous Localization and Mapping (SLAM). SLAM is a fundamental kernel in autonomous machines at all computing scales, from drones, AR, VR to self-driving cars. Principled mathematical solutions for SLAM involve filtering-based or non-linear optimization-based (Fig. 1a), where the latter recently shows higher robustness but with intensive computation. Prior ASICs [1], [2] and FPGAs [3], [4], [5] have accelerated SLAM on hardware, but they usually target one specific design. In this work, we present a runtime-reconfigurable FPGA accelerator for robotic localization tasks. We exploit SLAM-specific data locality, sparsity, reuse, and parallelism, and achieve >5x performance improvement over the state-of-the-art. Especially, our design is reconfigurable at runtime according to the environment and platform to save power while sustaining accuracy and performance.
机器人通常通过估计其位置和旋转状态的集合来定位自身在环境中的位置,同时构建未知环境的地图,这就产生了同时定位和映射(SLAM)的概念。SLAM是所有计算规模的自动机器的基本内核,从无人机、AR、VR到自动驾驶汽车。SLAM的原则数学解决方案包括基于滤波或基于非线性优化(图1a),后者最近显示出更高的鲁棒性,但需要密集的计算。先前的asic[1],[2]和fpga[3],[4],[5]已经加速了硬件上的SLAM,但它们通常针对一个特定的设计。在这项工作中,我们提出了一个运行时可重构的FPGA加速器,用于机器人定位任务。我们利用了slam特有的数据局部性、稀疏性、重用性和并行性,并实现了5倍以上的性能改进。特别是,我们的设计可以根据环境和平台在运行时重新配置,以节省功耗,同时保持精度和性能。
{"title":"An Energy-Efficient and Runtime-Reconfigurable FPGA-Based Accelerator for Robotic Localization Systems","authors":"Qiang Liu, Zishen Wan, Bo Yu, Weizhuang Liu, Shaoshan Liu, A. Raychowdhury","doi":"10.1109/CICC53496.2022.9772870","DOIUrl":"https://doi.org/10.1109/CICC53496.2022.9772870","url":null,"abstract":"A robot usually localizes itself in an environment by estimating the collection of its position and rotation states, while constructing a map of unknown surroundings, giving rise to the notion of Simultaneous Localization and Mapping (SLAM). SLAM is a fundamental kernel in autonomous machines at all computing scales, from drones, AR, VR to self-driving cars. Principled mathematical solutions for SLAM involve filtering-based or non-linear optimization-based (Fig. 1a), where the latter recently shows higher robustness but with intensive computation. Prior ASICs [1], [2] and FPGAs [3], [4], [5] have accelerated SLAM on hardware, but they usually target one specific design. In this work, we present a runtime-reconfigurable FPGA accelerator for robotic localization tasks. We exploit SLAM-specific data locality, sparsity, reuse, and parallelism, and achieve >5x performance improvement over the state-of-the-art. Especially, our design is reconfigurable at runtime according to the environment and platform to save power while sustaining accuracy and performance.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":" 47","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113951528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A 334uW 0.158mm2 Saber Learning with Rounding based Post-Quantum Crypto Accelerator 基于舍入的334uW 0.158mm2 Saber学习后量子加密加速器
Pub Date : 2022-01-19 DOI: 10.1109/CICC53496.2022.9772859
A. Ghosh, J. M. B. Mera, A. Karmakar, D. Das, Santosh K. Ghosh, I. Verbauwhede, Shreyas Sen
The arrival of large-scale quantum computers will break the security assurances of our current public-key cryptography. National Institute of Standard & Technology (NIST) is currently running a multi-year-long standardization procedure to select quantum-safe or postquantum cryptographic schemes to be used in the future. Energy efficiency is an important criterion in the selection process. This paper presents the first Silicon verified ASIC implementation for Saber (LWR algorithm as proposed in [1], [2]), a NIST PQC Round 3 finalist candidate in the key-encapsulation mechanism (KEM) category. Fig. 1 briefly describes the learning with rounding (LWR) problem, which is hard to solve even in the presence of large quantum computers due to the noise generated from rounding. IC features are tabulated in Fig. 1. which also shows a simplified version of the Saber KEM scheme to establish a secret key between two communicating parties Alice and Bob. Due to learning with rounding, secret $s$ is hard to guess based on publicly available data as shown in Fig. 1.
大规模量子计算机的到来将打破我们目前的公钥加密的安全保证。美国国家标准与技术研究所(NIST)目前正在进行一项长达数年的标准化程序,以选择未来使用的量子安全或后量子加密方案。能源效率是选择过程中的一个重要标准。本文介绍了Saber (LWR算法在[1],[2]中提出)的第一个经过硅验证的ASIC实现,Saber是NIST PQC第三轮入围密钥封装机制(KEM)类别的候选对象。图1简要描述了舍入学习(LWR)问题,由于舍入产生的噪声,即使在大型量子计算机存在的情况下也难以解决。IC特征列于图1中。它还展示了Saber KEM方案的简化版本,以在通信双方Alice和Bob之间建立密钥。由于采用四舍五入的学习方法,根据公开数据很难猜测secret $s$,如图1所示。
{"title":"A 334uW 0.158mm2 Saber Learning with Rounding based Post-Quantum Crypto Accelerator","authors":"A. Ghosh, J. M. B. Mera, A. Karmakar, D. Das, Santosh K. Ghosh, I. Verbauwhede, Shreyas Sen","doi":"10.1109/CICC53496.2022.9772859","DOIUrl":"https://doi.org/10.1109/CICC53496.2022.9772859","url":null,"abstract":"The arrival of large-scale quantum computers will break the security assurances of our current public-key cryptography. National Institute of Standard & Technology (NIST) is currently running a multi-year-long standardization procedure to select quantum-safe or postquantum cryptographic schemes to be used in the future. Energy efficiency is an important criterion in the selection process. This paper presents the first Silicon verified ASIC implementation for Saber (LWR algorithm as proposed in [1], [2]), a NIST PQC Round 3 finalist candidate in the key-encapsulation mechanism (KEM) category. Fig. 1 briefly describes the learning with rounding (LWR) problem, which is hard to solve even in the presence of large quantum computers due to the noise generated from rounding. IC features are tabulated in Fig. 1. which also shows a simplified version of the Saber KEM scheme to establish a secret key between two communicating parties Alice and Bob. Due to learning with rounding, secret $s$ is hard to guess based on publicly available data as shown in Fig. 1.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116295886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2022 IEEE Custom Integrated Circuits Conference (CICC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1