首页 > 最新文献

Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design最新文献

英文 中文
A Unified Forward Error Correction Accelerator for Multi-Mode Turbo, LDPC, and Polar Decoding 用于多模Turbo、LDPC和Polar解码的统一前向纠错加速器
Y. Yue, T. Ajayi, Xueyang Liu, Peiwen Xing, Zihan Wang, D. Blaauw, R. Dreslinski, Hun-Seok Kim
Forward error correction (FEC) is a critical component in communication systems as the errors induced by noisy channels can be corrected using the redundancy in the coded message. This paper introduces a novel multi-mode FEC decoder accelerator that can decode Turbo, LDPC, and Polar codes using a unified architecture. The proposed design explores the similarities in these codes to enable energy efficient decoding with minimal overhead in the total area of the unified architecture. Moreover, the proposed design is highly reconfigurable to support various existing and future FEC standards including 3GPP LTE/5G, and IEEE 802.11n WiFi. Implemented in GF 12nm FinFET technology, the design occupies 8.47mm2 of chip area attaining 25% logic and 49% memory area savings compared to a collection of single-mode designs. Running at 250MHz and 0.8V, the decoder achieves per-iteration throughput and energy efficiency of 690Mb/s and 44pJ/b for Turbo; 740Mb/s and 27.4pJ/b for LDPC; and 950Mb/s and 45.8pJ/b for Polar.
前向纠错(FEC)是通信系统中的一个重要组成部分,它可以利用编码信息中的冗余来纠正由噪声信道引起的错误。本文介绍了一种新的多模FEC译码加速器,该加速器采用统一的架构,可以译码Turbo码、LDPC码和Polar码。提出的设计探索了这些代码的相似之处,以便在统一架构的总面积中以最小的开销实现节能解码。此外,提出的设计具有高度可重构性,可支持各种现有和未来的FEC标准,包括3GPP LTE/5G和IEEE 802.11n WiFi。采用GF 12nm FinFET技术实现,与单模设计相比,该设计占地8.47mm2的芯片面积,实现了25%的逻辑和49%的存储面积节省。在250MHz和0.8V下,Turbo实现了690Mb/s和44pJ/b的单次迭代吞吐量和能量效率;LDPC为740Mb/s, 27.4pJ/b;Polar为950Mb/s和45.8pJ/b。
{"title":"A Unified Forward Error Correction Accelerator for Multi-Mode Turbo, LDPC, and Polar Decoding","authors":"Y. Yue, T. Ajayi, Xueyang Liu, Peiwen Xing, Zihan Wang, D. Blaauw, R. Dreslinski, Hun-Seok Kim","doi":"10.1145/3531437.3539726","DOIUrl":"https://doi.org/10.1145/3531437.3539726","url":null,"abstract":"Forward error correction (FEC) is a critical component in communication systems as the errors induced by noisy channels can be corrected using the redundancy in the coded message. This paper introduces a novel multi-mode FEC decoder accelerator that can decode Turbo, LDPC, and Polar codes using a unified architecture. The proposed design explores the similarities in these codes to enable energy efficient decoding with minimal overhead in the total area of the unified architecture. Moreover, the proposed design is highly reconfigurable to support various existing and future FEC standards including 3GPP LTE/5G, and IEEE 802.11n WiFi. Implemented in GF 12nm FinFET technology, the design occupies 8.47mm2 of chip area attaining 25% logic and 49% memory area savings compared to a collection of single-mode designs. Running at 250MHz and 0.8V, the decoder achieves per-iteration throughput and energy efficiency of 690Mb/s and 44pJ/b for Turbo; 740Mb/s and 27.4pJ/b for LDPC; and 950Mb/s and 45.8pJ/b for Polar.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125171407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC Operation for 4-bit Input Processing 具有低成本DAC/ADC操作的4位输入处理的电荷域P-8T SRAM内存计算
Joonhyung Kim, Kyeongho Lee, Jongsun Park
This paper presents a low cost PMOS-based 8T (P-8T) SRAM Compute-In-Memory (CIM) architecture that efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights. First, bit-line (BL) charge-sharing technique is employed to design the low-cost and reliable digital-to-analog conversion of 4-bit input activations in the pro-posed SRAM CIM, where the charge domain analog computing provides variation tolerant and linear MAC outputs. The 16 local arrays are also effectively exploited to implement the analog mul-tiplication unit (AMU) that simultaneously produces 16 multipli-cation results between 4-bit input activations and 1-bit weights. For the hardware cost reduction of analog-to-digital converter (ADC) without sacrificing DNN accuracy, hardware aware system simulations are performed to decide the ADC bit-resolutions and the number of activated rows in the proposed CIM macro. In addition, for the ADC operation, the AMU-based reference col-umns are utilized for generating ADC reference voltages, with which low-cost 4-bit coarse-fine flash ADC has been designed. The 256×80 P-8T SRAM CIM macro implementation using 28nm CMOS process shows that the proposed CIM shows the accuracies of 91.46% and 66.67% with CIFAR-10 and CIFAR-100 dataset, respectively, with the energy efficiency of 50.07-TOPS/W.
本文提出了一种低成本的基于pmos的8T (P-8T) SRAM内存计算(CIM)架构,该架构可以有效地执行4位输入激活和8位权重之间的乘法累加(MAC)操作。首先,在提出的SRAM CIM中,采用位线(BL)电荷共享技术设计低成本、可靠的4位输入激活数模转换,其中电荷域模拟计算提供变化容忍和线性MAC输出。16个本地阵列也被有效地利用来实现模拟乘法单元(AMU),同时在4位输入激活和1位权重之间产生16个乘法结果。为了在不牺牲DNN精度的情况下降低模数转换器(ADC)的硬件成本,进行了硬件感知系统仿真,以确定所建议的CIM宏中的ADC位分辨率和激活行数。此外,在ADC工作中,利用基于amu的参考列产生ADC参考电压,设计了低成本的4位粗精快闪ADC。利用28nm CMOS工艺实现256×80 P-8T SRAM CIM宏,结果表明,在CIFAR-10和CIFAR-100数据集上,所提出的CIM精度分别为91.46%和66.67%,能量效率为50.07 tops /W。
{"title":"A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC Operation for 4-bit Input Processing","authors":"Joonhyung Kim, Kyeongho Lee, Jongsun Park","doi":"10.1145/3531437.3539718","DOIUrl":"https://doi.org/10.1145/3531437.3539718","url":null,"abstract":"This paper presents a low cost PMOS-based 8T (P-8T) SRAM Compute-In-Memory (CIM) architecture that efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights. First, bit-line (BL) charge-sharing technique is employed to design the low-cost and reliable digital-to-analog conversion of 4-bit input activations in the pro-posed SRAM CIM, where the charge domain analog computing provides variation tolerant and linear MAC outputs. The 16 local arrays are also effectively exploited to implement the analog mul-tiplication unit (AMU) that simultaneously produces 16 multipli-cation results between 4-bit input activations and 1-bit weights. For the hardware cost reduction of analog-to-digital converter (ADC) without sacrificing DNN accuracy, hardware aware system simulations are performed to decide the ADC bit-resolutions and the number of activated rows in the proposed CIM macro. In addition, for the ADC operation, the AMU-based reference col-umns are utilized for generating ADC reference voltages, with which low-cost 4-bit coarse-fine flash ADC has been designed. The 256×80 P-8T SRAM CIM macro implementation using 28nm CMOS process shows that the proposed CIM shows the accuracies of 91.46% and 66.67% with CIFAR-10 and CIFAR-100 dataset, respectively, with the energy efficiency of 50.07-TOPS/W.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130654354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visible Light Synchronization for Time-Slotted Energy-Aware Transiently-Powered Communication 时隙能量感知瞬态供电通信的可见光同步
A. Torrisi, Maria Doglioni, K. Yıldırım, D. Brunelli
Energy-harvesting IoT devices that operate without batteries paved the way for sustainable sensing applications. These devices force applications to run intermittently since the ambient energy is sporadic, leading to frequent power failures. Unexpected power failures introduce several challenges to wireless communication since nodes are not synchronized and stop operating during data transmission. This paper presents a novel self-powered autonomous circuit design to remedy this problem. This circuit uses visible-light communication (VLC) to enable synchronization for time-slotted energy-aware transiently powered communication. Therefore, it aligns the activity phases of the batteryless sensors so that energy status communication occurs when these nodes are active simultaneously. Evaluations showed that our circuit has an ultra-low power consumption, can work with zero energy cost by relying only on the harvested energy, and supports efficient intermittent communication over intermittently powered nodes.
无电池运行的能量收集物联网设备为可持续传感应用铺平了道路。这些设备迫使应用程序间歇性地运行,因为周围的能量是零星的,导致频繁的电源故障。由于节点不同步并且在数据传输期间停止运行,因此意外的电源故障给无线通信带来了一些挑战。本文提出了一种新颖的自供电自主电路设计来解决这一问题。该电路使用可见光通信(VLC)来实现时隙能量感知瞬态供电通信的同步。因此,它对齐无电池传感器的活动阶段,以便在这些节点同时活动时发生能量状态通信。评估表明,我们的电路具有超低功耗,可以仅依靠收集的能量以零能量成本工作,并支持间歇性供电节点上有效的间歇性通信。
{"title":"Visible Light Synchronization for Time-Slotted Energy-Aware Transiently-Powered Communication","authors":"A. Torrisi, Maria Doglioni, K. Yıldırım, D. Brunelli","doi":"10.1145/3531437.3539722","DOIUrl":"https://doi.org/10.1145/3531437.3539722","url":null,"abstract":"Energy-harvesting IoT devices that operate without batteries paved the way for sustainable sensing applications. These devices force applications to run intermittently since the ambient energy is sporadic, leading to frequent power failures. Unexpected power failures introduce several challenges to wireless communication since nodes are not synchronized and stop operating during data transmission. This paper presents a novel self-powered autonomous circuit design to remedy this problem. This circuit uses visible-light communication (VLC) to enable synchronization for time-slotted energy-aware transiently powered communication. Therefore, it aligns the activity phases of the batteryless sensors so that energy status communication occurs when these nodes are active simultaneously. Evaluations showed that our circuit has an ultra-low power consumption, can work with zero energy cost by relying only on the harvested energy, and supports efficient intermittent communication over intermittently powered nodes.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124095235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RACE: RISC-V SoC for En/decryption Acceleration on the Edge for Homomorphic Computation 竞赛:用于同态计算边缘加密/解密加速的RISC-V SoC
Zahra Azad, Guowei Yang, R. Agrawal, Daniel Petrisko, Michael B. Taylor, A. Joshi
As more and more edge devices connect to the cloud to use its storage and compute capabilities, they bring in security and data privacy concerns. Homomorphic Encryption (HE) is a promising solution to maintain data privacy by enabling computations on the encrypted user data in the cloud. While there has been a lot of work on accelerating HE computation in the cloud, little attention has been paid to optimize the en/decryption on the edge. Therefore, in this paper, we present RACE, a custom-designed area- and energy-efficient SoC for en/decryption of data for HE. Owing to similar operations in en/decryption, RACE unifies the en/decryption datapath to save area. RACE efficiently exploits techniques like memory reuse and data reordering to utilize minimal amount of on-chip memory. We evaluate RACE using a complete RTL design containing a RISC-V processor and our unified accelerator. Our analysis shows that, for the end-to-end en/decryption, using RACE leads to, on average, 48 × to 39729 × (for a wide range of security parameters) more energy-efficient solution than purely using a processor.
随着越来越多的边缘设备连接到云来使用其存储和计算能力,它们带来了安全和数据隐私问题。同态加密(HE)是一种很有前途的解决方案,可以通过在云中对加密的用户数据进行计算来维护数据隐私。虽然在加速云中的HE计算方面已经做了很多工作,但很少有人关注如何优化边缘的加密/解密。因此,在本文中,我们提出了RACE,这是一种定制设计的区域和节能SoC,用于HE的数据解密。由于en/decryption的操作类似,RACE统一了en/decryption的数据路径来保存区域。RACE有效地利用内存重用和数据重排序等技术来利用最小的片上内存。我们使用包含RISC-V处理器和我们的统一加速器的完整RTL设计来评估RACE。我们的分析表明,对于端到端加密/解密,与纯粹使用处理器相比,使用RACE平均可以带来48到39729倍的节能解决方案(适用于广泛的安全参数)。
{"title":"RACE: RISC-V SoC for En/decryption Acceleration on the Edge for Homomorphic Computation","authors":"Zahra Azad, Guowei Yang, R. Agrawal, Daniel Petrisko, Michael B. Taylor, A. Joshi","doi":"10.1145/3531437.3539725","DOIUrl":"https://doi.org/10.1145/3531437.3539725","url":null,"abstract":"As more and more edge devices connect to the cloud to use its storage and compute capabilities, they bring in security and data privacy concerns. Homomorphic Encryption (HE) is a promising solution to maintain data privacy by enabling computations on the encrypted user data in the cloud. While there has been a lot of work on accelerating HE computation in the cloud, little attention has been paid to optimize the en/decryption on the edge. Therefore, in this paper, we present RACE, a custom-designed area- and energy-efficient SoC for en/decryption of data for HE. Owing to similar operations in en/decryption, RACE unifies the en/decryption datapath to save area. RACE efficiently exploits techniques like memory reuse and data reordering to utilize minimal amount of on-chip memory. We evaluate RACE using a complete RTL design containing a RISC-V processor and our unified accelerator. Our analysis shows that, for the end-to-end en/decryption, using RACE leads to, on average, 48 × to 39729 × (for a wide range of security parameters) more energy-efficient solution than purely using a processor.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126385709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FlexiDRAM: A Flexible in-DRAM Framework to Enable Parallel General-Purpose Computation FlexiDRAM:一种灵活的内置dram框架,可实现并行通用计算
Ranyang Zhou, A. Roohi, Durga Misra, Shaahin Angizi
In this paper, we propose a Flexible processing-in-DRAM framework named FlexiDRAM that supports the efficient implementation of complex bulk bitwise operations. This framework is developed on top of a new reconfigurable in-DRAM accelerator that leverages the analog operation of DRAM sub-arrays and elevates it to implement XOR2-MAJ3 operations between operands stored in the same bit-line. FlexiDRAM first generates an efficient XOR-MAJ representation of the desired logic and then appropriately allocates DRAM rows to the operands to execute any in-DRAM computation. We develop ISA and software support required to compute in-DRAM operation. FlexiDRAM transforms current memory architecture to a massively parallel computational unit and can be leveraged to significantly reduce the latency and energy consumption of complex workloads. Our extensive circuit-to-architecture simulation results show that averaged across two well-known deep learning workloads, FlexiDRAM achieves ∼ 15 × energy-saving and 13 × speedup over the GPU outperforming recent processing-in-DRAM platforms.
在本文中,我们提出了一个名为FlexiDRAM的灵活的dram处理框架,它支持复杂的批量位操作的有效实现。该框架是在新的可重构DRAM加速器之上开发的,该加速器利用DRAM子阵列的模拟操作,并将其提升到在存储在同一位线上的操作数之间实现XOR2-MAJ3操作。FlexiDRAM首先生成所需逻辑的有效XOR-MAJ表示,然后适当地将DRAM行分配给操作数以执行任何DRAM内的计算。我们开发计算dram内操作所需的ISA和软件支持。FlexiDRAM将当前的内存架构转换为大规模并行计算单元,可以用来显著降低复杂工作负载的延迟和能耗。我们广泛的电路到架构模拟结果表明,FlexiDRAM在两种众所周知的深度学习工作负载上平均实现了约15倍的节能和13倍的加速,优于最近的dram处理平台。
{"title":"FlexiDRAM: A Flexible in-DRAM Framework to Enable Parallel General-Purpose Computation","authors":"Ranyang Zhou, A. Roohi, Durga Misra, Shaahin Angizi","doi":"10.1145/3531437.3539721","DOIUrl":"https://doi.org/10.1145/3531437.3539721","url":null,"abstract":"In this paper, we propose a Flexible processing-in-DRAM framework named FlexiDRAM that supports the efficient implementation of complex bulk bitwise operations. This framework is developed on top of a new reconfigurable in-DRAM accelerator that leverages the analog operation of DRAM sub-arrays and elevates it to implement XOR2-MAJ3 operations between operands stored in the same bit-line. FlexiDRAM first generates an efficient XOR-MAJ representation of the desired logic and then appropriately allocates DRAM rows to the operands to execute any in-DRAM computation. We develop ISA and software support required to compute in-DRAM operation. FlexiDRAM transforms current memory architecture to a massively parallel computational unit and can be leveraged to significantly reduce the latency and energy consumption of complex workloads. Our extensive circuit-to-architecture simulation results show that averaged across two well-known deep learning workloads, FlexiDRAM achieves ∼ 15 × energy-saving and 13 × speedup over the GPU outperforming recent processing-in-DRAM platforms.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128844301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HOGEye: Neural Approximation of HOG Feature Extraction in RRAM-Based 3D-Stacked Image Sensors HOGEye:基于rram的3d堆叠图像传感器HOG特征提取的神经逼近
T. Ma, Weidong Cao, Fei Qiao, Ayan Chakrabarti, Xuan Zhang
Many computer vision tasks, ranging from recognition to multi-view registration, operate on feature representation of images rather than raw pixel intensities. However, conventional pipelines for obtaining these representations incur significant energy consumption due to pixel-wise analog-to-digital (A/D) conversions and costly storage and computations. In this paper, we propose HOGEye, an efficient near-pixel implementation for a widely-used feature extraction algorithm—Histograms of Oriented Gradients (HOG). HOGEye moves the key but computation-intensive derivative extraction (DE) and histogram generation (HG) steps into the analog domain by applying a novel neural approximation method in a resistive random-access memory (RRAM)-based 3D-stacked image sensor. The co-location of perception (sensor) and computation (DE and HG) and the alleviation of A/D conversions allow HOGEye design to achieve significant energy saving. With negligible detection rate degradation, the entire HOGEye sensor system consumes less than 48μW@30fps for an image resolution of 256 × 256 (equivalent to 24.3pJ/pixel) while the processing part only consumes 14.1pJ/pixel, achieving more than 2.5 × energy efficiency improvement than the state-of-the-art designs.
许多计算机视觉任务,从识别到多视图配准,都是基于图像的特征表示而不是原始像素强度。然而,由于像素级模拟到数字(A/D)转换和昂贵的存储和计算,用于获得这些表示的传统管道会产生显着的能量消耗。在本文中,我们提出了HOGEye,一种高效的近像素实现,用于广泛使用的特征提取算法-定向梯度直方图(HOG)。HOGEye通过在基于电阻式随机存取存储器(RRAM)的3d堆叠图像传感器中应用一种新的神经逼近方法,将关键但计算密集型的导数提取(DE)和直方图生成(HG)步骤移动到模拟域。感知(传感器)和计算(DE和HG)的共同定位以及A/D转换的缓解使HOGEye设计实现了显着的节能。在检测率可忽略不计的情况下,整个HOGEye传感器系统在256 × 256(相当于24.3pJ/像素)的图像分辨率下消耗小于48μW@30fps,而处理部分仅消耗14.1pJ/像素,比最先进的设计实现了超过2.5倍的能效提升。
{"title":"HOGEye: Neural Approximation of HOG Feature Extraction in RRAM-Based 3D-Stacked Image Sensors","authors":"T. Ma, Weidong Cao, Fei Qiao, Ayan Chakrabarti, Xuan Zhang","doi":"10.1145/3531437.3539706","DOIUrl":"https://doi.org/10.1145/3531437.3539706","url":null,"abstract":"Many computer vision tasks, ranging from recognition to multi-view registration, operate on feature representation of images rather than raw pixel intensities. However, conventional pipelines for obtaining these representations incur significant energy consumption due to pixel-wise analog-to-digital (A/D) conversions and costly storage and computations. In this paper, we propose HOGEye, an efficient near-pixel implementation for a widely-used feature extraction algorithm—Histograms of Oriented Gradients (HOG). HOGEye moves the key but computation-intensive derivative extraction (DE) and histogram generation (HG) steps into the analog domain by applying a novel neural approximation method in a resistive random-access memory (RRAM)-based 3D-stacked image sensor. The co-location of perception (sensor) and computation (DE and HG) and the alleviation of A/D conversions allow HOGEye design to achieve significant energy saving. With negligible detection rate degradation, the entire HOGEye sensor system consumes less than 48μW@30fps for an image resolution of 256 × 256 (equivalent to 24.3pJ/pixel) while the processing part only consumes 14.1pJ/pixel, achieving more than 2.5 × energy efficiency improvement than the state-of-the-art designs.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115197357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Study on Optimizing Pin Accessibility of Standard Cells in the Post-3 nm Node 标准细胞后3nm节点引脚可及性优化研究
J. Jeong, Jonghyun Ko, Taigon Song
Nanosheet FETs (NSFETs) are expected to be the post-FinFET device in the technology nodes of 5 nm and beyond. However, despite the high potential of NSFETs, few studies report the impact of NSFETs in the digital VLSI’s perspective. In this paper, we present a study of NSFETs for the optimal standard cell (SDC) library design and pin accessibility-aware layout for less routing congestion and low power consumption. For this objective, we present five novel methodologies to tackle the pin accessibility issues that rise in SDC designs in extremely-low routing resource environments (4 tracks) and emphasize the importance of local trench contact (LTC) in it. Using our methodology, we improve design metrics such as power consumption, total area, and wirelength by -11.0%, -13.2%, and 16.0%, respectively. By our study, we expect the routing congestion issues that additionally occur in advanced technology nodes to be handled and better full-chip designs to be done in 3 nm and beyond.
纳米片fet (nsfet)有望成为5nm及以上技术节点的后finfet器件。然而,尽管nsfet具有很高的潜力,但很少有研究报道nsfet在数字VLSI方面的影响。在本文中,我们研究了nsfet用于最佳标准单元(SDC)库设计和引脚可访问性感知布局,以减少路由拥塞和低功耗。为此,我们提出了五种新的方法来解决在极低路由资源环境(4个轨道)下SDC设计中出现的引脚可达性问题,并强调了其中局部沟槽接触(LTC)的重要性。使用我们的方法,我们将功耗、总面积和带宽等设计指标分别提高了-11.0%、-13.2%和16.0%。通过我们的研究,我们预计在先进技术节点中出现的路由拥塞问题将得到解决,更好的全芯片设计将在3nm及以后完成。
{"title":"A Study on Optimizing Pin Accessibility of Standard Cells in the Post-3 nm Node","authors":"J. Jeong, Jonghyun Ko, Taigon Song","doi":"10.1145/3531437.3539707","DOIUrl":"https://doi.org/10.1145/3531437.3539707","url":null,"abstract":"Nanosheet FETs (NSFETs) are expected to be the post-FinFET device in the technology nodes of 5 nm and beyond. However, despite the high potential of NSFETs, few studies report the impact of NSFETs in the digital VLSI’s perspective. In this paper, we present a study of NSFETs for the optimal standard cell (SDC) library design and pin accessibility-aware layout for less routing congestion and low power consumption. For this objective, we present five novel methodologies to tackle the pin accessibility issues that rise in SDC designs in extremely-low routing resource environments (4 tracks) and emphasize the importance of local trench contact (LTC) in it. Using our methodology, we improve design metrics such as power consumption, total area, and wirelength by -11.0%, -13.2%, and 16.0%, respectively. By our study, we expect the routing congestion issues that additionally occur in advanced technology nodes to be handled and better full-chip designs to be done in 3 nm and beyond.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129419178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Drift-tolerant Coding to Enhance the Energy Efficiency of Multi-Level-Cell Phase-Change Memory 容漂编码提高多电平单元相变存储器的能量效率
Yi-Shen Chen, Yuan-Hao Chang, Tei-Wei Kuo
Phase-Change Memory (PCM) has emerged as a promising memory and storage technology in recent years, and Multi-Level-Cell (MLC) PCM further reduces the per-bit cost to improve its competitiveness by storing multiple bits in each PCM cell. However, MLC PCM has high energy consumption issue in its write operations. In contrast to existing works that try to enhance the energy efficiency of the physical program&verify strategy for MLC PCM, this work proposes a drift-tolerant coding scheme to enable the fast write operation on MLC PCM without sacrificing any data accuracy. By exploiting the resistance drift and asymmetric write characteristic of PCM cells, the proposed scheme can reduce the write energy consumption of MLC PCM significantly. Meanwhile, a segmentation strategy is proposed to further improve the write performance with our coding scheme. A series of analyses and experiments was conducted to evaluate the capability of the proposed scheme. The results show that the proposed scheme can reduce 6.2–17.1% energy consumption and 3.2–11.3% write latency under six representative benchmarks, compared with the existing well-known schemes.
相变存储器(PCM)是近年来出现的一种很有前途的存储和存储技术,而多电平单元(MLC) PCM通过在每个PCM单元中存储多个比特,进一步降低了每比特的成本,从而提高了其竞争力。然而,MLC PCM在写操作中存在高能耗问题。与现有的试图提高MLC PCM物理程序和验证策略的能源效率的工作相比,本工作提出了一种容漂编码方案,在不牺牲任何数据精度的情况下,实现MLC PCM的快速写入操作。该方案利用PCM单元的电阻漂移和不对称写入特性,可显著降低MLC PCM的写入能耗。同时,提出了一种分段策略,以进一步提高编码方案的写入性能。通过一系列的分析和实验来评估该方案的性能。结果表明,在6个具有代表性的基准测试中,与现有的知名方案相比,该方案可以降低6.2-17.1%的能耗和3.2-11.3%的写时延。
{"title":"Drift-tolerant Coding to Enhance the Energy Efficiency of Multi-Level-Cell Phase-Change Memory","authors":"Yi-Shen Chen, Yuan-Hao Chang, Tei-Wei Kuo","doi":"10.1145/3531437.3539701","DOIUrl":"https://doi.org/10.1145/3531437.3539701","url":null,"abstract":"Phase-Change Memory (PCM) has emerged as a promising memory and storage technology in recent years, and Multi-Level-Cell (MLC) PCM further reduces the per-bit cost to improve its competitiveness by storing multiple bits in each PCM cell. However, MLC PCM has high energy consumption issue in its write operations. In contrast to existing works that try to enhance the energy efficiency of the physical program&verify strategy for MLC PCM, this work proposes a drift-tolerant coding scheme to enable the fast write operation on MLC PCM without sacrificing any data accuracy. By exploiting the resistance drift and asymmetric write characteristic of PCM cells, the proposed scheme can reduce the write energy consumption of MLC PCM significantly. Meanwhile, a segmentation strategy is proposed to further improve the write performance with our coding scheme. A series of analyses and experiments was conducted to evaluate the capability of the proposed scheme. The results show that the proposed scheme can reduce 6.2–17.1% energy consumption and 3.2–11.3% write latency under six representative benchmarks, compared with the existing well-known schemes.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127956273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Evolving Skyrmion Racetrack Memory as Energy-Efficient Last-Level Cache Devices 不断发展的Skyrmion赛道存储器作为节能的最后一级缓存设备
Ya-Hui Yang, Shuo-Han Chen, Yuan-Hao Chang
Skyrmion racetrack memory (SK-RM) has been regarded as a promising alternative to replace static random-access memory (SRAM) as a large-size on-chip cache device with high memory density. Different from other nonvolatile random-access memories (NVRAMs), data bits of SK-RM can only be altered or detected at access ports, and shift operations are required to move data bits across access ports along the racetrack. Owing to these special characteristics, word-based mapping and bit-interleaved mapping architectures have been proposed to facilitate reading and writing on SK-RM with different data layouts. Nevertheless, when SK-RM is used as an on-chip cache device, existing mapping architectures lead to the concerns of unpredictable access performance or excessive energy consumption during both data reads and writes. To resolve such concerns, this paper proposes extracting the merits of existing mapping architectures for allowing SK-RM to seamlessly switch its data update policy by considering the write latency requirement of cache accesses. Promising results have been demonstrated through a series of benchmark-driven experiments.
Skyrmion赛道存储器(SK-RM)被认为是一种具有高存储密度的大尺寸片上高速缓存器件,有望取代静态随机存取存储器(SRAM)。与其他非易失性随机存取存储器(nvram)不同,SK-RM的数据位只能在存取端口被改变或检测,并且需要移位操作来沿着赛道在存取端口之间移动数据位。由于这些特殊的特性,基于词的映射和位交错映射架构被提出,以方便不同数据布局的SK-RM上的读写。然而,当SK-RM作为片上缓存设备使用时,现有的映射架构会导致不可预测的访问性能或在读取和写入数据时消耗过多的能量。为了解决这些问题,本文提出通过考虑缓存访问的写延迟要求,提取现有映射架构的优点,以允许SK-RM无缝切换其数据更新策略。通过一系列基准驱动的实验已经证明了有希望的结果。
{"title":"Evolving Skyrmion Racetrack Memory as Energy-Efficient Last-Level Cache Devices","authors":"Ya-Hui Yang, Shuo-Han Chen, Yuan-Hao Chang","doi":"10.1145/3531437.3539709","DOIUrl":"https://doi.org/10.1145/3531437.3539709","url":null,"abstract":"Skyrmion racetrack memory (SK-RM) has been regarded as a promising alternative to replace static random-access memory (SRAM) as a large-size on-chip cache device with high memory density. Different from other nonvolatile random-access memories (NVRAMs), data bits of SK-RM can only be altered or detected at access ports, and shift operations are required to move data bits across access ports along the racetrack. Owing to these special characteristics, word-based mapping and bit-interleaved mapping architectures have been proposed to facilitate reading and writing on SK-RM with different data layouts. Nevertheless, when SK-RM is used as an on-chip cache device, existing mapping architectures lead to the concerns of unpredictable access performance or excessive energy consumption during both data reads and writes. To resolve such concerns, this paper proposes extracting the merits of existing mapping architectures for allowing SK-RM to seamlessly switch its data update policy by considering the write latency requirement of cache accesses. Promising results have been demonstrated through a series of benchmark-driven experiments.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126802678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Domain-Specific System-On-Chip Design for Energy Efficient Wearable Edge AI Applications 面向节能可穿戴边缘AI应用的特定领域片上系统设计
Yigit Tuncel, A. Krishnakumar, Aishwarya Lekshmi Chithra, Younghyun Kim, Ümit Y. Ogras
Artificial intelligence (AI) based wearable applications collect and process a significant amount of streaming sensor data. Transmitting the raw data to cloud processors wastes scarce energy and threatens user privacy. Wearable edge AI devices should ideally balance two competing requirements: (1) maximizing the energy efficiency using targeted hardware accelerators and (2) providing versatility using general-purpose cores to support arbitrary applications. To this end, we present an open-source domain-specific programmable system-on-chip (SoC) that combines a RISC-V core with a meticulously determined set of accelerators targeting wearable applications. We apply the proposed design method to design an FPGA prototype and six real-life use cases to demonstrate the efficacy of the proposed SoC. Thorough experimental evaluations show that the proposed SoC provides up to 9.1 × faster execution and up to 8.9 × higher energy efficiency than software implementations in FPGA while maintaining programmability.
基于人工智能(AI)的可穿戴应用程序收集和处理大量的流传感器数据。将原始数据传输到云处理器会浪费稀缺的能源,并威胁到用户隐私。可穿戴边缘人工智能设备应该理想地平衡两个相互竞争的要求:(1)使用目标硬件加速器最大化能源效率;(2)使用通用核心提供多功能性,以支持任意应用。为此,我们提出了一种开源领域特定的可编程片上系统(SoC),它将RISC-V核心与一套精心确定的针对可穿戴应用的加速器相结合。我们应用所提出的设计方法设计了一个FPGA原型和六个实际用例来证明所提出的SoC的有效性。全面的实验评估表明,所提出的SoC在保持可编程性的同时,比FPGA中的软件实现提供高达9.1倍的执行速度和高达8.9倍的能效。
{"title":"A Domain-Specific System-On-Chip Design for Energy Efficient Wearable Edge AI Applications","authors":"Yigit Tuncel, A. Krishnakumar, Aishwarya Lekshmi Chithra, Younghyun Kim, Ümit Y. Ogras","doi":"10.1145/3531437.3539711","DOIUrl":"https://doi.org/10.1145/3531437.3539711","url":null,"abstract":"Artificial intelligence (AI) based wearable applications collect and process a significant amount of streaming sensor data. Transmitting the raw data to cloud processors wastes scarce energy and threatens user privacy. Wearable edge AI devices should ideally balance two competing requirements: (1) maximizing the energy efficiency using targeted hardware accelerators and (2) providing versatility using general-purpose cores to support arbitrary applications. To this end, we present an open-source domain-specific programmable system-on-chip (SoC) that combines a RISC-V core with a meticulously determined set of accelerators targeting wearable applications. We apply the proposed design method to design an FPGA prototype and six real-life use cases to demonstrate the efficacy of the proposed SoC. Thorough experimental evaluations show that the proposed SoC provides up to 9.1 × faster execution and up to 8.9 × higher energy efficiency than software implementations in FPGA while maintaining programmability.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115006629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1