首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
A 28 nm 16-kb Sign-Extension-Less Digital-Compute-in-Memory Macro With Extension-Friendly Compute Units and Accuracy-Adjustable Adder-Tree 具有便于扩展的计算单元和精度可调加法器树的 28 纳米 16-kb 无符号扩展数字内存计算宏程序
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-03 DOI: 10.1109/TVLSI.2024.3418888
Xin Si;Fangyuan Dong;Shengnan He;Yuhui Shi;Anran Yin;Hui Gao;Xiang Li
Conventional digital-domain SRAM compute-in-memory (CIM) faces challenges in handling multiply-and-accumulate (MAC) operations with signed values, either in serial data feeding mode or extra sign-bit processing. The proposed CIM macro has the following features: 1) a sign-extension-less array multiplication circuit structure that eliminates the need for converting partial sums into 2’s complement, which removes the constraints related to handling specific symbol bits; 2) developing a circuit that avoids signed bit extension shift and accumulate, resulting in reduced area cost; and 3) integrating an adder structure that provides adjustable accuracy, thereby enhancing network adaptability as compared to traditional approximation techniques. A fabricated 28 nm 16-kb sign-extension-less DCIM was tested with the highest MAC speed with 5.6 ns (Signed 8 b IN&W 23 b Out) and achieved the best energy efficiency with 40.15 TOPS/W over a wide range of network adaptability.
传统的数字域 SRAM 内存计算 (CIM) 在处理带符号值的乘法累加 (MAC) 运算时面临挑战,无论是串行数据输入模式还是额外的符号位处理。拟议的 CIM 宏具有以下特点:1) 无符号扩展的阵列乘法电路结构,无需将部分和转换为 2 的补码,从而消除了与处理特定符号位相关的限制;2) 开发了一种避免符号位扩展移位和累加的电路,从而降低了面积成本;3) 集成了一种加法器结构,可提供可调精度,从而与传统近似技术相比增强了网络适应性。经测试,28 纳米制造的 16-kb 无符号扩展 DCIM 的 MAC 速度最高,为 5.6 ns(符号 8 b IN&W 23 b Out),并在广泛的网络适应性范围内实现了 40.15 TOPS/W 的最佳能效。
{"title":"A 28 nm 16-kb Sign-Extension-Less Digital-Compute-in-Memory Macro With Extension-Friendly Compute Units and Accuracy-Adjustable Adder-Tree","authors":"Xin Si;Fangyuan Dong;Shengnan He;Yuhui Shi;Anran Yin;Hui Gao;Xiang Li","doi":"10.1109/TVLSI.2024.3418888","DOIUrl":"10.1109/TVLSI.2024.3418888","url":null,"abstract":"Conventional digital-domain SRAM compute-in-memory (CIM) faces challenges in handling multiply-and-accumulate (MAC) operations with signed values, either in serial data feeding mode or extra sign-bit processing. The proposed CIM macro has the following features: 1) a sign-extension-less array multiplication circuit structure that eliminates the need for converting partial sums into 2’s complement, which removes the constraints related to handling specific symbol bits; 2) developing a circuit that avoids signed bit extension shift and accumulate, resulting in reduced area cost; and 3) integrating an adder structure that provides adjustable accuracy, thereby enhancing network adaptability as compared to traditional approximation techniques. A fabricated 28 nm 16-kb sign-extension-less DCIM was tested with the highest MAC speed with 5.6 ns (Signed 8 b IN&W 23 b Out) and achieved the best energy efficiency with 40.15 TOPS/W over a wide range of network adaptability.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2164-2168"},"PeriodicalIF":2.8,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A μ-GA Oriented ANN-Driven: Parameter Extraction of 5G CMOS Power Amplifier 面向ANN驱动的$mu$-GA:5G CMOS 功率放大器的参数提取
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-02 DOI: 10.1109/TVLSI.2024.3414584
Tahesin Samira Delwar;Abrar Siddique;Unal Aras;Yangwon Lee;Jee Youl Ryu
This article introduces a novel method for extracting crucial parameters from a fifth-generation (5G) CMOS power amplifier (PA) operating at 24 GHz. The proposed method, micro-genetic algorithm artificial neural network ( $mu $ -GAANN), presents an innovative synergy between $mu $ -GA and ANN, enabling the accurate determination of crucial PA (circuit components) parameters. The $mu $ -GAANN model has a fixed and robust stimulation function ( ${F} {_{text {SF}}}$ and ${R} {_{text {SF}}}$ ). ANNs are trained to approximate the parameter extraction process based on input-output data generated from the $mu $ -GA. The proposed $mu $ -GA incorporates the arithmetic crossover and nonuniform mutation; thus, several parameters of the ANN network are tweaked. Moreover, ANN parameters are enhanced by using $mu $ -GA to achieve an optimal PA design in a shorter period of time. To verify the proposed $mu $ -GAANN, we have also compared the training time with particle swarm optimization (PSO) employed in ANN, i.e., PSOANN. Besides, a derivative superposition (DS) linearization technique is used in the PA circuit, along with input load splits (I-LSs) to solve the low input impedance problem of conventional DS. To design a PA, the proposed $mu $ -GAANN outperforms the traditional feedforward artificial neural networks (TFFANN). Using $mu $ -GAANN, the PA’s simulated S21 is 25 dB, while the measured S21 is 21.2 dB. With traditional TFFANN, we observe a simulated gain of 24.1 dB for the PA. We achieved a simulated gain of 23.2 dB of the PA without using ANNs. The measured results of the $P {_{text {sat}}}$ and PAE of the PA with $mu $ -GAANN are 9.8 dBm and 32.1%, respectively. Also, a measured PA achieves a high third-order-input-intercept point (IIP3) of 14.1 dBm. The core chip area of the PA is 0.35 mm2.
本文介绍了一种从工作频率为 24 GHz 的第五代(5G)CMOS 功率放大器(PA)中提取关键参数的新方法。所提出的方法,即微遗传算法人工神经网络($mu $ -GAANN),在$mu $ -GA和ANN之间实现了创新性的协同作用,从而能够准确确定关键功率放大器(电路元件)参数。$mu $ -GAANN 模型有一个固定且稳健的刺激函数(${F} {_{text {SF}}$ 和 ${R} {_{text {SF}}$ )。根据 $mu $ -GA 生成的输入输出数据,训练 ANN 近似参数提取过程。所提出的 $mu $ -GA 包含算术交叉和非均匀突变;因此,ANN 网络的几个参数会被调整。此外,通过使用$mu $ -GA,增强了ANN参数,从而在更短的时间内实现了最佳功率放大器设计。为了验证所提出的 $mu $ -GAANN 方法,我们还将其训练时间与 ANN 中使用的粒子群优化(PSO)方法(即 PSOANN)进行了比较。此外,我们还在功率放大器电路中使用了导数叠加(DS)线性化技术和输入负载分割(I-LS),以解决传统 DS 的低输入阻抗问题。在设计功率放大器时,所提出的 $mu $ -GAANN 优于传统的前馈人工神经网络(TFFANN)。使用 $mu $ -GAANN,功率放大器的模拟 S21 为 25 dB,而测量 S21 为 21.2 dB。使用传统的 TFFANN,我们观察到功率放大器的模拟增益为 24.1 dB。在不使用 ANN 的情况下,我们实现了功率放大器 23.2 dB 的模拟增益。使用 $mu $ -GAANN 的功率放大器的 $P {_{text {sat}}$ 和 PAE 的测量结果分别为 9.8 dBm 和 32.1%。此外,测量的功率放大器还达到了 14.1 dBm 的高三阶输入截点 (IIP3)。功率放大器的核心芯片面积为 0.35 平方毫米。
{"title":"A μ-GA Oriented ANN-Driven: Parameter Extraction of 5G CMOS Power Amplifier","authors":"Tahesin Samira Delwar;Abrar Siddique;Unal Aras;Yangwon Lee;Jee Youl Ryu","doi":"10.1109/TVLSI.2024.3414584","DOIUrl":"10.1109/TVLSI.2024.3414584","url":null,"abstract":"This article introduces a novel method for extracting crucial parameters from a fifth-generation (5G) CMOS power amplifier (PA) operating at 24 GHz. The proposed method, micro-genetic algorithm artificial neural network (\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN), presents an innovative synergy between \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GA and ANN, enabling the accurate determination of crucial PA (circuit components) parameters. The \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN model has a fixed and robust stimulation function (\u0000<inline-formula> <tex-math>${F} {_{text {SF}}}$ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>${R} {_{text {SF}}}$ </tex-math></inline-formula>\u0000). ANNs are trained to approximate the parameter extraction process based on input-output data generated from the \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GA. The proposed \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GA incorporates the arithmetic crossover and nonuniform mutation; thus, several parameters of the ANN network are tweaked. Moreover, ANN parameters are enhanced by using \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GA to achieve an optimal PA design in a shorter period of time. To verify the proposed \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN, we have also compared the training time with particle swarm optimization (PSO) employed in ANN, i.e., PSOANN. Besides, a derivative superposition (DS) linearization technique is used in the PA circuit, along with input load splits (I-LSs) to solve the low input impedance problem of conventional DS. To design a PA, the proposed \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN outperforms the traditional feedforward artificial neural networks (TFFANN). Using \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN, the PA’s simulated S21 is 25 dB, while the measured S21 is 21.2 dB. With traditional TFFANN, we observe a simulated gain of 24.1 dB for the PA. We achieved a simulated gain of 23.2 dB of the PA without using ANNs. The measured results of the \u0000<inline-formula> <tex-math>$P {_{text {sat}}}$ </tex-math></inline-formula>\u0000 and PAE of the PA with \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000-GAANN are 9.8 dBm and 32.1%, respectively. Also, a measured PA achieves a high third-order-input-intercept point (IIP3) of 14.1 dBm. The core chip area of the PA is 0.35 mm2.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"1569-1577"},"PeriodicalIF":2.8,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Efficient Asynchronous Circuits Design Flow Using Backward Delay Propagation Constraint 利用后向延迟传播约束实现高效异步电路设计流程
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-01 DOI: 10.1109/TVLSI.2024.3418769
Lingfeng Zhou;Shanlin Xiao;Huiyao Wang;Jinghai Wang;Zeyang Xu;Bohan Wang;Zhiyi Yu
In recent years, asynchronous circuits have gained attention in neural network chips and Internet of Things (IoT) due to their potential advantages of low power and high performance. However, design efficiency of asynchronous circuits remains low and faces challenges in large-scale applications because of the lack of electronic design automation (EDA) support. This article presents a new bundled-data (BD) asynchronous circuits’ design flow using traditional EDA tools, including a new backward delay propagation constraint (BDPC) method. In this method, control paths and data paths are analyzed together in a tightly coupled approach to improve the accuracy of static timing analysis (STA). Compared with other design flows, the proposed design flow and constraint method show significant advantages in aspects of STA accuracy, design efficiency, and design applicability, and solving the congestion issues of field-programmable gate array (FPGA) in a previous work. An asynchronous RISC-V processor was implemented to verify the method, with selective handshake technology to further reduce power. Compared with the synchronous processor, the asynchronous processor achieves a 17.4% power optimization on the TSMC 65-nm process and a 48.3% dynamic power savings on the FPGA while maintaining the same frequency and resource utilization.
近年来,异步电路因其低功耗和高性能的潜在优势,在神经网络芯片和物联网(IoT)领域备受关注。然而,由于缺乏电子设计自动化(EDA)支持,异步电路的设计效率仍然较低,在大规模应用中面临挑战。本文介绍了一种使用传统 EDA 工具的新型捆绑数据(BD)异步电路设计流程,包括一种新的后向延迟传播约束(BDPC)方法。在这种方法中,控制路径和数据路径以紧密耦合的方式一起分析,以提高静态时序分析(STA)的准确性。与其他设计流程相比,所提出的设计流程和约束方法在静态时序分析的准确性、设计效率和设计适用性等方面都具有显著优势,并解决了之前工作中现场可编程门阵列(FPGA)的拥塞问题。为了验证该方法,我们使用选择性握手技术实现了异步 RISC-V 处理器,以进一步降低功耗。与同步处理器相比,异步处理器在台积电 65 纳米工艺上实现了 17.4% 的功耗优化,在 FPGA 上实现了 48.3% 的动态功耗节省,同时保持了相同的频率和资源利用率。
{"title":"Toward Efficient Asynchronous Circuits Design Flow Using Backward Delay Propagation Constraint","authors":"Lingfeng Zhou;Shanlin Xiao;Huiyao Wang;Jinghai Wang;Zeyang Xu;Bohan Wang;Zhiyi Yu","doi":"10.1109/TVLSI.2024.3418769","DOIUrl":"10.1109/TVLSI.2024.3418769","url":null,"abstract":"In recent years, asynchronous circuits have gained attention in neural network chips and Internet of Things (IoT) due to their potential advantages of low power and high performance. However, design efficiency of asynchronous circuits remains low and faces challenges in large-scale applications because of the lack of electronic design automation (EDA) support. This article presents a new bundled-data (BD) asynchronous circuits’ design flow using traditional EDA tools, including a new backward delay propagation constraint (BDPC) method. In this method, control paths and data paths are analyzed together in a tightly coupled approach to improve the accuracy of static timing analysis (STA). Compared with other design flows, the proposed design flow and constraint method show significant advantages in aspects of STA accuracy, design efficiency, and design applicability, and solving the congestion issues of field-programmable gate array (FPGA) in a previous work. An asynchronous RISC-V processor was implemented to verify the method, with selective handshake technology to further reduce power. Compared with the synchronous processor, the asynchronous processor achieves a 17.4% power optimization on the TSMC 65-nm process and a 48.3% dynamic power savings on the FPGA while maintaining the same frequency and resource utilization.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1852-1863"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Neural Fields Accelerator Design for a Millimeter-Scale Tracking System 毫米级跟踪系统的动态神经场加速器设计
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-01 DOI: 10.1109/TVLSI.2024.3416725
Yuyang Li;Vijay Shankaran Vivekanand;Rajkumar Kubendran;Inhee Lee
This brief introduces a compact-size hardware accelerator for dynamic neural fields (DNF) used in object tracking. To address the substantial computational workload and memory occupancy associated with conventional DNFs, three key approaches are implemented: kernel size reduction and abstraction, the replacement of sigmoidal functions with comparison operations, and the approximation of rectangular-shaped objects. The design is realized in a 28-nm CMOS process, resulting in a layout with an area of 0.53 mm2. Simulation results demonstrate that the accelerator processes $256 times 256$ dynamic vision sensor (DVS) frames at 211 frames per second (fps), with a power consumption of 1.68 mW under such conditions.
本简介介绍了用于物体跟踪的动态神经场(DNF)的紧凑型硬件加速器。为了解决与传统 DNF 相关的大量计算工作量和内存占用问题,我们采用了三种关键方法:减小内核大小并进行抽象、用比较运算取代正余弦函数以及近似矩形物体。设计采用 28 纳米 CMOS 工艺实现,布局面积为 0.53 平方毫米。仿真结果表明,该加速器能以每秒 211 帧的速度处理 256 次 256 美元的动态视觉传感器(DVS)帧,在这种条件下的功耗为 1.68 mW。
{"title":"Dynamic Neural Fields Accelerator Design for a Millimeter-Scale Tracking System","authors":"Yuyang Li;Vijay Shankaran Vivekanand;Rajkumar Kubendran;Inhee Lee","doi":"10.1109/TVLSI.2024.3416725","DOIUrl":"10.1109/TVLSI.2024.3416725","url":null,"abstract":"This brief introduces a compact-size hardware accelerator for dynamic neural fields (DNF) used in object tracking. To address the substantial computational workload and memory occupancy associated with conventional DNFs, three key approaches are implemented: kernel size reduction and abstraction, the replacement of sigmoidal functions with comparison operations, and the approximation of rectangular-shaped objects. The design is realized in a 28-nm CMOS process, resulting in a layout with an area of 0.53 mm2. Simulation results demonstrate that the accelerator processes \u0000<inline-formula> <tex-math>$256 times 256$ </tex-math></inline-formula>\u0000 dynamic vision sensor (DVS) frames at 211 frames per second (fps), with a power consumption of 1.68 mW under such conditions.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1940-1944"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Adaptive Zero-Current Detector for Single-Inductor Multiple-Output DC-DC Converter With Full-Wave Current Sensor 带全波电流传感器的单电感器多输出 DC-DC 转换器自适应零电流检测器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-01 DOI: 10.1109/TVLSI.2024.3415475
Yichen Zhang;Chaowei Tang;Yanqi Zheng;Xian Tang;Ka Nang Leung
This brief presents an adaptive zero-current detector (ZCD) for the single-inductor multiple-output (SIMO) DC-DC converter with a full-wave current sensor. The innovative adaptive ZCD, which can be applied to the order power distribution control (OPDC) SIMO DC-DC converter, is designed, and it can accurately turn off the low-side power switch when the SIMO DC-DC converter operates in the discontinuous conduction mode. Besides, a new full-wave current sensor which contains only one sensing transistor is presented, and it can precisely sense the inductor current with a small delay. The SIMO DC-DC converter is designed and fabricated in a standard 65 nm CMOS process with output power ranges from 3.7 to 925 mW. The measured reverse current is reduced by up to 78.2%, and the measured light-load power efficiency is improved by up to 10%.
本文介绍了一种自适应零电流检测器(ZCD),适用于带有全波电流传感器的单感应器多输出(SIMO)直流-直流转换器。该创新型自适应零电流检测器可应用于有序功率分配控制(OPDC)SIMO DC-DC 转换器,当 SIMO DC-DC 转换器工作在不连续导通模式时,它能准确地关闭低侧电源开关。此外,还提出了一种仅包含一个传感晶体管的新型全波电流传感器,它能以较小的延迟精确感测电感器电流。SIMO DC-DC 转换器采用标准 65 纳米 CMOS 工艺设计和制造,输出功率范围为 3.7 至 925 mW。测量的反向电流降低了 78.2%,测量的轻载功率效率提高了 10%。
{"title":"An Adaptive Zero-Current Detector for Single-Inductor Multiple-Output DC-DC Converter With Full-Wave Current Sensor","authors":"Yichen Zhang;Chaowei Tang;Yanqi Zheng;Xian Tang;Ka Nang Leung","doi":"10.1109/TVLSI.2024.3415475","DOIUrl":"10.1109/TVLSI.2024.3415475","url":null,"abstract":"This brief presents an adaptive zero-current detector (ZCD) for the single-inductor multiple-output (SIMO) DC-DC converter with a full-wave current sensor. The innovative adaptive ZCD, which can be applied to the order power distribution control (OPDC) SIMO DC-DC converter, is designed, and it can accurately turn off the low-side power switch when the SIMO DC-DC converter operates in the discontinuous conduction mode. Besides, a new full-wave current sensor which contains only one sensing transistor is presented, and it can precisely sense the inductor current with a small delay. The SIMO DC-DC converter is designed and fabricated in a standard 65 nm CMOS process with output power ranges from 3.7 to 925 mW. The measured reverse current is reduced by up to 78.2%, and the measured light-load power efficiency is improved by up to 10%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"1764-1768"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protecting Parallel Data Encryption in Multi-Tenant FPGAs by Exploring Simple but Effective Clocking Methodologies 通过探索简单而有效的时钟方法保护多租户 FPGA 中的并行数据加密
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-01 DOI: 10.1109/TVLSI.2024.3418961
Yankun Zhu;Pingqiang Zhou
Capitalizing on their versatility and high-performance attributes within heterogeneous designs, increasingly number of field-programmable gate arrays (FPGAs) are integrated into cloud data centers by cloud service providers (CSPs). While CSPs intend to reduce the cost by sharing one board among multiple users (called multi-tenant FPGA), hardware security problems such as side-channel attacks restrict it from spreading commercially. Existing research works have underscored the feasibility of remote side-channel attacks targeting a singular advanced encryption standard (AES) module on multi-tenant FPGAs, but they have not looked into the scenario of parallel data encryption on multiple AES modules for a single tenant, which is possible due to the small resource consumption of one AES module. In this work, we scrutinize correlation power analysis (CPA)-based side-channel attacks on parallel data encryption modules and develop two simple yet effective protective methods based on clocking methodologies—clocking phase shift and small frequency shift. The former technique adopts an identical clock frequency but with distinctive clocking phase to parallel encryption modules while the latter implements slightly different clock frequencies for parallel encryption modules. Experimental results show that both the methods can effectively increase the minimum required power traces for successful CPA, thus instituting a natural protective barrier for parallel data encryption.
云服务提供商(CSP)利用现场可编程门阵列(FPGA)在异构设计中的多功能性和高性能特性,将越来越多的FPGA集成到云数据中心中。虽然 CSP 希望通过多个用户共享一块板(称为多租户 FPGA)来降低成本,但侧信道攻击等硬件安全问题限制了它的商业推广。现有研究强调了在多租户 FPGA 上针对单个高级加密标准(AES)模块进行远程侧信道攻击的可行性,但没有研究单个租户在多个 AES 模块上进行并行数据加密的情况,而这种情况由于一个 AES 模块的资源消耗较小而成为可能。在这项工作中,我们仔细研究了对并行数据加密模块的基于相关功率分析(CPA)的侧信道攻击,并开发了两种基于时钟方法--时钟相移和小频率移动--的简单而有效的保护方法。前者对并行加密模块采用相同的时钟频率但不同的时钟相位,后者对并行加密模块采用略有不同的时钟频率。实验结果表明,这两种方法都能有效提高 CPA 成功所需的最小功率迹线,从而为并行数据加密建立了天然的保护屏障。
{"title":"Protecting Parallel Data Encryption in Multi-Tenant FPGAs by Exploring Simple but Effective Clocking Methodologies","authors":"Yankun Zhu;Pingqiang Zhou","doi":"10.1109/TVLSI.2024.3418961","DOIUrl":"10.1109/TVLSI.2024.3418961","url":null,"abstract":"Capitalizing on their versatility and high-performance attributes within heterogeneous designs, increasingly number of field-programmable gate arrays (FPGAs) are integrated into cloud data centers by cloud service providers (CSPs). While CSPs intend to reduce the cost by sharing one board among multiple users (called multi-tenant FPGA), hardware security problems such as side-channel attacks restrict it from spreading commercially. Existing research works have underscored the feasibility of remote side-channel attacks targeting a singular advanced encryption standard (AES) module on multi-tenant FPGAs, but they have not looked into the scenario of parallel data encryption on multiple AES modules for a single tenant, which is possible due to the small resource consumption of one AES module. In this work, we scrutinize correlation power analysis (CPA)-based side-channel attacks on parallel data encryption modules and develop two simple yet effective protective methods based on clocking methodologies—clocking phase shift and small frequency shift. The former technique adopts an identical clock frequency but with distinctive clocking phase to parallel encryption modules while the latter implements slightly different clock frequencies for parallel encryption modules. Experimental results show that both the methods can effectively increase the minimum required power traces for successful CPA, thus instituting a natural protective barrier for parallel data encryption.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1919-1929"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thermally Constrained Codesign of Heterogeneous 3-D Integration of Compute-in-Memory, Digital ML Accelerator, and RISC-V Cores for Mixed ML and Non-ML Workloads 针对混合 ML 和非 ML 工作负载的内存计算、数字 ML 加速器和 RISC-V 内核异构三维集成的热约束代码设计
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-28 DOI: 10.1109/TVLSI.2024.3415481
Yuan-Chun Luo;Anni Lu;Janak Sharda;Moritz Scherer;Jorge Tomas Gomez;Syed Shakib Sarwar;Ziyun Li;Reid Frederick Pinkham;Barbara De Salvo;Shimeng Yu
Heterogeneous 3-D (H3D) integration not only reduces the chip form factor and fabrication cost but also allows the merging of diverse compute paradigms that suit different applications. This is especially attractive when modern algorithms, such as the augmented reality/virtual reality (AR/VR) workloads, consist of mixed machine learning (ML) and non-ML workloads. To date, codesign that considers the thermal, latency, and power constraints of H3D hardware is largely unexplored. In this work, a thermally aware framework for H3D hardware design is developed to evaluate the thermal, latency, and power trade-offs for a heterogeneous system with compute-in-memory (CIM), digital ML cores, and RISC-V cores. The framework solves for runtime tunable operating points described as the optimal speedup factor, the number of activated RISC-V cores, the cooling coefficient, and the activity rate based on user-defined criteria, achieving up to 135 TOPS and 215 TOPS/W under $74~^{circ }$ C for the AR/VR workloads.
异构三维(H3D)集成不仅能降低芯片外形尺寸和制造成本,还能融合适合不同应用的各种计算模式。当现代算法(如增强现实/虚拟现实(AR/VR)工作负载)由混合机器学习(ML)和非 ML 工作负载组成时,这一点尤其具有吸引力。迄今为止,考虑到 H3D 硬件的热量、延迟和功耗限制的代码设计在很大程度上尚未得到探索。在这项工作中,为 H3D 硬件设计开发了一个热感知框架,用于评估具有内存计算(CIM)、数字 ML 内核和 RISC-V 内核的异构系统的热、延迟和功耗权衡。该框架根据用户定义的标准,解决了运行时可调整的操作点,这些操作点被描述为最佳加速因子、激活的 RISC-V 内核数量、冷却系数和活动率,在 AR/VR 工作负载的 $74~^{circ }$ C 条件下实现了高达 135 TOPS 和 215 TOPS/W。
{"title":"Thermally Constrained Codesign of Heterogeneous 3-D Integration of Compute-in-Memory, Digital ML Accelerator, and RISC-V Cores for Mixed ML and Non-ML Workloads","authors":"Yuan-Chun Luo;Anni Lu;Janak Sharda;Moritz Scherer;Jorge Tomas Gomez;Syed Shakib Sarwar;Ziyun Li;Reid Frederick Pinkham;Barbara De Salvo;Shimeng Yu","doi":"10.1109/TVLSI.2024.3415481","DOIUrl":"10.1109/TVLSI.2024.3415481","url":null,"abstract":"Heterogeneous 3-D (H3D) integration not only reduces the chip form factor and fabrication cost but also allows the merging of diverse compute paradigms that suit different applications. This is especially attractive when modern algorithms, such as the augmented reality/virtual reality (AR/VR) workloads, consist of mixed machine learning (ML) and non-ML workloads. To date, codesign that considers the thermal, latency, and power constraints of H3D hardware is largely unexplored. In this work, a thermally aware framework for H3D hardware design is developed to evaluate the thermal, latency, and power trade-offs for a heterogeneous system with compute-in-memory (CIM), digital ML cores, and RISC-V cores. The framework solves for runtime tunable operating points described as the optimal speedup factor, the number of activated RISC-V cores, the cooling coefficient, and the activity rate based on user-defined criteria, achieving up to 135 TOPS and 215 TOPS/W under \u0000<inline-formula> <tex-math>$74~^{circ }$ </tex-math></inline-formula>\u0000C for the AR/VR workloads.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"1718-1725"},"PeriodicalIF":2.8,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Area Efficient 0.009-mm2 28.1-ppm/°C 11.3-MHz ALL-MOS Relaxation Oscillator 面积效率 0.009-mm$^{2}$ 28.1-ppm/$^{circ}$C 11.3-MHz ALL-MOS 驰豫振荡器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-27 DOI: 10.1109/TVLSI.2024.3416992
Joshua Adiel Wijaya;Poki Chen;Lucky Kumar Pradhan;Ahmad Shahid Bhatti;Seiji Kajihara
This article presents an ultrasmall area on-chip relaxation oscillator with low-temperature sensitivity. In this design, a virtual resistor mainly composed of a complementary to absolute temperature (CTAT) voltage reference circuit is implemented to replace the real resistor for efficient temperature compensation, which counterbalances the inherent proportional to absolute temperature (PTAT) property of the original relaxation circuit of the oscillator. The conventional capacitor is also replaced with a MOS capacitor to complete the ALL-MOS oscillator circuit with two prime advantages, one of which is larger capacitance to area density, and the other is better matching with critical MOSFETs. Implemented in a 0.18- $mu $ m TSMC standard CMOS process, the proposed relaxation oscillator has achieved a temperature coefficient of 28.17 ppm/°C over the temperature range from $- 25~^{circ }$ C to $+ 125~^{circ }$ C at 11.39-MHz oscillation frequency. This circuit consumes $243.1~mu $ W under 1.3-V power supply. Along with the abovementioned excellent performance, the oscillator achieves an ultrasmall core chip area of 0.009 mm2, which is almost one order less than most of the prior arts’ in the same process.
本文介绍了一种具有低温灵敏度的超小面积片上弛豫振荡器。在这一设计中,实现了一个主要由绝对温度互补(CTAT)电压基准电路组成的虚拟电阻器,以取代实际电阻器,从而实现有效的温度补偿,抵消了振荡器原始弛豫电路固有的绝对温度成正比(PTAT)特性。此外,还用 MOS 电容器取代了传统的电容器,使全 MOS 振荡器电路具备了两个主要优势,其一是电容与面积密度更大,其二是与临界 MOSFET 的匹配性更好。在 0.18- $mu $ m TSMC 标准 CMOS 工艺中实现的弛豫振荡器,在 11.39-MHz 振荡频率下,温度系数在 $- 25~^{circ }$ C 到 $+ 125~^{circ }$ C 的温度范围内达到了 28.17 ppm/°C。该电路在 1.3 V 电源下的功耗为 243.1~mu $ W。除上述卓越性能外,该振荡器还实现了 0.009 mm2 的超小型核心芯片面积,比相同工艺下的大多数现有技术少了近一个数量级。
{"title":"Area Efficient 0.009-mm2 28.1-ppm/°C 11.3-MHz ALL-MOS Relaxation Oscillator","authors":"Joshua Adiel Wijaya;Poki Chen;Lucky Kumar Pradhan;Ahmad Shahid Bhatti;Seiji Kajihara","doi":"10.1109/TVLSI.2024.3416992","DOIUrl":"10.1109/TVLSI.2024.3416992","url":null,"abstract":"This article presents an ultrasmall area on-chip relaxation oscillator with low-temperature sensitivity. In this design, a virtual resistor mainly composed of a complementary to absolute temperature (CTAT) voltage reference circuit is implemented to replace the real resistor for efficient temperature compensation, which counterbalances the inherent proportional to absolute temperature (PTAT) property of the original relaxation circuit of the oscillator. The conventional capacitor is also replaced with a MOS capacitor to complete the ALL-MOS oscillator circuit with two prime advantages, one of which is larger capacitance to area density, and the other is better matching with critical MOSFETs. Implemented in a 0.18-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000m TSMC standard CMOS process, the proposed relaxation oscillator has achieved a temperature coefficient of 28.17 ppm/°C over the temperature range from \u0000<inline-formula> <tex-math>$- 25~^{circ }$ </tex-math></inline-formula>\u0000C to \u0000<inline-formula> <tex-math>$+ 125~^{circ }$ </tex-math></inline-formula>\u0000C at 11.39-MHz oscillation frequency. This circuit consumes \u0000<inline-formula> <tex-math>$243.1~mu $ </tex-math></inline-formula>\u0000W under 1.3-V power supply. Along with the abovementioned excellent performance, the oscillator achieves an ultrasmall core chip area of 0.009 mm2, which is almost one order less than most of the prior arts’ in the same process.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1900-1907"},"PeriodicalIF":2.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE 超大规模集成 (VLSI) 系统论文集 出版信息
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-27 DOI: 10.1109/TVLSI.2024.3410460
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2024.3410460","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3410460","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 7","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10576046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141474803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 电气和电子工程师学会超大规模集成 (VLSI) 系统学会论文集信息
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-27 DOI: 10.1109/TVLSI.2024.3410462
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2024.3410462","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3410462","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 7","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10576058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141474820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1