首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
A 578-TOPS/W RRAM-Based Binary Convolutional Neural Network Macro for Tiny AI Edge Devices 基于578 tops /W rram的微型AI边缘设备二进制卷积神经网络宏
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-08 DOI: 10.1109/TVLSI.2024.3469217
Lixun Wang;Yuejun Zhang;Pengjun Wang;Jianguo Yang;Huihong Zhang;Gang Li;Qikang Li
The novel nonvolatile computing-in-memory (nvCIM) technology enables data to be stored and processed in situ, providing a feasible solution for the widespread deployment of machine learning algorithms in edge AI devices. However, current nvCIM approaches based on weighted current summation face challenges such as device nonidealities and substantial time, storage, and energy overheads when handling high-precision analog signals. To address these issues, we propose a resistive random access memory (RRAM)-based binary convolution macro for constructing a complete binary convolutional neural network (BCNN) hardware circuit, accelerating edge AI applications with low-weight precision. This macro performs error compensation at the circuit level and provides stable rail-to-rail output, eliminating the need for any ADCs or processor to perform auxiliary computations. Experimental results demonstrate that the proposed BCNN full-hardware computing system achieves on-chip recognition accuracy of 90.7% (98.64%) on the CIFAR10 (MNIST) dataset, which represents a decrease of 0.98% (0.59%) compared to software recognition accuracy. In addition, this binary convolution macro achieves a maximum throughput of 320 GOPS and a peak energy efficiency of 578 TOPS/W at 136 MHz.
这种新型的非易失性内存计算(nvCIM)技术使数据能够就地存储和处理,为边缘人工智能设备中机器学习算法的广泛部署提供了可行的解决方案。然而,当前基于加权电流求和的nvCIM方法在处理高精度模拟信号时面临着设备非理想性和大量时间、存储和能量开销等挑战。为了解决这些问题,我们提出了一种基于电阻随机存取存储器(RRAM)的二进制卷积宏,用于构建完整的二进制卷积神经网络(BCNN)硬件电路,以低权重精度加速边缘人工智能应用。该宏在电路级执行误差补偿,并提供稳定的轨对轨输出,消除了任何adc或处理器执行辅助计算的需要。实验结果表明,本文提出的BCNN全硬件计算系统在CIFAR10 (MNIST)数据集上的片上识别准确率为90.7%(98.64%),与软件识别准确率相比下降0.98%(0.59%)。此外,该二进制卷积宏在136 MHz时实现了320 GOPS的最大吞吐量和578 TOPS/W的峰值能量效率。
{"title":"A 578-TOPS/W RRAM-Based Binary Convolutional Neural Network Macro for Tiny AI Edge Devices","authors":"Lixun Wang;Yuejun Zhang;Pengjun Wang;Jianguo Yang;Huihong Zhang;Gang Li;Qikang Li","doi":"10.1109/TVLSI.2024.3469217","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3469217","url":null,"abstract":"The novel nonvolatile computing-in-memory (nvCIM) technology enables data to be stored and processed in situ, providing a feasible solution for the widespread deployment of machine learning algorithms in edge AI devices. However, current nvCIM approaches based on weighted current summation face challenges such as device nonidealities and substantial time, storage, and energy overheads when handling high-precision analog signals. To address these issues, we propose a resistive random access memory (RRAM)-based binary convolution macro for constructing a complete binary convolutional neural network (BCNN) hardware circuit, accelerating edge AI applications with low-weight precision. This macro performs error compensation at the circuit level and provides stable rail-to-rail output, eliminating the need for any ADCs or processor to perform auxiliary computations. Experimental results demonstrate that the proposed BCNN full-hardware computing system achieves on-chip recognition accuracy of 90.7% (98.64%) on the CIFAR10 (MNIST) dataset, which represents a decrease of 0.98% (0.59%) compared to software recognition accuracy. In addition, this binary convolution macro achieves a maximum throughput of 320 GOPS and a peak energy efficiency of 578 TOPS/W at 136 MHz.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"371-383"},"PeriodicalIF":2.8,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware-Accelerator Design by Composition: Dataflow Component Interfaces With Tydi-Chisel 硬件加速器的组合设计:数据流组件接口与Tydi-Chisel
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-04 DOI: 10.1109/TVLSI.2024.3461330
Casper Cromjongh;Yongding Tian;H. Peter Hofstee;Zaid Al-Ars
As dedicated hardware is becoming more prevalent in accelerating complex applications, methods are needed to enable easy integration of multiple hardware components into a single accelerator system. However, this vision of composable hardware is hindered by the lack of standards for interfaces that allow such components to communicate. To address this challenge, the Tydi standard was proposed to facilitate the representation of streaming data in digital circuits, notably providing interface specifications of composite and variable-length data structures. At the same time, constructing hardware in a Scala embedded language (Chisel) provides a suitable environment for deploying Tydi-centric components due to its abstraction level and customizability. This article introduces Tydi-Chisel, a library that integrates the Tydi standard within Chisel, along with a toolchain and methodology for designing data-streaming accelerators. This toolchain reduces the effort needed to design streaming hardware accelerators by raising the abstraction level for streams and module interfaces, hereby avoiding writing boilerplate code, and allows for easy integration of accelerator components from different designers. This is demonstrated through an example project incorporating various scenarios where the interface-related declaration is reduced by 6–14 times. Tydi-Chisel project repository is available at https://github.com/abs-tudelft/Tydi-Chisel.
由于专用硬件在加速复杂应用程序方面变得越来越普遍,因此需要能够将多个硬件组件轻松集成到单个加速器系统中的方法。然而,由于缺乏允许这些组件进行通信的接口标准,这种可组合硬件的愿景受到了阻碍。为了应对这一挑战,Tydi标准被提出,以促进数字电路中流数据的表示,特别是提供复合和变长数据结构的接口规范。同时,使用Scala嵌入式语言(Chisel)构建硬件,由于其抽象级别和可定制性,为部署以tydi为中心的组件提供了合适的环境。本文介绍了Tydi-Chisel,一个将Tydi标准集成到Chisel中的库,以及用于设计数据流加速器的工具链和方法。该工具链通过提高流和模块接口的抽象级别,减少了设计流硬件加速器所需的工作量,从而避免了编写样板代码,并允许来自不同设计人员的加速器组件轻松集成。通过一个包含各种场景的示例项目来演示这一点,其中与接口相关的声明减少了6-14倍。Tydi-Chisel项目存储库可在https://github.com/abs-tudelft/Tydi-Chisel获得。
{"title":"Hardware-Accelerator Design by Composition: Dataflow Component Interfaces With Tydi-Chisel","authors":"Casper Cromjongh;Yongding Tian;H. Peter Hofstee;Zaid Al-Ars","doi":"10.1109/TVLSI.2024.3461330","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3461330","url":null,"abstract":"As dedicated hardware is becoming more prevalent in accelerating complex applications, methods are needed to enable easy integration of multiple hardware components into a single accelerator system. However, this vision of composable hardware is hindered by the lack of standards for interfaces that allow such components to communicate. To address this challenge, the Tydi standard was proposed to facilitate the representation of streaming data in digital circuits, notably providing interface specifications of composite and variable-length data structures. At the same time, constructing hardware in a Scala embedded language (Chisel) provides a suitable environment for deploying Tydi-centric components due to its abstraction level and customizability. This article introduces Tydi-Chisel, a library that integrates the Tydi standard within Chisel, along with a toolchain and methodology for designing data-streaming accelerators. This toolchain reduces the effort needed to design streaming hardware accelerators by raising the abstraction level for streams and module interfaces, hereby avoiding writing boilerplate code, and allows for easy integration of accelerator components from different designers. This is demonstrated through an example project incorporating various scenarios where the interface-related declaration is reduced by 6–14 times. Tydi-Chisel project repository is available at \u0000<uri>https://github.com/abs-tudelft/Tydi-Chisel</uri>\u0000.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2281-2292"},"PeriodicalIF":2.8,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multiprecision DNN Inference 速度:一个可扩展的RISC-V矢量处理器,实现高效的多精度DNN推理
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-04 DOI: 10.1109/TVLSI.2024.3466224
Chuanning Wang;Chao Fang;Xiao Wu;Zhongfeng Wang;Jun Lin
Deploying deep neural networks (DNNs) on those resource-constrained edge platforms is hindered by their substantial computation and storage demands. Quantized multiprecision DNNs (MP-DNNs), denoted as MP-DNNs, offer a promising solution for these limitations but pose challenges for the existing RISC-V processors due to complex instructions, suboptimal parallel processing, and inefficient dataflow mapping. To tackle the challenges mentioned above, SPEED, a scalable RISC-V vector (RVV) processor, is proposed to enable efficient MP-DNN inference, incorporating innovations in customized instructions, hardware architecture, and dataflow mapping. First, some dedicated customized RISC-V instructions are introduced based on RVV extensions to reduce the instruction complexity, allowing SPEED to support processing precision ranging from 4- to 16-bit with minimized hardware overhead. Second, a parameterized multiprecision tensor unit (MPTU) is developed and integrated within the scalable module to enhance parallel processing capability by providing reconfigurable parallelism that matches the computation patterns of diverse MP-DNNs. Finally, a flexible mixed dataflow method is adopted to improve computational and energy efficiency according to the computing patterns of different DNN operators. The synthesis of SPEED is conducted on TSMC 28-nm technology. Experimental results show that SPEED achieves a peak throughput of 737.9 GOPS and an energy efficiency of 1383.4 GOPS/W for 4-bit operators. Furthermore, SPEED exhibits superior area efficiency compared with prior RVV processors, with the enhancements of $5.9sim 26.9times $ and $8.2sim 18.5times $ for 8-bit operator and best integer performance, respectively, which highlights SPEED’s significant potential for efficient MP-DNN inference.
在这些资源受限的边缘平台上部署深度神经网络(dnn)受到其大量计算和存储需求的阻碍。量化多精度dnn (mp - dnn),表示为mp - dnn,为这些限制提供了一个有希望的解决方案,但由于复杂的指令,次优并行处理和低效的数据流映射,给现有的RISC-V处理器带来了挑战。为了应对上述挑战,提出了一种可扩展的RISC-V向量(RVV)处理器SPEED,以实现高效的MP-DNN推理,结合定制指令、硬件架构和数据流映射方面的创新。首先,基于RVV扩展引入了一些专用的定制RISC-V指令,以降低指令复杂性,使SPEED能够以最小的硬件开销支持4到16位的处理精度。其次,在可扩展模块中开发并集成了参数化多精度张量单元(MPTU),通过提供与不同mp - dnn的计算模式相匹配的可重构并行性来增强并行处理能力。最后,根据不同深度神经网络算子的计算模式,采用灵活的混合数据流方法提高计算效率和能量效率。采用台积电28纳米工艺合成了SPEED。实验结果表明,对于4位运营商,SPEED的峰值吞吐量为737.9 GOPS,能量效率为1383.4 GOPS/W。此外,与先前的RVV处理器相比,SPEED显示出卓越的区域效率,8位运算符和最佳整数性能分别提高了5.9sim 26.9times $和8.2sim 18.5times $,这凸显了SPEED在高效MP-DNN推理方面的巨大潜力。
{"title":"SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multiprecision DNN Inference","authors":"Chuanning Wang;Chao Fang;Xiao Wu;Zhongfeng Wang;Jun Lin","doi":"10.1109/TVLSI.2024.3466224","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3466224","url":null,"abstract":"Deploying deep neural networks (DNNs) on those resource-constrained edge platforms is hindered by their substantial computation and storage demands. Quantized multiprecision DNNs (MP-DNNs), denoted as MP-DNNs, offer a promising solution for these limitations but pose challenges for the existing RISC-V processors due to complex instructions, suboptimal parallel processing, and inefficient dataflow mapping. To tackle the challenges mentioned above, SPEED, a scalable RISC-V vector (RVV) processor, is proposed to enable efficient MP-DNN inference, incorporating innovations in customized instructions, hardware architecture, and dataflow mapping. First, some dedicated customized RISC-V instructions are introduced based on RVV extensions to reduce the instruction complexity, allowing SPEED to support processing precision ranging from 4- to 16-bit with minimized hardware overhead. Second, a parameterized multiprecision tensor unit (MPTU) is developed and integrated within the scalable module to enhance parallel processing capability by providing reconfigurable parallelism that matches the computation patterns of diverse MP-DNNs. Finally, a flexible mixed dataflow method is adopted to improve computational and energy efficiency according to the computing patterns of different DNN operators. The synthesis of SPEED is conducted on TSMC 28-nm technology. Experimental results show that SPEED achieves a peak throughput of 737.9 GOPS and an energy efficiency of 1383.4 GOPS/W for 4-bit operators. Furthermore, SPEED exhibits superior area efficiency compared with prior RVV processors, with the enhancements of \u0000<inline-formula> <tex-math>$5.9sim 26.9times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$8.2sim 18.5times $ </tex-math></inline-formula>\u0000 for 8-bit operator and best integer performance, respectively, which highlights SPEED’s significant potential for efficient MP-DNN inference.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"207-220"},"PeriodicalIF":2.8,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Power and High-Speed SRAM Cells With Double-Node Upset Self-Recovery for Reliable Applications 低功耗和高速SRAM单元与双节点破坏自恢复可靠的应用
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-03 DOI: 10.1109/TVLSI.2024.3466897
Shuo Cai;Xinjie Liang;Zhu Huang;Weizheng Wang;Fei Yu
Transistor sizing and spacing are constantly decreasing due to the continuous advancement of CMOS technology. The charge of the sensitive nodes in the static random access memory (SRAM) cell gradually decreases, making the SRAM cell more and more sensitive to soft errors, such as single-node upsets (SNUs) and double-node upsets (DNUs). Therefore, two types of radiation-hardened SRAM cells are proposed in this article. First, a low-power DNU self-recovery S6P8N cell is proposed. This cell can realize SNU self-recovery from all sensitive nodes as well as realize partial DNUs self-recovery and has low-power consumption overhead. Second, we propose a high-speed DNU self-recovery S8P6N cell, which has a soft-error tolerance level similar to the S6P8N. Furthermore, it reduces the read access time (RAT) and write access time (WAT). Simulation results show that the proposed cells are self-recovery for all SNUs and most of DNUs. Compared with RHD12, QCCM12T, QUCCE12T, RHMD10T, SEA14T, RHM-12T, S4P8N, S8P4N, RH-14T, HRLP16T, CC18T, and RHM, the average power consumption of S6P8N is reduced by 48.78%, and the average WAT is reduced by 6.62%. While the average power consumption of S8P6N is reduced by 23.64%, and the average WAT and RAT by 9.07% and 36.84%, respectively.
由于CMOS技术的不断进步,晶体管的尺寸和间距不断减小。静态随机存取存储器(SRAM)单元中敏感节点的电荷逐渐减少,使得SRAM单元对软错误越来越敏感,如单节点扰动(snu)和双节点扰动(dnu)。因此,本文提出了两种类型的抗辐射SRAM单元。首先,提出了一种低功耗DNU自恢复S6P8N电池。该单元可以实现所有敏感节点的SNU自恢复,也可以实现部分dnu自恢复,且功耗低。其次,我们提出了一种高速DNU自恢复S8P6N电池,它具有类似于S6P8N的软容错水平。此外,它还减少了读访问时间(RAT)和写访问时间(WAT)。仿真结果表明,所提出的细胞对所有snu和大多数dnu具有自恢复能力。与RHD12、QCCM12T、QUCCE12T、RHMD10T、SEA14T、RHM- 12t、S4P8N、S8P4N、RH-14T、HRLP16T、CC18T、RHM相比,S6P8N的平均功耗降低48.78%,平均WAT降低6.62%。而S8P6N的平均功耗降低23.64%,平均WAT和RAT分别降低9.07%和36.84%。
{"title":"Low-Power and High-Speed SRAM Cells With Double-Node Upset Self-Recovery for Reliable Applications","authors":"Shuo Cai;Xinjie Liang;Zhu Huang;Weizheng Wang;Fei Yu","doi":"10.1109/TVLSI.2024.3466897","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3466897","url":null,"abstract":"Transistor sizing and spacing are constantly decreasing due to the continuous advancement of CMOS technology. The charge of the sensitive nodes in the static random access memory (SRAM) cell gradually decreases, making the SRAM cell more and more sensitive to soft errors, such as single-node upsets (SNUs) and double-node upsets (DNUs). Therefore, two types of radiation-hardened SRAM cells are proposed in this article. First, a low-power DNU self-recovery S6P8N cell is proposed. This cell can realize SNU self-recovery from all sensitive nodes as well as realize partial DNUs self-recovery and has low-power consumption overhead. Second, we propose a high-speed DNU self-recovery S8P6N cell, which has a soft-error tolerance level similar to the S6P8N. Furthermore, it reduces the read access time (RAT) and write access time (WAT). Simulation results show that the proposed cells are self-recovery for all SNUs and most of DNUs. Compared with RHD12, QCCM12T, QUCCE12T, RHMD10T, SEA14T, RHM-12T, S4P8N, S8P4N, RH-14T, HRLP16T, CC18T, and RHM, the average power consumption of S6P8N is reduced by 48.78%, and the average WAT is reduced by 6.62%. While the average power consumption of S8P6N is reduced by 23.64%, and the average WAT and RAT by 9.07% and 36.84%, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"475-487"},"PeriodicalIF":2.8,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 9.6-nW Wake-Up Timer With RC-Referenced Subharmonic Locking Using Dual Leakage-Based Oscillators 基于双泄漏振荡器的带rc参考次谐波锁定的9.6 nw唤醒定时器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-03 DOI: 10.1109/TVLSI.2024.3466850
Jahyun Koo;Hyunwoo Son;Jae-Yoon Sim
This brief presents a nano-watt wake-up timer implemented mainly through digital synthesis. By performing successive subharmonic frequency locks between two leakage-based digitally controlled oscillators (DCOs) and repeatedly switching their roles, the period of the timer can be locked to a scaled RC time, enabling low-frequency generation without the need for substantial RC values. The proposed frequency-lock scheme is applied to design a 360 Hz timer. The implemented timer in a 0.18- $mu $ m CMOS process consumes 9.6 nW and shows a standard deviation of 1.36% without the need for extensive external trimming, mainly due to intra-wafer process variation. The measured supply and temperature sensitivities are 0.32%/V and 395 ppm/°C, respectively.
本文介绍了一种主要通过数字合成实现的纳瓦唤醒定时器。通过在两个基于泄漏的数字控制振荡器(dco)之间执行连续的次谐波频率锁定并反复切换它们的角色,计时器的周期可以锁定到一个比例RC时间,从而在不需要大量RC值的情况下产生低频。将所提出的锁频方案应用于一个360 Hz定时器的设计。在0.18- $mu $ m CMOS工艺中实现的定时器消耗9.6 nW,显示出1.36%的标准偏差,而不需要大量的外部修整,主要是由于晶圆内部工艺变化。测量的电源和温度灵敏度分别为0.32%/V和395 ppm/°C。
{"title":"A 9.6-nW Wake-Up Timer With RC-Referenced Subharmonic Locking Using Dual Leakage-Based Oscillators","authors":"Jahyun Koo;Hyunwoo Son;Jae-Yoon Sim","doi":"10.1109/TVLSI.2024.3466850","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3466850","url":null,"abstract":"This brief presents a nano-watt wake-up timer implemented mainly through digital synthesis. By performing successive subharmonic frequency locks between two leakage-based digitally controlled oscillators (DCOs) and repeatedly switching their roles, the period of the timer can be locked to a scaled RC time, enabling low-frequency generation without the need for substantial RC values. The proposed frequency-lock scheme is applied to design a 360 Hz timer. The implemented timer in a 0.18-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m CMOS process consumes 9.6 nW and shows a standard deviation of 1.36% without the need for extensive external trimming, mainly due to intra-wafer process variation. The measured supply and temperature sensitivities are 0.32%/V and 395 ppm/°C, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"598-602"},"PeriodicalIF":2.8,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PIPECIM: Energy-Efficient Pipelined Computing-in-Memory Computation Engine With Sparsity-Aware Technique PIPECIM:基于稀疏感知技术的高效内存管道计算引擎
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-01 DOI: 10.1109/TVLSI.2024.3462507
Yuanbo Wang;Liang Chang;Jingke Wang;Pan Zhao;Jiahao Zeng;Xin Zhao;Wuyang Hao;Liang Zhou;Haining Tan;Yinhe Han;Jun Zhou
Computing-in-memory (CIM) architecture has become a promising solution to improve the parallelism of the multiply-and-accumulation (MAC) operation for artificial intelligence (AI) processors. Recently, revived CIM engine partly relieves the memory wall issue by integrating computation in/with the memory. However, current CIM solutions still require large data movements with the increase of the practical neural network model and massive input data. Previous CIM works only considered computation without concern for the memory attribute, leading to a low memory computing ratio. This article presents a static-random access-memory (SRAM)-based digital CIM macro supporting pipeline mode and computation-memory-aware technique to improve the memory computing ratio. We develop a novel weight driver with fine-grained ping-pong operation, avoiding the computation stall caused by weight update. Based on our evaluation, the peak energy efficiency is 19.78 TOPS/W at the 22-nm technology node, 8-bit width, and 50% sparsity of the input feature map.
内存计算(CIM)架构已成为提高人工智能(AI)处理器乘法累加(MAC)运算并行性的一种有前途的解决方案。最近,复兴的CIM引擎通过将计算集成到内存中,在一定程度上缓解了内存墙问题。然而,随着实用神经网络模型的增加和海量输入数据的增加,目前的CIM解决方案仍然需要大量的数据移动。以前的CIM只考虑计算,不考虑内存属性,导致内存计算率较低。本文提出了一种基于静态随机存取存储器(SRAM)的数字CIM宏,支持流水线模式和计算内存感知技术,以提高内存计算比。我们开发了一种具有细粒度乒乓操作的权重驱动,避免了权重更新带来的计算失速。根据我们的评估,在22nm技术节点,8位宽度,输入特征映射的50%稀疏度下,峰值能量效率为19.78 TOPS/W。
{"title":"PIPECIM: Energy-Efficient Pipelined Computing-in-Memory Computation Engine With Sparsity-Aware Technique","authors":"Yuanbo Wang;Liang Chang;Jingke Wang;Pan Zhao;Jiahao Zeng;Xin Zhao;Wuyang Hao;Liang Zhou;Haining Tan;Yinhe Han;Jun Zhou","doi":"10.1109/TVLSI.2024.3462507","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3462507","url":null,"abstract":"Computing-in-memory (CIM) architecture has become a promising solution to improve the parallelism of the multiply-and-accumulation (MAC) operation for artificial intelligence (AI) processors. Recently, revived CIM engine partly relieves the memory wall issue by integrating computation in/with the memory. However, current CIM solutions still require large data movements with the increase of the practical neural network model and massive input data. Previous CIM works only considered computation without concern for the memory attribute, leading to a low memory computing ratio. This article presents a static-random access-memory (SRAM)-based digital CIM macro supporting pipeline mode and computation-memory-aware technique to improve the memory computing ratio. We develop a novel weight driver with fine-grained ping-pong operation, avoiding the computation stall caused by weight update. Based on our evaluation, the peak energy efficiency is 19.78 TOPS/W at the 22-nm technology node, 8-bit width, and 50% sparsity of the input feature map.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"525-536"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Highly Defect Detectable and SEU-Resilient Robust Scan-Test-Aware Latch Design 高缺陷可检测和seu弹性鲁棒扫描测试感知锁存器设计
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-01 DOI: 10.1109/TVLSI.2024.3467089
Ruijun Ma;Stefan Holst;Hui Xu;Xiaoqing Wen;Senling Wang;Jiuqi Li;Aibin Yan
Soft errors have been a severe threat to the reliability of modern integrated circuits (ICs), making hardened latch designs indispensable for masking soft errors with redundancy. However, the added redundancy also masks production defects as soft errors; this makes it hard to detect defects in hardened latches, thus significantly reducing their reliability. Our previous work proposed the scan-test-aware hardened latch (STAHL) design, the first for addressing the issue of low defect detectability of hardened latch designs. However, STAHL still suffers from two problems: 1) it is not self-resilient to soft errors and 2) a STAHL-based scan design requires one additional control signal. This article proposes a high defect detectable and single-event-upset (SEU)-resilient robust (HIDER) latch to address the issues of the low defect detectability of existing hardened latches and the STAHLs lack of SEU-resilient capability. Two scan designs [HIDER-based scan-cell-S (HIDER-SC-S) and HIDER-based scan-cell-F (HIDER-SC-F)], as well as two corresponding test procedures, are proposed to fully test HIDER latch with only one control signal. Simulation results show that the HIDER latch achieves the highest defect coverage (DC) in both single latch cell detection and scan tests among all existing hardened latch designs. In addition, the HIDER latch has much lower power and a smaller delay than STAHL.
软错误已经严重威胁到现代集成电路(ic)的可靠性,因此硬化锁存器设计对于用冗余掩盖软错误是必不可少的。然而,增加的冗余也将生产缺陷掩盖为软错误;这使得很难检测到硬化锁存器的缺陷,从而大大降低了它们的可靠性。我们之前的工作提出了扫描测试感知硬化锁存器(STAHL)设计,这是第一个解决硬化锁存器设计的低缺陷可检测性问题的设计。然而,STAHL仍然存在两个问题:1)它对软错误没有自弹性;2)基于STAHL的扫描设计需要一个额外的控制信号。本文提出了一种高缺陷可检测性和单事件破坏(SEU)弹性鲁棒(HIDER)锁存器,以解决现有硬化锁存器缺陷可检测性低和stahl缺乏SEU弹性能力的问题。提出了两种扫描设计[基于hder的扫描细胞- s (hder - sc - s)和基于hder的扫描细胞- f (hder - sc - f)]以及两种相应的测试程序,仅用一个控制信号就可以充分测试hder锁存器。仿真结果表明,在现有的所有硬化锁存器设计中,HIDER锁存器在单锁存单元检测和扫描测试中都具有最高的缺陷覆盖率(DC)。此外,HIDER锁存器具有比STAHL低得多的功率和更小的延迟。
{"title":"Highly Defect Detectable and SEU-Resilient Robust Scan-Test-Aware Latch Design","authors":"Ruijun Ma;Stefan Holst;Hui Xu;Xiaoqing Wen;Senling Wang;Jiuqi Li;Aibin Yan","doi":"10.1109/TVLSI.2024.3467089","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3467089","url":null,"abstract":"Soft errors have been a severe threat to the reliability of modern integrated circuits (ICs), making hardened latch designs indispensable for masking soft errors with redundancy. However, the added redundancy also masks production defects as soft errors; this makes it hard to detect defects in hardened latches, thus significantly reducing their reliability. Our previous work proposed the scan-test-aware hardened latch (STAHL) design, the first for addressing the issue of low defect detectability of hardened latch designs. However, STAHL still suffers from two problems: 1) it is not self-resilient to soft errors and 2) a STAHL-based scan design requires one additional control signal. This article proposes a high defect detectable and single-event-upset (SEU)-resilient robust (HIDER) latch to address the issues of the low defect detectability of existing hardened latches and the STAHLs lack of SEU-resilient capability. Two scan designs [HIDER-based scan-cell-S (HIDER-SC-S) and HIDER-based scan-cell-F (HIDER-SC-F)], as well as two corresponding test procedures, are proposed to fully test HIDER latch with only one control signal. Simulation results show that the HIDER latch achieves the highest defect coverage (DC) in both single latch cell detection and scan tests among all existing hardened latch designs. In addition, the HIDER latch has much lower power and a smaller delay than STAHL.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"449-461"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An End-to-End Bundled-Data Asynchronous Circuits Design Flow: From RTL to GDS 端到端捆绑数据异步电路设计流程:从RTL到GDS
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-01 DOI: 10.1109/TVLSI.2024.3464870
Jinghai Wang;Shanlin Xiao;Jilong Luo;Bo Li;Lingfeng Zhou;Zhiyi Yu
Asynchronous circuits with low power and robustness are revived in emerging applications such as the Internet of Things (IoT) and neuromorphic chips, thanks to clock-less and event-driven mechanisms. However, the lack of mature computer-aided design (CAD) tools for designing large-scale asynchronous circuits results in low design efficiency and high cost. This article proposes an end-to-end bundled-data (BD) asynchronous circuit design flow, which can facilitate building asynchronous circuits, even if the designer has little or no asynchronous circuit foundation. Three features that enable this are: 1) a lightweight circuit converter developed in Python can convert circuits from synchronous descriptions to corresponding asynchronous ones at register transfer level (RTL). Desynchronization flow helps designers maintain a “synchronization mentality” to construct asynchronous circuits; 2) a synchronization-like verification method is proposed for asynchronous circuits so that it can be functionally verified before synthesis. Avoids the risk of rework after logic defects are discovered during the synthesis and implementation, as asynchronous circuits often cannot be simulated until gate-level (GL) netlist generation; and 3) the whole implementation flow from RTL to graphic data system (GDS) is based on commercial electronic design automation (EDA) tools. Similar to the design flow of synchronous circuits, it helps designers implement asynchronous circuits with “synchronization habits.” Furthermore, to validate this methodology, two asynchronous processors were, respectively, implemented and evaluated in the TSMC 28-nm CMOS process. Compared to their synchronous counterparts, the general-purpose asynchronous RISC-V processor achieves 20.5% power savings. And the domain-specific asynchronous spiking neural network (SNN) accelerator achieves 58.46% power savings and $2.41times $ energy efficiency improvement at 70% input spike sparsity.
由于采用了无时钟和事件驱动机制,具有低功耗和鲁棒性的异步电路在物联网(IoT)和神经形态芯片等新兴应用中焕发出新的活力。然而,由于缺乏成熟的计算机辅助设计(CAD)工具来设计大规模异步电路,导致设计效率低、成本高。本文提出了一种端到端捆绑数据(BD)异步电路设计流程,即使设计者几乎没有异步电路基础,也能轻松构建异步电路。实现这一点的三个特点是1) 在 Python 中开发的轻量级电路转换器可在寄存器传输层(RTL)将电路从同步描述转换为相应的异步描述。去同步化流程可帮助设计人员保持 "同步心态 "来构建异步电路;2)针对异步电路提出了一种类似同步的验证方法,以便在综合之前对其进行功能验证。避免了在综合和实现过程中发现逻辑缺陷后返工的风险,因为异步电路通常要到门级(GL)网表生成后才能进行仿真;以及 3) 从 RTL 到图形数据系统(GDS)的整个实现流程都基于商用电子设计自动化(EDA)工具。与同步电路的设计流程类似,它可以帮助设计人员实现具有 "同步习惯 "的异步电路。此外,为了验证这种方法,分别在台积电 28 纳米 CMOS 工艺中实现并评估了两个异步处理器。与同步处理器相比,通用异步 RISC-V 处理器的功耗降低了 20.5%。而特定领域的异步尖峰神经网络(SNN)加速器在 70% 输入尖峰稀疏度的条件下实现了 58.46% 的功耗节省和 2.41 美元/次的能效提升。
{"title":"An End-to-End Bundled-Data Asynchronous Circuits Design Flow: From RTL to GDS","authors":"Jinghai Wang;Shanlin Xiao;Jilong Luo;Bo Li;Lingfeng Zhou;Zhiyi Yu","doi":"10.1109/TVLSI.2024.3464870","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3464870","url":null,"abstract":"Asynchronous circuits with low power and robustness are revived in emerging applications such as the Internet of Things (IoT) and neuromorphic chips, thanks to clock-less and event-driven mechanisms. However, the lack of mature computer-aided design (CAD) tools for designing large-scale asynchronous circuits results in low design efficiency and high cost. This article proposes an end-to-end bundled-data (BD) asynchronous circuit design flow, which can facilitate building asynchronous circuits, even if the designer has little or no asynchronous circuit foundation. Three features that enable this are: 1) a lightweight circuit converter developed in Python can convert circuits from synchronous descriptions to corresponding asynchronous ones at register transfer level (RTL). Desynchronization flow helps designers maintain a “synchronization mentality” to construct asynchronous circuits; 2) a synchronization-like verification method is proposed for asynchronous circuits so that it can be functionally verified before synthesis. Avoids the risk of rework after logic defects are discovered during the synthesis and implementation, as asynchronous circuits often cannot be simulated until gate-level (GL) netlist generation; and 3) the whole implementation flow from RTL to graphic data system (GDS) is based on commercial electronic design automation (EDA) tools. Similar to the design flow of synchronous circuits, it helps designers implement asynchronous circuits with “synchronization habits.” Furthermore, to validate this methodology, two asynchronous processors were, respectively, implemented and evaluated in the TSMC 28-nm CMOS process. Compared to their synchronous counterparts, the general-purpose asynchronous RISC-V processor achieves 20.5% power savings. And the domain-specific asynchronous spiking neural network (SNN) accelerator achieves 58.46% power savings and \u0000<inline-formula> <tex-math>$2.41times $ </tex-math></inline-formula>\u0000 energy efficiency improvement at 70% input spike sparsity.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"154-167"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and Design of Wideband GaAs Digital Step Attenuators 宽带GaAs数字阶跃衰减器的分析与设计
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-30 DOI: 10.1109/TVLSI.2024.3461715
Quanzhen Liang;Xiao Wang;Kuisong Wang;Yuepeng Yan;Xiaoxin Liang
This brief analyses the causes of amplitude and phase errors in digital step attenuators (DSAs), and proposes two novel structures, namely, the series inductive compensation structure (SICS) and the small-bit compensation structure, to reduce these two kinds of errors. A 6-bit DSA with ultrawideband, low insertion loss, and high accuracy is presented, which has an area of only 0.51 mm2 and shows an attenuation range of 31.5 dB in 0.5 dB steps. Measurements reveal that the root-mean-square (rms) amplitude and phase errors for the 64 attenuation states are within 0.18 dB and 8°, respectively. The insertion loss is better than 2.54 dB, and the input 1 dB compression point (IP1 dB) is better than 29 dBm. To the best of our knowledge, this chip presents the highest attenuation accuracy, the lowest insertion loss, the best IP1 dB, and a good matching performance in the range of 2–22 GHz using the 0.25- $mu $ m GaAs p-HEMT process.
简要分析了数字阶跃衰减器(dsa)中产生幅度和相位误差的原因,提出了串联电感补偿结构(SICS)和小比特补偿结构两种新型结构来减小这两种误差。提出了一种具有超宽带、低插入损耗和高精度的6位DSA,其面积仅为0.51 mm2,在0.5 dB步长中衰减范围为31.5 dB。测量结果表明,64种衰减状态的均方根(rms)幅度和相位误差分别在0.18 dB和8°以内。插入损耗优于2.54 dB,输入1db压缩点(ip1db)优于29dbm。据我们所知,该芯片采用0.25- $mu $ m GaAs p-HEMT工艺,在2-22 GHz范围内具有最高的衰减精度,最低的插入损耗,最佳的IP1 dB和良好的匹配性能。
{"title":"Analysis and Design of Wideband GaAs Digital Step Attenuators","authors":"Quanzhen Liang;Xiao Wang;Kuisong Wang;Yuepeng Yan;Xiaoxin Liang","doi":"10.1109/TVLSI.2024.3461715","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3461715","url":null,"abstract":"This brief analyses the causes of amplitude and phase errors in digital step attenuators (DSAs), and proposes two novel structures, namely, the series inductive compensation structure (SICS) and the small-bit compensation structure, to reduce these two kinds of errors. A 6-bit DSA with ultrawideband, low insertion loss, and high accuracy is presented, which has an area of only 0.51 mm2 and shows an attenuation range of 31.5 dB in 0.5 dB steps. Measurements reveal that the root-mean-square (rms) amplitude and phase errors for the 64 attenuation states are within 0.18 dB and 8°, respectively. The insertion loss is better than 2.54 dB, and the input 1 dB compression point (IP1 dB) is better than 29 dBm. To the best of our knowledge, this chip presents the highest attenuation accuracy, the lowest insertion loss, the best IP1 dB, and a good matching performance in the range of 2–22 GHz using the 0.25-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m GaAs p-HEMT process.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"583-587"},"PeriodicalIF":2.8,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Compact 0.9μ W Direct-Conversion Frequency Analyzer for Speech Recognition With Wide- Range Q-Controllable Bandpass Rectifier 基于宽范围q可控带通整流器的0.9μ W语音识别直接转换频率分析仪
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-26 DOI: 10.1109/TVLSI.2024.3453314
Shiro Dosho;Ludovico Minati;Kazuki Maari;Shungo Ohkubo;Hiroyuki Ito
The development of ultralow-power analog front ends for edge artificial intelligence (AI) is actively pursued; however, these front ends suffer from low-frequency selection accuracy, leading to increased training loads for the AI components and higher testing costs. In this article, we propose a novel circuit that fundamentally addresses these issues through direct conversion. By re-evaluating the circuit configurations of the multiplier, harmonic removal filter, and full-wave rectifier (FWR) from scratch, we have miniaturized and integrated an ultralow-power converter that transforms frequency components into pulse sequences. The frequency to be analyzed is determined by the local frequency input to the multiplier, which can be digitally controlled with high precision. In our system, the Q value is adaptively adjusted by the local frequency of the direct conversion, allowing the same circuit configuration to be applied to all frequency nodes, eliminating the need for filter design for each node and providing a highly design-friendly and scalable frequency analysis system.The test chip was fabricated with a 0.18- $mu $ m process, operating at a 1.2-V supply, and outputting power pulse streams corresponding to 11 different frequencies ranging from 500 to 5 kHz. The total operating power was $0.9mu $ W, with an achieved equivalent Q factor ranging from 3.6 to 36. In a training experiment using a convolutional neural network (CNN) speech recognition model constructed with a functional model equivalent to this front end, a recognition rate exceeding 80% was achieved, demonstrating the practicality of this front end.
积极推进边缘人工智能(AI)超低功耗模拟前端开发;然而,这些前端受到低频选择准确性的影响,导致人工智能组件的训练负荷增加和测试成本增加。在本文中,我们提出了一种新颖的电路,通过直接转换从根本上解决了这些问题。通过从头开始重新评估乘法器、谐波去除滤波器和全波整流器(FWR)的电路配置,我们已经小型化并集成了一个将频率分量转换为脉冲序列的超低功率转换器。要分析的频率是由输入到乘法器的本地频率决定的,可以进行高精度的数字控制。在我们的系统中,Q值由直接转换的本地频率自适应调整,允许相同的电路配置应用于所有频率节点,消除了对每个节点的滤波器设计的需要,并提供了一个高度设计友好和可扩展的频率分析系统。测试芯片采用0.18- $ $ μ $ m工艺制作,工作在1.2 v电源下,输出功率脉冲流对应于500至5 kHz的11个不同频率。总工作功率为0.9 μ W,达到的等效Q因子范围为3.6至36。在与该前端等效的功能模型构建的卷积神经网络(CNN)语音识别模型的训练实验中,识别率超过80%,证明了该前端的实用性。
{"title":"A Compact 0.9μ W Direct-Conversion Frequency Analyzer for Speech Recognition With Wide- Range Q-Controllable Bandpass Rectifier","authors":"Shiro Dosho;Ludovico Minati;Kazuki Maari;Shungo Ohkubo;Hiroyuki Ito","doi":"10.1109/TVLSI.2024.3453314","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3453314","url":null,"abstract":"The development of ultralow-power analog front ends for edge artificial intelligence (AI) is actively pursued; however, these front ends suffer from low-frequency selection accuracy, leading to increased training loads for the AI components and higher testing costs. In this article, we propose a novel circuit that fundamentally addresses these issues through direct conversion. By re-evaluating the circuit configurations of the multiplier, harmonic removal filter, and full-wave rectifier (FWR) from scratch, we have miniaturized and integrated an ultralow-power converter that transforms frequency components into pulse sequences. The frequency to be analyzed is determined by the local frequency input to the multiplier, which can be digitally controlled with high precision. In our system, the Q value is adaptively adjusted by the local frequency of the direct conversion, allowing the same circuit configuration to be applied to all frequency nodes, eliminating the need for filter design for each node and providing a highly design-friendly and scalable frequency analysis system.The test chip was fabricated with a 0.18-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m process, operating at a 1.2-V supply, and outputting power pulse streams corresponding to 11 different frequencies ranging from 500 to 5 kHz. The total operating power was <inline-formula> <tex-math>$0.9mu $ </tex-math></inline-formula>W, with an achieved equivalent Q factor ranging from 3.6 to 36. In a training experiment using a convolutional neural network (CNN) speech recognition model constructed with a functional model equivalent to this front end, a recognition rate exceeding 80% was achieved, demonstrating the practicality of this front end.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"315-325"},"PeriodicalIF":2.8,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10695034","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1