首页 > 最新文献

IEEE Transactions on Circuits and Systems I: Regular Papers最新文献

英文 中文
IEEE Circuits and Systems Society Information 电气和电子工程师学会电路与系统协会信息
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-01 DOI: 10.1109/TCSI.2024.3460265
{"title":"IEEE Circuits and Systems Society Information","authors":"","doi":"10.1109/TCSI.2024.3460265","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3460265","url":null,"abstract":"","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 10","pages":"C3-C3"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10702445","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dynamic Capacitance Matching (DCM)-Based Current Response Algorithm for Signal Line RC Network 基于动态电容匹配的信号线RC网络电流响应算法
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-01 DOI: 10.1109/TCSI.2024.3463708
Zhoujie Wu;Cai Luo;Zhong Guan
This paper proposes a dynamic capacitance matching (DCM)-based RC current response algorithm for calculating the current waveform of a signal line without performing transistor level SPICE simulation. Specifically, unlike previous methods such as current source model, driver linear representation, waveform functional fitting or equivalent load capacitance, our algorithm does not rely on fixed reduced model of standard-cell driver or RC load. Instead, it approaches the current waveform dynamically by computing current responses of the target driver for various load scenarios. Besides, we creatively use symbolic expression to combine the y-parameter of RC network with the pre-characterized driver library in order to perform capacitance matching and simulate current waveform by considering the Miller and over/undershoot effects. Our algorithm is experimentally verified on 40nm CMOS technology and has been adopted by latest commercial tool for different nodes (from 180nm to 3nm). Experimental results show that our algorithm has only about 1% error compared with SPICE golden results while the runtime is improved by 50 to 200 times, which demonstrates overwhelming capability in calculating timing, power and electromigration of signal lines.
本文提出了一种基于动态电容匹配(DCM)的RC电流响应算法,可在不进行晶体管级SPICE仿真的情况下计算信号线的电流波形。具体来说,与以往的电流源模型、驱动器线性表示、波形函数拟合或等效负载电容等方法不同,我们的算法不依赖于标准单元驱动器或RC负载的固定简化模型。相反,它通过计算各种负载情况下目标驱动器的电流响应来动态地接近电流波形。此外,我们创造性地使用符号表达式将RC网络的y参数与预表征的驱动器库结合起来,在考虑米勒效应和过/欠冲效应的情况下进行电容匹配和电流波形仿真。我们的算法在40nm CMOS技术上进行了实验验证,并已被最新的商用工具用于不同节点(从180nm到3nm)。实验结果表明,该算法与SPICE golden结果相比误差仅为1%左右,而运行时间提高了50 ~ 200倍,在计算信号线的时序、功率和电迁移方面具有很强的能力。
{"title":"A Dynamic Capacitance Matching (DCM)-Based Current Response Algorithm for Signal Line RC Network","authors":"Zhoujie Wu;Cai Luo;Zhong Guan","doi":"10.1109/TCSI.2024.3463708","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3463708","url":null,"abstract":"This paper proposes a dynamic capacitance matching (DCM)-based RC current response algorithm for calculating the current waveform of a signal line without performing transistor level SPICE simulation. Specifically, unlike previous methods such as current source model, driver linear representation, waveform functional fitting or equivalent load capacitance, our algorithm does not rely on fixed reduced model of standard-cell driver or RC load. Instead, it approaches the current waveform dynamically by computing current responses of the target driver for various load scenarios. Besides, we creatively use symbolic expression to combine the y-parameter of RC network with the pre-characterized driver library in order to perform capacitance matching and simulate current waveform by considering the Miller and over/undershoot effects. Our algorithm is experimentally verified on 40nm CMOS technology and has been adopted by latest commercial tool for different nodes (from 180nm to 3nm). Experimental results show that our algorithm has only about 1% error compared with SPICE golden results while the runtime is improved by 50 to 200 times, which demonstrates overwhelming capability in calculating timing, power and electromigration of signal lines.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 12","pages":"5804-5813"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142757861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Circuits and Systems--I: Regular Papers Information for Authors IEEE 《电路与系统》期刊--I:常规论文 作者须知
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-01 DOI: 10.1109/TCSI.2024.3460263
{"title":"IEEE Transactions on Circuits and Systems--I: Regular Papers Information for Authors","authors":"","doi":"10.1109/TCSI.2024.3460263","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3460263","url":null,"abstract":"","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 10","pages":"4898-4898"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10702479","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compact On-Chip mm-wave Reconfigurable Wideband Filtering Switch in 28-nm Bulk CMOS for Integrated Sensing and Communication System Applications 紧凑型片上毫米波可重构宽带滤波开关在28纳米体CMOS集成传感和通信系统的应用
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-01 DOI: 10.1109/TCSI.2024.3464734
Hui-Yang Li;Jin-Xu Xu;Xiu Yin Zhang
In this paper, we propose a compact wideband on-chip millimeter-wave (mm-wave) reconfigurable wideband filtering switch in 28-nm bulk CMOS technology. A dual-mode LC resonator loaded with transistors is used to achieve wideband filtering responses with a transmission zero at the lower frequency band. The resonant frequency of the resonator and the location of the transmission zero can be conveniently tuned to reconfigure the passband and stopband frequencies by turning on and off the transistor. Moreover, the passband can also be switched on and off, enabling the single-pole single-throw filtering switch circuit function. In this way, the proposed mm-wave reconfigurable filtering switch is applicable to the integrated sensing and communication (ISAC) system, where image rejection in communication operation and a wide bandwidth (or high resolution) in sensing operation are both required. Furthermore, to meet the applications in the ISAC systems with different architectures, extension designs of the proposed reconfigurable filtering switch with the impedance conversion function, high-order responses, balanced-to-unbalanced transition, and differential input/output ports are presented in detailed. For demonstration, the wideband reconfigurable filtering switch has been fabricated. The core circuit has a very compact size of $0.205times 0.140$ mm2. Experimental results show that the passband can be reconfigured between 20-55 GHz and 37-44 GHz, with a rejection >17 dB for sensing operation and >12 dB image-band rejection for communication operation, respectively. High off-state isolation of better than 24.8 dB is also achieved.
在本文中,我们提出了一个紧凑的宽带片上毫米波(mm波)可重构宽带滤波开关在28纳米的大块CMOS技术。采用负载晶体管的双模LC谐振器实现低频传输零的宽带滤波响应。谐振器的谐振频率和传输零点的位置可以方便地通过打开和关闭晶体管来重新配置通带和阻带频率。此外,通带还可以通断开关,实现单极单掷滤波开关电路功能。因此,所提出的毫米波可重构滤波开关适用于综合传感与通信(ISAC)系统,该系统在通信操作中需要抑制图像,同时在传感操作中需要带宽(或高分辨率)。此外,为了满足不同架构的ISAC系统的应用,本文还详细介绍了具有阻抗转换功能、高阶响应、平衡到不平衡转换和差分输入/输出端口的可重构滤波开关的扩展设计。为了演示,制作了宽带可重构滤波开关。核心电路的尺寸非常紧凑,为0.205美元× 0.140美元mm2。实验结果表明,该通频带可在20-55 GHz和37-44 GHz之间重新配置,分别具有用于感知操作的抑制>17 dB和用于通信操作的>12 dB图像带抑制。还实现了优于24.8 dB的高非状态隔离。
{"title":"Compact On-Chip mm-wave Reconfigurable Wideband Filtering Switch in 28-nm Bulk CMOS for Integrated Sensing and Communication System Applications","authors":"Hui-Yang Li;Jin-Xu Xu;Xiu Yin Zhang","doi":"10.1109/TCSI.2024.3464734","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3464734","url":null,"abstract":"In this paper, we propose a compact wideband on-chip millimeter-wave (mm-wave) reconfigurable wideband filtering switch in 28-nm bulk CMOS technology. A dual-mode LC resonator loaded with transistors is used to achieve wideband filtering responses with a transmission zero at the lower frequency band. The resonant frequency of the resonator and the location of the transmission zero can be conveniently tuned to reconfigure the passband and stopband frequencies by turning on and off the transistor. Moreover, the passband can also be switched on and off, enabling the single-pole single-throw filtering switch circuit function. In this way, the proposed mm-wave reconfigurable filtering switch is applicable to the integrated sensing and communication (ISAC) system, where image rejection in communication operation and a wide bandwidth (or high resolution) in sensing operation are both required. Furthermore, to meet the applications in the ISAC systems with different architectures, extension designs of the proposed reconfigurable filtering switch with the impedance conversion function, high-order responses, balanced-to-unbalanced transition, and differential input/output ports are presented in detailed. For demonstration, the wideband reconfigurable filtering switch has been fabricated. The core circuit has a very compact size of \u0000<inline-formula> <tex-math>$0.205times 0.140$ </tex-math></inline-formula>\u0000 mm2. Experimental results show that the passband can be reconfigured between 20-55 GHz and 37-44 GHz, with a rejection >17 dB for sensing operation and >12 dB image-band rejection for communication operation, respectively. High off-state isolation of better than 24.8 dB is also achieved.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"125-134"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Circuits and Systems--I: Regular Papers Publication Information IEEE 电路与系统论文集--I:常规论文 出版信息
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-01 DOI: 10.1109/TCSI.2024.3460261
{"title":"IEEE Transactions on Circuits and Systems--I: Regular Papers Publication Information","authors":"","doi":"10.1109/TCSI.2024.3460261","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3460261","url":null,"abstract":"","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 10","pages":"C2-C2"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10702443","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CASCADE: A Framework for CNN Accelerator Synthesis With Concatenation and Refreshing Dataflow CASCADE:利用串联和刷新数据流的 CNN 加速器合成框架
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-01 DOI: 10.1109/TCSI.2024.3452954
Qingyu Guo;Haoyang Luo;Meng Li;Xiyuan Tang;Yuan Wang
Layer Pipeline (LP) represents an innovative architecture for neural network accelerators, which implements task-level pipelining at the granularity of layers. Despite improvements in throughput, LP architectures face challenges due to complicated dataflow design, intricate design space and high resource requirements. In this paper, we introduce an accelerator synthesis framework, CASCADE. CASCADE leverages a novel dataflow, CARD, to efficiently manage convolutional operations’ irregular memory access patterns using simplified logic and minimal buffers. It also employs advanced design space exploration methods to optimize unrolling parallelism and FIFO depth settings automatically for each layer. Finally, to further enhance resource efficiency, CASCADE leverages Lookup Table-based multiplication and accumulation units. With extensive experimental results, we demonstrate that CASCADE significantly outperforms existing works, achieving a $3times $ improvement in resource efficiency and a $4times $ improvement in power efficiency. It achieves over $1.5times 10^{4}$ frames per second throughput and 71.9% accuracy on ImageNet.
层流水线(LP)是神经网络加速器的创新架构,它以层为粒度实现任务级流水线。尽管吞吐量有所提高,但由于复杂的数据流设计、错综复杂的设计空间和较高的资源要求,LP 架构仍面临挑战。在本文中,我们介绍了一种加速器综合框架 CASCADE。CASCADE 利用新颖的数据流 CARD,使用简化的逻辑和最小的缓冲区有效地管理卷积操作的不规则内存访问模式。它还采用先进的设计空间探索方法,为每一层自动优化开卷并行性和 FIFO 深度设置。最后,为了进一步提高资源效率,CASCADE 利用了基于查找表的乘法和累加单元。通过大量实验结果,我们证明 CASCADE 的性能明显优于现有研究成果,在资源效率方面提高了 3 美元/次,在能效方面提高了 4 美元/次。它在 ImageNet 上实现了每秒超过 1.5 美元(10^{4}$ 帧)的吞吐量和 71.9% 的准确率。
{"title":"CASCADE: A Framework for CNN Accelerator Synthesis With Concatenation and Refreshing Dataflow","authors":"Qingyu Guo;Haoyang Luo;Meng Li;Xiyuan Tang;Yuan Wang","doi":"10.1109/TCSI.2024.3452954","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3452954","url":null,"abstract":"Layer Pipeline (LP) represents an innovative architecture for neural network accelerators, which implements task-level pipelining at the granularity of layers. Despite improvements in throughput, LP architectures face challenges due to complicated dataflow design, intricate design space and high resource requirements. In this paper, we introduce an accelerator synthesis framework, CASCADE. CASCADE leverages a novel dataflow, CARD, to efficiently manage convolutional operations’ irregular memory access patterns using simplified logic and minimal buffers. It also employs advanced design space exploration methods to optimize unrolling parallelism and FIFO depth settings automatically for each layer. Finally, to further enhance resource efficiency, CASCADE leverages Lookup Table-based multiplication and accumulation units. With extensive experimental results, we demonstrate that CASCADE significantly outperforms existing works, achieving a \u0000<inline-formula> <tex-math>$3times $ </tex-math></inline-formula>\u0000 improvement in resource efficiency and a \u0000<inline-formula> <tex-math>$4times $ </tex-math></inline-formula>\u0000 improvement in power efficiency. It achieves over \u0000<inline-formula> <tex-math>$1.5times 10^{4}$ </tex-math></inline-formula>\u0000 frames per second throughput and 71.9% accuracy on ImageNet.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 11","pages":"5235-5248"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142517918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TechRxiv: Share Your Preprint Research with the World! TechRxiv:与世界分享您的预印本研究成果!
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-01 DOI: 10.1109/TCSI.2024.3462329
{"title":"TechRxiv: Share Your Preprint Research with the World!","authors":"","doi":"10.1109/TCSI.2024.3462329","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3462329","url":null,"abstract":"","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 10","pages":"4897-4897"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10702446","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Efficacy and Vulnerabilities of Logic Locking in Tree-Based Machine Learning 基于树的机器学习中逻辑锁定的有效性和漏洞研究
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-01 DOI: 10.1109/TCSI.2024.3457541
Brunno Alves de Abreu;Guilherme Paim;Lilas Alrahis;Paulo Flores;Ozgur Sinanoglu;Sergio Bampi;Hussam Amrouch
The popularity and widespread usage of machine learning (ML) hardware have created challenges for its intellectual property (IP) protection. Logic locking is a widely used technique for IP protection but has received little attention in error-resilient applications such as ML hardware modules. This work investigates the effectiveness of logic locking when applied to tree-based ML circuits and reveals a critical vulnerability that undermines its effectiveness for single-label ML classifiers. We propose a logic locking scheme to eliminate the vulnerabilities in decision trees (DTs) and random forests (RFs) circuits. In our extensive simulation involving 16 DTs and 16 RFs, our solution consistently thwarts the vulnerability. We further evaluated the security of our approach by considering different obfuscation percentages and launching state-of-the-art oracle-less attacks on logic locking. Our method proves resilient, indicating that by fixing the identified vulnerability, we did not introduce new attack vectors. Further, our investigation indicates that DT/RF accelerators are significantly less vulnerable to oracle-less attacks compared to exact circuits. Overall, our work lays the foundation for future investigations into the effectiveness of logic locking for ML circuits.
机器学习硬件的普及和广泛使用为其知识产权(IP)保护带来了挑战。逻辑锁定是一种广泛使用的IP保护技术,但在ML硬件模块等容错应用中很少受到关注。这项工作研究了应用于基于树的ML电路时逻辑锁定的有效性,并揭示了一个严重的漏洞,该漏洞破坏了其对单标签ML分类器的有效性。我们提出了一种逻辑锁定方案来消除决策树和随机森林电路中的漏洞。在我们涉及16个dt和16个rf的广泛模拟中,我们的解决方案始终如一地挫败了漏洞。我们通过考虑不同的混淆百分比和对逻辑锁定发起最先进的无oracle攻击,进一步评估了我们方法的安全性。我们的方法证明了弹性,表明通过修复已识别的漏洞,我们没有引入新的攻击向量。此外,我们的调查表明,与精确电路相比,DT/RF加速器明显不容易受到无oracle攻击。总的来说,我们的工作为未来研究ML电路的逻辑锁定有效性奠定了基础。
{"title":"On the Efficacy and Vulnerabilities of Logic Locking in Tree-Based Machine Learning","authors":"Brunno Alves de Abreu;Guilherme Paim;Lilas Alrahis;Paulo Flores;Ozgur Sinanoglu;Sergio Bampi;Hussam Amrouch","doi":"10.1109/TCSI.2024.3457541","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3457541","url":null,"abstract":"The popularity and widespread usage of machine learning (ML) hardware have created challenges for its intellectual property (IP) protection. Logic locking is a widely used technique for IP protection but has received little attention in error-resilient applications such as ML hardware modules. This work investigates the effectiveness of logic locking when applied to tree-based ML circuits and reveals a critical vulnerability that undermines its effectiveness for single-label ML classifiers. We propose a logic locking scheme to eliminate the vulnerabilities in decision trees (DTs) and random forests (RFs) circuits. In our extensive simulation involving 16 DTs and 16 RFs, our solution consistently thwarts the vulnerability. We further evaluated the security of our approach by considering different obfuscation percentages and launching state-of-the-art oracle-less attacks on logic locking. Our method proves resilient, indicating that by fixing the identified vulnerability, we did not introduce new attack vectors. Further, our investigation indicates that DT/RF accelerators are significantly less vulnerable to oracle-less attacks compared to exact circuits. Overall, our work lays the foundation for future investigations into the effectiveness of logic locking for ML circuits.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"180-191"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HRCIM-NTT: An Efficient Compute-in-Memory NTT Accelerator With Hybrid-Redundant Numbers HRCIM-NTT:具有混合冗余数字的高效内存中计算NTT加速器
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-30 DOI: 10.1109/TCSI.2024.3463184
Xu Zhang;Yaodong Wei;Minghao Li;Jing Tian;Zhongfeng Wang
Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature the number-theoretic transform (NTT) as the computing bottleneck compelling fast and low-power hardware implementations. In this work, a high-speed and power-efficient NTT accelerator is presented leveraging the compute-in-memory (CIM) technique with bottom-up optimizations. Firstly, a carry-free modular multiplication (CFMM) algorithm is proposed, which utilizes on-the-fly reduction and hybrid-redundant representation to optimize the butterfly unit operation, the cornerstone of NTT. Based on the optimized algorithm, an efficient butterfly unit in memory (BUIM) is developed by co-designing with SRAM circuit, which saves the memory access energy, decreases operation cycles, and obtains ultra-short critical path. Additionally, the data pattern of CIM array is also improved to avoid redundant memory read/write operations, which further reduces memory access overhead. Finally, a combination of pipelined operation flow and constant interstage data mapping strategy is employed to bestow the proposed hybrid-redundant CIM NTT (HRCIM-NTT) architecture with minimized computing cycles and reduced routing overhead. The implementation under 45nm CMOS technology demonstrates that HRCIM-NTT achieves the highest throughput and lowest latency among the existing CIM-based NTT accelerators.
最近,选择了四种nist批准的后量子加密(PQC)算法进行标准化。其中三种是基于格的加密方案,并以数字理论变换(NTT)为特征,作为快速低功耗硬件实现的计算瓶颈。在这项工作中,提出了一种高速和节能的NTT加速器,利用内存中计算(CIM)技术进行自下而上的优化。首先,提出了一种无携带模乘法(CFMM)算法,该算法利用动态约简和混合冗余表示来优化NTT的基础——蝴蝶单元运行。在优化算法的基础上,通过与SRAM电路协同设计,开发了一种高效的内存蝴蝶单元(BUIM),节省了存储器访问能量,缩短了运算周期,并获得了超短的关键路径。此外,还改进了CIM阵列的数据模式,避免了冗余的内存读/写操作,进一步降低了内存访问开销。最后,采用流水线操作流和恒级间数据映射策略相结合,使所提出的混合冗余CIM NTT (HRCIM-NTT)体系结构具有最小化的计算周期和减少的路由开销。在45纳米CMOS技术下的实现表明,HRCIM-NTT在现有的基于cim的NTT加速器中实现了最高的吞吐量和最低的延迟。
{"title":"HRCIM-NTT: An Efficient Compute-in-Memory NTT Accelerator With Hybrid-Redundant Numbers","authors":"Xu Zhang;Yaodong Wei;Minghao Li;Jing Tian;Zhongfeng Wang","doi":"10.1109/TCSI.2024.3463184","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3463184","url":null,"abstract":"Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature the number-theoretic transform (NTT) as the computing bottleneck compelling fast and low-power hardware implementations. In this work, a high-speed and power-efficient NTT accelerator is presented leveraging the compute-in-memory (CIM) technique with bottom-up optimizations. Firstly, a carry-free modular multiplication (CFMM) algorithm is proposed, which utilizes on-the-fly reduction and hybrid-redundant representation to optimize the butterfly unit operation, the cornerstone of NTT. Based on the optimized algorithm, an efficient butterfly unit in memory (BUIM) is developed by co-designing with SRAM circuit, which saves the memory access energy, decreases operation cycles, and obtains ultra-short critical path. Additionally, the data pattern of CIM array is also improved to avoid redundant memory read/write operations, which further reduces memory access overhead. Finally, a combination of pipelined operation flow and constant interstage data mapping strategy is employed to bestow the proposed hybrid-redundant CIM NTT (HRCIM-NTT) architecture with minimized computing cycles and reduced routing overhead. The implementation under 45nm CMOS technology demonstrates that HRCIM-NTT achieves the highest throughput and lowest latency among the existing CIM-based NTT accelerators.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"214-227"},"PeriodicalIF":5.2,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of Linear Differential Equations Using Pulse-Coupled Oscillators With an Ultra-Low Power Neuromorphic Realization 用脉冲耦合振荡器实现线性微分方程的超低功耗神经形态实现
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-27 DOI: 10.1109/TCSI.2024.3463536
Jafar Shamsi;Wilten Nicola
Pulse-coupled oscillators (PCOs) are used as models for oscillatory systems in diverse fields such as biology, physics, and engineering. When correctly coupled, PCOs can display sophisticated emergent dynamics for a large number of oscillators. Here, we propose an algorithm and hardware implementation of PCOs to emulate arbitrary systems of linear differential equations (DEs) with inputs, which are similar to the equations used in feedback control laws or linearizations of nonlinear systems. We show that m populations of oscillators can solve a set of m-dimensional linear DEs with simple coupling schemes, and crucially, without the matrix multiplications required in Euler integration. The emergence of linear dynamical systems in networks of PCOs occurs when the number of oscillators within a population becomes large, as demonstrated through an analytically exact mean-field derivation. In addition, a hardware architecture of PCOs for digital implementation is proposed and realized on an ultra-low power FPGA as a proof of concept. These results show that there are simple coupling schemes for pulse-coupled oscillator networks that collectively compute complex dynamical systems. These PCO networks also have an immediate implementation as low power neuromorphic edge devices.
脉冲耦合振荡器(PCOs)在生物、物理和工程等多个领域被用作振荡系统的模型。当正确耦合时,PCOs可以显示大量振荡器的复杂紧急动态。在这里,我们提出了一种PCOs的算法和硬件实现,以模拟具有输入的线性微分方程(DEs)的任意系统,这些系统类似于反馈控制律或非线性系统线性化中使用的方程。我们证明了m个振子种群可以用简单的耦合方案求解一组m维线性微分方程,而且至关重要的是,不需要欧拉积分中所需的矩阵乘法。线性动力系统在PCOs网络中出现时,在一个群体内的振子数量变得很大,通过解析精确的平均场推导证明。此外,提出了一种用于数字实现的PCOs硬件架构,并在超低功耗FPGA上实现,作为概念验证。这些结果表明,对于共同计算复杂动力系统的脉冲耦合振荡器网络,存在简单的耦合方案。这些PCO网络也可以作为低功耗神经形态边缘设备立即实现。
{"title":"Implementation of Linear Differential Equations Using Pulse-Coupled Oscillators With an Ultra-Low Power Neuromorphic Realization","authors":"Jafar Shamsi;Wilten Nicola","doi":"10.1109/TCSI.2024.3463536","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3463536","url":null,"abstract":"Pulse-coupled oscillators (PCOs) are used as models for oscillatory systems in diverse fields such as biology, physics, and engineering. When correctly coupled, PCOs can display sophisticated emergent dynamics for a large number of oscillators. Here, we propose an algorithm and hardware implementation of PCOs to emulate arbitrary systems of linear differential equations (DEs) with inputs, which are similar to the equations used in feedback control laws or linearizations of nonlinear systems. We show that m populations of oscillators can solve a set of m-dimensional linear DEs with simple coupling schemes, and crucially, without the matrix multiplications required in Euler integration. The emergence of linear dynamical systems in networks of PCOs occurs when the number of oscillators within a population becomes large, as demonstrated through an analytically exact mean-field derivation. In addition, a hardware architecture of PCOs for digital implementation is proposed and realized on an ultra-low power FPGA as a proof of concept. These results show that there are simple coupling schemes for pulse-coupled oscillator networks that collectively compute complex dynamical systems. These PCO networks also have an immediate implementation as low power neuromorphic edge devices.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"14-24"},"PeriodicalIF":5.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Circuits and Systems I: Regular Papers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1