首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
RCU- 2m: A VLSI Radix- 2m Cubic Unit
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-08 DOI: 10.1109/TVLSI.2024.3486237
Eduardo Antonio Ceśar da Costa;Morgana Macedo Azevedo da Rosa
Cubic operations are among the most used arithmetic operations in many applications that demand higher order simultaneous operand computation, such as cryptography and bicubic polynomial interpolation. This article proposes a novel VLSI radix- $2^{m}$ cubic unit (RCU- $2^{m}$ ) capable of processing cubic operations at m bits simultaneously, with m values of 2 (RCU-4), 3 (RCU-8), and 4 (RCU-16). RCU-16 emerges as the most area-efficient configuration, surpassing RCU-8 and notably outperforming RCU-4. In the 8-bit scenario, RCU-16 achieves remarkable area savings, surpassing the literature’s proposed cubic unit by $11.58times $ . Across all configurations, RCU- $2^{m}$ consistently outperforms the automatically selected cube unit, with energy savings ranging from $1.04times $ to $2times $ . In application specific integrated circuit (ASIC) and field-programmable gate array (FPGA)-based analyses, RCU-16 consistently exhibits superior performance in both area and energy savings compared with RCU-4, RCU-8, and solutions from the literature. These findings emphasize the importance of adopting radix- $2^{m}$ configurations, particularly RCU-16, for optimal energy-constrained VLSI applications.
{"title":"RCU- 2m: A VLSI Radix- 2m Cubic Unit","authors":"Eduardo Antonio Ceśar da Costa;Morgana Macedo Azevedo da Rosa","doi":"10.1109/TVLSI.2024.3486237","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3486237","url":null,"abstract":"Cubic operations are among the most used arithmetic operations in many applications that demand higher order simultaneous operand computation, such as cryptography and bicubic polynomial interpolation. This article proposes a novel VLSI radix-<inline-formula> <tex-math>$2^{m}$ </tex-math></inline-formula> cubic unit (RCU-<inline-formula> <tex-math>$2^{m}$ </tex-math></inline-formula>) capable of processing cubic operations at m bits simultaneously, with m values of 2 (RCU-4), 3 (RCU-8), and 4 (RCU-16). RCU-16 emerges as the most area-efficient configuration, surpassing RCU-8 and notably outperforming RCU-4. In the 8-bit scenario, RCU-16 achieves remarkable area savings, surpassing the literature’s proposed cubic unit by <inline-formula> <tex-math>$11.58times $ </tex-math></inline-formula>. Across all configurations, RCU-<inline-formula> <tex-math>$2^{m}$ </tex-math></inline-formula> consistently outperforms the automatically selected cube unit, with energy savings ranging from <inline-formula> <tex-math>$1.04times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$2times $ </tex-math></inline-formula>. In application specific integrated circuit (ASIC) and field-programmable gate array (FPGA)-based analyses, RCU-16 consistently exhibits superior performance in both area and energy savings compared with RCU-4, RCU-8, and solutions from the literature. These findings emphasize the importance of adopting radix-<inline-formula> <tex-math>$2^{m}$ </tex-math></inline-formula> configurations, particularly RCU-16, for optimal energy-constrained VLSI applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"733-745"},"PeriodicalIF":2.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Agile-X: A Structured-ASIC Created With a Mask-Less Lithography System Enabling Low-Cost and Agile Chip Fabrication
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-06 DOI: 10.1109/TVLSI.2024.3486239
Atsutake Kosuge;Hirofumi Sumi;Naonobu Shimamoto;Yukinori Ochiai;Yurie Inoue;Hideharu Amano;Tohru Mogami;Yoshio Mita;Makoto Ikeda;Tadahiro Kuroda
Scaling to finer CMOS process nodes necessitates more masks, resulting in higher costs and extended turnaround times (TATs). High costs and long TATs have hindered researchers outside the field of integrated circuits, including those in medicine, physics, and science from prototyping their own chips. Therefore, opportunities for diverse innovations in integrated circuits and talent development have been limited. We have developed the Agile-X platform for low-cost, rapid manufacturing of system-on-chips. Users can implement their own dedicated circuits with gate-array circuits on a base chip, which has common intellectual properties (IPs) such as RISC-V CPUs, various IOs, and ADCs. The base chip is manufactured in a foundry up to the intermediate metal layers and shipped with metal deposition on its surface. By directly drawing wiring patterns on this base chip with a mask-less lithography system, custom chips can be manufactured on-site without masks. As this process only requires wiring and eliminates masks, production time is drastically reduced compared to traditional full-mask wafer processes and multiproject wafer (MPW) shuttles. Development and manufacturing costs for the base chip, including preintegrated IPs, are shared among all Agile-X users. This reduces both IP and base-chip wafer costs per user. We prototyped wafers using a 0.18- $mu $ m CMOS process and tested the proposed structured ASIC platform and manufacturing process using mask-less lithography systems. The results indicate that the process from inputting GDS data to lithography and dry etching can be completed within 30 min, and custom application-specific integrated circuits (ASICs) can be manufactured within a day. Compared with full-mask wafer design and manufacturing, the manufacturing cost per chip, including IP costs, is reduced from 271000 USD to 22 USD, a reduction of 1/12252, and the manufacturing period is reduced from 20 days to 30 min, a reduction of 1/960.
{"title":"Agile-X: A Structured-ASIC Created With a Mask-Less Lithography System Enabling Low-Cost and Agile Chip Fabrication","authors":"Atsutake Kosuge;Hirofumi Sumi;Naonobu Shimamoto;Yukinori Ochiai;Yurie Inoue;Hideharu Amano;Tohru Mogami;Yoshio Mita;Makoto Ikeda;Tadahiro Kuroda","doi":"10.1109/TVLSI.2024.3486239","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3486239","url":null,"abstract":"Scaling to finer CMOS process nodes necessitates more masks, resulting in higher costs and extended turnaround times (TATs). High costs and long TATs have hindered researchers outside the field of integrated circuits, including those in medicine, physics, and science from prototyping their own chips. Therefore, opportunities for diverse innovations in integrated circuits and talent development have been limited. We have developed the Agile-X platform for low-cost, rapid manufacturing of system-on-chips. Users can implement their own dedicated circuits with gate-array circuits on a base chip, which has common intellectual properties (IPs) such as RISC-V CPUs, various IOs, and ADCs. The base chip is manufactured in a foundry up to the intermediate metal layers and shipped with metal deposition on its surface. By directly drawing wiring patterns on this base chip with a mask-less lithography system, custom chips can be manufactured on-site without masks. As this process only requires wiring and eliminates masks, production time is drastically reduced compared to traditional full-mask wafer processes and multiproject wafer (MPW) shuttles. Development and manufacturing costs for the base chip, including preintegrated IPs, are shared among all Agile-X users. This reduces both IP and base-chip wafer costs per user. We prototyped wafers using a 0.18-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m CMOS process and tested the proposed structured ASIC platform and manufacturing process using mask-less lithography systems. The results indicate that the process from inputting GDS data to lithography and dry etching can be completed within 30 min, and custom application-specific integrated circuits (ASICs) can be manufactured within a day. Compared with full-mask wafer design and manufacturing, the manufacturing cost per chip, including IP costs, is reduced from 271000 USD to 22 USD, a reduction of 1/12252, and the manufacturing period is reduced from 20 days to 30 min, a reduction of 1/960.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"746-756"},"PeriodicalIF":2.8,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Falcon: A Fused-Layer Accelerator With Layer-Wise Hybrid Inference Flow for Computational Imaging CNNs
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-06 DOI: 10.1109/TVLSI.2024.3488042
Yong-Tai Chen;Yen-Ting Chiu;Hao-Jiun Tu;Chao-Tsung Huang
Computational imaging (CI) has advanced significantly due to the use of convolutional neural networks (CNNs). Its edge deployment relies on layer fusion to offload the monstrous external memory access (EMA) of feature maps, necessitating the handling of overlapped features either through reusing or recomputing them. Depending on how the boundary-handling strategy is organized, the induced computing complexity and EMA can be optimized. However, state-of-the-art CI accelerators primarily apply homogeneous inference flows, which employ a single overlap-handling strategy throughout the fused layers, limiting their ability to balance computation and data access. In this article, we explore layer-wise optimization in fused-layer CNNs by exploiting hybrid-strategy inference flows and devising a corresponding computing architecture. We categorize layer-wise strategies and put forward a layer-wise hybrid inference flow (LHIF) to integrate their advantages, and we propose an optimization procedure that explicitly analyzes essential figures of merit (FoMs), including throughput, EMA, and energy efficiency. Furthermore, we develop a high-throughput accelerator—Falcon—to efficiently support LHIF under massive parallelism, especially with a time-division-multiplexing (TDM) buffer interface that enables seamless access to feature maps stored in an interleaved manner. Layout results show that the accelerator, delivering 41 TOPS with 1.5 MB of feature-map buffers, supports LHIF while increasing the die area by only 1.4% and power consumption by only 0.7%. Extensive simulations are conducted to demonstrate the versatility of LHIF in working scenarios at operational, design, and system levels. Compared with using homogeneous inference flows, the proposed LHIF achieves Pareto optimality with up to $2.28times $ higher throughput and $3.5times $ lower EMA.
{"title":"Falcon: A Fused-Layer Accelerator With Layer-Wise Hybrid Inference Flow for Computational Imaging CNNs","authors":"Yong-Tai Chen;Yen-Ting Chiu;Hao-Jiun Tu;Chao-Tsung Huang","doi":"10.1109/TVLSI.2024.3488042","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3488042","url":null,"abstract":"Computational imaging (CI) has advanced significantly due to the use of convolutional neural networks (CNNs). Its edge deployment relies on layer fusion to offload the monstrous external memory access (EMA) of feature maps, necessitating the handling of overlapped features either through reusing or recomputing them. Depending on how the boundary-handling strategy is organized, the induced computing complexity and EMA can be optimized. However, state-of-the-art CI accelerators primarily apply homogeneous inference flows, which employ a single overlap-handling strategy throughout the fused layers, limiting their ability to balance computation and data access. In this article, we explore layer-wise optimization in fused-layer CNNs by exploiting hybrid-strategy inference flows and devising a corresponding computing architecture. We categorize layer-wise strategies and put forward a layer-wise hybrid inference flow (LHIF) to integrate their advantages, and we propose an optimization procedure that explicitly analyzes essential figures of merit (FoMs), including throughput, EMA, and energy efficiency. Furthermore, we develop a high-throughput accelerator—Falcon—to efficiently support LHIF under massive parallelism, especially with a time-division-multiplexing (TDM) buffer interface that enables seamless access to feature maps stored in an interleaved manner. Layout results show that the accelerator, delivering 41 TOPS with 1.5 MB of feature-map buffers, supports LHIF while increasing the die area by only 1.4% and power consumption by only 0.7%. Extensive simulations are conducted to demonstrate the versatility of LHIF in working scenarios at operational, design, and system levels. Compared with using homogeneous inference flows, the proposed LHIF achieves Pareto optimality with up to <inline-formula> <tex-math>$2.28times $ </tex-math></inline-formula> higher throughput and <inline-formula> <tex-math>$3.5times $ </tex-math></inline-formula> lower EMA.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"720-732"},"PeriodicalIF":2.8,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Cost-Effective Per-Pin ALPG for High-Speed Memory Testing
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-05 DOI: 10.1109/TVLSI.2024.3486332
Juyong Lee;Hayoung Lee;Sooryeong Lee;Sungho Kang
An algorithmic pattern generator (ALPG) has been developed within automatic test equipment (ATE) due to the extensive number of test patterns required for testing the memories. Since shared-resource ALPG generates the test pattern using the same arithmetic instruction and timing across multiple input/output (I/O) pins, the maximum operating frequency is limited by the delay of the arithmetic operation. On the other hand, per-pin ALPG can achieve high-speed operations by generating one bit of the test pattern for each I/O pin. However, the hardware cost is significantly increased due to the need for individual instruction and pattern generator (PG) for each I/O pin. To address these limitations, a cost-effective per-pin ALPG for high-speed memory testing is proposed. The proposed per-pin ALPG can achieve high-speed operations, and the hardware resources for storing and decoding the instructions are shared among multiple I/O pins to reduce the hardware cost. The experimental results indicate that the proposed ALPG can achieve a higher speed than the conventional per-pin ALPG with a reasonable hardware cost comparable to the conventional shared-resource ALPG.
{"title":"A Cost-Effective Per-Pin ALPG for High-Speed Memory Testing","authors":"Juyong Lee;Hayoung Lee;Sooryeong Lee;Sungho Kang","doi":"10.1109/TVLSI.2024.3486332","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3486332","url":null,"abstract":"An algorithmic pattern generator (ALPG) has been developed within automatic test equipment (ATE) due to the extensive number of test patterns required for testing the memories. Since shared-resource ALPG generates the test pattern using the same arithmetic instruction and timing across multiple input/output (I/O) pins, the maximum operating frequency is limited by the delay of the arithmetic operation. On the other hand, per-pin ALPG can achieve high-speed operations by generating one bit of the test pattern for each I/O pin. However, the hardware cost is significantly increased due to the need for individual instruction and pattern generator (PG) for each I/O pin. To address these limitations, a cost-effective per-pin ALPG for high-speed memory testing is proposed. The proposed per-pin ALPG can achieve high-speed operations, and the hardware resources for storing and decoding the instructions are shared among multiple I/O pins to reduce the hardware cost. The experimental results indicate that the proposed ALPG can achieve a higher speed than the conventional per-pin ALPG with a reasonable hardware cost comparable to the conventional shared-resource ALPG.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"867-871"},"PeriodicalIF":2.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Performance Elliptic Curve Scalar Multiplication Architecture Based on Interleaved Mechanism
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-05 DOI: 10.1109/TVLSI.2024.3486312
Jingqi Zhang;Zhiming Chen;Mingzhi Ma;Rongkun Jiang;An Wang;Weijiang Wang;Hua Dang
High-performance (HP) elliptic curve scalar multiplication (ECSM) hardware implementations hold significant importance in ensuring communication security in high-capacity and high-concurrence application scenarios. By analyzing the inherent priorities and parallelism in ECSMs, we proposed a novel HP ECSM algorithm and a partially parallel inversion algorithm based on the interleaved mechanism. With two dedicated multipliers and one interleaved multiplier, we introduced a compact hardware scheduling scheme to realize the consumption of four clock cycles within each loop of ECSM. The proposed HP ECSM architecture consists of two Karatsuba-Ofman multipliers (KOMs) and one classical multiplier (CM). The multiplexors and pipeline stages are meticulously designed to optimize the critical path (CP). The proposed architecture is implemented over Virtex-7 field-programmable gate array (FPGA), and the throughput reaches 158.03, 138.23, and 117.50 Mbps over $text {GF}(2^{163})$ , $text {GF}(2^{283})$ , and $text {GF}(2^{571})$ using 8762, 20451, and 41974 slices, respectively. The comparisons with recent existing works demonstrate that the performance and throughput of our design are among the top.
{"title":"High-Performance Elliptic Curve Scalar Multiplication Architecture Based on Interleaved Mechanism","authors":"Jingqi Zhang;Zhiming Chen;Mingzhi Ma;Rongkun Jiang;An Wang;Weijiang Wang;Hua Dang","doi":"10.1109/TVLSI.2024.3486312","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3486312","url":null,"abstract":"High-performance (HP) elliptic curve scalar multiplication (ECSM) hardware implementations hold significant importance in ensuring communication security in high-capacity and high-concurrence application scenarios. By analyzing the inherent priorities and parallelism in ECSMs, we proposed a novel HP ECSM algorithm and a partially parallel inversion algorithm based on the interleaved mechanism. With two dedicated multipliers and one interleaved multiplier, we introduced a compact hardware scheduling scheme to realize the consumption of four clock cycles within each loop of ECSM. The proposed HP ECSM architecture consists of two Karatsuba-Ofman multipliers (KOMs) and one classical multiplier (CM). The multiplexors and pipeline stages are meticulously designed to optimize the critical path (CP). The proposed architecture is implemented over Virtex-7 field-programmable gate array (FPGA), and the throughput reaches 158.03, 138.23, and 117.50 Mbps over <inline-formula> <tex-math>$text {GF}(2^{163})$ </tex-math></inline-formula>, <inline-formula> <tex-math>$text {GF}(2^{283})$ </tex-math></inline-formula>, and <inline-formula> <tex-math>$text {GF}(2^{571})$ </tex-math></inline-formula> using 8762, 20451, and 41974 slices, respectively. The comparisons with recent existing works demonstrate that the performance and throughput of our design are among the top.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"757-770"},"PeriodicalIF":2.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 0.2–2.6 GHz Reconfigurable Receiver Using RF-Gain-Adapted Impedance Matching and Gm-Separated IQ-Leakage Suppression Structure in 40-nm CMOS 一种基于自适应射频增益阻抗匹配和锗分离iq泄漏抑制结构的40纳米CMOS 0.2-2.6 GHz可重构接收机
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-30 DOI: 10.1109/TVLSI.2024.3477731
Zhaolin Yang;Jing Jin;Xiaoming Liu;Jianjun Zhou
A 0.2–2.6 GHz reconfigurable direct conversion receiver is proposed in this article. The receiver’s high-linearity mode and high-gain mode can be configured by either bypassing or including the low-noise amplifier (LNA) stage. An agile-switching module is designed to facilitate the mode transitioning. In high-gain mode, a variable-gain current-reused shunt-feedback (VGCRSF) LNA with radio frequency (RF) gain-adapted impedance matching technique is proposed. Instead of utilizing a shared transconductance (Gm) stage in both the I- and Q-path, the Gm-separated IQ-leakage suppression (GSIQLS) structure is employed in the mixer stage to reduce the complex and frequency-dependent IQ mismatch engendered by the nonideal local oscillator (LO) signal overlap. In baseband, both the gain and the bandwidth are made configurable through the utilization of a bi-quad low pass filter (LPF) and a programmable gain amplifier (PGA). The proposed receiver is fabricated in a 40-nm CMOS technology. Measurement results indicate a maximum conversion gain of 78.5 dB and a minimum noise figure (NF) of 2.5 dB are achieved. The input 1-dB compression point (IP1dB), in-band (IB) third-order input-referred intercept point (IIP3), and out-of-band (OOB) IIP3 are larger than 0, 9.7, and 13.1 dBm, respectively. The gain and phase mismatch of the quadrature receiver are lower than 0.3 dB and 1°, respectively, over the baseband bandwidth ranging from 410 kHz to 24 MHz. The receiver occupies an area of 0.605 mm2 and consumes a power of 75.4 mW.
本文提出了一种0.2-2.6 GHz可重构直接转换接收机。接收机的高线性模式和高增益模式可以通过旁路或包括低噪声放大器(LNA)级来配置。设计了一个敏捷切换模块来促进模式转换。在高增益模式下,提出了一种基于射频增益自适应阻抗匹配技术的变增益电流复用并联反馈LNA。在混频器级中采用了Gm分离的IQ泄漏抑制(GSIQLS)结构,而不是在I路和q路中使用共享的跨导(Gm)级,以减少由非理想本振(LO)信号重叠引起的复杂和频率相关的IQ不匹配。在基带,增益和带宽都是可配置的,通过利用双四通低通滤波器(LPF)和可编程增益放大器(PGA)。该接收器采用40纳米CMOS技术制造。测量结果表明,最大转换增益为78.5 dB,最小噪声系数(NF)为2.5 dB。输入1db压缩点(IP1dB)、带内(IB)三阶输入参考截距点(IIP3)和带外(OOB) IIP3分别大于0、9.7和13.1 dBm。在410 kHz至24 MHz的基带带宽范围内,正交接收机的增益和相位失配分别小于0.3 dB和1°。接收机占地面积0.605 mm2,功耗75.4 mW。
{"title":"A 0.2–2.6 GHz Reconfigurable Receiver Using RF-Gain-Adapted Impedance Matching and Gm-Separated IQ-Leakage Suppression Structure in 40-nm CMOS","authors":"Zhaolin Yang;Jing Jin;Xiaoming Liu;Jianjun Zhou","doi":"10.1109/TVLSI.2024.3477731","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3477731","url":null,"abstract":"A 0.2–2.6 GHz reconfigurable direct conversion receiver is proposed in this article. The receiver’s high-linearity mode and high-gain mode can be configured by either bypassing or including the low-noise amplifier (LNA) stage. An agile-switching module is designed to facilitate the mode transitioning. In high-gain mode, a variable-gain current-reused shunt-feedback (VGCRSF) LNA with radio frequency (RF) gain-adapted impedance matching technique is proposed. Instead of utilizing a shared transconductance (Gm) stage in both the I- and Q-path, the Gm-separated IQ-leakage suppression (GSIQLS) structure is employed in the mixer stage to reduce the complex and frequency-dependent IQ mismatch engendered by the nonideal local oscillator (LO) signal overlap. In baseband, both the gain and the bandwidth are made configurable through the utilization of a bi-quad low pass filter (LPF) and a programmable gain amplifier (PGA). The proposed receiver is fabricated in a 40-nm CMOS technology. Measurement results indicate a maximum conversion gain of 78.5 dB and a minimum noise figure (NF) of 2.5 dB are achieved. The input 1-dB compression point (IP1dB), in-band (IB) third-order input-referred intercept point (IIP3), and out-of-band (OOB) IIP3 are larger than 0, 9.7, and 13.1 dBm, respectively. The gain and phase mismatch of the quadrature receiver are lower than 0.3 dB and 1°, respectively, over the baseband bandwidth ranging from 410 kHz to 24 MHz. The receiver occupies an area of 0.605 mm2 and consumes a power of 75.4 mW.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"234-247"},"PeriodicalIF":2.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Programmable and Reconfigurable CMOS Analog Hopfield Network for NP-Hard Problems
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-29 DOI: 10.1109/TVLSI.2024.3480958
Pranav O. Mathews;Jennifer O. Hasler
Analog Hopfield networks perform continuous energy minimization, leading to efficient and near-optimal solutions to nonpolynomial (NP)-hard problems. However, practical implementations suffer from scaling and connectivity issues. A programmable and reconfigurable analog Hopfield network is presented that addresses these challenges through a reconfigurable Manhattan architecture with a high-precision 14-bit floating-gate (FG) compute-in-memory (CiM) fabric. The network is implemented on a field programmable analog array (FPAA) and experimentally tested on three different NP-hard problems with different scaling challenges: Weighted Max-Cut (high connectivity and weight precision), traveling salesman problem (TSP) (high connectivity and medium weight precision), and Boolean Satisfiability/3SAT (low connectivity and weight precision) where it solved each problem optimally in microseconds.
{"title":"A Programmable and Reconfigurable CMOS Analog Hopfield Network for NP-Hard Problems","authors":"Pranav O. Mathews;Jennifer O. Hasler","doi":"10.1109/TVLSI.2024.3480958","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3480958","url":null,"abstract":"Analog Hopfield networks perform continuous energy minimization, leading to efficient and near-optimal solutions to nonpolynomial (NP)-hard problems. However, practical implementations suffer from scaling and connectivity issues. A programmable and reconfigurable analog Hopfield network is presented that addresses these challenges through a reconfigurable Manhattan architecture with a high-precision 14-bit floating-gate (FG) compute-in-memory (CiM) fabric. The network is implemented on a field programmable analog array (FPAA) and experimentally tested on three different NP-hard problems with different scaling challenges: Weighted Max-Cut (high connectivity and weight precision), traveling salesman problem (TSP) (high connectivity and medium weight precision), and Boolean Satisfiability/3SAT (low connectivity and weight precision) where it solved each problem optimally in microseconds.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"821-830"},"PeriodicalIF":2.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Histogram-Based Calibration Algorithm of Capacitor Mismatch for SAR ADCs
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-29 DOI: 10.1109/TVLSI.2024.3481993
Hui Hu;Bingbing Yao;Yi Shan;Lei Qiu
The conversion accuracy of successive approximation register (SAR) analog-to-digital converter (ADC) is mainly affected by the capacitor mismatch. In this brief, a histogram-based calibration technique is proposed, which does not require any additional analog circuitry. In this work, the method of partial fitting is used to detect irregular code densities, and construct a cost function to update the weight recursively. The prototype of the calibration is verified with a 12-bit SAR ADC manufactured in 28-nm standard CMOS process. At the sampling rate 50 MS/s, the measurement results indicate that the maximum spurious-free dynamic range (SFDR) can be improved from 77.26 to 88.26 dB, using 10.6 fJ/conversion-step, including reference voltage buffer, with a low-frequency input signal.
{"title":"A Histogram-Based Calibration Algorithm of Capacitor Mismatch for SAR ADCs","authors":"Hui Hu;Bingbing Yao;Yi Shan;Lei Qiu","doi":"10.1109/TVLSI.2024.3481993","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3481993","url":null,"abstract":"The conversion accuracy of successive approximation register (SAR) analog-to-digital converter (ADC) is mainly affected by the capacitor mismatch. In this brief, a histogram-based calibration technique is proposed, which does not require any additional analog circuitry. In this work, the method of partial fitting is used to detect irregular code densities, and construct a cost function to update the weight recursively. The prototype of the calibration is verified with a 12-bit SAR ADC manufactured in 28-nm standard CMOS process. At the sampling rate 50 MS/s, the measurement results indicate that the maximum spurious-free dynamic range (SFDR) can be improved from 77.26 to 88.26 dB, using 10.6 fJ/conversion-step, including reference voltage buffer, with a low-frequency input signal.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"872-876"},"PeriodicalIF":2.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Adaptive Maintain Power Signature (MPS) Scheme With Reusable Current Generator for Powered Device (PD)
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-28 DOI: 10.1109/TVLSI.2024.3480955
Yongyuan Li;Xuhong Yin;Wei Guo;Qiang Wu;Yongbo Zhang;Yong You;Zhangming Zhu
The power over Ethernet (PoE) technology has gained intensive attention in networking market owing to the advantages of compactness, flexibility, and cost in application. The automatic maintain power signature (MPS) function specified by IEEE standard extracts the periodic pulsed current to enable applications requiring low power modes. However, a large driving capacity is required due to a large MPS current above 10 mA, sacrificing a certain area. This brief proposes an adaptive MPS scheme, which reuses existing class regulator and delay timer to source a pulsed MPS current to meet the MPS requirements, saving an area of 0.0104 mm2. The proposed MPS scheme has been fabricated in 0.18- $mu $ m 120-V BCD process and the area is $1.37times 1.00$ mm2. The experimental results show that the proposed PoE interface draws a pulsed current with a period of 312 ms and 25.6% duty cycle to address the issue of MPS absence in very low-power standby modes.
{"title":"An Adaptive Maintain Power Signature (MPS) Scheme With Reusable Current Generator for Powered Device (PD)","authors":"Yongyuan Li;Xuhong Yin;Wei Guo;Qiang Wu;Yongbo Zhang;Yong You;Zhangming Zhu","doi":"10.1109/TVLSI.2024.3480955","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3480955","url":null,"abstract":"The power over Ethernet (PoE) technology has gained intensive attention in networking market owing to the advantages of compactness, flexibility, and cost in application. The automatic maintain power signature (MPS) function specified by IEEE standard extracts the periodic pulsed current to enable applications requiring low power modes. However, a large driving capacity is required due to a large MPS current above 10 mA, sacrificing a certain area. This brief proposes an adaptive MPS scheme, which reuses existing class regulator and delay timer to source a pulsed MPS current to meet the MPS requirements, saving an area of 0.0104 mm2. The proposed MPS scheme has been fabricated in 0.18-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m 120-V BCD process and the area is <inline-formula> <tex-math>$1.37times 1.00$ </tex-math></inline-formula> mm2. The experimental results show that the proposed PoE interface draws a pulsed current with a period of 312 ms and 25.6% duty cycle to address the issue of MPS absence in very low-power standby modes.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"877-881"},"PeriodicalIF":2.8,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143496497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Self-Calibrated Unified Voltage-and-Frequency Regulator System Design Based on Universal Logic Line Circuit 基于通用逻辑线路电路的自校准统一电压调频系统设计
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-28 DOI: 10.1109/TVLSI.2024.3466132
Jiliang Liu;Huidong Zhao;Zhi Li;Kangning Wang;Shushan Qiao
In this brief, a unified voltage frequency regulator (UVFR) system is designed to eliminate the voltage margin induced by process, voltage, and temperature (PVT) variations. The frequency is regulated with voltage by a universal logic line oscillator (ULLO), which can protect the system from timing violations. The length of the ULLO is self-calibrated by a ULL-based time-digital converter (ULL-TDC) and an in situ half-critical path timing detector, where the ULL is designed to track the critical path delay. A fully synthesizable digital low dropout (DLDO) is designed with the ULL-TDC and a proportional differential (PD) circuit for voltage regulation. The proposed system is implemented in an ARM Cortex-M0 microcontroller in 22 nm technology. Simulation results show that the ULL can accurately track the critical path delay with a maximum variation of 3% at 0.6 V and 11.5% at 0.45 V. The UVFR system consumes 13.2–112 uW of power overhead, and eliminates the voltage margin by 22.3%–28% while reducing the power consumption by 35%–42.3%.
在本文中,设计了一个统一的电压频率调节器(UVFR)系统,以消除由过程、电压和温度(PVT)变化引起的电压裕度。该频率由通用逻辑线振荡器(ULLO)的电压调节,可以保护系统不发生时序违规。ULLO的长度由基于ULL的时间数字转换器(ULL- tdc)和原位半临界路径定时检测器自校准,其中ULL被设计用于跟踪关键路径延迟。利用ULL-TDC和比例差分(PD)电压调节电路设计了一个完全可合成的数字低差(DLDO)电路。该系统在22纳米工艺的ARM Cortex-M0微控制器上实现。仿真结果表明,该方法能准确跟踪关键路径延迟,在0.6 V时最大变化3%,在0.45 V时最大变化11.5%。UVFR系统的开销功率为13.2-112 uW,消除了22.3%-28%的电压裕度,同时降低了35%-42.3%的功耗。
{"title":"A Self-Calibrated Unified Voltage-and-Frequency Regulator System Design Based on Universal Logic Line Circuit","authors":"Jiliang Liu;Huidong Zhao;Zhi Li;Kangning Wang;Shushan Qiao","doi":"10.1109/TVLSI.2024.3466132","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3466132","url":null,"abstract":"In this brief, a unified voltage frequency regulator (UVFR) system is designed to eliminate the voltage margin induced by process, voltage, and temperature (PVT) variations. The frequency is regulated with voltage by a universal logic line oscillator (ULLO), which can protect the system from timing violations. The length of the ULLO is self-calibrated by a ULL-based time-digital converter (ULL-TDC) and an in situ half-critical path timing detector, where the ULL is designed to track the critical path delay. A fully synthesizable digital low dropout (DLDO) is designed with the ULL-TDC and a proportional differential (PD) circuit for voltage regulation. The proposed system is implemented in an ARM Cortex-M0 microcontroller in 22 nm technology. Simulation results show that the ULL can accurately track the critical path delay with a maximum variation of 3% at 0.6 V and 11.5% at 0.45 V. The UVFR system consumes 13.2–112 uW of power overhead, and eliminates the voltage margin by 22.3%–28% while reducing the power consumption by 35%–42.3%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"593-597"},"PeriodicalIF":2.8,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1