Pub Date : 2024-09-17DOI: 10.1109/TVLSI.2024.3454350
Kimi Jokiniemi;Kaisa Ryynänen;Joni Vähä;Elmo Kankkunen;Kari Stadius;Jussi Ryynänen
This article presents a wideband active millimeter wave (mmWave) CMOS downconversion mixer preceded by thorough analysis. This article aims to provide solid reasoning for the proper choice of mixer topology and present methods to achieve high mixer performance, guiding mmWave mixer design. The article first analyses passive and active mixer input impedance and switching performance with a weak sinusoidal local oscillator (LO) signal, demonstrating that passive mixer switching performance is far more dependent on the LO signal. The article then introduces different active mixer design enhancement techniques, namely, peaking inductances and individual mixer stage biasing. The article proposes an enhanced Gilbert cell mixer that uses transformer coupling between the transconductance and switching stages. The complete mixer structure with an LO buffer and an IF amplifier consumes an area of only 0.13 mm2 fabricated in a 22-nm FDSOI process. The design achieves a measured peak voltage conversion gain (CG) of 3.5 dB, an exceptionally wide 55–100-GHz RF bandwidth, and a 10-GHz IF bandwidth. The complete mixer consumes 33 mW of power from a low 0.8-V supply voltage and demonstrates an input 1-dB gain compression point of −6 dBm.
{"title":"55–100-GHz Enhanced Gilbert Cell Mixer Design in 22-nm FDSOI CMOS","authors":"Kimi Jokiniemi;Kaisa Ryynänen;Joni Vähä;Elmo Kankkunen;Kari Stadius;Jussi Ryynänen","doi":"10.1109/TVLSI.2024.3454350","DOIUrl":"10.1109/TVLSI.2024.3454350","url":null,"abstract":"This article presents a wideband active millimeter wave (mmWave) CMOS downconversion mixer preceded by thorough analysis. This article aims to provide solid reasoning for the proper choice of mixer topology and present methods to achieve high mixer performance, guiding mmWave mixer design. The article first analyses passive and active mixer input impedance and switching performance with a weak sinusoidal local oscillator (LO) signal, demonstrating that passive mixer switching performance is far more dependent on the LO signal. The article then introduces different active mixer design enhancement techniques, namely, peaking inductances and individual mixer stage biasing. The article proposes an enhanced Gilbert cell mixer that uses transformer coupling between the transconductance and switching stages. The complete mixer structure with an LO buffer and an IF amplifier consumes an area of only 0.13 mm2 fabricated in a 22-nm FDSOI process. The design achieves a measured peak voltage conversion gain (CG) of 3.5 dB, an exceptionally wide 55–100-GHz RF bandwidth, and a 10-GHz IF bandwidth. The complete mixer consumes 33 mW of power from a low 0.8-V supply voltage and demonstrates an input 1-dB gain compression point of −6 dBm.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2186-2197"},"PeriodicalIF":2.8,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A nanopower, buffered CMOS voltage reference designed to operate across the entire industrial temperature range (from $- 40~^{circ }$ C to $125~^{circ }$ C) and with an input voltage range from 1.2 to 5 V (automotive applications) is presented. The solution provides 390 mV at $20~^{circ }$ C and is implemented in a standard BCD technology featuring 160-nm CMOS devices. It is characterized by an average temperature coefficient of 200 ppm/°C, a line sensitivity (LS) of 0.138%/V, and a power supply rejection of −83 dB at 100 Hz. In addition, the circuit occupies a die area of 0.146 mm2 (with the reference circuit alone covering 0.043 mm2) and maintains a highly stable current consumption of around 45 nA across various process and input voltage conditions (25 nA for the reference circuit alone) while providing a maximum output current of $630~mu $ A with a load regulation of 0.016 mV/$mu $ A.
{"title":"46-nA High-PSR CMOS Buffered Voltage Reference With 1.2–5 V and −40 ◦C to 125 ◦C Operating Range","authors":"Chiara Venezia;Andrea Ballo;Alfio Dario Grasso;Alessandro Rizzo;Calogero Ribellino;Salvatore Pennisi","doi":"10.1109/TVLSI.2024.3455428","DOIUrl":"10.1109/TVLSI.2024.3455428","url":null,"abstract":"A nanopower, buffered CMOS voltage reference designed to operate across the entire industrial temperature range (from <inline-formula> <tex-math>$- 40~^{circ }$ </tex-math></inline-formula>C to <inline-formula> <tex-math>$125~^{circ }$ </tex-math></inline-formula>C) and with an input voltage range from 1.2 to 5 V (automotive applications) is presented. The solution provides 390 mV at <inline-formula> <tex-math>$20~^{circ }$ </tex-math></inline-formula>C and is implemented in a standard BCD technology featuring 160-nm CMOS devices. It is characterized by an average temperature coefficient of 200 ppm/°C, a line sensitivity (LS) of 0.138%/V, and a power supply rejection of −83 dB at 100 Hz. In addition, the circuit occupies a die area of 0.146 mm2 (with the reference circuit alone covering 0.043 mm2) and maintains a highly stable current consumption of around 45 nA across various process and input voltage conditions (25 nA for the reference circuit alone) while providing a maximum output current of <inline-formula> <tex-math>$630~mu $ </tex-math></inline-formula>A with a load regulation of 0.016 mV/<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>A.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"326-336"},"PeriodicalIF":2.8,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10682102","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the end of Moore’s law and Dennard scaling, waferscale systems or processors that integrate multiple pre-tested known good dies (KGDs) on a waferscale-interposer are new approaches to further improve the chiplet-based system’s performance. This article explores the network on wafer (NoW) architecture of waferscale switching system under several physical constraints. A software-based approach is proposed to redefine the topological property. A five-level butterfly fat-tree (BFT)-like logical topology with 8.96-Tb/s (896 ports $times 10$ Gb/s/port) switching bandwidth is achieved based on 2-D-mesh-like physical topology. We show that the proposed BFT-like topology with breadth-first-search (BFS) based traffic balanced routing algorithm reduces 55.6% hops, 41.4% transmission delay, and improves 24.2% throughput compared to 2-D-mesh-like topology under different traffic distributions. This BFT-like waferscale switching system is suitable for high-performance computing and data centers. In addition, the numerical analysis shows that the waferscale package can provide significant power efficiency and latency advantages compared to the typical single-chip package, which mainly benefits from the short-reach IO requirements. Note that the proposed waferscale switching system is compatible with high-switch-capacity dies with advanced process technology, which can further improve system performance. Finally, we present the physical implementations for the waferscale system with heterogeneous dies.
{"title":"Architectural Exploration for Waferscale Switching System","authors":"Zhiquan Wan;Zhipeng Cao;Shunbin Li;Peijie Li;Qingwen Deng;Weihao Wang;Kun Zhang;Guandong Liu;Ruyun Zhang;Qinrang Liu","doi":"10.1109/TVLSI.2024.3455332","DOIUrl":"10.1109/TVLSI.2024.3455332","url":null,"abstract":"With the end of Moore’s law and Dennard scaling, waferscale systems or processors that integrate multiple pre-tested known good dies (KGDs) on a waferscale-interposer are new approaches to further improve the chiplet-based system’s performance. This article explores the network on wafer (NoW) architecture of waferscale switching system under several physical constraints. A software-based approach is proposed to redefine the topological property. A five-level butterfly fat-tree (BFT)-like logical topology with 8.96-Tb/s (896 ports <inline-formula> <tex-math>$times 10$ </tex-math></inline-formula> Gb/s/port) switching bandwidth is achieved based on 2-D-mesh-like physical topology. We show that the proposed BFT-like topology with breadth-first-search (BFS) based traffic balanced routing algorithm reduces 55.6% hops, 41.4% transmission delay, and improves 24.2% throughput compared to 2-D-mesh-like topology under different traffic distributions. This BFT-like waferscale switching system is suitable for high-performance computing and data centers. In addition, the numerical analysis shows that the waferscale package can provide significant power efficiency and latency advantages compared to the typical single-chip package, which mainly benefits from the short-reach IO requirements. Note that the proposed waferscale switching system is compatible with high-switch-capacity dies with advanced process technology, which can further improve system performance. Finally, we present the physical implementations for the waferscale system with heterogeneous dies.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"512-524"},"PeriodicalIF":2.8,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10682064","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1109/TVLSI.2024.3454286
Seung Ho Shin;Hayoung Lee;Sungho Kang
The rapid increment of the memory density leads to an increment of fault occurrence in memory cells. To improve the memory yield, effective memory test and repair methodologies for automatic test equipment (ATE) have been studied. Multiple memory chips are tested simultaneously by the ATE to improve throughput and reduce costs. In general, redundancy analysis (RA) is used for memory repair. However, since conventional RA methods store fault information in the respective failure bitmaps and operate sequentially, those have limitations due to the high area and analysis time. To address these problems, a novel graphic processing unit (GPU)-based RA method has been proposed which significantly enhances the efficiency of searching for repair solutions for multiple memories. The proposed RA method strategically focuses on the pivot line to efficiently utilize parallel processing and reduce the solution search space. Moreover, the proposed method does not require the extensive use of failure bitmaps since all process is conducted on the GPU. The process involves real-time fault collection, analysis, spare allocation, and solution decision process dynamically during the memory test. Experimental results demonstrate that the performance of the proposed RA method achieves an optimal repair rate and high analysis speed for multiple memories.
{"title":"Effective Parallel Redundancy Analysis Using GPU for Memory Repair","authors":"Seung Ho Shin;Hayoung Lee;Sungho Kang","doi":"10.1109/TVLSI.2024.3454286","DOIUrl":"10.1109/TVLSI.2024.3454286","url":null,"abstract":"The rapid increment of the memory density leads to an increment of fault occurrence in memory cells. To improve the memory yield, effective memory test and repair methodologies for automatic test equipment (ATE) have been studied. Multiple memory chips are tested simultaneously by the ATE to improve throughput and reduce costs. In general, redundancy analysis (RA) is used for memory repair. However, since conventional RA methods store fault information in the respective failure bitmaps and operate sequentially, those have limitations due to the high area and analysis time. To address these problems, a novel graphic processing unit (GPU)-based RA method has been proposed which significantly enhances the efficiency of searching for repair solutions for multiple memories. The proposed RA method strategically focuses on the pivot line to efficiently utilize parallel processing and reduce the solution search space. Moreover, the proposed method does not require the extensive use of failure bitmaps since all process is conducted on the GPU. The process involves real-time fault collection, analysis, spare allocation, and solution decision process dynamically during the memory test. Experimental results demonstrate that the performance of the proposed RA method achieves an optimal repair rate and high analysis speed for multiple memories.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"462-474"},"PeriodicalIF":2.8,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Embedded real-time systems impose rigorous timing constraints, where the failure to complete critical tasks within prescribed deadlines can lead to system crashes and catastrophic errors. Control latency, encompassing I/O and interrupt latency, significantly impacts system performance. Previous studies have primarily concentrated on architectural design to meet timing requirements or optimize for performance enhancement. Although the Ti programmable real-time unit (PRU) addresses both timing requirements and control latency, it remains a proprietary commercial chip. This article introduces a deterministic response architecture called Sophon, founded on the open and freely available reduced instruction set computer five (RISC-V). The essential part of this architecture is a tiny and flexible Sophon core that has fixed instruction latency. We propose an enhanced instruction set architecture (ISA) extension interface (EEI) capable of transmitting up to 32 operands in a single instruction, facilitating the development of domain-specific applications. In addition, we have devised two custom instructions to minimize control latency. The Sophon core requires a minimum of 28.6k gate equivalents. Experimental results demonstrate that the Sophon architecture eliminates execution time deviations while preserving low control latency. The highest achievable general purpose I/O (GPIO) flipping frequency is half of the core frequency, and the fastest interrupt latency is three clock cycles.
{"title":"Sophon: A Time-Repeatable and Low-Latency Architecture for Embedded Real-Time Systems Based on RISC-V","authors":"Zhe Huang;Xingyao Chen;Feng Gao;Ruige Li;Xiguang Wu;Fan Zhang","doi":"10.1109/TVLSI.2024.3447279","DOIUrl":"10.1109/TVLSI.2024.3447279","url":null,"abstract":"Embedded real-time systems impose rigorous timing constraints, where the failure to complete critical tasks within prescribed deadlines can lead to system crashes and catastrophic errors. Control latency, encompassing I/O and interrupt latency, significantly impacts system performance. Previous studies have primarily concentrated on architectural design to meet timing requirements or optimize for performance enhancement. Although the Ti programmable real-time unit (PRU) addresses both timing requirements and control latency, it remains a proprietary commercial chip. This article introduces a deterministic response architecture called Sophon, founded on the open and freely available reduced instruction set computer five (RISC-V). The essential part of this architecture is a tiny and flexible Sophon core that has fixed instruction latency. We propose an enhanced instruction set architecture (ISA) extension interface (EEI) capable of transmitting up to 32 operands in a single instruction, facilitating the development of domain-specific applications. In addition, we have devised two custom instructions to minimize control latency. The Sophon core requires a minimum of 28.6k gate equivalents. Experimental results demonstrate that the Sophon architecture eliminates execution time deviations while preserving low control latency. The highest achievable general purpose I/O (GPIO) flipping frequency is half of the core frequency, and the fastest interrupt latency is three clock cycles.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"221-233"},"PeriodicalIF":2.8,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1109/TVLSI.2024.3445631
Haitao Du;Hairui Zhu;Song Chen;Yi Kang
A dynamic random access memory (DRAM) relies on periodic refresh operations to prevent data loss caused by charge leakage. As memory capacities continue to grow, refresh power consumption accounts for an increasing proportion of the total DRAM power, and in some contexts, it even becomes a major contributor to power consumption. To address this issue, previous research has explored the tradeoff between DRAM reliability and refresh overhead. However, DRAM reliability degrades as technology nodes advance, making these approaches inapplicable in scenarios, such as servers, where high data reliability is critical. Furthermore, these approaches require modifications to the standard DRAM interface protocol and memory controller (MC), rendering them infeasible for standalone use in computer systems. In this article, we propose an energy-efficient charge-recycling DRAM (CR-DRAM), which enables multiple rounds of charge (i.e., energy) recycling between subarrays within a single autorefresh (AR) process. After refreshing a row, CR-DRAM reuses the charge stored in the bitline (BL) capacitors to supply power for refreshing the next row in another subarray, rather than discharging them directly. Since CR-DRAM is compatible with the joint electron device engineering council (JEDEC) interface standard, it can be easily integrated into modern computer systems. Our circuit-level simulation shows that CR-DRAM significantly reduces AR power consumption by 33.9% compared with conventional DRAM, with a modest area overhead of less than 0.9%. Furthermore, our system-level evaluation shows that CR-DRAM offers an average energy savings of 9.2% (maximum of 11.9%) compared with 8-Gb double data rate 4 (DDR4) DRAM across SPEC-2006 benchmark workloads.
{"title":"CR-DRAM: Improving DRAM Refresh Energy Efficiency With Inter-Subarray Charge Recycling","authors":"Haitao Du;Hairui Zhu;Song Chen;Yi Kang","doi":"10.1109/TVLSI.2024.3445631","DOIUrl":"10.1109/TVLSI.2024.3445631","url":null,"abstract":"A dynamic random access memory (DRAM) relies on periodic refresh operations to prevent data loss caused by charge leakage. As memory capacities continue to grow, refresh power consumption accounts for an increasing proportion of the total DRAM power, and in some contexts, it even becomes a major contributor to power consumption. To address this issue, previous research has explored the tradeoff between DRAM reliability and refresh overhead. However, DRAM reliability degrades as technology nodes advance, making these approaches inapplicable in scenarios, such as servers, where high data reliability is critical. Furthermore, these approaches require modifications to the standard DRAM interface protocol and memory controller (MC), rendering them infeasible for standalone use in computer systems. In this article, we propose an energy-efficient charge-recycling DRAM (CR-DRAM), which enables multiple rounds of charge (i.e., energy) recycling between subarrays within a single autorefresh (AR) process. After refreshing a row, CR-DRAM reuses the charge stored in the bitline (BL) capacitors to supply power for refreshing the next row in another subarray, rather than discharging them directly. Since CR-DRAM is compatible with the joint electron device engineering council (JEDEC) interface standard, it can be easily integrated into modern computer systems. Our circuit-level simulation shows that CR-DRAM significantly reduces AR power consumption by 33.9% compared with conventional DRAM, with a modest area overhead of less than 0.9%. Furthermore, our system-level evaluation shows that CR-DRAM offers an average energy savings of 9.2% (maximum of 11.9%) compared with 8-Gb double data rate 4 (DDR4) DRAM across SPEC-2006 benchmark workloads.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"21-34"},"PeriodicalIF":2.8,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1109/TVLSI.2024.3452032
Arunkumar P Chavan;Shrish Shrinath Vaidya;Sanket M. Mantrashetti;Abhishek Gurunath Dastikopp;Kishan S. Murthy;H. V. Ravish Aradhya;Prakash Pawar
Analog integrated circuit (IC) design and its automation pose significant challenges due to the time-consuming mathematical computations and complexity of circuit design. Though efforts have been made to automate the analog design flow, the current approach falls short in meeting the exact design requirements and plagued by inaccuracies, highlighting the necessity for a more robust approach capable of accurately predicting circuits. In addition, there is a need for an improved dataset collection technique to enhance the overall performance of the automation process. The power management unit (PMU) is a crucial block in any IC that requires the design of low-dropout regulator (LDO) for which amplifiers are fundamental blocks. This research harnesses the capabilities of deep neural networks (DNNs) to automate essential amplifier blocks, such as the differential amplifier (DiffAmp) and two-stage operational amplifier (OpAmp). In addition, it proposes an automation framework for the higher level circuitry of the LDO. This article introduces a novel “TriNet” architecture designed for various parameters of amplifiers, including gain, bandwidth, and power facilitating precise predictions for DiffAmp and OpAmp, and presents a decoder architecture tailored for LDO. A notable aspect is the development of an efficient technique for acquiring larger datasets in a condensed timeframe. The presented methodologies demonstrate high accuracy rates, achieving 97.3% for DiffAmp and OpAmp circuits and 94.3% for LDO design.
{"title":"A Novel TriNet Architecture for Enhanced Analog IC Design Automation","authors":"Arunkumar P Chavan;Shrish Shrinath Vaidya;Sanket M. Mantrashetti;Abhishek Gurunath Dastikopp;Kishan S. Murthy;H. V. Ravish Aradhya;Prakash Pawar","doi":"10.1109/TVLSI.2024.3452032","DOIUrl":"10.1109/TVLSI.2024.3452032","url":null,"abstract":"Analog integrated circuit (IC) design and its automation pose significant challenges due to the time-consuming mathematical computations and complexity of circuit design. Though efforts have been made to automate the analog design flow, the current approach falls short in meeting the exact design requirements and plagued by inaccuracies, highlighting the necessity for a more robust approach capable of accurately predicting circuits. In addition, there is a need for an improved dataset collection technique to enhance the overall performance of the automation process. The power management unit (PMU) is a crucial block in any IC that requires the design of low-dropout regulator (LDO) for which amplifiers are fundamental blocks. This research harnesses the capabilities of deep neural networks (DNNs) to automate essential amplifier blocks, such as the differential amplifier (DiffAmp) and two-stage operational amplifier (OpAmp). In addition, it proposes an automation framework for the higher level circuitry of the LDO. This article introduces a novel “TriNet” architecture designed for various parameters of amplifiers, including gain, bandwidth, and power facilitating precise predictions for DiffAmp and OpAmp, and presents a decoder architecture tailored for LDO. A notable aspect is the development of an efficient technique for acquiring larger datasets in a condensed timeframe. The presented methodologies demonstrate high accuracy rates, achieving 97.3% for DiffAmp and OpAmp circuits and 94.3% for LDO design.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2046-2059"},"PeriodicalIF":2.8,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10672521","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1109/TVLSI.2024.3449293
Xiang Yan;Kefan Qin;Xinyue Zheng;Weibo Hu;Wei Ma;Haitao Cui
A dual-channel interleaved analog-to-digital converter (ADC) operating at 320 MS/s is prototyped to validate a fast-converging foreground time calibration algorithm that is independent of ADC offset errors. An input polarity switching technique is introduced to eliminate the impact of sub-ADC offset mismatches during foreground time calibration. After foreground calibration, the signal-to-noise and distortion ratio (SNDR) and spurious free dynamic range (SFDR) are improved by 8.6 and 18.4 dB, respectively. In the sub-ADC design, a comparison functionality is enabled in the digital circuits to prevent metastability and expedite data conversion. The single-channel conversion rates reach 160 MS/s. The ADC is implemented via 40-nm digital CMOS technology, achieving a 52.01 dB signal-to-noise plus distortion ratio (SNDR) at near-Nyquist input while sampling at 320 MS/s. The overall power consumption is 3.65 mW, which includes an on-chip reference buffer and a clock circuit.
{"title":"A Two-Channel Interleaved ADC With Fast-Converging Foreground Time Calibration and Comparison-Based Control Logic","authors":"Xiang Yan;Kefan Qin;Xinyue Zheng;Weibo Hu;Wei Ma;Haitao Cui","doi":"10.1109/TVLSI.2024.3449293","DOIUrl":"10.1109/TVLSI.2024.3449293","url":null,"abstract":"A dual-channel interleaved analog-to-digital converter (ADC) operating at 320 MS/s is prototyped to validate a fast-converging foreground time calibration algorithm that is independent of ADC offset errors. An input polarity switching technique is introduced to eliminate the impact of sub-ADC offset mismatches during foreground time calibration. After foreground calibration, the signal-to-noise and distortion ratio (SNDR) and spurious free dynamic range (SFDR) are improved by 8.6 and 18.4 dB, respectively. In the sub-ADC design, a comparison functionality is enabled in the digital circuits to prevent metastability and expedite data conversion. The single-channel conversion rates reach 160 MS/s. The ADC is implemented via 40-nm digital CMOS technology, achieving a 52.01 dB signal-to-noise plus distortion ratio (SNDR) at near-Nyquist input while sampling at 320 MS/s. The overall power consumption is 3.65 mW, which includes an on-chip reference buffer and a clock circuit.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2001-2011"},"PeriodicalIF":2.8,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the complex nonlinear problem of determining class-F voltage-controlled oscillator (VCO) dimensions, this article introduces an electronic design automation (EDA) framework that rapidly optimizes multiple design objectives yielding superior outcomes. The framework incorporates fast frequency determination, harmonic alignment, and extremal optimization of multiobjective particle swarm optimization with crowding distance (FHE-MOPSO-CD), an efficient algorithm we developed specifically for class-F VCOs, which includes transformer-based tank circuit strategies and extremum optimization techniques. Using a 55-nm CMOS process, this algorithm optimized various class-F VCO topologies, achieving excellent metrics and confirming its versatility. Optimization results indicate that at a 10-MHz offset, the figure of merit (FoM) is at least 8.81 dBc/Hz higher than values reported in the literature. Compared with other analog/RF dimension optimization methods, our approach yielded a higher hypervolume, indicating better convergence and greater diversity of solutions.
{"title":"Multiobjective Optimization of Class-F Oscillators","authors":"Zhan Qu;Zhenjiao Chen;Xingqiang Shi;Ya Zhao;Guohe Zhang;Feng Liang","doi":"10.1109/TVLSI.2024.3449567","DOIUrl":"10.1109/TVLSI.2024.3449567","url":null,"abstract":"To address the complex nonlinear problem of determining class-F voltage-controlled oscillator (VCO) dimensions, this article introduces an electronic design automation (EDA) framework that rapidly optimizes multiple design objectives yielding superior outcomes. The framework incorporates fast frequency determination, harmonic alignment, and extremal optimization of multiobjective particle swarm optimization with crowding distance (FHE-MOPSO-CD), an efficient algorithm we developed specifically for class-F VCOs, which includes transformer-based tank circuit strategies and extremum optimization techniques. Using a 55-nm CMOS process, this algorithm optimized various class-F VCO topologies, achieving excellent metrics and confirming its versatility. Optimization results indicate that at a 10-MHz offset, the figure of merit (FoM) is at least 8.81 dBc/Hz higher than values reported in the literature. Compared with other analog/RF dimension optimization methods, our approach yielded a higher hypervolume, indicating better convergence and greater diversity of solutions.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"303-314"},"PeriodicalIF":2.8,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1109/TVLSI.2024.3450452
Xu Fang;Xiaofeng Zhao
Monolithic 3-D (M3D) integrated circuits (ICs) have the potential to achieve significantly higher device density compared with conventional ICs. The implementation of nanoscale interlayer vias (ILVs) in M3D plays a pivotal role in achieving enhanced transistor density and increased flexibility for circuit design. However, the high integration density and aggressive scaling of the interlayer dielectric make ILVs especially prone to defects. In this article, we propose a post-bond ILV test method for the detection and diagnosis of ILVs’ open faults, stuck-at faults (SAFs), and short faults in the fabrication process of M3D ICs. The proposed method is well-suited for post-bond ILV test, which can significantly save the cost and improve the yield. The HSPICE simulation results demonstrate that the proposed method can effectively detect and localize ILVs’ open, stuck-at 0, stuck-at 1, and short faults. Evaluation results for M3D benchmarks demonstrate the proposed method has a quite small power-performance–area (PPA) overhead and a relatively low test-time overhead.
{"title":"A Post-Bond ILV Test Method in Monolithic 3-D ICs","authors":"Xu Fang;Xiaofeng Zhao","doi":"10.1109/TVLSI.2024.3450452","DOIUrl":"10.1109/TVLSI.2024.3450452","url":null,"abstract":"Monolithic 3-D (M3D) integrated circuits (ICs) have the potential to achieve significantly higher device density compared with conventional ICs. The implementation of nanoscale interlayer vias (ILVs) in M3D plays a pivotal role in achieving enhanced transistor density and increased flexibility for circuit design. However, the high integration density and aggressive scaling of the interlayer dielectric make ILVs especially prone to defects. In this article, we propose a post-bond ILV test method for the detection and diagnosis of ILVs’ open faults, stuck-at faults (SAFs), and short faults in the fabrication process of M3D ICs. The proposed method is well-suited for post-bond ILV test, which can significantly save the cost and improve the yield. The HSPICE simulation results demonstrate that the proposed method can effectively detect and localize ILVs’ open, stuck-at 0, stuck-at 1, and short faults. Evaluation results for M3D benchmarks demonstrate the proposed method has a quite small power-performance–area (PPA) overhead and a relatively low test-time overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2377-2388"},"PeriodicalIF":2.8,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}