Pub Date : 2025-10-15DOI: 10.1109/LSSC.2025.3622233
Basem Abdelaziz Abdelmagid;Yuqi Liu;Hua Wang
This work presents a compact 110–140 GHz bidirectional D-band passive phase shifter based on combining a 5-stage capacitively-loaded reflective-type PS (RTPS) with a wideband 0°/180° stage. The design achieves a 360° phase range with a resolution of 11.25°. By applying: 1) a wideband RTPS design methodology on the stage level; 2) frequency/switching-staggering techniques among the RTPS stages; and 3) a balun-staggering technique into the 0°/180° stage, the design achieves calibration-free operation with frequency-invariant codes over 24% fractional bandwidth (FBW). The design is implemented in GlobalFoundaries 22-nm CMOS FD-SOI technology with a compact core area of $130~mu $ m $times 480~mu $ m. It achieves measured RMS phase and magnitude errors lower than 2.38° and 0.63 dB, respectively, across the entire operating bandwidth using the same frequency-invariant codes.
{"title":"A Wideband Calibration-Free D-Band Passive Phase Shifter With Frequency-Invariant Codes Over 24% Fractional Bandwidth","authors":"Basem Abdelaziz Abdelmagid;Yuqi Liu;Hua Wang","doi":"10.1109/LSSC.2025.3622233","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3622233","url":null,"abstract":"This work presents a compact 110–140 GHz bidirectional D-band passive phase shifter based on combining a 5-stage capacitively-loaded reflective-type PS (RTPS) with a wideband 0°/180° stage. The design achieves a 360° phase range with a resolution of 11.25°. By applying: 1) a wideband RTPS design methodology on the stage level; 2) frequency/switching-staggering techniques among the RTPS stages; and 3) a balun-staggering technique into the 0°/180° stage, the design achieves calibration-free operation with frequency-invariant codes over 24% fractional bandwidth (FBW). The design is implemented in GlobalFoundaries 22-nm CMOS FD-SOI technology with a compact core area of <inline-formula> <tex-math>$130~mu $ </tex-math></inline-formula>m <inline-formula> <tex-math>$times 480~mu $ </tex-math></inline-formula>m. It achieves measured RMS phase and magnitude errors lower than 2.38° and 0.63 dB, respectively, across the entire operating bandwidth using the same frequency-invariant codes.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"329-332"},"PeriodicalIF":2.0,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145405265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.1109/LSSC.2025.3621413
Berkay Özbek;Timothy G. Constandinou
This letter presents an adaptable ring oscillator (RO)-true random number generator (TRNG) that removes the fixed power–throughput tradeoff by selecting delay-cell physics at run time. A hybrid core uses a current-starved inverter in low-power (LP) mode to amplify slew-limited jitter for high bit-efficiency at low frequency, and a latched cross-coupled cell in high-performance (HP) mode to exploit regeneration-time jitter over a wider band; both share a unified fast-by-slow sampling path with XOR combining. Fabricated in 180 nm CMOS ($265times 490~mu $ m), the TRNG spans 0.8–2.0 V and $- 20~^{circ }$ C–$80~^{circ }$ C, achieves 168 nW (3.95 pJ/bit) in LP and 44.3 Mb/s in HP, and reaches near-ideal HP entropy (0.999999999984). Long datasets pass NIST SP 800-22 (including under 400 mV injection at the second harmonic), SP 800-90B, and AIS31. A single, digitally-tunable IP thus delivers nanowatt standby entropy and burst-mode throughput without architectural change.
{"title":"A 168 nW to 44.3 Mb/s Adaptable TRNG With 400 mV Attack-Resilient Hybrid RO Core","authors":"Berkay Özbek;Timothy G. Constandinou","doi":"10.1109/LSSC.2025.3621413","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3621413","url":null,"abstract":"This letter presents an adaptable ring oscillator (RO)-true random number generator (TRNG) that removes the fixed power–throughput tradeoff by selecting delay-cell physics at run time. A hybrid core uses a current-starved inverter in low-power (LP) mode to amplify slew-limited jitter for high bit-efficiency at low frequency, and a latched cross-coupled cell in high-performance (HP) mode to exploit regeneration-time jitter over a wider band; both share a unified fast-by-slow sampling path with XOR combining. Fabricated in 180 nm CMOS (<inline-formula> <tex-math>$265times 490~mu $ </tex-math></inline-formula>m), the TRNG spans 0.8–2.0 V and <inline-formula> <tex-math>$- 20~^{circ }$ </tex-math></inline-formula>C–<inline-formula> <tex-math>$80~^{circ }$ </tex-math></inline-formula>C, achieves 168 nW (3.95 pJ/bit) in LP and 44.3 Mb/s in HP, and reaches near-ideal HP entropy (0.999999999984). Long datasets pass NIST SP 800-22 (including under 400 mV injection at the second harmonic), SP 800-90B, and AIS31. A single, digitally-tunable IP thus delivers nanowatt standby entropy and burst-mode throughput without architectural change.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"325-328"},"PeriodicalIF":2.0,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-07DOI: 10.1109/LSSC.2025.3618942
Prashanth Mohan;Siddharth Das;Ken Mai
This letter presents a flexible and energy-efficient RISC-V system-on-chip (SoC) in 22nm FinFET technology, achieving state-of-the-art performance by tightly integrating the CPU with a synthesized embedded FPGA (embedded field programmable gate array (eFPGA)), enabling the implementation of reconfigurable custom instructions. The tight integration of the eFPGA with SoC scratchpad memory facilitates parallel high-bandwidth (16 GB/s) access to scratchpad SRAMs and enables rapid eFPGA reconfiguration ($2~mu $ s) to switch between custom instruction sets. The eFPGA fabric itself is optimized for compute-intensive tasks, featuring fused logic tiles that amortize routing overheads to achieve a compute density of 22.3 GOPS/mm2. The measurement results from the fabricated chip demonstrate a peak energy efficiency of 748 GOPS/W (INT8) while improving throughput and energy by 1-2 orders of magnitude compared to the CPU for accelerated applications.
{"title":"A RISC-V SoC With Reconfigurable Custom Instructions on a Synthesized eFPGA Fabric in 22nm FinFET","authors":"Prashanth Mohan;Siddharth Das;Ken Mai","doi":"10.1109/LSSC.2025.3618942","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3618942","url":null,"abstract":"This letter presents a flexible and energy-efficient RISC-V system-on-chip (SoC) in 22nm FinFET technology, achieving state-of-the-art performance by tightly integrating the CPU with a synthesized embedded FPGA (embedded field programmable gate array (eFPGA)), enabling the implementation of reconfigurable custom instructions. The tight integration of the eFPGA with SoC scratchpad memory facilitates parallel high-bandwidth (16 GB/s) access to scratchpad SRAMs and enables rapid eFPGA reconfiguration (<inline-formula> <tex-math>$2~mu $ </tex-math></inline-formula>s) to switch between custom instruction sets. The eFPGA fabric itself is optimized for compute-intensive tasks, featuring fused logic tiles that amortize routing overheads to achieve a compute density of 22.3 GOPS/mm2. The measurement results from the fabricated chip demonstrate a peak energy efficiency of 748 GOPS/W (INT8) while improving throughput and energy by 1-2 orders of magnitude compared to the CPU for accelerated applications.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"321-324"},"PeriodicalIF":2.0,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-06DOI: 10.1109/LSSC.2025.3618161
Bram Veraverbeke;Filip Tavernier
All cryo-CMOS quantum-classical control interfaces require an analog-to-digital converter (ADC) bridging the analog qubits and the digital control logic. Dynamic comparators play a crucial role in the precision, speed, and power consumption of these ADCs. Yet, their performance is severely impacted by the cryogenic environment. Therefore, this letter benchmarks, for the first time in the literature, four comparator topologies at room temperature (RT) and 6K. Their noise, delay, and energy consumption are characterized, allowing the identification of the best topology for a given application. This analysis shows that the strongARM (SA) comparator is the most energy efficient, closely followed by the double tail comparator. However, the reduced voltage headroom at 6K almost doubles the SA’s delay compared to RT and leaves it susceptible to common-mode and supply voltage variations. In contrast, the recently proposed triple tail comparator with capacitive over-neutralization limits the delay increase to only 13ps by separating the preamplifier and the latch. Furthermore, its boosted preamplification gain makes it notably more resilient to voltage variations, ensuring a highly robust cryogenic operation favorable for scaled technologies.
{"title":"A Benchmark of Cryo-CMOS Dynamic Comparators in a 40 nm Bulk CMOS Technology","authors":"Bram Veraverbeke;Filip Tavernier","doi":"10.1109/LSSC.2025.3618161","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3618161","url":null,"abstract":"All cryo-CMOS quantum-classical control interfaces require an analog-to-digital converter (ADC) bridging the analog qubits and the digital control logic. Dynamic comparators play a crucial role in the precision, speed, and power consumption of these ADCs. Yet, their performance is severely impacted by the cryogenic environment. Therefore, this letter benchmarks, for the first time in the literature, four comparator topologies at room temperature (RT) and 6K. Their noise, delay, and energy consumption are characterized, allowing the identification of the best topology for a given application. This analysis shows that the strongARM (SA) comparator is the most energy efficient, closely followed by the double tail comparator. However, the reduced voltage headroom at 6K almost doubles the SA’s delay compared to RT and leaves it susceptible to common-mode and supply voltage variations. In contrast, the recently proposed triple tail comparator with capacitive over-neutralization limits the delay increase to only 13ps by separating the preamplifier and the latch. Furthermore, its boosted preamplification gain makes it notably more resilient to voltage variations, ensuring a highly robust cryogenic operation favorable for scaled technologies.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"317-320"},"PeriodicalIF":2.0,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this letter, we present a high-entropy strong physically unclonable function (PUF) utilizing weak-inversion current mirrors implemented in a standard 65-nm CMOS technology. Each response bit of the proposed PUF relies on the threshold voltage differences of minimum-sized transistors arranged in a $32times 32$ matrix. The analog operating principle enables encoding at least three effective bits per transistor pair, significantly improving entropy density. Leveraging a bit-masking technique, the design achieves remarkable robustness, attaining a bit error rate (BER) as low as 0.22% even under substantial supply voltage and temperature variations, with less than 10% discarded bits. The presented architecture exhibits a record area-to-entropy ratio of $166~rm {F^{2}}$ /bit, confirming its suitability for highly secure, compact applications in hardware security.
{"title":"High-Entropy Analog-Based Strong Physical Unclonable Function With Area-to-Entropy-ratio of 166 F2/bit","authors":"Alessandro Catania;Sebastiano Strangio;Maksym Paliy;Christian Sbrana;Michele Bertozzi;Giuseppe Iannaccone","doi":"10.1109/LSSC.2025.3616263","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3616263","url":null,"abstract":"In this letter, we present a high-entropy strong physically unclonable function (PUF) utilizing weak-inversion current mirrors implemented in a standard 65-nm CMOS technology. Each response bit of the proposed PUF relies on the threshold voltage differences of minimum-sized transistors arranged in a <inline-formula> <tex-math>$32times 32$ </tex-math></inline-formula> matrix. The analog operating principle enables encoding at least three effective bits per transistor pair, significantly improving entropy density. Leveraging a bit-masking technique, the design achieves remarkable robustness, attaining a bit error rate (BER) as low as 0.22% even under substantial supply voltage and temperature variations, with less than 10% discarded bits. The presented architecture exhibits a record area-to-entropy ratio of <inline-formula> <tex-math>$166~rm {F^{2}}$ </tex-math></inline-formula>/bit, confirming its suitability for highly secure, compact applications in hardware security.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"309-312"},"PeriodicalIF":2.0,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-29DOI: 10.1109/LSSC.2025.3615656
Jelle H. T. Bakker;Nimit Jain;Paul Klatser;Mark S. Oude Alink;Bram Nauta
We present a monolithically integrated (MI) dualjunction monitoring photodiode (PD) and transimpedance amplifier (TIA). The photocurrent originates from the deep Nwell (DNW)/P-type substrate (PSUB) $({lt }5~ mathrm {GHz})$ and the P-Well $(mathrm {PW}) / mathrm {DNW}({gt }1~ mathrm {GHz})$ junctions. The presented combination of bulk PD and 22 nm fully-depleted silicon-on-insulator (FDSOI) TIA (18-GHz bandwidth (BW), $17.8 ~mathrm {pA} / sqrt {mathrm {Hz}}$ noise level) advances the state-of-the-art in MI high-speed optical monitoring and reduces the inherent tradeoff in MI solutions regarding PD (responsivity & BW) and RF circuitry $(f_{t})$ performance.
{"title":"Dual-Junction Monolithically Integrated Monitoring Photodiode With a Two-Stage 18 GHz 18 pA/√Hz TIA in 22-nm FDSOI","authors":"Jelle H. T. Bakker;Nimit Jain;Paul Klatser;Mark S. Oude Alink;Bram Nauta","doi":"10.1109/LSSC.2025.3615656","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3615656","url":null,"abstract":"We present a monolithically integrated (MI) dualjunction monitoring photodiode (PD) and transimpedance amplifier (TIA). The photocurrent originates from the deep Nwell (DNW)/P-type substrate (PSUB) <inline-formula> <tex-math>$({lt }5~ mathrm {GHz})$ </tex-math></inline-formula> and the P-Well <inline-formula> <tex-math>$(mathrm {PW}) / mathrm {DNW}({gt }1~ mathrm {GHz})$ </tex-math></inline-formula> junctions. The presented combination of bulk PD and 22 nm fully-depleted silicon-on-insulator (FDSOI) TIA (18-GHz bandwidth (BW), <inline-formula> <tex-math>$17.8 ~mathrm {pA} / sqrt {mathrm {Hz}}$ </tex-math></inline-formula> noise level) advances the state-of-the-art in MI high-speed optical monitoring and reduces the inherent tradeoff in MI solutions regarding PD (responsivity & BW) and RF circuitry <inline-formula> <tex-math>$(f_{t})$ </tex-math></inline-formula> performance.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"313-316"},"PeriodicalIF":2.0,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter presents an RC oscillator featuring a mixed-signal compensation loop that simultaneously mitigates comparator offset, loop delay, switch on-resistance, and temperature dependency. The oscillator employs an auxiliary comparator, a charge pump, and a differential difference amplifier (DDA)-based main comparator to suppress ramping voltage overshoots caused by device and loop imperfections. Moreover, the auxiliary comparator employs chopper and digital demodulation techniques to suppress its own offset, further enhancing the accuracy of the calibration loop. By generating both ramping and reference voltages, the proportional-to-absolute-temperature (PTAT) core ensures that the temperature coefficient (TC) of the oscillation frequency primarily depends on passive RC components. Fabricated in a $0.18~mu $ m BCD process, the design achieves an average frequency TC of 42 ppm/°C from −20 °C to 125 °C across ten samples. Benefiting from the proposed architecture, the oscillator operates at 8.45 MHz with a fast startup calibration time of $50~mu $ s.
{"title":"An 8.5 MHz 42 ppm/°C Relaxation Oscillator With Charge-Pump Delay Cancellation and Digital Chopping Demodulation","authors":"Yongjia Li;Jianlin Xia;Feng Cheng;Yifan Cao;Jin Wu;Encheng Zhu;Xiaofeng Sun;Dejin Wang;Long Zhang;Zhongyuan Fang;Weifeng Sun","doi":"10.1109/LSSC.2025.3615833","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3615833","url":null,"abstract":"This letter presents an RC oscillator featuring a mixed-signal compensation loop that simultaneously mitigates comparator offset, loop delay, switch on-resistance, and temperature dependency. The oscillator employs an auxiliary comparator, a charge pump, and a differential difference amplifier (DDA)-based main comparator to suppress ramping voltage overshoots caused by device and loop imperfections. Moreover, the auxiliary comparator employs chopper and digital demodulation techniques to suppress its own offset, further enhancing the accuracy of the calibration loop. By generating both ramping and reference voltages, the proportional-to-absolute-temperature (PTAT) core ensures that the temperature coefficient (TC) of the oscillation frequency primarily depends on passive RC components. Fabricated in a <inline-formula> <tex-math>$0.18~mu $ </tex-math></inline-formula>m BCD process, the design achieves an average frequency TC of 42 ppm/°C from −20 °C to 125 °C across ten samples. Benefiting from the proposed architecture, the oscillator operates at 8.45 MHz with a fast startup calibration time of <inline-formula> <tex-math>$50~mu $ </tex-math></inline-formula>s.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"333-336"},"PeriodicalIF":2.0,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter presents a frequency quadrupler with 32% fractional bandwidth (66–92 GHz) and 5% peak power-added efficiency (PAE), capable of operating with an input power of 0 dBm. The quadrupler consisting of two cascaded frequency doublers uses a multiport driven push-push complementary architecture for the first stage to generate differential signals for the second doubler with high fundamental harmonic rejection. The second doubler based on the nMOS-based push-push architecture uses gain enhancement to achieve a maximum conversion gain of –4 dB for the quadrupler. The quadrupler with an output saturation power (P${}_{text {sat}}$ ) of –2.6 dBm achieves first- to third-harmonic rejections of more than 36 dBc across the 3-dB bandwidth. The compact quadrupler has a core area of 0.09 mm2, while consuming a DC power of 6.2 mW from a 0.8 V supply with an input power of 0 dBm at 20 GHz.
{"title":"A Compact Current-Reusing 6-mW 66–92 GHz Frequency Quadrupler With 5% Peak Power Added Efficiency and >36 dBc Harmonic Rejection in 22-nm FDSOI CMOS","authors":"Shankkar Balasubramanian;Kristof Vaesen;Piet Wambacq;Carsten Wulff","doi":"10.1109/LSSC.2025.3614381","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3614381","url":null,"abstract":"This letter presents a frequency quadrupler with 32% fractional bandwidth (66–92 GHz) and 5% peak power-added efficiency (PAE), capable of operating with an input power of 0 dBm. The quadrupler consisting of two cascaded frequency doublers uses a multiport driven push-push complementary architecture for the first stage to generate differential signals for the second doubler with high fundamental harmonic rejection. The second doubler based on the nMOS-based push-push architecture uses gain enhancement to achieve a maximum conversion gain of –4 dB for the quadrupler. The quadrupler with an output saturation power (P<inline-formula> <tex-math>${}_{text {sat}}$ </tex-math></inline-formula>) of –2.6 dBm achieves first- to third-harmonic rejections of more than 36 dBc across the 3-dB bandwidth. The compact quadrupler has a core area of 0.09 mm2, while consuming a DC power of 6.2 mW from a 0.8 V supply with an input power of 0 dBm at 20 GHz.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"301-304"},"PeriodicalIF":2.0,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11178244","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter presents a backscatter chip that features bidirectional communication with commodity bluetooth low-energy (BLE) transceivers. For uplink, the chip reflects a reverse-whitened BLE tone into single-sideband (SSB) GFSK-modulated BLE packets via a proposed replica VCO-based GFSK modulator and an inductor-free SSB reflector. For downlink, the BLE packets are frequency down-converted passively by utilizing a reverse-whitened BLE tone followed by a proposed self-calibrated GFSK demodulator to recover the downlink data at ultralow power with improved robustness. A dual-linearly polarized microstrip patch antenna (DPMPA) is integrated to enable concurrent RF energy harvesting and communication in a wearable form factor. Implemented in 65-nm CMOS, the chip consumes $1.4~mu $ W for downlink and $15.8~mu $ W for uplink. Wireless tests demonstrated a 50 cm downlink and >3 m uplink ranges at 20 dBm EIRP.
{"title":"A Battery-Free BLE Backscatter Communication Chip for Wearable Systems","authors":"Yongling Zhang;Ji Xiong;Junzai Chen;Xiaoyu Li;Jinrui Zuo;Yan Wang;Xiaoyi Wang;Miao Meng","doi":"10.1109/LSSC.2025.3612423","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3612423","url":null,"abstract":"This letter presents a backscatter chip that features bidirectional communication with commodity bluetooth low-energy (BLE) transceivers. For uplink, the chip reflects a reverse-whitened BLE tone into single-sideband (SSB) GFSK-modulated BLE packets via a proposed replica VCO-based GFSK modulator and an inductor-free SSB reflector. For downlink, the BLE packets are frequency down-converted passively by utilizing a reverse-whitened BLE tone followed by a proposed self-calibrated GFSK demodulator to recover the downlink data at ultralow power with improved robustness. A dual-linearly polarized microstrip patch antenna (DPMPA) is integrated to enable concurrent RF energy harvesting and communication in a wearable form factor. Implemented in 65-nm CMOS, the chip consumes <inline-formula> <tex-math>$1.4~mu $ </tex-math></inline-formula>W for downlink and <inline-formula> <tex-math>$15.8~mu $ </tex-math></inline-formula>W for uplink. Wireless tests demonstrated a 50 cm downlink and >3 m uplink ranges at 20 dBm EIRP.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"297-300"},"PeriodicalIF":2.0,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sparsity has recently attracted increased attention in the machine learning (ML) community due to its potential to improve performance and energy efficiency by eliminating ineffectual computations. As ML models evolve rapidly, reconfigurable architectures, such as coarse-grained reconfigurable arrays (CGRAs), are being explored to adapt to and accelerate emerging models. Previous CGRA designs have supported unstructured sparsity and reported promising speedups and energy savings for compute-intensive kernels. However, these approaches still face performance bottlenecks when accelerating entire sparse ML networks. In this letter, we identify the primary sources of inefficiency in prior CGRA-based approaches and present Opal, a CGRA SoC with three key contributions: 1) flexible dataflow architecture supporting Gustavson’s dataflow for sparse matrix multiplication; 2) high-throughput sparse hardware primitives; and 3) enhanced processing elements to support mapping all ML operations on the CGRA. As a result, Opal achieves a 66% to 79% reduction in runtime and energy consumption across our evaluated sparse graph neural network benchmarks compared to prior CGRA solutions which only target kernel acceleration.
{"title":"Opal: A 16-nm Coarse-Grained Reconfigurable Array SoC for Full Sparse Machine Learning Applications","authors":"Po-Han Chen;Bo Wun Cheng;Michael Oduoza;Zhouhua Xie;Rupert Lu;Sai Gautham Ravipati;Kalhan Koul;Alex Carsello;Yuchen Mei;Mark Horowitz;Priyanka Raina","doi":"10.1109/LSSC.2025.3613245","DOIUrl":"https://doi.org/10.1109/LSSC.2025.3613245","url":null,"abstract":"Sparsity has recently attracted increased attention in the machine learning (ML) community due to its potential to improve performance and energy efficiency by eliminating ineffectual computations. As ML models evolve rapidly, reconfigurable architectures, such as coarse-grained reconfigurable arrays (CGRAs), are being explored to adapt to and accelerate emerging models. Previous CGRA designs have supported unstructured sparsity and reported promising speedups and energy savings for compute-intensive kernels. However, these approaches still face performance bottlenecks when accelerating entire sparse ML networks. In this letter, we identify the primary sources of inefficiency in prior CGRA-based approaches and present Opal, a CGRA SoC with three key contributions: 1) flexible dataflow architecture supporting Gustavson’s dataflow for sparse matrix multiplication; 2) high-throughput sparse hardware primitives; and 3) enhanced processing elements to support mapping all ML operations on the CGRA. As a result, Opal achieves a 66% to 79% reduction in runtime and energy consumption across our evaluated sparse graph neural network benchmarks compared to prior CGRA solutions which only target kernel acceleration.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"8 ","pages":"293-296"},"PeriodicalIF":2.0,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}