This paper presents a lightweight hybrid random number generator (HRNG), implemented and evaluated on a Field-Programmable Gate Array (FPGA). The proposed design enhances security and randomness by synergizing jitter and metastability using a feedforward topology, which achieves a near-perfect Shannon entropy. Moreover, it is validated using three distinct entropy metrics, guaranteeing statistically robust random numbers for security-sensitive applications. In addition to entropy evaluations, this design is also rigorously analyzed using multiple industry-standard randomness test suites. Beyond the FPGA implementation, this work presents performance metrics, including area utilization, power consumption, maximum frequency, and energy usage per random bit, which are synthesized across three different technology nodes in Synopsys Design Compiler (SDC). All of the results from the FPGA and the SDC implementations demonstrate significant improvements. These results confirm the design’s scalability to advance technology nodes and its suitability for applications that require secure and reliable random number generation, such as resource-efficient Internet of Things (IoT) devices.
{"title":"A Lightweight Hybrid Random Number Generator With Dynamic Entropy Injection","authors":"Sonia Akter;Shelby Williams;Prosen Kirtonia;Magdy Bayoumi;Kasem Khalil","doi":"10.1109/OJCAS.2025.3582975","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3582975","url":null,"abstract":"This paper presents a lightweight hybrid random number generator (HRNG), implemented and evaluated on a Field-Programmable Gate Array (FPGA). The proposed design enhances security and randomness by synergizing jitter and metastability using a feedforward topology, which achieves a near-perfect Shannon entropy. Moreover, it is validated using three distinct entropy metrics, guaranteeing statistically robust random numbers for security-sensitive applications. In addition to entropy evaluations, this design is also rigorously analyzed using multiple industry-standard randomness test suites. Beyond the FPGA implementation, this work presents performance metrics, including area utilization, power consumption, maximum frequency, and energy usage per random bit, which are synthesized across three different technology nodes in Synopsys Design Compiler (SDC). All of the results from the FPGA and the SDC implementations demonstrate significant improvements. These results confirm the design’s scalability to advance technology nodes and its suitability for applications that require secure and reliable random number generation, such as resource-efficient Internet of Things (IoT) devices.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"257-269"},"PeriodicalIF":2.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11106931","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144758423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-31DOI: 10.1109/OJCAS.2025.3594022
Katsutoshi Ito;Yusaku Shiotsu;Satoshi Sugahara
A new ultralow-voltage retention (ULVR) SRAM cell is proposed, which can highly enhance the noise margin (NM) for the ULVR mode at ultralow voltages $(V_{mathrm { UL}})$ . This 8T cell is configured with new-type Schmitt-trigger (ST) inverters that can nearly maximize the hysteresis width of the voltage transfer characteristics (VTC). The design methodology of the cell is developed with careful consideration for the process variation of the constituent transistors, and the optimally designed cell can ensure sufficient NMs that satisfy the $6sigma $ failure probability for all the operating modes. In particular, for the ULVR mode at $V_{mathrm { UL}} {=} 0.2$ V, the proposed 8T cell can exhibit much stronger noise immunity than previously proposed various low-voltage cells. In addition, the proposed 8T cell can achieve stable data retention even at $V_{mathrm { UL}} {=} 0.16$ V with sufficient noise immunity satisfying the $6sigma $ failure probability. An 8kB ULVR-SRAM macro configured with the proposed-8T-cell array is also developed. Using the ULVR mode, the macro can reduce the standby power by ~93% compared with the standby mode of a conventional 6T-SRAM macro.
{"title":"A New Ultralow-Voltage Retention SRAM Cell Enhancing Noise Immunity","authors":"Katsutoshi Ito;Yusaku Shiotsu;Satoshi Sugahara","doi":"10.1109/OJCAS.2025.3594022","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3594022","url":null,"abstract":"A new ultralow-voltage retention (ULVR) SRAM cell is proposed, which can highly enhance the noise margin (NM) for the ULVR mode at ultralow voltages <inline-formula> <tex-math>$(V_{mathrm { UL}})$ </tex-math></inline-formula>. This 8T cell is configured with new-type Schmitt-trigger (ST) inverters that can nearly maximize the hysteresis width of the voltage transfer characteristics (VTC). The design methodology of the cell is developed with careful consideration for the process variation of the constituent transistors, and the optimally designed cell can ensure sufficient NMs that satisfy the <inline-formula> <tex-math>$6sigma $ </tex-math></inline-formula> failure probability for all the operating modes. In particular, for the ULVR mode at <inline-formula> <tex-math>$V_{mathrm { UL}} {=} 0.2$ </tex-math></inline-formula> V, the proposed 8T cell can exhibit much stronger noise immunity than previously proposed various low-voltage cells. In addition, the proposed 8T cell can achieve stable data retention even at <inline-formula> <tex-math>$V_{mathrm { UL}} {=} 0.16$ </tex-math></inline-formula> V with sufficient noise immunity satisfying the <inline-formula> <tex-math>$6sigma $ </tex-math></inline-formula> failure probability. An 8kB ULVR-SRAM macro configured with the proposed-8T-cell array is also developed. Using the ULVR mode, the macro can reduce the standby power by ~93% compared with the standby mode of a conventional 6T-SRAM macro.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"370-382"},"PeriodicalIF":2.4,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11106369","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-25DOI: 10.1109/OJCAS.2025.3592773
Konstantinos Metaxas;Paul P. Sotiriadis;Yannis Kominis
This work introduces a rigorous time-domain approach for studying the complex synchronization dynamics of periodically forced electronic oscillators, based on the well-developed theories of Phase-Amplitude reduction via the Koopman operator and dynamics of circle maps. The paper is structured in two parts. Part I presents the theoretical foundation and the numerical application of the theory. Under suitable forcing, the reduced equations simplify to a one-dimensional phase model—represented by a circle map—whose bifurcations are determined by the Phase Response Curves. This map efficiently captures the oscillator’s dynamics and enables accurate computation of resonance regions in the forcing parameter space. The influence of global isochron geometry on the map validates their critical role in phase locking, extending previous results in the theory of electronic oscillators. For more general forcing scenarios, the full Phase-Amplitude reduction effectively describes the synchronization dynamics. The developed time-domain approach demonstrates that the same limit cycle oscillator can produce periodic output with tunable spectral characteristics, operating as a frequency divider, or function as a chaotic or quasiperiodic signal generator, depending on the driving signal. As an illustrative example, the synchronization dynamics of differential LC oscillators is studied in detail. Part II is dedicated to confirming the validity, generality, and robustness of the introduced approach, which is first presented as a detailed step-by-step methodology, suitable for direct application to any oscillator. The Colpitts and ring oscillators are analyzed theoretically, and their resonance diagrams are numerically computed, following the approach established in Part I. Simulations of realistically implemented models in the Cadence IC Suite show that both synchronized and chaotic/quasiperiodic states are accurately predicted by the reduced circle map. Notably, despite the use of simplified analytical models, the theoretical framework effectively captures the qualitative behavior observed in simulation. The consistency between the theoretical and simulation results confirms both the robustness and general applicability of the proposed approach.
这项工作介绍了一种严格的时域方法来研究周期性强迫电子振荡器的复杂同步动力学,该方法基于通过Koopman算子和圆映射动力学的相幅减少理论。本文的结构分为两部分。第一部分介绍了该理论的理论基础和数值应用。在适当的强迫作用下,将简化方程简化为一个由相位响应曲线决定分岔的一维相位模型,该模型用圆图表示。该图有效地捕获了振荡器的动力学,并能够在强迫参数空间中精确计算共振区域。全局等时线几何对图的影响验证了它们在锁相中的关键作用,扩展了电子振荡器理论中的先前结果。对于更一般的强迫情景,完整的相位幅度减小有效地描述了同步动力学。所开发的时域方法表明,相同的极限环振荡器可以产生具有可调谐频谱特性的周期输出,作为分频器,或作为混沌或准周期信号发生器,取决于驱动信号。作为一个示例,详细研究了差分LC振荡器的同步动力学。第二部分致力于确认所引入方法的有效性,通用性和鲁棒性,该方法首先作为详细的一步一步的方法提出,适用于直接应用于任何振荡器。根据第一部分建立的方法,对Colpitts和环振子进行了理论分析,并对它们的谐振图进行了数值计算。Cadence IC Suite中实际实现模型的仿真表明,通过简化的圆映射可以准确地预测同步和混沌/准周期状态。值得注意的是,尽管使用了简化的分析模型,理论框架有效地捕获了在模拟中观察到的定性行为。理论和仿真结果的一致性验证了所提方法的鲁棒性和通用性。
{"title":"Complex Synchronization Dynamics of Electronic Oscillators–Part I: A Time-Domain Approach via Phase-Amplitude Reduced Models","authors":"Konstantinos Metaxas;Paul P. Sotiriadis;Yannis Kominis","doi":"10.1109/OJCAS.2025.3592773","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3592773","url":null,"abstract":"This work introduces a rigorous time-domain approach for studying the complex synchronization dynamics of periodically forced electronic oscillators, based on the well-developed theories of Phase-Amplitude reduction via the Koopman operator and dynamics of circle maps. The paper is structured in two parts. Part I presents the theoretical foundation and the numerical application of the theory. Under suitable forcing, the reduced equations simplify to a one-dimensional phase model—represented by a circle map—whose bifurcations are determined by the Phase Response Curves. This map efficiently captures the oscillator’s dynamics and enables accurate computation of resonance regions in the forcing parameter space. The influence of global isochron geometry on the map validates their critical role in phase locking, extending previous results in the theory of electronic oscillators. For more general forcing scenarios, the full Phase-Amplitude reduction effectively describes the synchronization dynamics. The developed time-domain approach demonstrates that the same limit cycle oscillator can produce periodic output with tunable spectral characteristics, operating as a frequency divider, or function as a chaotic or quasiperiodic signal generator, depending on the driving signal. As an illustrative example, the synchronization dynamics of differential LC oscillators is studied in detail. Part II is dedicated to confirming the validity, generality, and robustness of the introduced approach, which is first presented as a detailed step-by-step methodology, suitable for direct application to any oscillator. The Colpitts and ring oscillators are analyzed theoretically, and their resonance diagrams are numerically computed, following the approach established in Part I. Simulations of realistically implemented models in the Cadence IC Suite show that both synchronized and chaotic/quasiperiodic states are accurately predicted by the reduced circle map. Notably, despite the use of simplified analytical models, the theoretical framework effectively captures the qualitative behavior observed in simulation. The consistency between the theoretical and simulation results confirms both the robustness and general applicability of the proposed approach.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"329-342"},"PeriodicalIF":2.4,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11096569","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144868343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-25DOI: 10.1109/OJCAS.2025.3592750
Konstantinos Metaxas;Nikolaos P. Eleftheriou;Yannis Kominis;Paul P. Sotiriadis
This work introduces a rigorous time-domain approach for studying the complex synchronization dynamics of periodically forced electronic oscillators, based on the well-developed theories of Phase-Amplitude reduction via the Koopman operator and dynamics of circle maps. The paper is structured in two parts. Part I presents the theoretical foundation and the numerical application of the theory. Under suitable forcing, the reduced equations simplify to a one-dimensional phase model—represented by a circle map—whose bifurcations are determined by the Phase Response Curves. This map efficiently captures the oscillator’s dynamics and enables accurate computation of resonance regions in the forcing parameter space. The influence of global isochron geometry on the map validates their critical role in phase locking, extending previous results in the theory of electronic oscillators. For more general forcing scenarios, the full Phase-Amplitude reduction effectively describes the synchronization dynamics. The developed time-domain approach demonstrates that the same limit cycle oscillator can produce periodic output with tunable spectral characteristics, operating as a frequency divider, or function as a chaotic or quasiperiodic signal generator, depending on the driving signal. As an illustrative example, the synchronization dynamics of differential LC oscillators is studied in detail. Part II is dedicated to confirming the validity, generality, and robustness of the introduced approach, which is first presented as a detailed step-by-step methodology, suitable for direct application to any oscillator. The Colpitts and ring oscillators are analyzed theoretically, and their resonance diagrams are numerically computed, following the approach established in Part I. Simulations of realistically implemented models in the Cadence IC Suite show that both synchronized and chaotic/quasiperiodic states are accurately predicted by the reduced circle map. Notably, despite the use of simplified analytical models, the theoretical framework effectively captures the qualitative behavior observed in simulation. The consistency between the theoretical and simulation results confirms both the robustness and general applicability of the proposed approach.
这项工作介绍了一种严格的时域方法来研究周期性强迫电子振荡器的复杂同步动力学,该方法基于通过Koopman算子和圆映射动力学的相幅减少理论。本文的结构分为两部分。第一部分介绍了该理论的理论基础和数值应用。在适当的强迫作用下,将简化方程简化为一个由相位响应曲线决定分岔的一维相位模型,该模型用圆图表示。该图有效地捕获了振荡器的动力学,并能够在强迫参数空间中精确计算共振区域。全局等时线几何对图的影响验证了它们在锁相中的关键作用,扩展了电子振荡器理论中的先前结果。对于更一般的强迫情景,完整的相位幅度减小有效地描述了同步动力学。所开发的时域方法表明,相同的极限环振荡器可以产生具有可调谐频谱特性的周期输出,作为分频器,或作为混沌或准周期信号发生器,取决于驱动信号。作为一个示例,详细研究了差分LC振荡器的同步动力学。第二部分致力于确认所引入方法的有效性,通用性和鲁棒性,该方法首先作为详细的一步一步的方法提出,适用于直接应用于任何振荡器。根据第一部分建立的方法,对Colpitts和环振子进行了理论分析,并对它们的谐振图进行了数值计算。Cadence IC Suite中实际实现模型的仿真表明,通过简化的圆映射可以准确地预测同步和混沌/准周期状态。值得注意的是,尽管使用了简化的分析模型,理论框架有效地捕获了在模拟中观察到的定性行为。理论和仿真结果的一致性验证了所提方法的鲁棒性和通用性。
{"title":"Complex Synchronization Dynamics of Electronic Oscillators–Part II: Simulations and Validation of Phase-Amplitude Reduced Models","authors":"Konstantinos Metaxas;Nikolaos P. Eleftheriou;Yannis Kominis;Paul P. Sotiriadis","doi":"10.1109/OJCAS.2025.3592750","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3592750","url":null,"abstract":"This work introduces a rigorous time-domain approach for studying the complex synchronization dynamics of periodically forced electronic oscillators, based on the well-developed theories of Phase-Amplitude reduction via the Koopman operator and dynamics of circle maps. The paper is structured in two parts. Part I presents the theoretical foundation and the numerical application of the theory. Under suitable forcing, the reduced equations simplify to a one-dimensional phase model—represented by a circle map—whose bifurcations are determined by the Phase Response Curves. This map efficiently captures the oscillator’s dynamics and enables accurate computation of resonance regions in the forcing parameter space. The influence of global isochron geometry on the map validates their critical role in phase locking, extending previous results in the theory of electronic oscillators. For more general forcing scenarios, the full Phase-Amplitude reduction effectively describes the synchronization dynamics. The developed time-domain approach demonstrates that the same limit cycle oscillator can produce periodic output with tunable spectral characteristics, operating as a frequency divider, or function as a chaotic or quasiperiodic signal generator, depending on the driving signal. As an illustrative example, the synchronization dynamics of differential LC oscillators is studied in detail. Part II is dedicated to confirming the validity, generality, and robustness of the introduced approach, which is first presented as a detailed step-by-step methodology, suitable for direct application to any oscillator. The Colpitts and ring oscillators are analyzed theoretically, and their resonance diagrams are numerically computed, following the approach established in Part I. Simulations of realistically implemented models in the Cadence IC Suite show that both synchronized and chaotic/quasiperiodic states are accurately predicted by the reduced circle map. Notably, despite the use of simplified analytical models, the theoretical framework effectively captures the qualitative behavior observed in simulation. The consistency between the theoretical and simulation results confirms both the robustness and general applicability of the proposed approach.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"343-355"},"PeriodicalIF":2.4,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11096566","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144868342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-22DOI: 10.1109/OJCAS.2025.3591136
Haesung Jung;Quang Dang Truong;Hanho Lee
The advent of quantum computers, with their immense computational potential, poses significant threats to traditional cryptographic systems. In response, NIST announced the quantum-resistant Module Lattice-based Key Encapsulation Mechanism (ML-KEM) standard in 2024. This paper presents an efficient hardware architecture for the ML-KEM scheme, capable of supporting all algorithms and flexibly adapting to different security levels. The proposed design achieves a balance between high performance and low hardware resource consumption, making it suitable for deployment across various FPGA platforms. Key innovations include the Unified Polynomial Arithmetic Module (UniPAM), capable of handling all polynomial arithmetic operations, and an optimized hash module for the SHA-3 variants integral to ML-KEM. Additionally, the design introduces an efficient timing diagram and conflict-free memory management strategy, enabling seamless parallelism and reducing execution time while minimizing hardware resource consumption. Furthermore, the implementation incorporates several methods to effectively mitigate side-channel attacks, a common concern in hardware-based cryptosystem deployments. The proposed architecture is validated through implementation on an Artix-7 FPGA and Synopsys 14nm ASIC technology. Compared to state-of-the-art designs, our approach demonstrates superior performance while maintaining comparable hardware resource efficiency. Specifically, the hardware implementation on the Xilinx Artix-7 utilizes 12k LUTs, 6.9k FFs, 4 DSPs, and 9 BRAMs at clock frequency of 220 MHz.
{"title":"Highly-Efficient Hardware Architecture for ML-KEM PQC Standard","authors":"Haesung Jung;Quang Dang Truong;Hanho Lee","doi":"10.1109/OJCAS.2025.3591136","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3591136","url":null,"abstract":"The advent of quantum computers, with their immense computational potential, poses significant threats to traditional cryptographic systems. In response, NIST announced the quantum-resistant Module Lattice-based Key Encapsulation Mechanism (ML-KEM) standard in 2024. This paper presents an efficient hardware architecture for the ML-KEM scheme, capable of supporting all algorithms and flexibly adapting to different security levels. The proposed design achieves a balance between high performance and low hardware resource consumption, making it suitable for deployment across various FPGA platforms. Key innovations include the Unified Polynomial Arithmetic Module (UniPAM), capable of handling all polynomial arithmetic operations, and an optimized hash module for the SHA-3 variants integral to ML-KEM. Additionally, the design introduces an efficient timing diagram and conflict-free memory management strategy, enabling seamless parallelism and reducing execution time while minimizing hardware resource consumption. Furthermore, the implementation incorporates several methods to effectively mitigate side-channel attacks, a common concern in hardware-based cryptosystem deployments. The proposed architecture is validated through implementation on an Artix-7 FPGA and Synopsys 14nm ASIC technology. Compared to state-of-the-art designs, our approach demonstrates superior performance while maintaining comparable hardware resource efficiency. Specifically, the hardware implementation on the Xilinx Artix-7 utilizes 12k LUTs, 6.9k FFs, 4 DSPs, and 9 BRAMs at clock frequency of 220 MHz.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"356-369"},"PeriodicalIF":2.4,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11088254","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-08DOI: 10.1109/OJCAS.2025.3584317
Yuntao Han;Yihan Pan;Xiongfei Jiang;Cristian Sestito;Shady Agwa;Themis Prodromakis;Shiwei Wang
Spike sorting is a critical process for decoding large-scale neural activity from extracellular recordings. The advancement of neural probes facilitates the recording of a high number of neurons with an increase in channel counts, arising a higher data volume and challenging the current on-chip spike sorters. This paper introduces L-Sort, a novel on-chip spike sorting solution featuring median-of-median spike detection and localization-based clustering. By combining the median-of-median approximation and the proposed incremental median calculation scheme, our detection module achieves a reduction in memory consumption. Moreover, the localization-based clustering utilizes geometric features instead of morphological features, thus eliminating the memory-consuming buffer for containing the spike waveform during feature extraction. Evaluation using Neuropixels datasets demonstrates that L-Sort achieves competitive sorting accuracy with reduced hardware resource consumption. Implementations on FPGA and ASIC (180 nm technology) demonstrate significant improvements in area and power efficiency compared to state-of-the-art designs while maintaining comparable accuracy. If normalized to 22 nm technology, our design can achieve roughly $times 10$ area and power efficiency with similar accuracy, compared with the state-of-the-art design evaluated with the same dataset. Therefore, L-Sort is a promising solution for real-time, high-channel-count neural processing in implantable devices.
{"title":"L-Sort: On-Chip Spike Sorting With Efficient Median-of-Median Detection and Localization-Based Clustering","authors":"Yuntao Han;Yihan Pan;Xiongfei Jiang;Cristian Sestito;Shady Agwa;Themis Prodromakis;Shiwei Wang","doi":"10.1109/OJCAS.2025.3584317","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3584317","url":null,"abstract":"Spike sorting is a critical process for decoding large-scale neural activity from extracellular recordings. The advancement of neural probes facilitates the recording of a high number of neurons with an increase in channel counts, arising a higher data volume and challenging the current on-chip spike sorters. This paper introduces L-Sort, a novel on-chip spike sorting solution featuring median-of-median spike detection and localization-based clustering. By combining the median-of-median approximation and the proposed incremental median calculation scheme, our detection module achieves a reduction in memory consumption. Moreover, the localization-based clustering utilizes geometric features instead of morphological features, thus eliminating the memory-consuming buffer for containing the spike waveform during feature extraction. Evaluation using Neuropixels datasets demonstrates that L-Sort achieves competitive sorting accuracy with reduced hardware resource consumption. Implementations on FPGA and ASIC (180 nm technology) demonstrate significant improvements in area and power efficiency compared to state-of-the-art designs while maintaining comparable accuracy. If normalized to 22 nm technology, our design can achieve roughly <inline-formula> <tex-math>$times 10$ </tex-math></inline-formula> area and power efficiency with similar accuracy, compared with the state-of-the-art design evaluated with the same dataset. Therefore, L-Sort is a promising solution for real-time, high-channel-count neural processing in implantable devices.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"205-216"},"PeriodicalIF":2.4,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11072521","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144758375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present BAG$3{++}$ , an extensible analog/mixed-signal (AMS) design framework for layout-aware design. BAG$3{++}$ realizes a unified design environment that merges schematic, layout, and verification views into a single development interface. We further introduce new automated design features that enable rapid automation and optimization across a range of performance specifications, processes, and applications. We demonstrate the practical use of these features through (a) a bit-reconfigurable successive-approximation-register (SAR) analog-to-digital converter (ADC) implemented in the open-source Skywater 130nm process and (b) an ultra-high speed output driver optimized in two modern processes. BAG$3{++}$ interfaces with both commercial and open-source design frameworks, and the extensibility of BAG$3{++}$ is further illustrated through the integration of an open-source simulator.
{"title":"BAG3++: An Extensible Generator Framework for Automated Layout-Aware AMS Design","authors":"Felicia Guo;Bob Zhou;Ayan Biswas;Paul Kwon;Zhaokai Liu;Ken Ho;Vladimir Stojanović;Borivoje Nikolić","doi":"10.1109/OJCAS.2024.3502641","DOIUrl":"https://doi.org/10.1109/OJCAS.2024.3502641","url":null,"abstract":"We present BAG<inline-formula> <tex-math>$3{++}$ </tex-math></inline-formula>, an extensible analog/mixed-signal (AMS) design framework for layout-aware design. BAG<inline-formula> <tex-math>$3{++}$ </tex-math></inline-formula> realizes a unified design environment that merges schematic, layout, and verification views into a single development interface. We further introduce new automated design features that enable rapid automation and optimization across a range of performance specifications, processes, and applications. We demonstrate the practical use of these features through (a) a bit-reconfigurable successive-approximation-register (SAR) analog-to-digital converter (ADC) implemented in the open-source Skywater 130nm process and (b) an ultra-high speed output driver optimized in two modern processes. BAG<inline-formula> <tex-math>$3{++}$ </tex-math></inline-formula> interfaces with both commercial and open-source design frameworks, and the extensibility of BAG<inline-formula> <tex-math>$3{++}$ </tex-math></inline-formula> is further illustrated through the integration of an open-source simulator.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"181-191"},"PeriodicalIF":2.4,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11052889","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-26DOI: 10.1109/OJCAS.2024.3518754
Yifei Zhu;Zhenxuan Luan;Dawei Feng;Weiwei Chen;Lei Ren;Zhangxi Tan
The escalating demand for high-performance and energy-efficient electronics has propelled 3D integrated circuits (3D ICs) as a promising solution. However, major obstacles have been the lack of specialized electronic design automation (EDA) software and standardized design flows for 3D chiplets. To bridge the gap, we introduce Open3DFlow,1 an open-source design platform for 3D ICs. It is a seven-step workflow that incorporates essential ASIC back-end processes while supporting multi-physics analysis, such as through silicon via (TSV) modeling, thermal analysis, and signal integrity (SI) evaluations. To illustrate all functionalities of Open3DFlow, we use it to implement a 3D RISC-V CPU design with a vertically stacked L2 cache on a separated die. We harden both CPU logic and 3D-cache die in a GlobalFoundries $0.18mu $ m (GF180) process with open-source PDK support. We enable face-to-face (F2F) coupling of the top and bottom die by constructing a bonding layer based on the original technology file. Open3DFlow’s open-source nature allows seamless integration of custom AI optimization algorithms. As a showcase, we leverage large language models (LLMs) to help the bonding pad placement. In addition, we apply LLM on back-end Tcl script generations to improve design productivity. We expect Open3DFlow to open up a brand-new paradigm for future 3D IC innovations.
对高性能和节能电子产品不断增长的需求推动了3D集成电路(3D ic)作为一个有前途的解决方案。然而,主要的障碍是缺乏专门的电子设计自动化(EDA)软件和3D小芯片的标准化设计流程。为了弥补这一差距,我们引入了Open3DFlow,一个3D ic的开源设计平台。这是一个七步工作流程,结合了基本的ASIC后端流程,同时支持多物理场分析,如通过硅孔(TSV)建模、热分析和信号完整性(SI)评估。为了说明Open3DFlow的所有功能,我们用它来实现一个3D RISC-V CPU设计,在一个独立的die上有一个垂直堆叠的L2缓存。我们在GlobalFoundries $0.18mu $ m (GF180)进程中强化CPU逻辑和3d缓存芯片,并支持开源PDK。我们通过基于原始技术文件构建键合层,实现了上下模具的面对面(F2F)耦合。Open3DFlow的开源特性允许自定义AI优化算法的无缝集成。作为展示,我们利用大型语言模型(llm)来帮助键合垫的放置。此外,我们将LLM应用于后端Tcl脚本生成,以提高设计效率。我们期待Open3DFlow为未来的3D集成电路创新开辟一个全新的范例。
{"title":"Revolutionize 3D-Chip Design With Open3DFlow, an Open-Source AI-Enhanced Solution","authors":"Yifei Zhu;Zhenxuan Luan;Dawei Feng;Weiwei Chen;Lei Ren;Zhangxi Tan","doi":"10.1109/OJCAS.2024.3518754","DOIUrl":"https://doi.org/10.1109/OJCAS.2024.3518754","url":null,"abstract":"The escalating demand for high-performance and energy-efficient electronics has propelled 3D integrated circuits (3D ICs) as a promising solution. However, major obstacles have been the lack of specialized electronic design automation (EDA) software and standardized design flows for 3D chiplets. To bridge the gap, we introduce Open3DFlow,<xref>1</xref> an open-source design platform for 3D ICs. It is a seven-step workflow that incorporates essential ASIC back-end processes while supporting multi-physics analysis, such as through silicon via (TSV) modeling, thermal analysis, and signal integrity (SI) evaluations. To illustrate all functionalities of <italic>Open3DFlow</i>, we use it to implement a 3D RISC-V CPU design with a vertically stacked L2 cache on a separated die. We harden both CPU logic and 3D-cache die in a GlobalFoundries <inline-formula> <tex-math>$0.18mu $ </tex-math></inline-formula>m (GF180) process with open-source PDK support. We enable face-to-face (F2F) coupling of the top and bottom die by constructing a bonding layer based on the original technology file. <italic>Open3DFlow</i>’s open-source nature allows seamless integration of custom AI optimization algorithms. As a showcase, we leverage large language models (LLMs) to help the bonding pad placement. In addition, we apply LLM on back-end Tcl script generations to improve design productivity. We expect <italic>Open3DFlow</i> to open up a brand-new paradigm for future 3D IC innovations.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"169-180"},"PeriodicalIF":2.4,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11052893","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-18DOI: 10.1109/OJCAS.2025.3580744
Inès Winandy;Arnaud Dion;Florent Manni;Pierre-Loïc Garoche;Dorra Ben Khalifa;Matthieu Martel
Precision tuning of fixed-point arithmetic is a powerful technique for optimizing hardware designs on, where computing resources and memory are often severely constrained. While fixed-point arithmetic offers significant performance and area advantages over floating-point implementations, deriving an appropriate fixed-point representation remains a challenging task. In particular, developers must carefully select the number of bits assigned to the integer and fractional parts of each variable to balance accuracy and resource consumption. In this article, we introduce an original precision tuning technique for synthesizing fixed-point programs from floating-point code, specifically targeting platforms. The distinguishing feature of our technique lies in its formal approach to error analysis: it systematically propagates numerical errors through computations to infer variable-specific fixed-point formats that guarantee user-specified accuracy bounds. Unlike heuristic or ad-hoc methods, our technique provides formal guarantees on the final accuracy of the generated code, ensuring safe deployment on hardware platforms. To enable hardware-friendly implementations, the resulting fixed-point programs use the ap_fixed data types provided by High Level Synthesis (HLS) tools, allowing fine-grained control over the precision of each variable. Our method has been implemented within the POPiX 2.0 framework, which automatically generates optimized fixed-point code ready for synthesis. Experimental results on a set of embedded benchmarks show that our fixed-point codes use predominantly fewer machine cycles than floating-point codes when compiled on an with the state-of-the-art HLS compiler by AMD. Also, our generated fixed-point codes reduce hardware resource usage, such as LUTs, flip-flops, and DSP blocks, with typical reductions ranging from 67% to 83% compared to double precision floating-point codes, depending on the application.
{"title":"Automated Fixed-Point Precision Optimization for FPGA Synthesis","authors":"Inès Winandy;Arnaud Dion;Florent Manni;Pierre-Loïc Garoche;Dorra Ben Khalifa;Matthieu Martel","doi":"10.1109/OJCAS.2025.3580744","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3580744","url":null,"abstract":"Precision tuning of fixed-point arithmetic is a powerful technique for optimizing hardware designs on, where computing resources and memory are often severely constrained. While fixed-point arithmetic offers significant performance and area advantages over floating-point implementations, deriving an appropriate fixed-point representation remains a challenging task. In particular, developers must carefully select the number of bits assigned to the integer and fractional parts of each variable to balance accuracy and resource consumption. In this article, we introduce an original precision tuning technique for synthesizing fixed-point programs from floating-point code, specifically targeting platforms. The distinguishing feature of our technique lies in its formal approach to error analysis: it systematically propagates numerical errors through computations to infer variable-specific fixed-point formats that guarantee user-specified accuracy bounds. Unlike heuristic or ad-hoc methods, our technique provides formal guarantees on the final accuracy of the generated code, ensuring safe deployment on hardware platforms. To enable hardware-friendly implementations, the resulting fixed-point programs use the ap_fixed data types provided by High Level Synthesis (HLS) tools, allowing fine-grained control over the precision of each variable. Our method has been implemented within the <sc>POPiX 2.0</small> framework, which automatically generates optimized fixed-point code ready for synthesis. Experimental results on a set of embedded benchmarks show that our fixed-point codes use predominantly fewer machine cycles than floating-point codes when compiled on an with the state-of-the-art HLS compiler by AMD. Also, our generated fixed-point codes reduce hardware resource usage, such as LUTs, flip-flops, and DSP blocks, with typical reductions ranging from 67% to 83% compared to double precision floating-point codes, depending on the application.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"192-204"},"PeriodicalIF":2.4,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11039693","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1109/OJCAS.2025.3559774
Jiovana S. Gomes;Mateus Grellert;Fábio L. L. Ramos;Sergio Bampi
The pervasive presence of video content has spurred the development of advanced technologies to manage, process, and deliver high-quality content efficiently. Video compression is crucial in providing high-quality video services under limited network and storage capacities, traditionally achieved through hybrid codecs. However, as these frameworks reach a performance bottleneck with compression gains becoming harder to achieve with conventional methods, Deep Neural Networks (DNNs) offer a promising alternative. By leveraging DNNs’ nonlinear representation capacity, these networks can enhance compression efficiency and visual quality. Neural Video Coding (NVC) has recently received significant attention, with Neural Image Coding models surpassing traditional codecs in compression ratios. Therefore, this survey explores the state-of-the-art in NVC, examining recent works, frameworks, and the potential of this innovative approach to revolutionize video compression. We identify that NVC models have come a long way since the first proposals and currently are on par in compression efficiency with the latest hybrid codec, VVC. Still, many improvements are required to enable the practical usage of NVC, such as hardware-friendly development to enable faster inference and execution on mobile and energy-constrained devices.
{"title":"End-to-End Neural Video Compression: A Review","authors":"Jiovana S. Gomes;Mateus Grellert;Fábio L. L. Ramos;Sergio Bampi","doi":"10.1109/OJCAS.2025.3559774","DOIUrl":"https://doi.org/10.1109/OJCAS.2025.3559774","url":null,"abstract":"The pervasive presence of video content has spurred the development of advanced technologies to manage, process, and deliver high-quality content efficiently. Video compression is crucial in providing high-quality video services under limited network and storage capacities, traditionally achieved through hybrid codecs. However, as these frameworks reach a performance bottleneck with compression gains becoming harder to achieve with conventional methods, Deep Neural Networks (DNNs) offer a promising alternative. By leveraging DNNs’ nonlinear representation capacity, these networks can enhance compression efficiency and visual quality. Neural Video Coding (NVC) has recently received significant attention, with Neural Image Coding models surpassing traditional codecs in compression ratios. Therefore, this survey explores the state-of-the-art in NVC, examining recent works, frameworks, and the potential of this innovative approach to revolutionize video compression. We identify that NVC models have come a long way since the first proposals and currently are on par in compression efficiency with the latest hybrid codec, VVC. Still, many improvements are required to enable the practical usage of NVC, such as hardware-friendly development to enable faster inference and execution on mobile and energy-constrained devices.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"120-134"},"PeriodicalIF":2.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10962175","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143848781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}