Pub Date : 2025-12-01Epub Date: 2025-10-11DOI: 10.1016/j.micpro.2025.105207
Marcello Barbirotta, Marco Angioli, Antonio Mastrandrea, Francesco Menichelli, Marco Pisani, Mauro Olivieri
As device dimensions shrink and operating frequencies increase in modern technologies, Single Event Transient faults present significant challenges. These arises from the susceptibility to radiation-induced errors and decreasing feature sizes, which can propagate through logic circuits and result in incorrect system behavior, reducing reliability, particularly concerning internal nodes of combinational voting circuits.
This paper emphasizes the importance of voting schemes focusing on specific Dual Modular Redundancy lock-step architectures where the voting system is made of a comparator with parity and a recovery signal. The study includes both theoretical and practical fault injection analyses and proposes a novel voting structure designed to reduce the failure rate to 0.4% in cases of Input-Internal faults. This achievement represents the lowest failure rate reported in the literature when compared to conventional DMR lock-step comparators and Self voter approaches without filtering mechanisms. The proposed solution significantly enhances fault resilience, with only a slight increase in hardware utilization and frequency performance.
{"title":"Fault tolerant voting circuits: A Dual-Modular-Redundancy approach for Single-Event-Transient mitigation","authors":"Marcello Barbirotta, Marco Angioli, Antonio Mastrandrea, Francesco Menichelli, Marco Pisani, Mauro Olivieri","doi":"10.1016/j.micpro.2025.105207","DOIUrl":"10.1016/j.micpro.2025.105207","url":null,"abstract":"<div><div>As device dimensions shrink and operating frequencies increase in modern technologies, Single Event Transient faults present significant challenges. These arises from the susceptibility to radiation-induced errors and decreasing feature sizes, which can propagate through logic circuits and result in incorrect system behavior, reducing reliability, particularly concerning internal nodes of combinational voting circuits.</div><div>This paper emphasizes the importance of voting schemes focusing on specific Dual Modular Redundancy lock-step architectures where the voting system is made of a comparator with parity and a recovery signal. The study includes both theoretical and practical fault injection analyses and proposes a novel voting structure designed to reduce the failure rate to 0.4% in cases of Input-Internal faults. This achievement represents the lowest failure rate reported in the literature when compared to conventional DMR lock-step comparators and Self voter approaches without filtering mechanisms. The proposed solution significantly enhances fault resilience, with only a slight increase in hardware utilization and frequency performance.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105207"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145324184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-20DOI: 10.1016/j.micpro.2025.105219
Roberto Rocco, Francesco Gianchino, Antonio Miele, Gianluca Palermo
Nowadays, most computing systems experience highly dynamic workloads with performance-demanding applications entering and leaving the system with an unpredictable trend. Ensuring their performance guarantees led to the design of adaptive mechanisms, including (i) application autotuners, able to optimize algorithmic parameters (e.g., frame resolution in a video processing application), and (ii) runtime resource management to distribute computing resources among the running applications and tune architectural knobs (e.g., frequency scaling). Past work investigates the two directions separately, acting on a limited set of control knobs and objective functions; instead, this work proposes a combined framework to integrate these two complementary approaches in a single two-level governor acting on the overall hardware/software stack. The resource manager incorporates a policy for computing resource distribution and architectural knobs to guarantee the required performance of each application while limiting the side effect on results quality and minimizing system power consumption. Meanwhile, the autotuner manages the applications’ software knobs, ensuring results’ quality and performance constraint satisfaction while hiding application details from the controller. Experimental evaluation carried out on a homogeneous architecture for workstation machines demonstrates that the proposed framework is stable and can save more than 72% of the power consumed by one-layer solutions.
{"title":"Power/accuracy-aware dynamic workload optimization combining application autotuning and runtime resource management on homogeneous architectures","authors":"Roberto Rocco, Francesco Gianchino, Antonio Miele, Gianluca Palermo","doi":"10.1016/j.micpro.2025.105219","DOIUrl":"10.1016/j.micpro.2025.105219","url":null,"abstract":"<div><div>Nowadays, most computing systems experience highly dynamic workloads with performance-demanding applications entering and leaving the system with an unpredictable trend. Ensuring their performance guarantees led to the design of adaptive mechanisms, including (i) application autotuners, able to optimize algorithmic parameters (e.g., frame resolution in a video processing application), and (ii) runtime resource management to distribute computing resources among the running applications and tune architectural knobs (e.g., frequency scaling). Past work investigates the two directions separately, acting on a limited set of control knobs and objective functions; instead, this work proposes a combined framework to integrate these two complementary approaches in a single two-level governor acting on the overall hardware/software stack. The resource manager incorporates a policy for computing resource distribution and architectural knobs to guarantee the required performance of each application while limiting the side effect on results quality and minimizing system power consumption. Meanwhile, the autotuner manages the applications’ software knobs, ensuring results’ quality and performance constraint satisfaction while hiding application details from the controller. Experimental evaluation carried out on a homogeneous architecture for workstation machines demonstrates that the proposed framework is stable and can save more than 72% of the power consumed by one-layer solutions.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105219"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-09-03DOI: 10.1016/j.micpro.2025.105187
Patrick Plagwitz , Frank Hannig , Jürgen Teich , Oliver Keszocze
Neural Networks (NNs) are a very active field of research that also has wide-ranging applications in industry. An emerging type of NN that is promising for hardware acceleration and low energy requirements are Spiking Neural Networks (SNNs). But design automation in terms of accelerator circuit generation is still lacking proper search techniques for optimization of network parameters including the selection of proper neuron models and spike encodings. They are often restricted to implement a single network setting and/or a fixed hardware architecture.
In this paper, we present a novel multi-layer Domain-Specific Language (DSL) for constructing sequential circuits, including building blocks for pipelines supporting hazard detection. As the host language, we use Chisel, a hardware construction language allowing to express hardware at Register-Transfer Level and above. In contrast to applying High-Level Synthesis, we introduce a domain-specific language (DSL) for SNN accelerator design based on Chisel by defining building blocks for SNNs. After introducing this DSL, we present a full SNN accelerator generation framework that covers all phases, from training to deployment. Also proposed is a design space exploration for various SNN accelerator designs using different neuron models, their parametrizations as well as spike encodings. The generated designs are evaluated in terms of execution time, power consumption, classification accuracy, and resource usage when mapped to Field-Programmable Gate Arrays (FPGAs) for the MNIST, Fashion-MNIST, SVHN, and CIFAR-10 data sets.
{"title":"DSL-based SNN accelerator design using Chisel","authors":"Patrick Plagwitz , Frank Hannig , Jürgen Teich , Oliver Keszocze","doi":"10.1016/j.micpro.2025.105187","DOIUrl":"10.1016/j.micpro.2025.105187","url":null,"abstract":"<div><div>Neural Networks (NNs) are a very active field of research that also has wide-ranging applications in industry. An emerging type of NN that is promising for hardware acceleration and low energy requirements are Spiking Neural Networks (SNNs). But design automation in terms of accelerator circuit generation is still lacking proper search techniques for optimization of network parameters including the selection of proper neuron models and spike encodings. They are often restricted to implement a single network setting and/or a fixed hardware architecture.</div><div>In this paper, we present a novel multi-layer Domain-Specific Language (DSL) for constructing sequential circuits, including building blocks for pipelines supporting hazard detection. As the host language, we use Chisel, a hardware construction language allowing to express hardware at Register-Transfer Level and above. In contrast to applying High-Level Synthesis, we introduce a domain-specific language (DSL) for SNN accelerator design based on Chisel by defining building blocks for SNNs. After introducing this DSL, we present a full SNN accelerator generation framework that covers all phases, from training to deployment. Also proposed is a design space exploration for various SNN accelerator designs using different neuron models, their parametrizations as well as spike encodings. The generated designs are evaluated in terms of execution time, power consumption, classification accuracy, and resource usage when mapped to Field-Programmable Gate Arrays (FPGAs) for the MNIST, Fashion-MNIST, SVHN, and CIFAR-10 data sets.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105187"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-08-26DOI: 10.1016/j.micpro.2025.105191
Nima Kolahimahmoudi, Giorgio Insinga, Paolo Bernardi
The low observability of analog signals inside modern low-area system-on-chips (SoCs) results in an increasing need for Design for Testability (DfT) solutions. These solutions demand an optimal circuit design in terms of area, power consumption, and precision, with a focus on minimizing area overhead per SoC circuit blocks. To address this demand, we present a 6-bit, low-area Hybrid Analog-to-Digital Converter (ADC) that measures analog voltage inside SoCs locally. The proposed Hybrid ADC consists of two sub-ADCs: A 3-bit SAR ADC for coarse measurements and a 3-bit Flash ADC for fine measurements.
The advantage of the proposed ADC design is its low additional area cost to each IP of SoCs due to its specific design. It can also have a shared fine Flash part, which has the dominant area in the design. This ADC design converts the analog signals, which are difficult to read from SoC pins, to the digital domain, where they are easy to route and observe.
The suggested ADC is designed and analyzed using the 130 nm technology of Infineon, and it has a total area of 0.007 mm2. The areas of the fine Flash and coarse SAR parts are 0.0015 mm2 and 0.0042 mm2 respectively. The Signal-to-Noise Distortion Ratio (SNDR) of the design is 37 dB, and the Figure of Merit (FoM) is 2.15 pJ/conv.
{"title":"Extended design and linearity analysis of a 6-bit low-area hybrid ADC design for local system-on-chip measurements","authors":"Nima Kolahimahmoudi, Giorgio Insinga, Paolo Bernardi","doi":"10.1016/j.micpro.2025.105191","DOIUrl":"10.1016/j.micpro.2025.105191","url":null,"abstract":"<div><div>The low observability of analog signals inside modern low-area system-on-chips (SoCs) results in an increasing need for Design for Testability (DfT) solutions. These solutions demand an optimal circuit design in terms of area, power consumption, and precision, with a focus on minimizing area overhead per SoC circuit blocks. To address this demand, we present a 6-bit, low-area Hybrid Analog-to-Digital Converter (ADC) that measures analog voltage inside SoCs locally. The proposed Hybrid ADC consists of two sub-ADCs: A 3-bit SAR ADC for coarse measurements and a 3-bit Flash ADC for fine measurements.</div><div>The advantage of the proposed ADC design is its low additional area cost to each IP of SoCs due to its specific design. It can also have a shared fine Flash part, which has the dominant area in the design. This ADC design converts the analog signals, which are difficult to read from SoC pins, to the digital domain, where they are easy to route and observe.</div><div>The suggested ADC is designed and analyzed using the 130<!--> <!-->nm technology of Infineon, and it has a total area of 0.007<!--> <!-->mm<sup>2</sup>. The areas of the fine Flash and coarse SAR parts are 0.0015<!--> <!-->mm<sup>2</sup> and 0.0042<!--> <!-->mm<sup>2</sup> respectively. The Signal-to-Noise Distortion Ratio (SNDR) of the design is 37<!--> <!-->dB, and the Figure of Merit (FoM) is 2.15<!--> <!-->pJ/conv.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105191"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144921812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-09-25DOI: 10.1016/j.micpro.2025.105205
Khushboo Jain , Akansha Singh
Wireless Sensor Networks (WSNs) are increasingly embedded in mission-critical infrastructures, yet their constrained resources make conventional cryptographic solutions unsuitable. Existing hierarchical key management schemes, such as the RB method, provide partial protection but remain vulnerable to impersonation, replay, and node capture attacks. To address these challenges, we propose IHKEM (Improved Hierarchical Key Establishment and Management), a lightweight yet robust protocol that integrates symmetric and asymmetric primitives for mutual authentication, dynamic session key establishment, and end-to-end confidentiality. Unlike static key distribution methods, IHKEM eliminates unilateral key control, employs nonce- and timestamp-based validation for replay resistance, and supports adaptive key refreshing to preserve forward and backward secrecy. Extensive NS-2.35 simulations demonstrate that IHKEM significantly reduces energy consumption (∼15–20% over RB), improves flexibility against node compromise (>80% uncompromised links under 15% capture), extends network lifetime (delayed FND/HND thresholds), lowers memory footprint (∼20–25% reduction), while incurring only ∼3% higher overhead compared to lightweight schemes such as SEE2PK. Beyond its immediate gains, IHKEM’s modular architecture ensures post-quantum readiness, enabling seamless integration of lattice-based key encapsulation and signature schemes. This work bridges the gap between efficiency, resilience, and long-term cryptographic sustainability in WSNs.
{"title":"IHKEM: A post-quantum ready hierarchical key establishment and management scheme for wireless sensor networks","authors":"Khushboo Jain , Akansha Singh","doi":"10.1016/j.micpro.2025.105205","DOIUrl":"10.1016/j.micpro.2025.105205","url":null,"abstract":"<div><div>Wireless Sensor Networks (WSNs) are increasingly embedded in mission-critical infrastructures, yet their constrained resources make conventional cryptographic solutions unsuitable. Existing hierarchical key management schemes, such as the RB method, provide partial protection but remain vulnerable to impersonation, replay, and node capture attacks. To address these challenges, we propose IHKEM (Improved Hierarchical Key Establishment and Management), a lightweight yet robust protocol that integrates symmetric and asymmetric primitives for mutual authentication, dynamic session key establishment, and end-to-end confidentiality. Unlike static key distribution methods, IHKEM eliminates unilateral key control, employs nonce- and timestamp-based validation for replay resistance, and supports adaptive key refreshing to preserve forward and backward secrecy. Extensive NS-2.35 simulations demonstrate that IHKEM significantly reduces energy consumption (∼15–20% over RB), improves flexibility against node compromise (>80% uncompromised links under 15% capture), extends network lifetime (delayed FND/HND thresholds), lowers memory footprint (∼20–25% reduction), while incurring only ∼3% higher overhead compared to lightweight schemes such as SEE2PK. Beyond its immediate gains, IHKEM’s modular architecture ensures post-quantum readiness, enabling seamless integration of lattice-based key encapsulation and signature schemes. This work bridges the gap between efficiency, resilience, and long-term cryptographic sustainability in WSNs.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105205"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-09-22DOI: 10.1016/j.micpro.2025.105202
Wei Zeng, Yuzhou Xiao, Yiru Wang, Caihua Chen, Sulan He
Real-time multi-point, full-scene monitoring with low cost, low power consumption, low communication overhead, and front-end deployment is a current research focus in fire detection technology. This paper investigates and implements fire detection technology on the low-computation ZYNQ platform based on deep learning, aiming to provide a cost-effective, highly efficient, and reliable fire detection solution. Firstly, we propose a lightweight network model, YOLO-Fire, which incorporates modifications like replacing standard convolutions with depthwise separable convolutions, adding the ECA attention mechanism, and introducing multi-scale feature fusion to suit the memory and computational limitations of the ZYNQ device. Additionally, we designed a hardware accelerator IP core for the ZYNQ7020 platform using a specific loop tiling strategy, constraint statements, and a dual-dimensional parallel optimization of convolution input and output channels. Combined with fixed-point quantization and resource optimization, this implementation achieves efficient acceleration of convolution, pooling, and upsampling layers. Experimental results show that YOLO-Fire improves accuracy, recall, and F1-score on the BoWFire public flame dataset and a self-constructed flame dataset. Additionally, the average inference time on the ZYNQ platform is approximately 74.43 times faster than on mainstream ARM AI platforms, verifying the effectiveness of the proposed acceleration approach.
{"title":"Design and implementation of a hardware accelerator IP core for improved lightweight deep learning model","authors":"Wei Zeng, Yuzhou Xiao, Yiru Wang, Caihua Chen, Sulan He","doi":"10.1016/j.micpro.2025.105202","DOIUrl":"10.1016/j.micpro.2025.105202","url":null,"abstract":"<div><div>Real-time multi-point, full-scene monitoring with low cost, low power consumption, low communication overhead, and front-end deployment is a current research focus in fire detection technology. This paper investigates and implements fire detection technology on the low-computation ZYNQ platform based on deep learning, aiming to provide a cost-effective, highly efficient, and reliable fire detection solution. Firstly, we propose a lightweight network model, YOLO-Fire, which incorporates modifications like replacing standard convolutions with depthwise separable convolutions, adding the ECA attention mechanism, and introducing multi-scale feature fusion to suit the memory and computational limitations of the ZYNQ device. Additionally, we designed a hardware accelerator IP core for the ZYNQ7020 platform using a specific loop tiling strategy, constraint statements, and a dual-dimensional parallel optimization of convolution input and output channels. Combined with fixed-point quantization and resource optimization, this implementation achieves efficient acceleration of convolution, pooling, and upsampling layers. Experimental results show that YOLO-Fire improves accuracy, recall, and F1-score on the BoWFire public flame dataset and a self-constructed flame dataset. Additionally, the average inference time on the ZYNQ platform is approximately 74.43 times faster than on mainstream ARM AI platforms, verifying the effectiveness of the proposed acceleration approach.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105202"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-08-22DOI: 10.1016/j.micpro.2025.105192
Domenico Zito
This manuscript addresses the severe design challenge for the implementation of microwave and mm-wave control-and-readout ICs enabling the implementation of monolithic Silicon quantum processors (QPs).
For the first time, we describe the circuit design challenge within a unitary frame and provide some general considerations about requirements, technology and performances, as a reference for future developments. In support of the discussion and considerations, we report also some results emerged from our work envisioned and carried out within our research and developments toward monolithic QPs. In particular, we address the key aspects leading to the new design paradigm enabling qubit-size low-power CMOS ICs for qubit control and readout for monolithic QPs and summarize the main characteristics and results exhibited by some representative key building blocks. These circuit solutions open to a new class of low-power mm-wave circuits made of a few MOSFETs, without spiral inductors or other large and lossy distributed passive components, resulting in a characteristic size close to our qubit devices, namely — qubit-size low-power cryogenic ICs, as key enabling solutions for monolithic QPs scalable to a large number of qubits.
{"title":"Qubit-size low-power cryogenic CMOS ICs for monolithic quantum processors","authors":"Domenico Zito","doi":"10.1016/j.micpro.2025.105192","DOIUrl":"10.1016/j.micpro.2025.105192","url":null,"abstract":"<div><div>This manuscript addresses the severe design challenge for the implementation of microwave and mm-wave control-and-readout ICs enabling the implementation of monolithic Silicon quantum processors (QPs).</div><div>For the first time, we describe the circuit design challenge within a unitary frame and provide some general considerations about requirements, technology and performances, as a reference for future developments. In support of the discussion and considerations, we report also some results emerged from our work envisioned and carried out within our research and developments toward monolithic QPs. In particular, we address the key aspects leading to the new design paradigm enabling qubit-size low-power CMOS ICs for qubit control and readout for monolithic QPs and summarize the main characteristics and results exhibited by some representative key building blocks. These circuit solutions open to a new class of low-power mm-wave circuits made of a few MOSFETs, without spiral inductors or other large and lossy distributed passive components, resulting in a characteristic size close to our qubit devices, namely — qubit-size low-power cryogenic ICs, as key enabling solutions for monolithic QPs scalable to a large number of qubits.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105192"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-10-03DOI: 10.1016/j.micpro.2025.105206
Mohammed Chghaf, Sergio Rodríguez Flórez, Abdelhafid El Ouardi
Place recognition plays a crucial role in the Simultaneous Localization and Mapping (SLAM) process of self-driving cars. Over time, motion estimation is prone to accumulating errors, leading to drift. The ability to accurately recognize previously visited areas through the place recognition system allows for the correction of these drift errors in real-time. Recognizing places based on the structural aspects of the environment tends to be more resilient against variations in lighting, which can cause incorrect identifications when using feature-based descriptors. Nevertheless, research has predominantly focused on using depth sensors for this purpose. Inspired by a LiDAR-based approach, we introduce an inter-modal geometric descriptor that leverages the structural information obtained through a stereo camera.
Using this descriptor, we can achieve real-time place recognition by focusing on the structural appearance of the scene derived from a 3D vision system. Our experiments on the KITTI dataset and our self-collected dataset show that the proposed approach is comparable to state-of-the-art methods, all while being low-cost. We studied the algorithm’s complexity to propose an optimized parallelization on GPU and FPGA architectures. Performance evaluation on different hardware (Jetson AGX Xavier and Arria 10 SoC) shows that the real-time requirements of an embedded system are met. Compared to a CPU implementation, processing times showed a speed-up between 4x and 10x, depending on the architecture.
{"title":"Towards an embedded architecture based back-end processing for AGV SLAM applications","authors":"Mohammed Chghaf, Sergio Rodríguez Flórez, Abdelhafid El Ouardi","doi":"10.1016/j.micpro.2025.105206","DOIUrl":"10.1016/j.micpro.2025.105206","url":null,"abstract":"<div><div>Place recognition plays a crucial role in the Simultaneous Localization and Mapping (SLAM) process of self-driving cars. Over time, motion estimation is prone to accumulating errors, leading to drift. The ability to accurately recognize previously visited areas through the place recognition system allows for the correction of these drift errors in real-time. Recognizing places based on the structural aspects of the environment tends to be more resilient against variations in lighting, which can cause incorrect identifications when using feature-based descriptors. Nevertheless, research has predominantly focused on using depth sensors for this purpose. Inspired by a LiDAR-based approach, we introduce an inter-modal geometric descriptor that leverages the structural information obtained through a stereo camera.</div><div>Using this descriptor, we can achieve real-time place recognition by focusing on the structural appearance of the scene derived from a 3D vision system. Our experiments on the KITTI dataset and our self-collected dataset show that the proposed approach is comparable to state-of-the-art methods, all while being low-cost. We studied the algorithm’s complexity to propose an optimized parallelization on GPU and FPGA architectures. Performance evaluation on different hardware (Jetson AGX Xavier and Arria 10 SoC) shows that the real-time requirements of an embedded system are met. Compared to a CPU implementation, processing times showed a speed-up between 4x and 10x, depending on the architecture.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105206"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-09-02DOI: 10.1016/j.micpro.2025.105199
Luca Müller , Rolf Drechsler
Verification is an essential step in the design process of microprocessors. A complete coverage can only be ensured by formal methods, which tend to have exponential runtimes in the general case. Polynomial Formal Verification addresses this issue, opening a research field focused on providing formal methods which can ensure 100% correctness along with predictable and manageable time and space complexity. In this work, two SAT-based verification approaches in the field of PFV are presented. For both the verification of the cutwidth decomposition on the Circuit-CNF and the verification of the cutwidth decomposition on the Circuit-AIG, it is proven that their time complexity is parameterized by their respective cutwidth. This enables the definition of a class of circuits with constant cutwidth, for which verification can be ensured in linear time. After the theoretical considerations, both approaches are experimentally evaluated on the case study of adder circuits, underlining the established theoretical bounds. Finally, both approaches are compared and their significance in the research filed of PFV are stated.
{"title":"Polynomial formal verification parameterized by cutwidth properties of a circuit using Boolean satisfiability","authors":"Luca Müller , Rolf Drechsler","doi":"10.1016/j.micpro.2025.105199","DOIUrl":"10.1016/j.micpro.2025.105199","url":null,"abstract":"<div><div>Verification is an essential step in the design process of microprocessors. A complete coverage can only be ensured by formal methods, which tend to have exponential runtimes in the general case. Polynomial Formal Verification addresses this issue, opening a research field focused on providing formal methods which can ensure 100% correctness along with predictable and manageable time and space complexity. In this work, two SAT-based verification approaches in the field of PFV are presented. For both the verification of the cutwidth decomposition on the Circuit-CNF and the verification of the cutwidth decomposition on the Circuit-AIG, it is proven that their time complexity is parameterized by their respective cutwidth. This enables the definition of a class of circuits with constant cutwidth, for which verification can be ensured in linear time. After the theoretical considerations, both approaches are experimentally evaluated on the case study of adder circuits, underlining the established theoretical bounds. Finally, both approaches are compared and their significance in the research filed of PFV are stated.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105199"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145010698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-09-25DOI: 10.1016/j.micpro.2025.105203
Noïc Crouzet, Thomas Carle, Christine Rochange
This paper presents architectural design solutions aimed at improving the timing predictability of GPU pipelines, with a particular focus on the behavior of hardware schedulers in the fetch and issue stages. We argue that without coordination between these schedulers at each cycle, the timing behavior of the GPU is unpredictable. We show how coordination can be enforced and prove that our solution achieves a predictable behavior. We have implemented it in a modified version of the open-source Vortex GPU, synthesized for an AMD Xilinx FPGA. We evaluate the overhead of the approach both in terms of FPGA resources and execution time.
{"title":"Time-predictable warp scheduling in a GPU","authors":"Noïc Crouzet, Thomas Carle, Christine Rochange","doi":"10.1016/j.micpro.2025.105203","DOIUrl":"10.1016/j.micpro.2025.105203","url":null,"abstract":"<div><div>This paper presents architectural design solutions aimed at improving the timing predictability of GPU pipelines, with a particular focus on the behavior of hardware schedulers in the fetch and issue stages. We argue that without coordination between these schedulers at each cycle, the timing behavior of the GPU is unpredictable. We show how coordination can be enforced and prove that our solution achieves a predictable behavior. We have implemented it in a modified version of the open-source Vortex GPU, synthesized for an AMD Xilinx FPGA. We evaluate the overhead of the approach both in terms of FPGA resources and execution time.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105203"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}