Pub Date : 2025-09-25DOI: 10.1016/j.micpro.2025.105203
Noïc Crouzet, Thomas Carle, Christine Rochange
This paper presents architectural design solutions aimed at improving the timing predictability of GPU pipelines, with a particular focus on the behavior of hardware schedulers in the fetch and issue stages. We argue that without coordination between these schedulers at each cycle, the timing behavior of the GPU is unpredictable. We show how coordination can be enforced and prove that our solution achieves a predictable behavior. We have implemented it in a modified version of the open-source Vortex GPU, synthesized for an AMD Xilinx FPGA. We evaluate the overhead of the approach both in terms of FPGA resources and execution time.
{"title":"Time-predictable warp scheduling in a GPU","authors":"Noïc Crouzet, Thomas Carle, Christine Rochange","doi":"10.1016/j.micpro.2025.105203","DOIUrl":"10.1016/j.micpro.2025.105203","url":null,"abstract":"<div><div>This paper presents architectural design solutions aimed at improving the timing predictability of GPU pipelines, with a particular focus on the behavior of hardware schedulers in the fetch and issue stages. We argue that without coordination between these schedulers at each cycle, the timing behavior of the GPU is unpredictable. We show how coordination can be enforced and prove that our solution achieves a predictable behavior. We have implemented it in a modified version of the open-source Vortex GPU, synthesized for an AMD Xilinx FPGA. We evaluate the overhead of the approach both in terms of FPGA resources and execution time.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105203"},"PeriodicalIF":2.6,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-22DOI: 10.1016/j.micpro.2025.105202
Wei Zeng, Yuzhou Xiao, Yiru Wang, Caihua Chen, Sulan He
Real-time multi-point, full-scene monitoring with low cost, low power consumption, low communication overhead, and front-end deployment is a current research focus in fire detection technology. This paper investigates and implements fire detection technology on the low-computation ZYNQ platform based on deep learning, aiming to provide a cost-effective, highly efficient, and reliable fire detection solution. Firstly, we propose a lightweight network model, YOLO-Fire, which incorporates modifications like replacing standard convolutions with depthwise separable convolutions, adding the ECA attention mechanism, and introducing multi-scale feature fusion to suit the memory and computational limitations of the ZYNQ device. Additionally, we designed a hardware accelerator IP core for the ZYNQ7020 platform using a specific loop tiling strategy, constraint statements, and a dual-dimensional parallel optimization of convolution input and output channels. Combined with fixed-point quantization and resource optimization, this implementation achieves efficient acceleration of convolution, pooling, and upsampling layers. Experimental results show that YOLO-Fire improves accuracy, recall, and F1-score on the BoWFire public flame dataset and a self-constructed flame dataset. Additionally, the average inference time on the ZYNQ platform is approximately 74.43 times faster than on mainstream ARM AI platforms, verifying the effectiveness of the proposed acceleration approach.
{"title":"Design and implementation of a hardware accelerator IP core for improved lightweight deep learning model","authors":"Wei Zeng, Yuzhou Xiao, Yiru Wang, Caihua Chen, Sulan He","doi":"10.1016/j.micpro.2025.105202","DOIUrl":"10.1016/j.micpro.2025.105202","url":null,"abstract":"<div><div>Real-time multi-point, full-scene monitoring with low cost, low power consumption, low communication overhead, and front-end deployment is a current research focus in fire detection technology. This paper investigates and implements fire detection technology on the low-computation ZYNQ platform based on deep learning, aiming to provide a cost-effective, highly efficient, and reliable fire detection solution. Firstly, we propose a lightweight network model, YOLO-Fire, which incorporates modifications like replacing standard convolutions with depthwise separable convolutions, adding the ECA attention mechanism, and introducing multi-scale feature fusion to suit the memory and computational limitations of the ZYNQ device. Additionally, we designed a hardware accelerator IP core for the ZYNQ7020 platform using a specific loop tiling strategy, constraint statements, and a dual-dimensional parallel optimization of convolution input and output channels. Combined with fixed-point quantization and resource optimization, this implementation achieves efficient acceleration of convolution, pooling, and upsampling layers. Experimental results show that YOLO-Fire improves accuracy, recall, and F1-score on the BoWFire public flame dataset and a self-constructed flame dataset. Additionally, the average inference time on the ZYNQ platform is approximately 74.43 times faster than on mainstream ARM AI platforms, verifying the effectiveness of the proposed acceleration approach.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105202"},"PeriodicalIF":2.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-15DOI: 10.1016/j.micpro.2025.105204
Marko Andjelkovic , Nebojsa Maletic , Nicola Miglioranza , Milos Krstic , Enrico Koeck , Jan Buchholz , Maike Taddiken , Markus Fehrenz , Shaden Baradie , Dirk Wübben , Markus Breitbach
The integration of conventional terrestrial wireless communication networks and non-terrestrial networks (NTNs) is the main prerequisite for achieving global connectivity in the next generation (6G) wireless communications. Such integrated communication networks are usually referred to as the unified 3D networks. These networks need to meet the requirements for 6G communications in terms of higher data rates, as well as enhanced reliability, security and network reconfigurability. To achieve these goals, new technologies and components have to be developed. This work introduces the German project 6G-TakeOff, aimed at the development of innovative solutions for unified 3D networks. The project consortium brings together leading academic and industrial partners, covering the entire value chain from design of electronics to applications. In this work, the focus is on the development of key hardware components to support the wireless communication in 3D unified networks. The design concept for each component and the planned demonstrators are presented.
{"title":"Key components for unified 3D wireless communication networks","authors":"Marko Andjelkovic , Nebojsa Maletic , Nicola Miglioranza , Milos Krstic , Enrico Koeck , Jan Buchholz , Maike Taddiken , Markus Fehrenz , Shaden Baradie , Dirk Wübben , Markus Breitbach","doi":"10.1016/j.micpro.2025.105204","DOIUrl":"10.1016/j.micpro.2025.105204","url":null,"abstract":"<div><div>The integration of conventional terrestrial wireless communication networks and non-terrestrial networks (NTNs) is the main prerequisite for achieving global connectivity in the next generation (6G) wireless communications. Such integrated communication networks are usually referred to as the unified 3D networks. These networks need to meet the requirements for 6G communications in terms of higher data rates, as well as enhanced reliability, security and network reconfigurability. To achieve these goals, new technologies and components have to be developed. This work introduces the German project 6G-TakeOff, aimed at the development of innovative solutions for unified 3D networks. The project consortium brings together leading academic and industrial partners, covering the entire value chain from design of electronics to applications. In this work, the focus is on the development of key hardware components to support the wireless communication in 3D unified networks. The design concept for each component and the planned demonstrators are presented.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105204"},"PeriodicalIF":2.6,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145419087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-03DOI: 10.1016/j.micpro.2025.105187
Patrick Plagwitz , Frank Hannig , Jürgen Teich , Oliver Keszocze
Neural Networks (NNs) are a very active field of research that also has wide-ranging applications in industry. An emerging type of NN that is promising for hardware acceleration and low energy requirements are Spiking Neural Networks (SNNs). But design automation in terms of accelerator circuit generation is still lacking proper search techniques for optimization of network parameters including the selection of proper neuron models and spike encodings. They are often restricted to implement a single network setting and/or a fixed hardware architecture.
In this paper, we present a novel multi-layer Domain-Specific Language (DSL) for constructing sequential circuits, including building blocks for pipelines supporting hazard detection. As the host language, we use Chisel, a hardware construction language allowing to express hardware at Register-Transfer Level and above. In contrast to applying High-Level Synthesis, we introduce a domain-specific language (DSL) for SNN accelerator design based on Chisel by defining building blocks for SNNs. After introducing this DSL, we present a full SNN accelerator generation framework that covers all phases, from training to deployment. Also proposed is a design space exploration for various SNN accelerator designs using different neuron models, their parametrizations as well as spike encodings. The generated designs are evaluated in terms of execution time, power consumption, classification accuracy, and resource usage when mapped to Field-Programmable Gate Arrays (FPGAs) for the MNIST, Fashion-MNIST, SVHN, and CIFAR-10 data sets.
{"title":"DSL-based SNN accelerator design using Chisel","authors":"Patrick Plagwitz , Frank Hannig , Jürgen Teich , Oliver Keszocze","doi":"10.1016/j.micpro.2025.105187","DOIUrl":"10.1016/j.micpro.2025.105187","url":null,"abstract":"<div><div>Neural Networks (NNs) are a very active field of research that also has wide-ranging applications in industry. An emerging type of NN that is promising for hardware acceleration and low energy requirements are Spiking Neural Networks (SNNs). But design automation in terms of accelerator circuit generation is still lacking proper search techniques for optimization of network parameters including the selection of proper neuron models and spike encodings. They are often restricted to implement a single network setting and/or a fixed hardware architecture.</div><div>In this paper, we present a novel multi-layer Domain-Specific Language (DSL) for constructing sequential circuits, including building blocks for pipelines supporting hazard detection. As the host language, we use Chisel, a hardware construction language allowing to express hardware at Register-Transfer Level and above. In contrast to applying High-Level Synthesis, we introduce a domain-specific language (DSL) for SNN accelerator design based on Chisel by defining building blocks for SNNs. After introducing this DSL, we present a full SNN accelerator generation framework that covers all phases, from training to deployment. Also proposed is a design space exploration for various SNN accelerator designs using different neuron models, their parametrizations as well as spike encodings. The generated designs are evaluated in terms of execution time, power consumption, classification accuracy, and resource usage when mapped to Field-Programmable Gate Arrays (FPGAs) for the MNIST, Fashion-MNIST, SVHN, and CIFAR-10 data sets.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105187"},"PeriodicalIF":2.6,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-02DOI: 10.1016/j.micpro.2025.105199
Luca Müller , Rolf Drechsler
Verification is an essential step in the design process of microprocessors. A complete coverage can only be ensured by formal methods, which tend to have exponential runtimes in the general case. Polynomial Formal Verification addresses this issue, opening a research field focused on providing formal methods which can ensure 100% correctness along with predictable and manageable time and space complexity. In this work, two SAT-based verification approaches in the field of PFV are presented. For both the verification of the cutwidth decomposition on the Circuit-CNF and the verification of the cutwidth decomposition on the Circuit-AIG, it is proven that their time complexity is parameterized by their respective cutwidth. This enables the definition of a class of circuits with constant cutwidth, for which verification can be ensured in linear time. After the theoretical considerations, both approaches are experimentally evaluated on the case study of adder circuits, underlining the established theoretical bounds. Finally, both approaches are compared and their significance in the research filed of PFV are stated.
{"title":"Polynomial formal verification parameterized by cutwidth properties of a circuit using Boolean satisfiability","authors":"Luca Müller , Rolf Drechsler","doi":"10.1016/j.micpro.2025.105199","DOIUrl":"10.1016/j.micpro.2025.105199","url":null,"abstract":"<div><div>Verification is an essential step in the design process of microprocessors. A complete coverage can only be ensured by formal methods, which tend to have exponential runtimes in the general case. Polynomial Formal Verification addresses this issue, opening a research field focused on providing formal methods which can ensure 100% correctness along with predictable and manageable time and space complexity. In this work, two SAT-based verification approaches in the field of PFV are presented. For both the verification of the cutwidth decomposition on the Circuit-CNF and the verification of the cutwidth decomposition on the Circuit-AIG, it is proven that their time complexity is parameterized by their respective cutwidth. This enables the definition of a class of circuits with constant cutwidth, for which verification can be ensured in linear time. After the theoretical considerations, both approaches are experimentally evaluated on the case study of adder circuits, underlining the established theoretical bounds. Finally, both approaches are compared and their significance in the research filed of PFV are stated.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105199"},"PeriodicalIF":2.6,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145010698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-26DOI: 10.1016/j.micpro.2025.105191
Nima Kolahimahmoudi, Giorgio Insinga, Paolo Bernardi
The low observability of analog signals inside modern low-area system-on-chips (SoCs) results in an increasing need for Design for Testability (DfT) solutions. These solutions demand an optimal circuit design in terms of area, power consumption, and precision, with a focus on minimizing area overhead per SoC circuit blocks. To address this demand, we present a 6-bit, low-area Hybrid Analog-to-Digital Converter (ADC) that measures analog voltage inside SoCs locally. The proposed Hybrid ADC consists of two sub-ADCs: A 3-bit SAR ADC for coarse measurements and a 3-bit Flash ADC for fine measurements.
The advantage of the proposed ADC design is its low additional area cost to each IP of SoCs due to its specific design. It can also have a shared fine Flash part, which has the dominant area in the design. This ADC design converts the analog signals, which are difficult to read from SoC pins, to the digital domain, where they are easy to route and observe.
The suggested ADC is designed and analyzed using the 130 nm technology of Infineon, and it has a total area of 0.007 mm2. The areas of the fine Flash and coarse SAR parts are 0.0015 mm2 and 0.0042 mm2 respectively. The Signal-to-Noise Distortion Ratio (SNDR) of the design is 37 dB, and the Figure of Merit (FoM) is 2.15 pJ/conv.
{"title":"Extended design and linearity analysis of a 6-bit low-area hybrid ADC design for local system-on-chip measurements","authors":"Nima Kolahimahmoudi, Giorgio Insinga, Paolo Bernardi","doi":"10.1016/j.micpro.2025.105191","DOIUrl":"10.1016/j.micpro.2025.105191","url":null,"abstract":"<div><div>The low observability of analog signals inside modern low-area system-on-chips (SoCs) results in an increasing need for Design for Testability (DfT) solutions. These solutions demand an optimal circuit design in terms of area, power consumption, and precision, with a focus on minimizing area overhead per SoC circuit blocks. To address this demand, we present a 6-bit, low-area Hybrid Analog-to-Digital Converter (ADC) that measures analog voltage inside SoCs locally. The proposed Hybrid ADC consists of two sub-ADCs: A 3-bit SAR ADC for coarse measurements and a 3-bit Flash ADC for fine measurements.</div><div>The advantage of the proposed ADC design is its low additional area cost to each IP of SoCs due to its specific design. It can also have a shared fine Flash part, which has the dominant area in the design. This ADC design converts the analog signals, which are difficult to read from SoC pins, to the digital domain, where they are easy to route and observe.</div><div>The suggested ADC is designed and analyzed using the 130<!--> <!-->nm technology of Infineon, and it has a total area of 0.007<!--> <!-->mm<sup>2</sup>. The areas of the fine Flash and coarse SAR parts are 0.0015<!--> <!-->mm<sup>2</sup> and 0.0042<!--> <!-->mm<sup>2</sup> respectively. The Signal-to-Noise Distortion Ratio (SNDR) of the design is 37<!--> <!-->dB, and the Figure of Merit (FoM) is 2.15<!--> <!-->pJ/conv.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105191"},"PeriodicalIF":2.6,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144921812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-22DOI: 10.1016/j.micpro.2025.105192
Domenico Zito
This manuscript addresses the severe design challenge for the implementation of microwave and mm-wave control-and-readout ICs enabling the implementation of monolithic Silicon quantum processors (QPs).
For the first time, we describe the circuit design challenge within a unitary frame and provide some general considerations about requirements, technology and performances, as a reference for future developments. In support of the discussion and considerations, we report also some results emerged from our work envisioned and carried out within our research and developments toward monolithic QPs. In particular, we address the key aspects leading to the new design paradigm enabling qubit-size low-power CMOS ICs for qubit control and readout for monolithic QPs and summarize the main characteristics and results exhibited by some representative key building blocks. These circuit solutions open to a new class of low-power mm-wave circuits made of a few MOSFETs, without spiral inductors or other large and lossy distributed passive components, resulting in a characteristic size close to our qubit devices, namely — qubit-size low-power cryogenic ICs, as key enabling solutions for monolithic QPs scalable to a large number of qubits.
{"title":"Qubit-size low-power cryogenic CMOS ICs for monolithic quantum processors","authors":"Domenico Zito","doi":"10.1016/j.micpro.2025.105192","DOIUrl":"10.1016/j.micpro.2025.105192","url":null,"abstract":"<div><div>This manuscript addresses the severe design challenge for the implementation of microwave and mm-wave control-and-readout ICs enabling the implementation of monolithic Silicon quantum processors (QPs).</div><div>For the first time, we describe the circuit design challenge within a unitary frame and provide some general considerations about requirements, technology and performances, as a reference for future developments. In support of the discussion and considerations, we report also some results emerged from our work envisioned and carried out within our research and developments toward monolithic QPs. In particular, we address the key aspects leading to the new design paradigm enabling qubit-size low-power CMOS ICs for qubit control and readout for monolithic QPs and summarize the main characteristics and results exhibited by some representative key building blocks. These circuit solutions open to a new class of low-power mm-wave circuits made of a few MOSFETs, without spiral inductors or other large and lossy distributed passive components, resulting in a characteristic size close to our qubit devices, namely — qubit-size low-power cryogenic ICs, as key enabling solutions for monolithic QPs scalable to a large number of qubits.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105192"},"PeriodicalIF":2.6,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-19DOI: 10.1016/j.micpro.2025.105189
Dorian Ronga , Gianmarco Mongelli , Eric Faehn , Patrick Girard , Arnaud Virazel
Memory testing is crucial as memories play an ever-increasing important role in modern computing systems, to which a memory malfunction can lead to a system failure. Memory testing is commonly addressed by a functional testing approach that consists in verifying the manufactured memory function. Functional testing focuses on identifying memory functional failure mechanisms, which are modeled by Functional Fault Models (FFM), and for which dedicated test algorithms are developed to ensure their detection. However, as technology shrinks, fault mechanisms in memories become more complex, as well as their detection conditions. To anticipate any limitation, memory structural testing is investigated. Structural testing proposes to study the defect before the fault, as one or several manufactured defects or imperfections may be responsible for a fault. A structural test methodology for memory has been recently published and proposes to adapt the Cell-Aware test methodology from the digital domain to analog memories. As the resulting Structural Fault Models (SFM) for analog memory are compatible with digital test environment, this work proposes a digital SRAM modeling methodology, compatible with digital simulation and test environments, leveraging Fault Simulator for test algorithm coverage analysis, and Automatic Test Pattern Generator for dedicated and optimized defect-specific test generation.
{"title":"Analog to digital memory modeling for test","authors":"Dorian Ronga , Gianmarco Mongelli , Eric Faehn , Patrick Girard , Arnaud Virazel","doi":"10.1016/j.micpro.2025.105189","DOIUrl":"10.1016/j.micpro.2025.105189","url":null,"abstract":"<div><div>Memory testing is crucial as memories play an ever-increasing important role in modern computing systems, to which a memory malfunction can lead to a system failure. Memory testing is commonly addressed by a functional testing approach that consists in verifying the manufactured memory function. Functional testing focuses on identifying memory functional failure mechanisms, which are modeled by Functional Fault Models (FFM), and for which dedicated test algorithms are developed to ensure their detection. However, as technology shrinks, fault mechanisms in memories become more complex, as well as their detection conditions. To anticipate any limitation, memory structural testing is investigated. Structural testing proposes to study the defect before the fault, as one or several manufactured defects or imperfections may be responsible for a fault. A structural test methodology for memory has been recently published and proposes to adapt the Cell-Aware test methodology from the digital domain to analog memories. As the resulting Structural Fault Models (SFM) for analog memory are compatible with digital test environment, this work proposes a digital SRAM modeling methodology, compatible with digital simulation and test environments, leveraging Fault Simulator for test algorithm coverage analysis, and Automatic Test Pattern Generator for dedicated and optimized defect-specific test generation.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105189"},"PeriodicalIF":2.6,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144918086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-12DOI: 10.1016/j.micpro.2025.105188
Adebayo Omotosho , Christian Hammer
Code reuse attacks exploit existing codes in applications to hijack control flow and cause security breaches. However, reusing code on architectures with a register window or windowed register application binary interface (Winreg ABI), as known on Xtensa, poses significant challenges due to their unique architectural behavior. Winreg ABI aims to enhance register performance by reducing stack operations during procedure calls in reduced instruction set computer architectures. Rudimentary investigations have explored Winreg ABI exception handlers as potential sources of vulnerability in register window operations. Despite these efforts, the approach has been limited, even in synthetic examples, as it cannot technically reuse codes beyond changing register values.
In this paper, we present a novel approach to producing gadget-based code reuse attacks on Xtensa cores utilizing Winreg ABI, as found in embedded systems like ESP32 and ESP8266. At the same time, we showcase that established methods to detect such attacks such as leveraging hardware performance counter can also detect such attack schemes. Finally, we identify an additional potential loophole in the Winreg ABI. Our evaluation results using a number of benchmark applications demonstrate that successful attacks exhibit a consistent pattern that can be accurately detected.
{"title":"CRAX: Code reuse attacks on Xtensa’s register window ABI","authors":"Adebayo Omotosho , Christian Hammer","doi":"10.1016/j.micpro.2025.105188","DOIUrl":"10.1016/j.micpro.2025.105188","url":null,"abstract":"<div><div>Code reuse attacks exploit existing codes in applications to hijack control flow and cause security breaches. However, reusing code on architectures with a register window or windowed register application binary interface (Winreg ABI), as known on Xtensa, poses significant challenges due to their unique architectural behavior. Winreg ABI aims to enhance register performance by reducing stack operations during procedure calls in reduced instruction set computer architectures. Rudimentary investigations have explored Winreg ABI exception handlers as potential sources of vulnerability in register window operations. Despite these efforts, the approach has been limited, even in synthetic examples, as it cannot technically reuse codes beyond changing register values.</div><div>In this paper, we present a novel approach to producing gadget-based code reuse attacks on Xtensa cores utilizing Winreg ABI, as found in embedded systems like ESP32 and ESP8266. At the same time, we showcase that established methods to detect such attacks such as leveraging hardware performance counter can also detect such attack schemes. Finally, we identify an additional potential loophole in the Winreg ABI. Our evaluation results using a number of benchmark applications demonstrate that successful attacks exhibit a consistent pattern that can be accurately detected.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"117 ","pages":"Article 105188"},"PeriodicalIF":2.6,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144858203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Blockchain technology enables the creation of a timestamped, shared, and replicated history of events among participants who do not trust each other. To agree on the shared history, the blockchain uses a consensus protocol, such as Nakamoto’s protocol in Bitcoin. This protocol relies on a proof that statistically ensures the elapsed time between two blocks by design through the Proof of Work (PoW) mechanism. However, PoW relies heavily on computation and is not suitable for embedded systems. Proof of Hardware Time (PoHT) aims to provide a secure by design elapsed time proof mechanism with low power consumption. PoHT is embedded in a System on Module (SoM) that features an ARM Cortex-A7 microprocessor with a TrustZone and a Trusted Platform Module. This paper focuses on the security of the elapsed time measurement during PoHT, conducting experimental attacks targeting clock oscillators under temperature variations. It presents a consolidation of the various available time sources, as well as a solution for detecting time drifts. Furthermore, an embedded architecture for the time drift detection system is outlined and experimental testing of the system is performed.
{"title":"Detecting time drifts for securing Proof of Hardware Time in blockchain","authors":"Quentin Jayet , Christine Hennebert , Yann Kieffer , Vincent Beroulle","doi":"10.1016/j.micpro.2025.105185","DOIUrl":"10.1016/j.micpro.2025.105185","url":null,"abstract":"<div><div>Blockchain technology enables the creation of a timestamped, shared, and replicated history of events among participants who do not trust each other. To agree on the shared history, the blockchain uses a consensus protocol, such as Nakamoto’s protocol in Bitcoin. This protocol relies on a proof that statistically ensures the elapsed time between two blocks by design through the Proof of Work (PoW) mechanism. However, PoW relies heavily on computation and is not suitable for embedded systems. Proof of Hardware Time (PoHT) aims to provide a secure by design elapsed time proof mechanism with low power consumption. PoHT is embedded in a System on Module (SoM) that features an ARM Cortex-A7 microprocessor with a TrustZone and a Trusted Platform Module. This paper focuses on the security of the elapsed time measurement during PoHT, conducting experimental attacks targeting clock oscillators under temperature variations. It presents a consolidation of the various available time sources, as well as a solution for detecting time drifts. Furthermore, an embedded architecture for the time drift detection system is outlined and experimental testing of the system is performed.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"117 ","pages":"Article 105185"},"PeriodicalIF":2.6,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}