Pub Date : 2024-07-25DOI: 10.1007/s10617-024-09287-x
Ahmad Reda, Afulay Ahmed Bouzid, Alhasan Zghaibe, Daniel Drótos, Vásárhelyi József
With the increase in the non-linearity and complexity of the driving system’s environment, developing and optimizing related applications is becoming more crucial and remains an open challenge for researchers and automotive companies alike. Model predictive control (MPC) is a well-known classic control strategy used to solve online optimization problems. MPC is computationally expensive and resource-consuming. Recently, machine learning has become an effective alternative to classical control systems. This paper provides a developed deep neural network (DNN)-based control strategy for automated steering deployed on FPGA. The DNN model was designed and trained based on the behavior of the traditional MPC controller. The performance of the DNN model is evaluated compared to the performance of the designed MPC which already proved its merit in automated driving task. A new automatic intellectual property generator based on the Xilinx system generator (XSG) has been developed, not only to perform the deployment but also to optimize it. The performance was evaluated based on the ability of the controllers to drive the lateral deviation and yaw angle of the vehicle to be as close as possible to zero. The DNN model was implemented on FPGA using two different data types, fixed-point and floating-point, in order to evaluate the efficiency in the terms of performance and resource consumption. The obtained results show that the suggested DNN model provided a satisfactory performance and successfully imitated the behavior of the traditional MPC with a very small root mean square error (RMSE = 0.011228 rad). Additionally, the results show that the deployments using fixed-point data greatly reduced resource consumption compared to the floating-point data type while maintaining satisfactory performance and meeting the safety conditions
{"title":"Model predictive-based DNN control model for automated steering deployed on FPGA using an automatic IP generator tool","authors":"Ahmad Reda, Afulay Ahmed Bouzid, Alhasan Zghaibe, Daniel Drótos, Vásárhelyi József","doi":"10.1007/s10617-024-09287-x","DOIUrl":"https://doi.org/10.1007/s10617-024-09287-x","url":null,"abstract":"<p>With the increase in the non-linearity and complexity of the driving system’s environment, developing and optimizing related applications is becoming more crucial and remains an open challenge for researchers and automotive companies alike. Model predictive control (MPC) is a well-known classic control strategy used to solve online optimization problems. MPC is computationally expensive and resource-consuming. Recently, machine learning has become an effective alternative to classical control systems. This paper provides a developed deep neural network (DNN)-based control strategy for automated steering deployed on FPGA. The DNN model was designed and trained based on the behavior of the traditional MPC controller. The performance of the DNN model is evaluated compared to the performance of the designed MPC which already proved its merit in automated driving task. A new automatic intellectual property generator based on the Xilinx system generator (XSG) has been developed, not only to perform the deployment but also to optimize it. The performance was evaluated based on the ability of the controllers to drive the lateral deviation and yaw angle of the vehicle to be as close as possible to zero. The DNN model was implemented on FPGA using two different data types, fixed-point and floating-point, in order to evaluate the efficiency in the terms of performance and resource consumption. The obtained results show that the suggested DNN model provided a satisfactory performance and successfully imitated the behavior of the traditional MPC with a very small root mean square error (RMSE = 0.011228 rad). Additionally, the results show that the deployments using fixed-point data greatly reduced resource consumption compared to the floating-point data type while maintaining satisfactory performance and meeting the safety conditions</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"41 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-10DOI: 10.1007/s10617-024-09285-z
Daniel Reiser, Junchao Chen, Johannes Knödtel, Andrea Baroni, Miloš Krstić, Marc Reichenbach
Among the numerous benefits that novel RRAM devices offer over conventional memory technologies is an inherent resilience to the effects of radiation. Hence, they appear suitable for use as a memory subsystem in a computer architecture for satellites. In addition to memory devices resistant to radiation, the concept of applying protective measures dynamically promises a system with low susceptibility to errors during radiation events, while also ensuring efficient performance in the absence of radiation events. This paper presents the first RRAM-based memory subsystem for satellites with a dynamic response to radiation events. We integrate this subsystem into a computing platform that employs the same dynamic principles for its processing system and implements modules for timely detection and even prediction of radiation events. To determine which protection mechanism is optimal, we examine various approaches and simulate the probability of errors in memory. Additionally, we are studying the impact on the overall system by investigating different software algorithms and their radiation robustness requirements using a fault injection simulation. Finally, we propose a potential implementation of the dynamic RRAM-based memory subsystem that includes different levels of protection and can be used for real applications in satellites.
{"title":"Design and analysis of an adaptive radiation resilient RRAM subsystem for processing systems in satellites","authors":"Daniel Reiser, Junchao Chen, Johannes Knödtel, Andrea Baroni, Miloš Krstić, Marc Reichenbach","doi":"10.1007/s10617-024-09285-z","DOIUrl":"https://doi.org/10.1007/s10617-024-09285-z","url":null,"abstract":"<p>Among the numerous benefits that novel RRAM devices offer over conventional memory technologies is an inherent resilience to the effects of radiation. Hence, they appear suitable for use as a memory subsystem in a computer architecture for satellites. In addition to memory devices resistant to radiation, the concept of applying protective measures dynamically promises a system with low susceptibility to errors during radiation events, while also ensuring efficient performance in the absence of radiation events. This paper presents the first RRAM-based memory subsystem for satellites with a dynamic response to radiation events. We integrate this subsystem into a computing platform that employs the same dynamic principles for its processing system and implements modules for timely detection and even prediction of radiation events. To determine which protection mechanism is optimal, we examine various approaches and simulate the probability of errors in memory. Additionally, we are studying the impact on the overall system by investigating different software algorithms and their radiation robustness requirements using a fault injection simulation. Finally, we propose a potential implementation of the dynamic RRAM-based memory subsystem that includes different levels of protection and can be used for real applications in satellites.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"4 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140578611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-09DOI: 10.1007/s10617-024-09284-0
Samuel Fidelis, Márcio Castro, Frank Siqueira
Internet of Things (IoT) devices produce massive amounts of data in a very short time. Transferring these data to the cloud to be analyzed may be prohibitive for applications that require near real-time processing. One solution to meet such timing requirements is to bring most data processing closer to IoT devices (i.e., to the edge). In this context, the present work proposes a distributed architecture that meets the timing requirements imposed by Industrial IoT (IIoT) applications that need to apply Machine Learning (ML) models with high accuracy and low latency. This is done by dividing the tasks of storing and processing data into different layers—mist, fog, and cloud—using the cloud layer only for the tasks related to long-term storage of summarized data and hosting of necessary reports and dashboards. The proposed architecture employs ML inferences in the edge layer in a distributed fashion, where each edge node is either responsible for applying a different ML technique or the same technique but with a different training data set. Then, a consensus algorithm takes the ML inference results from the edge nodes to decide the result of the inference, thus improving the system’s overall accuracy. Results obtained with two different data sets show that the proposed approach can improve the accuracy of the ML models without significantly compromising the response time.
物联网(IoT)设备能在极短的时间内产生海量数据。对于需要近乎实时处理的应用来说,将这些数据传输到云端进行分析可能会令人望而却步。满足这种时间要求的一个解决方案是将大部分数据处理工作靠近物联网设备(即边缘)。在此背景下,本研究提出了一种分布式架构,可满足工业物联网(IIoT)应用提出的时序要求,这些应用需要以高精度和低延迟应用机器学习(ML)模型。具体做法是将存储和处理数据的任务划分为不同的层--雾层、雾层和云层--云层仅用于与长期存储汇总数据以及托管必要的报告和仪表板相关的任务。建议的架构以分布式方式在边缘层采用 ML 推断,每个边缘节点负责应用不同的 ML 技术,或应用相同的技术但使用不同的训练数据集。然后,一种共识算法将边缘节点的 ML 推断结果用于决定推理结果,从而提高系统的整体准确性。利用两个不同数据集获得的结果表明,所提出的方法可以提高 ML 模型的准确性,而不会明显影响响应时间。
{"title":"Improving edge AI for industrial IoT applications with distributed learning using consensus","authors":"Samuel Fidelis, Márcio Castro, Frank Siqueira","doi":"10.1007/s10617-024-09284-0","DOIUrl":"https://doi.org/10.1007/s10617-024-09284-0","url":null,"abstract":"<p>Internet of Things (IoT) devices produce massive amounts of data in a very short time. Transferring these data to the cloud to be analyzed may be prohibitive for applications that require near real-time processing. One solution to meet such timing requirements is to bring most data processing closer to IoT devices (i.e., to the edge). In this context, the present work proposes a distributed architecture that meets the timing requirements imposed by Industrial IoT (IIoT) applications that need to apply Machine Learning (ML) models with high accuracy and low latency. This is done by dividing the tasks of storing and processing data into different layers—mist, fog, and cloud—using the cloud layer only for the tasks related to long-term storage of summarized data and hosting of necessary reports and dashboards. The proposed architecture employs ML inferences in the edge layer in a distributed fashion, where each edge node is either responsible for applying a different ML technique or the same technique but with a different training data set. Then, a consensus algorithm takes the ML inference results from the edge nodes to decide the result of the inference, thus improving the system’s overall accuracy. Results obtained with two different data sets show that the proposed approach can improve the accuracy of the ML models without significantly compromising the response time.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"43 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140578540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-16DOI: 10.1007/s10617-024-09283-1
Christian Eichler, Jonas Röckl, Benedikt Jung, Ralph Schlenk, Tilo Müller, Timo Hönig
Large-scale attacks on IoT and edge computing devices pose a significant threat. As a prominent example, Mirai is an IoT botnet with 600,000 infected devices around the globe, capable of conducting effective and targeted DDoS attacks on (critical) infrastructure. Driven by the substantial impacts of attacks, manufacturers and system integrators propose Trusted Execution Environments (TEEs) that have gained significant importance recently. TEEs offer an execution environment to run small portions of code isolated from the rest of the system, even if the operating system is compromised. In this publication, we examine TEEs in the context of system monitoring and introduce the Trusted Monitor (TM), a novel anomaly detection system that runs within a TEE. The TM continuously profiles the system using hardware performance counters and utilizes an application-specific machine-learning model for anomaly detection. In our evaluation, we demonstrate that the TM accurately classifies 86% of 183 tested workloads, with an overhead of less than 2%. Notably, we show that a real-world kernel-level rootkit has observable effects on performance counters, allowing the TM to detect it. Major parts of the TM are implemented in the Rust programming language, eliminating common security-critical programming errors.
{"title":"Profiling with trust: system monitoring from trusted execution environments","authors":"Christian Eichler, Jonas Röckl, Benedikt Jung, Ralph Schlenk, Tilo Müller, Timo Hönig","doi":"10.1007/s10617-024-09283-1","DOIUrl":"https://doi.org/10.1007/s10617-024-09283-1","url":null,"abstract":"<p>Large-scale attacks on IoT and edge computing devices pose a significant threat. As a prominent example, Mirai is an IoT botnet with 600,000 infected devices around the globe, capable of conducting effective and targeted DDoS attacks on (critical) infrastructure. Driven by the substantial impacts of attacks, manufacturers and system integrators propose Trusted Execution Environments (TEEs) that have gained significant importance recently. TEEs offer an execution environment to run small portions of code isolated from the rest of the system, even if the operating system is compromised. In this publication, we examine TEEs in the context of system monitoring and introduce the Trusted Monitor (TM), a novel anomaly detection system that runs within a TEE. The TM continuously profiles the system using hardware performance counters and utilizes an application-specific machine-learning model for anomaly detection. In our evaluation, we demonstrate that the TM accurately classifies 86% of 183 tested workloads, with an overhead of less than 2%. Notably, we show that a real-world kernel-level rootkit has observable effects on performance counters, allowing the TM to detect it. Major parts of the TM are implemented in the Rust programming language, eliminating common security-critical programming errors.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"8 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-16DOI: 10.1007/s10617-024-09282-2
Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn
There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07(times ) while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is (approx 1%) for various networks with image and natural language processing datasets.
{"title":"Novel adaptive quantization methodology for 8-bit floating-point DNN training","authors":"Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn","doi":"10.1007/s10617-024-09282-2","DOIUrl":"https://doi.org/10.1007/s10617-024-09282-2","url":null,"abstract":"<p>There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07<span>(times )</span> while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is <span>(approx 1%)</span> for various networks with image and natural language processing datasets.\u0000</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"41 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-06DOI: 10.1007/s10617-023-09280-w
José Luis Conradi Hoffmann, Leonardo Passig Horstmann, Antônio Augusto Fröhlich
Simulations are key steps in the design, implementation, and verification of autonomous vehicles (AV). Parallel to this, typical simulation tools fail to integrate the entirety of the aspects related to the complexity of AV applications, such as data communication delay, security, and the integration of software/hardware-in-the-loop and other simulation tools. This work proposes a SmartData-based middleware to integrate AV simulators and external tools. The interface models the data used on a simulator and creates an intermediary layer between the simulator and the external tools by defining the inputs and outputs as SmartData. A message bus is used for communication between SmartData following their Interest relations. Messages are exchanged following a specific protocol. Nevertheless, the architecture presented is agnostic of protocol. Moreover, we present a data-centric AV design integrated into the middleware. The design considers the standardization of the data interfaces between AV components, including sensing, perception, planning, decision, and actuation. Therefore, the presented design promotes a transparent integration of the AV simulation with other simulators (e.g., network simulators), cloud services, fault injection mechanisms, digital twins, and hardware-in-the-loop scenarios. Moreover, the design allows for transparent, runtime component replacement and time synchronization, the modularization of the vehicle components, and the addition of security aspects in the simulation. We present a case-study application with an AV simulation using CARLA, and we measure the end-to-end delay and overhead incurred in the simulation by our middleware. An increase in the end-to-end delay was measured once data communication was not acknowledged in the original scenario, and data was assumed to be ready for processing with no communication delay between sensors, decision-making, and actuation units.
仿真是自动驾驶汽车(AV)设计、实施和验证的关键步骤。与此同时,典型的仿真工具无法集成与自动驾驶汽车应用复杂性相关的所有方面,如数据通信延迟、安全性以及软件/硬件在环和其他仿真工具的集成。这项工作提出了一种基于 SmartData 的中间件,用于集成视听模拟器和外部工具。该接口对模拟器上使用的数据进行建模,并通过将输入和输出定义为 SmartData,在模拟器和外部工具之间创建一个中间层。信息总线用于 SmartData 之间的通信,遵循它们之间的利益关系。信息按照特定协议进行交换。不过,我们介绍的架构与协议无关。此外,我们还在中间件中集成了以数据为中心的 AV 设计。该设计考虑了视听组件之间数据接口的标准化,包括传感、感知、规划、决策和执行。因此,所提出的设计促进了视听模拟与其他模拟器(如网络模拟器)、云服务、故障注入机制、数字双胞胎和硬件在环场景的透明集成。此外,该设计还允许透明的运行时组件更换和时间同步、车辆组件模块化以及在仿真中增加安全方面的内容。我们介绍了一个使用 CARLA 进行视听模拟的案例研究应用,并测量了我们的中间件在模拟中产生的端到端延迟和开销。一旦数据通信在原始场景中未被确认,并且假定数据已准备好进行处理,传感器、决策和执行单元之间没有通信延迟,那么端到端延迟就会增加。
{"title":"Transparent integration of autonomous vehicles simulation tools with a data-centric middleware","authors":"José Luis Conradi Hoffmann, Leonardo Passig Horstmann, Antônio Augusto Fröhlich","doi":"10.1007/s10617-023-09280-w","DOIUrl":"https://doi.org/10.1007/s10617-023-09280-w","url":null,"abstract":"<p>Simulations are key steps in the design, implementation, and verification of autonomous vehicles (AV). Parallel to this, typical simulation tools fail to integrate the entirety of the aspects related to the complexity of AV applications, such as data communication delay, security, and the integration of software/hardware-in-the-loop and other simulation tools. This work proposes a SmartData-based middleware to integrate AV simulators and external tools. The interface models the data used on a simulator and creates an intermediary layer between the simulator and the external tools by defining the inputs and outputs as SmartData. A message bus is used for communication between SmartData following their Interest relations. Messages are exchanged following a specific protocol. Nevertheless, the architecture presented is agnostic of protocol. Moreover, we present a data-centric AV design integrated into the middleware. The design considers the standardization of the data interfaces between AV components, including sensing, perception, planning, decision, and actuation. Therefore, the presented design promotes a transparent integration of the AV simulation with other simulators (e.g., network simulators), cloud services, fault injection mechanisms, digital twins, and hardware-in-the-loop scenarios. Moreover, the design allows for transparent, runtime component replacement and time synchronization, the modularization of the vehicle components, and the addition of security aspects in the simulation. We present a case-study application with an AV simulation using CARLA, and we measure the end-to-end delay and overhead incurred in the simulation by our middleware. An increase in the end-to-end delay was measured once data communication was not acknowledged in the original scenario, and data was assumed to be ready for processing with no communication delay between sensors, decision-making, and actuation units.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"26 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-31DOI: 10.1007/s10617-023-09281-9
Abstract
Estimating safe upper bounds on execution times of programs is required in the design of predictable real-time systems. When multi-core, instruction pipeline, branch prediction, or cache memory are in place, due to the considerable complexity traditional static timing analysis faces, measurement-based timing analysis (MBTA) is a more tractable option. MBTA estimates upper bounds on execution times using data measured under the execution of representative execution scenarios. In this context, understanding how hardware-related events affect the executing program under analysis brings about useful information for MBTA. This paper contributes to this need by modeling the execution behavior of programs in function of hardware-related events. More specifically, for a program under analysis, we show that the number of cycles per executed instruction can be correlated to hardware-related event occurrences. We apply our modeling methodology to two architectures, ARMv7 Cortex-M4 and Cortex-A53. While all hardware events can be monitored at once in the former, the latter allows simultaneous monitoring of up to 6 out of 59 events. We then describe a method to select the most relevant hardware events that affect the execution of a program under analysis. These events are then used to model the program behavior via machine learning techniques under different execution scenarios. The effectiveness of this method is evaluated by extensive experiments. Obtained results revealed prediction errors below 20%, showing that the chosen events can largely explain the execution behavior of programs.
{"title":"On the impact of hardware-related events on the execution of real-time programs","authors":"","doi":"10.1007/s10617-023-09281-9","DOIUrl":"https://doi.org/10.1007/s10617-023-09281-9","url":null,"abstract":"<h3>Abstract</h3> <p>Estimating safe upper bounds on execution times of programs is required in the design of predictable real-time systems. When multi-core, instruction pipeline, branch prediction, or cache memory are in place, due to the considerable complexity traditional static timing analysis faces, measurement-based timing analysis (MBTA) is a more tractable option. MBTA estimates upper bounds on execution times using data measured under the execution of representative execution scenarios. In this context, understanding how hardware-related events affect the executing program under analysis brings about useful information for MBTA. This paper contributes to this need by modeling the execution behavior of programs in function of hardware-related events. More specifically, for a program under analysis, we show that the number of cycles per executed instruction can be correlated to hardware-related event occurrences. We apply our modeling methodology to two architectures, ARMv7 Cortex-M4 and Cortex-A53. While all hardware events can be monitored at once in the former, the latter allows simultaneous monitoring of up to 6 out of 59 events. We then describe a method to select the most relevant hardware events that affect the execution of a program under analysis. These events are then used to model the program behavior via machine learning techniques under different execution scenarios. The effectiveness of this method is evaluated by extensive experiments. Obtained results revealed prediction errors below 20%, showing that the chosen events can largely explain the execution behavior of programs.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"119 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139066478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-21DOI: 10.1007/s10617-023-09279-3
M. Jordan, J. Vicenzi, Tiago Knorst, Guilherme Korol, Antonio Carlos Schneider Beck, M. B. Rutzig
{"title":"Multiprovision: a Design Space Exploration tool for multi-tenant resource provisioning in CPU–GPU environments","authors":"M. Jordan, J. Vicenzi, Tiago Knorst, Guilherme Korol, Antonio Carlos Schneider Beck, M. B. Rutzig","doi":"10.1007/s10617-023-09279-3","DOIUrl":"https://doi.org/10.1007/s10617-023-09279-3","url":null,"abstract":"","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"47 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138952408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-16DOI: 10.1007/s10617-023-09278-4
Leonardo Passig Horstmann, José Luis Conradi Hoffmann, Antônio Augusto Fröhlich
Monitoring the performance of multicore embedded systems is crucial to properly ensure their timing requirements. Collecting performance data is also very relevant for optimization and validation efforts. However, the strategies used to monitor and capture data in such systems are complex to design and implement since they must not interfere with the running system beyond the point at which the system’s timing and performance characteristics start to get affected by the monitoring strategies. In this paper, we extend a monitoring framework developed in previous work to encompass three monitoring strategies, namely Active and Passive Periodic monitoring and Job-based monitoring. Periodic monitoring follows a given sampling rate. Active Periodic relies on periodic timer interrupts to guarantee deterministic sampling, while Passive Periodic trades determinism for a less invasive strategy, sampling data only when ordinary system events are handled. Job-based follows an event-driven monitoring that samples data whenever a job leaves the CPU, thus building isolated traces for each job. We evaluate them according to overhead, latency, and jitter, where none of them presented an average impact on the system execution time higher than (0.3%). Moreover, a qualitative analysis is conducted in terms of data quality. On one hand, while Periodic monitoring allows for configurable sampling rates, it does not account for the rescheduling of jobs and may capture mixed traces. On the other hand, Job-based monitoring provides data samples tied to the execution of each job while disregarding sampling rate configuration and may lose track of instant measures.
监控多核嵌入式系统的性能对于正确确保其时序要求至关重要。收集性能数据对于优化和验证工作也非常重要。然而,在此类系统中用于监控和捕获数据的策略在设计和实施上非常复杂,因为这些策略必须在系统的时序和性能特征开始受到监控策略影响时,才不会对运行中的系统造成干扰。在本文中,我们扩展了之前工作中开发的监控框架,使其包含三种监控策略,即主动和被动定期监控以及基于任务的监控。定期监控遵循给定的采样率。主动定期监控依赖于定期定时器中断来保证确定性采样,而被动定期监控则以较低的侵入性策略来换取确定性,即仅在处理普通系统事件时采样数据。基于作业的方法采用事件驱动监控,每当作业离开 CPU 时就会对数据进行采样,从而为每个作业建立独立的跟踪。我们根据开销、延迟和抖动对它们进行了评估,结果发现它们对系统执行时间的平均影响都不高于(0.3/%/)。此外,我们还对数据质量进行了定性分析。一方面,虽然周期监控允许可配置的采样率,但它没有考虑作业的重新安排,可能会捕获到混合的痕迹。另一方面,基于作业的监控可提供与每个作业执行相关的数据样本,但不考虑采样率配置,可能会丢失即时测量数据。
{"title":"Monitoring the performance of multicore embedded systems without disrupting its timing requirements","authors":"Leonardo Passig Horstmann, José Luis Conradi Hoffmann, Antônio Augusto Fröhlich","doi":"10.1007/s10617-023-09278-4","DOIUrl":"https://doi.org/10.1007/s10617-023-09278-4","url":null,"abstract":"<p>Monitoring the performance of multicore embedded systems is crucial to properly ensure their timing requirements. Collecting performance data is also very relevant for optimization and validation efforts. However, the strategies used to monitor and capture data in such systems are complex to design and implement since they must not interfere with the running system beyond the point at which the system’s timing and performance characteristics start to get affected by the monitoring strategies. In this paper, we extend a monitoring framework developed in previous work to encompass three monitoring strategies, namely Active and Passive Periodic monitoring and Job-based monitoring. Periodic monitoring follows a given sampling rate. Active Periodic relies on periodic timer interrupts to guarantee deterministic sampling, while Passive Periodic trades determinism for a less invasive strategy, sampling data only when ordinary system events are handled. Job-based follows an event-driven monitoring that samples data whenever a job leaves the CPU, thus building isolated traces for each job. We evaluate them according to overhead, latency, and jitter, where none of them presented an average impact on the system execution time higher than <span>(0.3%)</span>. Moreover, a qualitative analysis is conducted in terms of data quality. On one hand, while Periodic monitoring allows for configurable sampling rates, it does not account for the rescheduling of jobs and may capture mixed traces. On the other hand, Job-based monitoring provides data samples tied to the execution of each job while disregarding sampling rate configuration and may lose track of instant measures.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"32 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138686297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-17DOI: 10.1007/s10617-023-09277-5
Jamile Vasconcelos, George Lima, Marwan Wehaiba El Khazen, Adriana Gogonel, Liliana Cucu-Grosjean
{"title":"On vulnerabilities in EVT-based timing analysis: an experimental investigation on a multi-core architecture","authors":"Jamile Vasconcelos, George Lima, Marwan Wehaiba El Khazen, Adriana Gogonel, Liliana Cucu-Grosjean","doi":"10.1007/s10617-023-09277-5","DOIUrl":"https://doi.org/10.1007/s10617-023-09277-5","url":null,"abstract":"","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135944710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}