Design Automation for Embedded Systems最新文献

Model predictive-based DNN control model for automated steering deployed on FPGA using an automatic IP generator tool 使用自动 IP 生成器工具在 FPGA 上部署基于模型预测的 DNN 自动转向控制模型

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2024-07-25 DOI: 10.1007/s10617-024-09287-x

Ahmad Reda, Afulay Ahmed Bouzid, Alhasan Zghaibe, Daniel Drótos, Vásárhelyi József

With the increase in the non-linearity and complexity of the driving system’s environment, developing and optimizing related applications is becoming more crucial and remains an open challenge for researchers and automotive companies alike. Model predictive control (MPC) is a well-known classic control strategy used to solve online optimization problems. MPC is computationally expensive and resource-consuming. Recently, machine learning has become an effective alternative to classical control systems. This paper provides a developed deep neural network (DNN)-based control strategy for automated steering deployed on FPGA. The DNN model was designed and trained based on the behavior of the traditional MPC controller. The performance of the DNN model is evaluated compared to the performance of the designed MPC which already proved its merit in automated driving task. A new automatic intellectual property generator based on the Xilinx system generator (XSG) has been developed, not only to perform the deployment but also to optimize it. The performance was evaluated based on the ability of the controllers to drive the lateral deviation and yaw angle of the vehicle to be as close as possible to zero. The DNN model was implemented on FPGA using two different data types, fixed-point and floating-point, in order to evaluate the efficiency in the terms of performance and resource consumption. The obtained results show that the suggested DNN model provided a satisfactory performance and successfully imitated the behavior of the traditional MPC with a very small root mean square error (RMSE = 0.011228 rad). Additionally, the results show that the deployments using fixed-point data greatly reduced resource consumption compared to the floating-point data type while maintaining satisfactory performance and meeting the safety conditions

随着驾驶系统环境的非线性和复杂性的增加，开发和优化相关应用变得越来越重要，这对研究人员和汽车公司来说仍然是一个公开的挑战。模型预测控制（MPC）是一种著名的经典控制策略，用于解决在线优化问题。MPC 的计算成本高且耗费资源。最近，机器学习已成为经典控制系统的有效替代方案。本文提供了一种基于深度神经网络（DNN）的控制策略，用于在 FPGA 上部署自动转向系统。DNN 模型是根据传统 MPC 控制器的行为设计和训练的。DNN 模型的性能与设计的 MPC 性能进行了比较评估，后者已在自动驾驶任务中证明了其优点。基于 Xilinx 系统生成器 (XSG) 开发了一种新的自动知识产权生成器，不仅可以执行部署，还可以对其进行优化。性能评估基于控制器驱动车辆横向偏差和偏航角尽可能接近于零的能力。在 FPGA 上使用定点和浮点两种不同的数据类型实现了 DNN 模型，以评估其在性能和资源消耗方面的效率。结果表明，建议的 DNN 模型性能令人满意，成功地模仿了传统 MPC 的行为，均方根误差（RMSE = 0.011228 rad）非常小。此外，结果表明，与浮点数据类型相比，使用定点数据的部署大大减少了资源消耗，同时保持了令人满意的性能并满足了安全条件。

{"title":"Model predictive-based DNN control model for automated steering deployed on FPGA using an automatic IP generator tool","authors":"Ahmad Reda, Afulay Ahmed Bouzid, Alhasan Zghaibe, Daniel Drótos, Vásárhelyi József","doi":"10.1007/s10617-024-09287-x","DOIUrl":"https://doi.org/10.1007/s10617-024-09287-x","url":null,"abstract":"With the increase in the non-linearity and complexity of the driving system’s environment, developing and optimizing related applications is becoming more crucial and remains an open challenge for researchers and automotive companies alike. Model predictive control (MPC) is a well-known classic control strategy used to solve online optimization problems. MPC is computationally expensive and resource-consuming. Recently, machine learning has become an effective alternative to classical control systems. This paper provides a developed deep neural network (DNN)-based control strategy for automated steering deployed on FPGA. The DNN model was designed and trained based on the behavior of the traditional MPC controller. The performance of the DNN model is evaluated compared to the performance of the designed MPC which already proved its merit in automated driving task. A new automatic intellectual property generator based on the Xilinx system generator (XSG) has been developed, not only to perform the deployment but also to optimize it. The performance was evaluated based on the ability of the controllers to drive the lateral deviation and yaw angle of the vehicle to be as close as possible to zero. The DNN model was implemented on FPGA using two different data types, fixed-point and floating-point, in order to evaluate the efficiency in the terms of performance and resource consumption. The obtained results show that the suggested DNN model provided a satisfactory performance and successfully imitated the behavior of the traditional MPC with a very small root mean square error (RMSE = 0.011228 rad). Additionally, the results show that the deployments using fixed-point data greatly reduced resource consumption compared to the floating-point data type while maintaining satisfactory performance and meeting the safety conditions","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"41 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and analysis of an adaptive radiation resilient RRAM subsystem for processing systems in satellites 设计和分析用于卫星处理系统的自适应抗辐射 RRAM 子系统

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2024-04-10 DOI: 10.1007/s10617-024-09285-z

Daniel Reiser, Junchao Chen, Johannes Knödtel, Andrea Baroni, Miloš Krstić, Marc Reichenbach

Among the numerous benefits that novel RRAM devices offer over conventional memory technologies is an inherent resilience to the effects of radiation. Hence, they appear suitable for use as a memory subsystem in a computer architecture for satellites. In addition to memory devices resistant to radiation, the concept of applying protective measures dynamically promises a system with low susceptibility to errors during radiation events, while also ensuring efficient performance in the absence of radiation events. This paper presents the first RRAM-based memory subsystem for satellites with a dynamic response to radiation events. We integrate this subsystem into a computing platform that employs the same dynamic principles for its processing system and implements modules for timely detection and even prediction of radiation events. To determine which protection mechanism is optimal, we examine various approaches and simulate the probability of errors in memory. Additionally, we are studying the impact on the overall system by investigating different software algorithms and their radiation robustness requirements using a fault injection simulation. Finally, we propose a potential implementation of the dynamic RRAM-based memory subsystem that includes different levels of protection and can be used for real applications in satellites.

与传统存储器技术相比，新型 RRAM 器件具有众多优点，其中之一是其固有的抗辐射能力。因此，它们似乎适合用作卫星计算机架构中的存储器子系统。除了能抵御辐射的存储器件外，动态应用保护措施的概念也保证了系统在辐射事件中不易出错，同时还能确保在无辐射事件时的高效性能。本文介绍了首个基于 RRAM 的卫星内存子系统，该子系统可对辐射事件做出动态响应。我们将该子系统集成到一个计算平台中，该平台的处理系统采用了相同的动态原理，并实施了及时检测甚至预测辐射事件的模块。为了确定哪种保护机制是最佳的，我们研究了各种方法，并模拟了内存出错的概率。此外，我们还利用故障注入仿真研究了不同的软件算法及其辐射鲁棒性要求，从而研究其对整个系统的影响。最后，我们提出了基于动态 RRAM 的存储器子系统的潜在实施方案，其中包括不同级别的保护，可用于卫星的实际应用。

{"title":"Design and analysis of an adaptive radiation resilient RRAM subsystem for processing systems in satellites","authors":"Daniel Reiser, Junchao Chen, Johannes Knödtel, Andrea Baroni, Miloš Krstić, Marc Reichenbach","doi":"10.1007/s10617-024-09285-z","DOIUrl":"https://doi.org/10.1007/s10617-024-09285-z","url":null,"abstract":"Among the numerous benefits that novel RRAM devices offer over conventional memory technologies is an inherent resilience to the effects of radiation. Hence, they appear suitable for use as a memory subsystem in a computer architecture for satellites. In addition to memory devices resistant to radiation, the concept of applying protective measures dynamically promises a system with low susceptibility to errors during radiation events, while also ensuring efficient performance in the absence of radiation events. This paper presents the first RRAM-based memory subsystem for satellites with a dynamic response to radiation events. We integrate this subsystem into a computing platform that employs the same dynamic principles for its processing system and implements modules for timely detection and even prediction of radiation events. To determine which protection mechanism is optimal, we examine various approaches and simulate the probability of errors in memory. Additionally, we are studying the impact on the overall system by investigating different software algorithms and their radiation robustness requirements using a fault injection simulation. Finally, we propose a potential implementation of the dynamic RRAM-based memory subsystem that includes different levels of protection and can be used for real applications in satellites.","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"4 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140578611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving edge AI for industrial IoT applications with distributed learning using consensus 利用共识分布式学习改进工业物联网应用的边缘人工智能

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2024-04-09 DOI: 10.1007/s10617-024-09284-0

Samuel Fidelis, Márcio Castro, Frank Siqueira

Internet of Things (IoT) devices produce massive amounts of data in a very short time. Transferring these data to the cloud to be analyzed may be prohibitive for applications that require near real-time processing. One solution to meet such timing requirements is to bring most data processing closer to IoT devices (i.e., to the edge). In this context, the present work proposes a distributed architecture that meets the timing requirements imposed by Industrial IoT (IIoT) applications that need to apply Machine Learning (ML) models with high accuracy and low latency. This is done by dividing the tasks of storing and processing data into different layers—mist, fog, and cloud—using the cloud layer only for the tasks related to long-term storage of summarized data and hosting of necessary reports and dashboards. The proposed architecture employs ML inferences in the edge layer in a distributed fashion, where each edge node is either responsible for applying a different ML technique or the same technique but with a different training data set. Then, a consensus algorithm takes the ML inference results from the edge nodes to decide the result of the inference, thus improving the system’s overall accuracy. Results obtained with two different data sets show that the proposed approach can improve the accuracy of the ML models without significantly compromising the response time.

物联网（IoT）设备能在极短的时间内产生海量数据。对于需要近乎实时处理的应用来说，将这些数据传输到云端进行分析可能会令人望而却步。满足这种时间要求的一个解决方案是将大部分数据处理工作靠近物联网设备（即边缘）。在此背景下，本研究提出了一种分布式架构，可满足工业物联网（IIoT）应用提出的时序要求，这些应用需要以高精度和低延迟应用机器学习（ML）模型。具体做法是将存储和处理数据的任务划分为不同的层--雾层、雾层和云层--云层仅用于与长期存储汇总数据以及托管必要的报告和仪表板相关的任务。建议的架构以分布式方式在边缘层采用 ML 推断，每个边缘节点负责应用不同的 ML 技术，或应用相同的技术但使用不同的训练数据集。然后，一种共识算法将边缘节点的 ML 推断结果用于决定推理结果，从而提高系统的整体准确性。利用两个不同数据集获得的结果表明，所提出的方法可以提高 ML 模型的准确性，而不会明显影响响应时间。

{"title":"Improving edge AI for industrial IoT applications with distributed learning using consensus","authors":"Samuel Fidelis, Márcio Castro, Frank Siqueira","doi":"10.1007/s10617-024-09284-0","DOIUrl":"https://doi.org/10.1007/s10617-024-09284-0","url":null,"abstract":"Internet of Things (IoT) devices produce massive amounts of data in a very short time. Transferring these data to the cloud to be analyzed may be prohibitive for applications that require near real-time processing. One solution to meet such timing requirements is to bring most data processing closer to IoT devices (i.e., to the edge). In this context, the present work proposes a distributed architecture that meets the timing requirements imposed by Industrial IoT (IIoT) applications that need to apply Machine Learning (ML) models with high accuracy and low latency. This is done by dividing the tasks of storing and processing data into different layers—mist, fog, and cloud—using the cloud layer only for the tasks related to long-term storage of summarized data and hosting of necessary reports and dashboards. The proposed architecture employs ML inferences in the edge layer in a distributed fashion, where each edge node is either responsible for applying a different ML technique or the same technique but with a different training data set. Then, a consensus algorithm takes the ML inference results from the edge nodes to decide the result of the inference, thus improving the system’s overall accuracy. Results obtained with two different data sets show that the proposed approach can improve the accuracy of the ML models without significantly compromising the response time.","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"43 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140578540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Profiling with trust: system monitoring from trusted execution environments 利用信任进行剖析：从受信任的执行环境监测系统

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2024-02-16 DOI: 10.1007/s10617-024-09283-1

Christian Eichler, Jonas Röckl, Benedikt Jung, Ralph Schlenk, Tilo Müller, Timo Hönig

Large-scale attacks on IoT and edge computing devices pose a significant threat. As a prominent example, Mirai is an IoT botnet with 600,000 infected devices around the globe, capable of conducting effective and targeted DDoS attacks on (critical) infrastructure. Driven by the substantial impacts of attacks, manufacturers and system integrators propose Trusted Execution Environments (TEEs) that have gained significant importance recently. TEEs offer an execution environment to run small portions of code isolated from the rest of the system, even if the operating system is compromised. In this publication, we examine TEEs in the context of system monitoring and introduce the Trusted Monitor (TM), a novel anomaly detection system that runs within a TEE. The TM continuously profiles the system using hardware performance counters and utilizes an application-specific machine-learning model for anomaly detection. In our evaluation, we demonstrate that the TM accurately classifies 86% of 183 tested workloads, with an overhead of less than 2%. Notably, we show that a real-world kernel-level rootkit has observable effects on performance counters, allowing the TM to detect it. Major parts of the TM are implemented in the Rust programming language, eliminating common security-critical programming errors.

对物联网和边缘计算设备的大规模攻击构成了重大威胁。一个突出的例子是，Mirai 是一个物联网僵尸网络，在全球拥有 600,000 台受感染设备，能够对（关键）基础设施进行有效和有针对性的 DDoS 攻击。在攻击的巨大影响下，制造商和系统集成商提出了可信执行环境（TEE），并在最近获得了极大的重视。TEE 提供了一种执行环境，即使操作系统受到攻击，也能运行与系统其他部分隔离的小部分代码。在这篇论文中，我们从系统监控的角度研究了 TEE，并介绍了在 TEE 中运行的新型异常检测系统--可信监控器（TM）。TM 使用硬件性能计数器对系统进行持续剖析，并利用特定于应用程序的机器学习模型进行异常检测。在评估中，我们证明 TM 能对 183 个测试工作负载中的 86% 进行准确分类，开销不到 2%。值得注意的是，我们证明了真实世界中的内核级 rootkit 对性能计数器有可观察到的影响，从而允许 TM 对其进行检测。TM 的主要部分是用 Rust 编程语言实现的，消除了常见的安全关键编程错误。

{"title":"Profiling with trust: system monitoring from trusted execution environments","authors":"Christian Eichler, Jonas Röckl, Benedikt Jung, Ralph Schlenk, Tilo Müller, Timo Hönig","doi":"10.1007/s10617-024-09283-1","DOIUrl":"https://doi.org/10.1007/s10617-024-09283-1","url":null,"abstract":"Large-scale attacks on IoT and edge computing devices pose a significant threat. As a prominent example, Mirai is an IoT botnet with 600,000 infected devices around the globe, capable of conducting effective and targeted DDoS attacks on (critical) infrastructure. Driven by the substantial impacts of attacks, manufacturers and system integrators propose Trusted Execution Environments (TEEs) that have gained significant importance recently. TEEs offer an execution environment to run small portions of code isolated from the rest of the system, even if the operating system is compromised. In this publication, we examine TEEs in the context of system monitoring and introduce the Trusted Monitor (TM), a novel anomaly detection system that runs within a TEE. The TM continuously profiles the system using hardware performance counters and utilizes an application-specific machine-learning model for anomaly detection. In our evaluation, we demonstrate that the TM accurately classifies 86% of 183 tested workloads, with an overhead of less than 2%. Notably, we show that a real-world kernel-level rootkit has observable effects on performance counters, allowing the TM to detect it. Major parts of the TM are implemented in the Rust programming language, eliminating common security-critical programming errors.","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"8 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Novel adaptive quantization methodology for 8-bit floating-point DNN training 用于 8 位浮点 DNN 训练的新型自适应量化方法

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2024-02-16 DOI: 10.1007/s10617-024-09282-2

Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn

There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07(times ) while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is (approx 1%) for various networks with image and natural language processing datasets.

训练深度神经网络（DNN）的能耗很高。片外内存访问在总体能耗中占很大比例。通过将数据字量化为低数据位宽（如 8 位），可以减少片外内存事务的数量。然而，低位宽数据格式的动态范围有限，导致精度降低。本文提出了一种新颖的 8 位浮点（FP8）数据格式量化 DNN 训练方法，它能即时适应所需的动态范围。我们的方法依赖于改变 FP8 格式的偏置值，以适应 DNN 参数和输入特征图所需的动态范围。训练期间的范围拟合由在线统计分析硬件单元自适应执行，而不会中断计算单元或其数据访问。我们的方法与任何 DNN 计算核心兼容，无需对架构进行任何重大修改。我们建议将新的 FP8 量化单元集成到内存控制器中。计算内核的 FP32 数据在写入 DRAM 之前在内存控制器中转换为 FP8，从 DRAM 读取数据后再转换回 FP8。我们的结果表明，在使用 8 位数据格式而不是 32 位数据格式时，DRAM 访问能量减少了 3.07（times ）。在使用图像和自然语言处理数据集的各种网络中，使用8位量化训练的拟议方法的精度损失为（约1%）。

{"title":"Novel adaptive quantization methodology for 8-bit floating-point DNN training","authors":"Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn","doi":"10.1007/s10617-024-09282-2","DOIUrl":"https://doi.org/10.1007/s10617-024-09282-2","url":null,"abstract":"There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07(times ) while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is (approx 1%) for various networks with image and natural language processing datasets.\u0000","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"41 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transparent integration of autonomous vehicles simulation tools with a data-centric middleware 自动驾驶汽车模拟工具与以数据为中心的中间件的透明集成

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2024-01-06 DOI: 10.1007/s10617-023-09280-w

José Luis Conradi Hoffmann, Leonardo Passig Horstmann, Antônio Augusto Fröhlich

Simulations are key steps in the design, implementation, and verification of autonomous vehicles (AV). Parallel to this, typical simulation tools fail to integrate the entirety of the aspects related to the complexity of AV applications, such as data communication delay, security, and the integration of software/hardware-in-the-loop and other simulation tools. This work proposes a SmartData-based middleware to integrate AV simulators and external tools. The interface models the data used on a simulator and creates an intermediary layer between the simulator and the external tools by defining the inputs and outputs as SmartData. A message bus is used for communication between SmartData following their Interest relations. Messages are exchanged following a specific protocol. Nevertheless, the architecture presented is agnostic of protocol. Moreover, we present a data-centric AV design integrated into the middleware. The design considers the standardization of the data interfaces between AV components, including sensing, perception, planning, decision, and actuation. Therefore, the presented design promotes a transparent integration of the AV simulation with other simulators (e.g., network simulators), cloud services, fault injection mechanisms, digital twins, and hardware-in-the-loop scenarios. Moreover, the design allows for transparent, runtime component replacement and time synchronization, the modularization of the vehicle components, and the addition of security aspects in the simulation. We present a case-study application with an AV simulation using CARLA, and we measure the end-to-end delay and overhead incurred in the simulation by our middleware. An increase in the end-to-end delay was measured once data communication was not acknowledged in the original scenario, and data was assumed to be ready for processing with no communication delay between sensors, decision-making, and actuation units.

仿真是自动驾驶汽车（AV）设计、实施和验证的关键步骤。与此同时，典型的仿真工具无法集成与自动驾驶汽车应用复杂性相关的所有方面，如数据通信延迟、安全性以及软件/硬件在环和其他仿真工具的集成。这项工作提出了一种基于 SmartData 的中间件，用于集成视听模拟器和外部工具。该接口对模拟器上使用的数据进行建模，并通过将输入和输出定义为 SmartData，在模拟器和外部工具之间创建一个中间层。信息总线用于 SmartData 之间的通信，遵循它们之间的利益关系。信息按照特定协议进行交换。不过，我们介绍的架构与协议无关。此外，我们还在中间件中集成了以数据为中心的 AV 设计。该设计考虑了视听组件之间数据接口的标准化，包括传感、感知、规划、决策和执行。因此，所提出的设计促进了视听模拟与其他模拟器（如网络模拟器）、云服务、故障注入机制、数字双胞胎和硬件在环场景的透明集成。此外，该设计还允许透明的运行时组件更换和时间同步、车辆组件模块化以及在仿真中增加安全方面的内容。我们介绍了一个使用 CARLA 进行视听模拟的案例研究应用，并测量了我们的中间件在模拟中产生的端到端延迟和开销。一旦数据通信在原始场景中未被确认，并且假定数据已准备好进行处理，传感器、决策和执行单元之间没有通信延迟，那么端到端延迟就会增加。

{"title":"Transparent integration of autonomous vehicles simulation tools with a data-centric middleware","authors":"José Luis Conradi Hoffmann, Leonardo Passig Horstmann, Antônio Augusto Fröhlich","doi":"10.1007/s10617-023-09280-w","DOIUrl":"https://doi.org/10.1007/s10617-023-09280-w","url":null,"abstract":"Simulations are key steps in the design, implementation, and verification of autonomous vehicles (AV). Parallel to this, typical simulation tools fail to integrate the entirety of the aspects related to the complexity of AV applications, such as data communication delay, security, and the integration of software/hardware-in-the-loop and other simulation tools. This work proposes a SmartData-based middleware to integrate AV simulators and external tools. The interface models the data used on a simulator and creates an intermediary layer between the simulator and the external tools by defining the inputs and outputs as SmartData. A message bus is used for communication between SmartData following their Interest relations. Messages are exchanged following a specific protocol. Nevertheless, the architecture presented is agnostic of protocol. Moreover, we present a data-centric AV design integrated into the middleware. The design considers the standardization of the data interfaces between AV components, including sensing, perception, planning, decision, and actuation. Therefore, the presented design promotes a transparent integration of the AV simulation with other simulators (e.g., network simulators), cloud services, fault injection mechanisms, digital twins, and hardware-in-the-loop scenarios. Moreover, the design allows for transparent, runtime component replacement and time synchronization, the modularization of the vehicle components, and the addition of security aspects in the simulation. We present a case-study application with an AV simulation using CARLA, and we measure the end-to-end delay and overhead incurred in the simulation by our middleware. An increase in the end-to-end delay was measured once data communication was not acknowledged in the original scenario, and data was assumed to be ready for processing with no communication delay between sensors, decision-making, and actuation units.","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"26 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the impact of hardware-related events on the execution of real-time programs 硬件相关事件对实时程序执行的影响

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2023-12-31 DOI: 10.1007/s10617-023-09281-9

Abstract

Estimating safe upper bounds on execution times of programs is required in the design of predictable real-time systems. When multi-core, instruction pipeline, branch prediction, or cache memory are in place, due to the considerable complexity traditional static timing analysis faces, measurement-based timing analysis (MBTA) is a more tractable option. MBTA estimates upper bounds on execution times using data measured under the execution of representative execution scenarios. In this context, understanding how hardware-related events affect the executing program under analysis brings about useful information for MBTA. This paper contributes to this need by modeling the execution behavior of programs in function of hardware-related events. More specifically, for a program under analysis, we show that the number of cycles per executed instruction can be correlated to hardware-related event occurrences. We apply our modeling methodology to two architectures, ARMv7 Cortex-M4 and Cortex-A53. While all hardware events can be monitored at once in the former, the latter allows simultaneous monitoring of up to 6 out of 59 events. We then describe a method to select the most relevant hardware events that affect the execution of a program under analysis. These events are then used to model the program behavior via machine learning techniques under different execution scenarios. The effectiveness of this method is evaluated by extensive experiments. Obtained results revealed prediction errors below 20%, showing that the chosen events can largely explain the execution behavior of programs.

摘要在设计可预测的实时系统时，需要估算程序执行时间的安全上限。在多核、指令流水线、分支预测或高速缓冲存储器都已到位的情况下，由于传统的静态时序分析面临着相当大的复杂性，基于测量的时序分析（MBTA）是一种更易操作的选择。MBTA 利用在有代表性的执行场景下测得的数据来估算执行时间的上限。在这种情况下，了解与硬件相关的事件如何影响被分析的执行程序，将为 MBTA 带来有用的信息。本文根据硬件相关事件对程序的执行行为进行建模，从而满足了这一需求。更具体地说，对于分析中的程序，我们表明每条执行指令的周期数可以与硬件相关事件的发生相关联。我们将建模方法应用于 ARMv7 Cortex-M4 和 Cortex-A53 这两种架构。前者可同时监控所有硬件事件，而后者最多可同时监控 59 个事件中的 6 个。然后，我们介绍了一种方法，用于选择影响被分析程序执行的最相关硬件事件。然后，在不同的执行场景下，通过机器学习技术对这些事件进行程序行为建模。通过大量实验对该方法的有效性进行了评估。实验结果显示，预测误差低于 20%，说明所选事件在很大程度上可以解释程序的执行行为。

{"title":"On the impact of hardware-related events on the execution of real-time programs","authors":"","doi":"10.1007/s10617-023-09281-9","DOIUrl":"https://doi.org/10.1007/s10617-023-09281-9","url":null,"abstract":"<h3>Abstract</h3> Estimating safe upper bounds on execution times of programs is required in the design of predictable real-time systems. When multi-core, instruction pipeline, branch prediction, or cache memory are in place, due to the considerable complexity traditional static timing analysis faces, measurement-based timing analysis (MBTA) is a more tractable option. MBTA estimates upper bounds on execution times using data measured under the execution of representative execution scenarios. In this context, understanding how hardware-related events affect the executing program under analysis brings about useful information for MBTA. This paper contributes to this need by modeling the execution behavior of programs in function of hardware-related events. More specifically, for a program under analysis, we show that the number of cycles per executed instruction can be correlated to hardware-related event occurrences. We apply our modeling methodology to two architectures, ARMv7 Cortex-M4 and Cortex-A53. While all hardware events can be monitored at once in the former, the latter allows simultaneous monitoring of up to 6 out of 59 events. We then describe a method to select the most relevant hardware events that affect the execution of a program under analysis. These events are then used to model the program behavior via machine learning techniques under different execution scenarios. The effectiveness of this method is evaluated by extensive experiments. Obtained results revealed prediction errors below 20%, showing that the chosen events can largely explain the execution behavior of programs.","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"119 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139066478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiprovision: a Design Space Exploration tool for multi-tenant resource provisioning in CPU–GPU environments 多重供应：CPU-GPU 环境中多租户资源供应的设计空间探索工具

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2023-12-21 DOI: 10.1007/s10617-023-09279-3

M. Jordan, J. Vicenzi, Tiago Knorst, Guilherme Korol, Antonio Carlos Schneider Beck, M. B. Rutzig

引用次数: 0

Monitoring the performance of multicore embedded systems without disrupting its timing requirements 监控多核嵌入式系统的性能，同时不影响其定时要求

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2023-12-16 DOI: 10.1007/s10617-023-09278-4

Leonardo Passig Horstmann, José Luis Conradi Hoffmann, Antônio Augusto Fröhlich

Monitoring the performance of multicore embedded systems is crucial to properly ensure their timing requirements. Collecting performance data is also very relevant for optimization and validation efforts. However, the strategies used to monitor and capture data in such systems are complex to design and implement since they must not interfere with the running system beyond the point at which the system’s timing and performance characteristics start to get affected by the monitoring strategies. In this paper, we extend a monitoring framework developed in previous work to encompass three monitoring strategies, namely Active and Passive Periodic monitoring and Job-based monitoring. Periodic monitoring follows a given sampling rate. Active Periodic relies on periodic timer interrupts to guarantee deterministic sampling, while Passive Periodic trades determinism for a less invasive strategy, sampling data only when ordinary system events are handled. Job-based follows an event-driven monitoring that samples data whenever a job leaves the CPU, thus building isolated traces for each job. We evaluate them according to overhead, latency, and jitter, where none of them presented an average impact on the system execution time higher than (0.3%). Moreover, a qualitative analysis is conducted in terms of data quality. On one hand, while Periodic monitoring allows for configurable sampling rates, it does not account for the rescheduling of jobs and may capture mixed traces. On the other hand, Job-based monitoring provides data samples tied to the execution of each job while disregarding sampling rate configuration and may lose track of instant measures.

监控多核嵌入式系统的性能对于正确确保其时序要求至关重要。收集性能数据对于优化和验证工作也非常重要。然而，在此类系统中用于监控和捕获数据的策略在设计和实施上非常复杂，因为这些策略必须在系统的时序和性能特征开始受到监控策略影响时，才不会对运行中的系统造成干扰。在本文中，我们扩展了之前工作中开发的监控框架，使其包含三种监控策略，即主动和被动定期监控以及基于任务的监控。定期监控遵循给定的采样率。主动定期监控依赖于定期定时器中断来保证确定性采样，而被动定期监控则以较低的侵入性策略来换取确定性，即仅在处理普通系统事件时采样数据。基于作业的方法采用事件驱动监控，每当作业离开 CPU 时就会对数据进行采样，从而为每个作业建立独立的跟踪。我们根据开销、延迟和抖动对它们进行了评估，结果发现它们对系统执行时间的平均影响都不高于（0.3/%/）。此外，我们还对数据质量进行了定性分析。一方面，虽然周期监控允许可配置的采样率，但它没有考虑作业的重新安排，可能会捕获到混合的痕迹。另一方面，基于作业的监控可提供与每个作业执行相关的数据样本，但不考虑采样率配置，可能会丢失即时测量数据。

{"title":"Monitoring the performance of multicore embedded systems without disrupting its timing requirements","authors":"Leonardo Passig Horstmann, José Luis Conradi Hoffmann, Antônio Augusto Fröhlich","doi":"10.1007/s10617-023-09278-4","DOIUrl":"https://doi.org/10.1007/s10617-023-09278-4","url":null,"abstract":"Monitoring the performance of multicore embedded systems is crucial to properly ensure their timing requirements. Collecting performance data is also very relevant for optimization and validation efforts. However, the strategies used to monitor and capture data in such systems are complex to design and implement since they must not interfere with the running system beyond the point at which the system’s timing and performance characteristics start to get affected by the monitoring strategies. In this paper, we extend a monitoring framework developed in previous work to encompass three monitoring strategies, namely Active and Passive Periodic monitoring and Job-based monitoring. Periodic monitoring follows a given sampling rate. Active Periodic relies on periodic timer interrupts to guarantee deterministic sampling, while Passive Periodic trades determinism for a less invasive strategy, sampling data only when ordinary system events are handled. Job-based follows an event-driven monitoring that samples data whenever a job leaves the CPU, thus building isolated traces for each job. We evaluate them according to overhead, latency, and jitter, where none of them presented an average impact on the system execution time higher than (0.3%). Moreover, a qualitative analysis is conducted in terms of data quality. On one hand, while Periodic monitoring allows for configurable sampling rates, it does not account for the rescheduling of jobs and may capture mixed traces. On the other hand, Job-based monitoring provides data samples tied to the execution of each job while disregarding sampling rate configuration and may lose track of instant measures.","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"32 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138686297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On vulnerabilities in EVT-based timing analysis: an experimental investigation on a multi-core architecture 基于evt的时序分析漏洞研究——基于多核架构的实验研究

4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Design Automation for Embedded Systems

Pub Date : 2023-10-17 DOI: 10.1007/s10617-023-09277-5

Jamile Vasconcelos, George Lima, Marwan Wehaiba El Khazen, Adriana Gogonel, Liliana Cucu-Grosjean

引用次数: 0