Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems最新文献

Entropy-Based Analysis of Benchmarks for Instruction Set Simulators 基于熵的指令集模拟器基准分析

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579267

Nils Bosbach, Lukas Jünger, Rebecca Pelke, Niko Zurstraßen, R. Leupers

Instruction-Set Simulators (ISSs) are widely used to simulate the execution of programs for a target architecture on a host machine. They translate the instructions of the program that should be executed into instructions of the host Instruction-Set Architecture (ISA). The performance of ISSs strongly depends on the implementation and the instructions that should be executed. Therefore, benchmarks that are used to compare the performance of ISSs should contain a variety of instructions. Since many benchmarks are written in high-level programming languages, it is usually not clear to the user which instructions are underlying the benchmarks. In this work, we present a tool that can be used to analyze the variety of instructions used in a benchmark. In a multi-stage analysis, the properties of the benchmarks are collected. An entropy-based metric is used to measure the diversity of the instructions used by the benchmark. In a case study, we present results for the benchmarks Whetstone, Dhrystone, Coremark STREAM, and stdcbench. We show the diversity of those benchmarks for different compiler optimizations and indicate which benchmarks should be used to test the general performance of an ISS.

指令集模拟器(iss)广泛用于在主机上模拟目标体系结构的程序执行。它们将程序中应该执行的指令翻译成主机指令集体系结构(ISA)的指令。iss的性能很大程度上取决于实现和应该执行的指令。因此，用于比较iss性能的基准测试应该包含各种指令。由于许多基准测试是用高级编程语言编写的，因此用户通常不清楚哪些指令是基准测试的基础。在这项工作中，我们提供了一个可用于分析基准测试中使用的各种指令的工具。在多阶段分析中，收集基准的属性。基于熵的度量用于度量基准所使用的指令的多样性。在一个案例研究中，我们给出了基准测试whitstone、Dhrystone、Coremark STREAM和stdcbench的结果。我们展示了针对不同编译器优化的这些基准的多样性，并指出应该使用哪些基准来测试ISS的一般性能。

{"title":"Entropy-Based Analysis of Benchmarks for Instruction Set Simulators","authors":"Nils Bosbach, Lukas Jünger, Rebecca Pelke, Niko Zurstraßen, R. Leupers","doi":"10.1145/3579170.3579267","DOIUrl":"https://doi.org/10.1145/3579170.3579267","url":null,"abstract":"Instruction-Set Simulators (ISSs) are widely used to simulate the execution of programs for a target architecture on a host machine. They translate the instructions of the program that should be executed into instructions of the host Instruction-Set Architecture (ISA). The performance of ISSs strongly depends on the implementation and the instructions that should be executed. Therefore, benchmarks that are used to compare the performance of ISSs should contain a variety of instructions. Since many benchmarks are written in high-level programming languages, it is usually not clear to the user which instructions are underlying the benchmarks. In this work, we present a tool that can be used to analyze the variety of instructions used in a benchmark. In a multi-stage analysis, the properties of the benchmarks are collected. An entropy-based metric is used to measure the diversity of the instructions used by the benchmark. In a case study, we present results for the benchmarks Whetstone, Dhrystone, Coremark STREAM, and stdcbench. We show the diversity of those benchmarks for different compiler optimizations and indicate which benchmarks should be used to test the general performance of an ISS.","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124838940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Yet Accurate Timing and Power Prediction of Artificial Neural Networks Deployed on Clock-Gated Multi-Core Platforms 基于时钟门控多核平台的人工神经网络快速精确定时和功率预测

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579263

Quentin Dariol, S. Le Nours, D. Helms, R. Stemmer, S. Pillement, Kim Grüttner

When deploying Artificial Neural Networks (ANNs) onto multi-core embedded platforms, an intensive evaluation flow is necessary to find implementations that optimize resource usage, timing and power. ANNs require indeed significant amounts of computational and memory resources to execute, while embedded execution platforms offer limited resources with strict power budget. Concurrent accesses from processors to shared resources on multi-core platforms can lead to bottlenecks with impact on performance and power. Existing approaches show limitations to deliver fast yet accurate evaluation ahead of ANN deployment on the targeted hardware. In this paper, we present a modeling flow for timing and power prediction in early design stage of fully-connected ANNs on multi-core platforms. Our flow offers fast yet accurate predictions with consideration of shared communication resources and scalability in regards of the number of cores used. The flow is evaluated on real measurements for 42 mappings of 3 fully-connected ANNs executed on a clock-gated multi-core platform featuring two different communication modes: polling or interrupt-based. Our modeling flow predicts timing with accuracy and power with accuracy on the tested mappings for an average simulation time of 0.23 s for 100 iterations. We then illustrate the application of our approach for efficient design space exploration of ANN implementations.

在多核嵌入式平台上部署人工神经网络(ann)时，需要进行密集的评估流程，以找到优化资源使用、时间和功耗的实现方案。人工神经网络确实需要大量的计算和内存资源来执行，而嵌入式执行平台提供有限的资源和严格的功率预算。在多核平台上，从处理器到共享资源的并发访问可能会导致瓶颈，从而影响性能和功耗。在目标硬件上部署人工神经网络之前，现有的方法在提供快速而准确的评估方面存在局限性。本文提出了一种在多核平台上全连接人工神经网络设计初期进行时序和功率预测的建模流程。我们的流程提供了快速而准确的预测，同时考虑了共享通信资源和所使用核心数量的可扩展性。在时钟控多核平台上执行的3个全连接人工神经网络的42个映射的实际测量中评估了该流，该平台具有两种不同的通信模式:轮询或基于中断。我们的建模流在测试映射上准确地预测时序和功率，100次迭代的平均模拟时间为0.23 s。然后，我们说明了我们的方法在人工神经网络实现的有效设计空间探索中的应用。

{"title":"Fast Yet Accurate Timing and Power Prediction of Artificial Neural Networks Deployed on Clock-Gated Multi-Core Platforms","authors":"Quentin Dariol, S. Le Nours, D. Helms, R. Stemmer, S. Pillement, Kim Grüttner","doi":"10.1145/3579170.3579263","DOIUrl":"https://doi.org/10.1145/3579170.3579263","url":null,"abstract":"When deploying Artificial Neural Networks (ANNs) onto multi-core embedded platforms, an intensive evaluation flow is necessary to find implementations that optimize resource usage, timing and power. ANNs require indeed significant amounts of computational and memory resources to execute, while embedded execution platforms offer limited resources with strict power budget. Concurrent accesses from processors to shared resources on multi-core platforms can lead to bottlenecks with impact on performance and power. Existing approaches show limitations to deliver fast yet accurate evaluation ahead of ANN deployment on the targeted hardware. In this paper, we present a modeling flow for timing and power prediction in early design stage of fully-connected ANNs on multi-core platforms. Our flow offers fast yet accurate predictions with consideration of shared communication resources and scalability in regards of the number of cores used. The flow is evaluated on real measurements for 42 mappings of 3 fully-connected ANNs executed on a clock-gated multi-core platform featuring two different communication modes: polling or interrupt-based. Our modeling flow predicts timing with accuracy and power with accuracy on the tested mappings for an average simulation time of 0.23 s for 100 iterations. We then illustrate the application of our approach for efficient design space exploration of ANN implementations.","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126548703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards a European Network of Enabling Technologies for Drones 迈向欧洲无人机使能技术网络

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579264

R. Nouacer, Mahmoud Hussein, Paul Detterer, E. Villar, F. Herrera, Carlo Tieri, E. Grolleau

Drone-based service and product innovation is curtailed by the growing dependence on poorly inter-operable proprietary technologies as well as by the risks posed to people on the ground, to other vehicles and to property (e.g. critical infrastructure). Regarding the innovation aspect, the Single European Sky Air Traffic Management (SESAR) Joint Research Undertaking is developing U-space, a set of services and procedures to help drones access airspace safely and efficiently. The aim of COMP4DRONES is to complements SESAR JU efforts by providing a framework of key enabling technologies for safe and autonomous drones with a specific focus on U2 and U3. The COMP4DRONES project has contributed to support (1) efficient customization and incremental assurance of drone-embedded platforms, (2) safe autonomous decision making concerning individual or cooperative missions, (3) trustworthy drone-to-drone and drone-to-ground communications even in presence of malicious attackers and under the intrinsic platform constraints, and (4) agile and cost-effective design and assurance of drone modules and systems. In this paper, we discuss the results of COMP4DRONES project to complement SESAR JU efforts with a particular focus on safe software and hardware drone architectures.

由于越来越依赖互操作性差的专有技术，以及对地面人员、其他车辆和财产(例如关键基础设施)构成的风险，基于无人机的服务和产品创新受到限制。在创新方面，欧洲单一空中交通管理(SESAR)联合研究项目正在开发U-space，这是一套帮助无人机安全高效进入空域的服务和程序。COMP4DRONES的目标是通过提供安全和自主无人机的关键使能技术框架来补充SESAR JU的努力，特别关注U2和U3。COMP4DRONES项目有助于支持(1)无人机嵌入式平台的高效定制和增量保证，(2)关于个人或合作任务的安全自主决策，(3)即使存在恶意攻击者和固有平台约束，无人机对无人机和无人机对地面通信也值得信赖，以及(4)无人机模块和系统的敏捷和经济高效设计和保证。在本文中，我们讨论了COMP4DRONES项目的结果，以补充SESAR JU的工作，特别关注安全的软件和硬件无人机架构。

{"title":"Towards a European Network of Enabling Technologies for Drones","authors":"R. Nouacer, Mahmoud Hussein, Paul Detterer, E. Villar, F. Herrera, Carlo Tieri, E. Grolleau","doi":"10.1145/3579170.3579264","DOIUrl":"https://doi.org/10.1145/3579170.3579264","url":null,"abstract":"Drone-based service and product innovation is curtailed by the growing dependence on poorly inter-operable proprietary technologies as well as by the risks posed to people on the ground, to other vehicles and to property (e.g. critical infrastructure). Regarding the innovation aspect, the Single European Sky Air Traffic Management (SESAR) Joint Research Undertaking is developing U-space, a set of services and procedures to help drones access airspace safely and efficiently. The aim of COMP4DRONES is to complements SESAR JU efforts by providing a framework of key enabling technologies for safe and autonomous drones with a specific focus on U2 and U3. The COMP4DRONES project has contributed to support (1) efficient customization and incremental assurance of drone-embedded platforms, (2) safe autonomous decision making concerning individual or cooperative missions, (3) trustworthy drone-to-drone and drone-to-ground communications even in presence of malicious attackers and under the intrinsic platform constraints, and (4) agile and cost-effective design and assurance of drone modules and systems. In this paper, we discuss the results of COMP4DRONES project to complement SESAR JU efforts with a particular focus on safe software and hardware drone architectures.","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117081207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Instruction Cache Simulation is Trickier than You Think 快速指令缓存模拟比你想象的要棘手

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579261

M. Badaroux, J. Dumas, F. Pétrot

Given the performances it achieves, dynamic binary translation is the most compelling simulation approach for cross-emulation of software centric systems. This speed comes at a cost: simulation is purely functional. Modeling instruction caches by instrumenting each target instruction is feasible, but severely degrades performances. As the translation occurs per target instruction block, we propose to model instruction caches at that granularity. This raises a few issues that we detail and mitigate. We implement this solution in the QEMU dynamic binary translation engine, which brings up an interesting problem inherent to this simulation strategy. Using as test vehicle a multicore RISC-V based platform, we show that a proper model can be nearly as accurate as an instruction accurate model. On the PolyBench/C and PARSEC benchmarks, our model slows down simulation by a factor of 2 to 10 compared to vanilla QEMU. Although not negligible, this is to be balanced with the factor of 20 to 60 for the instruction accurate approach.

考虑到它所达到的性能，动态二进制转换是软件中心系统交叉仿真中最引人注目的仿真方法。这种速度是有代价的:模拟纯粹是功能性的。通过检测每个目标指令来建模指令缓存是可行的，但会严重降低性能。由于翻译发生在每个目标指令块上，我们建议在该粒度上对指令缓存进行建模。这引起了一些问题，我们详细说明并减轻了这些问题。我们在QEMU动态二进制翻译引擎中实现了该解决方案，这带来了该仿真策略固有的一个有趣问题。使用基于多核RISC-V平台作为测试工具，我们表明一个合适的模型几乎可以像指令精确模型一样精确。在PolyBench/C和PARSEC基准测试中，与普通QEMU相比，我们的模型将模拟速度降低了2到10倍。虽然不能忽略不计，但这要用20到60的系数来平衡，以达到指令准确的方法。

引用次数: 0

An Analytical Model of Configurable Systolic Arrays to find the Best-Fitting Accelerator for a given DNN Workload 一种可配置收缩阵列的解析模型，用于寻找给定深度神经网络负载下最适合的加速器

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579258

Tim Hotfilter, Patrick Schmidt, Julian Höfer, Fabian Kreß, T. Harbaum, Juergen Becker

Since their breakthrough, complexity of Deep Neural Networks (DNNs) is rising steadily. As a result, accelerators for DNNs are now used in many domains. However, designing and configuring an accelerator that meets the requirements of a given application perfectly is a challenging task. In this paper, we therefore present our approach to support the accelerator design process. With an analytical model of a systolic array we can estimate performance, energy consumption and area for each design option. To determine these metrics, usually a cycle accurate simulation is performed, which is a time-consuming task. Hence, the design space has to be restricted heavily. Analytical modelling, however, allows for fast evaluation of a design using a mathematical abstraction of the accelerator. For DNNs, this works especially well since the dataflow and memory accesses have high regularity. To show the correctness of our model, we perform an exemplary realization with the state-of-the-art systolic array generator Gemmini and compare it with a cycle accurate simulation and state-of-the-art modelling tools, showing less than 1% deviation. We also conducted a design space exploration, showing the analytical model’s capabilities to support an accelerator design. In a case study on ResNet-34, we can demonstrate that our model and DSE tool reduces the time to find the best-fitting solution by four or two orders of magnitude compared to a cycle-accurate simulation or state-of-the-art modelling tools, respectively.

深度神经网络(Deep Neural Networks, dnn)的复杂度自突破以来稳步上升。因此，深度神经网络的加速器现在被用于许多领域。然而，设计和配置一个完全满足给定应用程序需求的加速器是一项具有挑战性的任务。因此，在本文中，我们提出了支持加速器设计过程的方法。通过收缩阵列的分析模型，我们可以估计每个设计选项的性能、能耗和面积。为了确定这些指标，通常要执行周期精确的模拟，这是一项耗时的任务。因此，必须严格限制设计空间。然而，分析建模允许使用加速器的数学抽象对设计进行快速评估。对于dnn来说，这尤其有效，因为数据流和内存访问具有很高的规律性。为了证明我们模型的正确性，我们使用最先进的收缩阵列生成器gemini进行了示例性实现，并将其与周期精确模拟和最先进的建模工具进行了比较，显示偏差小于1%。我们还进行了设计空间探索，展示了分析模型支持加速器设计的能力。在ResNet-34的案例研究中，我们可以证明，与周期精确模拟或最先进的建模工具相比，我们的模型和DSE工具分别将找到最佳拟合解决方案的时间减少了四个或两个数量级。

{"title":"An Analytical Model of Configurable Systolic Arrays to find the Best-Fitting Accelerator for a given DNN Workload","authors":"Tim Hotfilter, Patrick Schmidt, Julian Höfer, Fabian Kreß, T. Harbaum, Juergen Becker","doi":"10.1145/3579170.3579258","DOIUrl":"https://doi.org/10.1145/3579170.3579258","url":null,"abstract":"Since their breakthrough, complexity of Deep Neural Networks (DNNs) is rising steadily. As a result, accelerators for DNNs are now used in many domains. However, designing and configuring an accelerator that meets the requirements of a given application perfectly is a challenging task. In this paper, we therefore present our approach to support the accelerator design process. With an analytical model of a systolic array we can estimate performance, energy consumption and area for each design option. To determine these metrics, usually a cycle accurate simulation is performed, which is a time-consuming task. Hence, the design space has to be restricted heavily. Analytical modelling, however, allows for fast evaluation of a design using a mathematical abstraction of the accelerator. For DNNs, this works especially well since the dataflow and memory accesses have high regularity. To show the correctness of our model, we perform an exemplary realization with the state-of-the-art systolic array generator Gemmini and compare it with a cycle accurate simulation and state-of-the-art modelling tools, showing less than 1% deviation. We also conducted a design space exploration, showing the analytical model’s capabilities to support an accelerator design. In a case study on ResNet-34, we can demonstrate that our model and DSE tool reduces the time to find the best-fitting solution by four or two orders of magnitude compared to a cycle-accurate simulation or state-of-the-art modelling tools, respectively.","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123802384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic DRAM Subsystem Configuration with irace 具有irace的自动DRAM子系统配置

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579259

Lukas Steiner, Gustavo Delazeri, Iron Prando da Silva, Matthias Jung, N. Wehn

Nowadays, DRAM subsystem configuration includes a large number of parameters, resulting in an extensive design space. Setting these parameters is a challenging step in system design as the parameter-workload interactions are complex. Since design space exploration by exhaustive simulation is infeasible due to limited computing resources and development time, semi-automatic configuration involving both manual as well as simulation-based decisions is state-of-the-art. However, it requires a lot of expertise in the DRAM domain as well as application knowledge, and there is no guarantee for a good performance of the resulting subsystem. In this paper, we present a new framework that fully automatizes the DRAM subsystem configuration for a given parameter space and set of target applications. It is based on irace, a software package originally developed for automatic configuration of optimization algorithms. We show that the framework finds nearly-optimal configurations, while only a fraction of all application-configuration combinations has to be evaluated. In addition, all returned configurations perform better than a predefined standard configuration. Thus, our framework enables designers to automatically determine a suitable DRAM subsystem for their platform.

目前，DRAM子系统的配置包含大量的参数，导致设计空间很大。设置这些参数是系统设计中的一个具有挑战性的步骤，因为参数-工作负载交互非常复杂。由于有限的计算资源和开发时间，通过详尽的仿真来探索设计空间是不可行的，因此涉及手动和基于仿真的决策的半自动配置是最先进的。然而，它需要大量的DRAM领域的专业知识和应用知识，并且不能保证最终子系统的良好性能。在本文中，我们提出了一个新的框架，可以完全自动化给定参数空间和一组目标应用的DRAM子系统配置。它是基于irace，一个软件包最初开发的优化算法的自动配置。我们展示了框架找到了几乎最优的配置，而所有应用程序配置组合中只有一小部分需要评估。此外，所有返回的配置的性能都优于预定义的标准配置。因此，我们的框架使设计人员能够自动确定适合其平台的DRAM子系统。

{"title":"Automatic DRAM Subsystem Configuration with irace","authors":"Lukas Steiner, Gustavo Delazeri, Iron Prando da Silva, Matthias Jung, N. Wehn","doi":"10.1145/3579170.3579259","DOIUrl":"https://doi.org/10.1145/3579170.3579259","url":null,"abstract":"Nowadays, DRAM subsystem configuration includes a large number of parameters, resulting in an extensive design space. Setting these parameters is a challenging step in system design as the parameter-workload interactions are complex. Since design space exploration by exhaustive simulation is infeasible due to limited computing resources and development time, semi-automatic configuration involving both manual as well as simulation-based decisions is state-of-the-art. However, it requires a lot of expertise in the DRAM domain as well as application knowledge, and there is no guarantee for a good performance of the resulting subsystem. In this paper, we present a new framework that fully automatizes the DRAM subsystem configuration for a given parameter space and set of target applications. It is based on irace, a software package originally developed for automatic configuration of optimization algorithms. We show that the framework finds nearly-optimal configurations, while only a fraction of all application-configuration combinations has to be evaluated. In addition, all returned configurations perform better than a predefined standard configuration. Thus, our framework enables designers to automatically determine a suitable DRAM subsystem for their platform.","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114365571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards an Ontological Methodology for Dynamic Dependability Management of Unmanned Aerial Vehicles 无人机动态可靠性管理的本体论方法研究

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579265

Guillaume Ollier, F. Arnez, Morayo Adedjouma, Raphaël Lallement, Simos Gerasimou, C. Mraidha

Dynamic Dependability Management (DDM) is a promising approach to guarantee and monitor the ability of safety-critical Automated Systems (ASs) to deliver the intended service with an acceptable risk level. However, the non-interpretability and lack of specifications of the Learning-Enabled Component (LEC) used in ASs make this mission particularly challenging. Some existing DDM techniques overcome these limitations by using probabilistic environmental perception knowledge associated with predicting behavior changes for the agents in the environment. Ontology-based methods allow using a formal and traceable representation of AS usage scenarios to support the design process of the DDM component of such ASs. This paper presents a methodology to perform this design process, starting from the AS specification stage and including threat analysis and requirements identification. The present paper focuses on the formalization of an ontology modeling language allowing the interpretation of logical usage scenarios, i.e., a formal description of the scenario represented by state variables. The proposed supervisory system also considers the uncertainty estimation and interaction between AS components through the whole perception-planning-control pipeline. This methodology is illustrated in this paper on a use case involving Unmanned Aerial Vehicles (UAVs).

动态可靠性管理(DDM)是一种很有前途的方法，用于保证和监视安全关键型自动化系统(as)以可接受的风险水平交付预期服务的能力。然而，应用服务器中使用的支持学习的组件(LEC)的不可解释性和缺乏规范使得这项任务特别具有挑战性。一些现有的DDM技术通过使用与预测环境中代理的行为变化相关的概率环境感知知识来克服这些限制。基于本体的方法允许使用AS使用场景的形式化和可追溯的表示来支持此类AS的DDM组件的设计过程。本文提出了一种执行该设计过程的方法，从AS规范阶段开始，包括威胁分析和需求识别。本文着重于本体建模语言的形式化，允许对逻辑使用场景进行解释，即由状态变量表示的场景的形式化描述。该监控系统还考虑了整个感知-计划-控制管道中AS组件之间的不确定性估计和相互作用。本文通过一个涉及无人驾驶飞行器(uav)的用例说明了这种方法。

{"title":"Towards an Ontological Methodology for Dynamic Dependability Management of Unmanned Aerial Vehicles","authors":"Guillaume Ollier, F. Arnez, Morayo Adedjouma, Raphaël Lallement, Simos Gerasimou, C. Mraidha","doi":"10.1145/3579170.3579265","DOIUrl":"https://doi.org/10.1145/3579170.3579265","url":null,"abstract":"Dynamic Dependability Management (DDM) is a promising approach to guarantee and monitor the ability of safety-critical Automated Systems (ASs) to deliver the intended service with an acceptable risk level. However, the non-interpretability and lack of specifications of the Learning-Enabled Component (LEC) used in ASs make this mission particularly challenging. Some existing DDM techniques overcome these limitations by using probabilistic environmental perception knowledge associated with predicting behavior changes for the agents in the environment. Ontology-based methods allow using a formal and traceable representation of AS usage scenarios to support the design process of the DDM component of such ASs. This paper presents a methodology to perform this design process, starting from the AS specification stage and including threat analysis and requirements identification. The present paper focuses on the formalization of an ontology modeling language allowing the interpretation of logical usage scenarios, i.e., a formal description of the scenario represented by state variables. The proposed supervisory system also considers the uncertainty estimation and interaction between AS components through the whole perception-planning-control pipeline. This methodology is illustrated in this paper on a use case involving Unmanned Aerial Vehicles (UAVs).","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121310041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Non-Intrusive Runtime Monitoring for Manycore Prototypes 多核原型的非侵入式运行时监控

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579262

Fabian Lesniak, Nidhi Anantharajaiah, T. Harbaum, Juergen Becker

Rapid prototyping is widely used, essential technique for developing novel computing architectures. While simulation-based approaches allow to examine the Design Under Test, the observability of FPGA-based prototypes is limited as they can behave like a black box. However, for verification and design space exploration purposes it is crucial to obtain detailed information on the internal state of such a prototype. In this work we propose an architecture to gather detailed internal measurements during execution and extract them from the design under test without impacting its runtime behavior. It is specifically designed for low resource usage and minimal impact on timing, leaving more resources for the actual prototyped system. Our proposed architecture offers several different interface modules for various signal sources, including register capturing, event counters and bus snooping. We present an estimate of achievable bandwidth and maximum sample rate as well as a demanding case-study with a tiled manycore platform on a multi-FPGA prototyping platform. Experimental results show up to 32 million 4-byte measurements per second, saturating a gigabit Ethernet connection. The monitoring system has proven to be very useful when working with an FPGA-based manycore prototype, as it is an essential tool to reveal incorrect behavior and bottlenecks in hardware, operating system and applications at an early stage.

快速原型技术是开发新型计算体系结构的一项重要技术。虽然基于仿真的方法允许检查测试中的设计，但基于fpga的原型的可观察性是有限的，因为它们可能表现得像一个黑盒子。然而，为了验证和设计空间探索的目的，获得这种原型的内部状态的详细信息是至关重要的。在这项工作中，我们提出了一个架构，在执行期间收集详细的内部测量，并在不影响其运行时行为的情况下从测试中的设计中提取它们。它是专门为低资源使用和对时间的最小影响而设计的，为实际原型系统留下更多的资源。我们提出的架构为各种信号源提供了几种不同的接口模块，包括寄存器捕获，事件计数器和总线窥探。我们提出了可实现带宽和最大采样率的估计，以及在多fpga原型平台上使用平铺多核平台的苛刻案例研究。实验结果显示每秒多达3200万个4字节的测量，使千兆以太网连接饱和。在使用基于fpga的多核原型时，该监控系统已被证明是非常有用的，因为它是在早期阶段发现硬件、操作系统和应用程序中的错误行为和瓶颈的重要工具。

{"title":"Non-Intrusive Runtime Monitoring for Manycore Prototypes","authors":"Fabian Lesniak, Nidhi Anantharajaiah, T. Harbaum, Juergen Becker","doi":"10.1145/3579170.3579262","DOIUrl":"https://doi.org/10.1145/3579170.3579262","url":null,"abstract":"Rapid prototyping is widely used, essential technique for developing novel computing architectures. While simulation-based approaches allow to examine the Design Under Test, the observability of FPGA-based prototypes is limited as they can behave like a black box. However, for verification and design space exploration purposes it is crucial to obtain detailed information on the internal state of such a prototype. In this work we propose an architecture to gather detailed internal measurements during execution and extract them from the design under test without impacting its runtime behavior. It is specifically designed for low resource usage and minimal impact on timing, leaving more resources for the actual prototyped system. Our proposed architecture offers several different interface modules for various signal sources, including register capturing, event counters and bus snooping. We present an estimate of achievable bandwidth and maximum sample rate as well as a demanding case-study with a tiled manycore platform on a multi-FPGA prototyping platform. Experimental results show up to 32 million 4-byte measurements per second, saturating a gigabit Ethernet connection. The monitoring system has proven to be very useful when working with an FPGA-based manycore prototype, as it is an essential tool to reveal incorrect behavior and bottlenecks in hardware, operating system and applications at an early stage.","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128159944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Faster Functional Warming with Cache Merging 更快的缓存合并功能升温

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579256

Gustaf Borgström, C. Rohner, D. Black-Schaffer

Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming to achieve sufficient accuracy. Unfortunately, each increases requires that the previous warming be redone, nearly doubling the total warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. Our experiments show that Cache Merging delivers an average speedup of 1.44 ×, 1.84 ×, and 1.87 × for 128kB, 2MB, and 8MB L2 caches, respectively, (vs. a 2 × theoretical maximum speedup) with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.

类似智能的采样硬件仿真技术通过详细模拟应用程序的许多小部分来实现良好的准确性。然而，虽然这减少了模拟时间，但它导致了大量的缓存升温时间，因为许多模拟点中的每一个都需要升温整个内存层次结构。自适应缓存升温通过迭代增加升温来达到足够的精度，从而减少了这一时间。不幸的是，每次升温都需要之前的变暖重新开始，几乎使总变暖增加一倍。我们通过开发一种技术来合并来自先前和额外的升温迭代的缓存状态，从而解决重新升温问题。我们在多级LRU缓存层次结构上演示了我们的合并方法，并评估和解决了引入的错误。我们的实验表明，对于128kB、2MB和8MB的L2缓存，Cache合并分别提供了1.44倍、1.84倍和1.87倍的平均加速(相对于2倍的理论最大加速)，95%的绝对IPC误差分别只有0.029、0.015和0.006。这些结果表明，缓存合并以最小的损失显著提高了模拟速度。

{"title":"Faster Functional Warming with Cache Merging","authors":"Gustaf Borgström, C. Rohner, D. Black-Schaffer","doi":"10.1145/3579170.3579256","DOIUrl":"https://doi.org/10.1145/3579170.3579256","url":null,"abstract":"Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming to achieve sufficient accuracy. Unfortunately, each increases requires that the previous warming be redone, nearly doubling the total warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. Our experiments show that Cache Merging delivers an average speedup of 1.44 ×, 1.84 ×, and 1.87 × for 128kB, 2MB, and 8MB L2 caches, respectively, (vs. a 2 × theoretical maximum speedup) with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114624099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ReDroSe — Reconfigurable Drone Setup for Resource-Efficient SLAM ReDroSe -资源高效SLAM的可重构无人机设置

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

Pub Date : 2023-01-17 DOI: 10.1145/3579170.3579266

Sebastian Rahn, Philipp Gehricke, Can-Leon Petermöller, Eric Neumann, Philipp Schlinge, Leon Rabius, Henning Termühlen, Christopher Sieh, M. Tassemeier, T. Wiemann, Mario Porrmann

In this paper we present ReDroSe, a heterogeneous compute system based on embedded CPUs, FPGAs and GPUs, which is integrated into an existing UAV platform to allow real time SLAM based on a Truncated Signed Distance Field (TSDF) directly on the drone. The system is fully integrated into the existing infrastructure to allow ground control to manage and monitor the data acquisition process. ReDroSe is evaluated in terms of power consumption and computing capabilities. The results show that the proposed architecture allows computations on the UAV that were previously only possible in post-processing while keeping the power consumption low enough to match the available flight time of the UAV.

在本文中，我们提出了一个基于嵌入式cpu、fpga和gpu的异构计算系统ReDroSe，该系统集成到现有的无人机平台中，可以直接在无人机上实现基于截断签名距离域(TSDF)的实时SLAM。该系统完全集成到现有的基础设施中，使地面控制能够管理和监控数据采集过程。ReDroSe是根据功耗和计算能力进行评估的。结果表明，所提出的架构允许在无人机上进行以前只能在后处理中进行的计算，同时保持足够低的功耗以匹配无人机的可用飞行时间。

引用次数: 1