首页 > 最新文献

Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems最新文献

英文 中文
Exploiting Predictability in Dynamic Network Communication for Power-Efficient Data Transmission in LTE Radio Systems 在LTE无线电系统中利用动态网络通信的可预测性实现高能效数据传输
Peter Brand, Jonathan Ah Sue, J. Brendel, J. Falk, R. Hasholzner, Jürgen Teich, S. Wildermann
In embedded systems powered by batteries, power is undoubtedly a critical resource making power management an important topic in the design phase. Even though power management is a heavily researched topic, most approaches focus on improving the way the power manager reacts to outside control events. In this paper, we propose techniques that not only react but rather try to predict these outside control events in advance, thus, broadening the capabilities of any employed power manager by allowing for superior transition decisions and even saving redundant calculations. We present results on employing a predictive power management system that couples a classic dynamic power manager with a machine learning subsystem in the context of a mobile device in a Long Term Evolution (LTE) system, with emphasis on evaluating the potential of saving power as well as the handling of the induced prediction uncertainty. First, we examine the LTE communication protocol and showcase certain control data that has to be received periodically, but may contain no information for the receiver. Finally, we show a proof-of-concept based on real LTE traces and hardware simulation, that prediction of this information can be leveraged to allow for a far superior decision process compared to a non-predicting system. Here, we achieve a theoretical best case power saving of 15 % for an idealized prediction with 100 % accuracy and no additional power consumption.
在电池供电的嵌入式系统中,电源无疑是一种重要的资源,使得电源管理成为设计阶段的一个重要课题。尽管电源管理是一个被大量研究的主题,但大多数方法都侧重于改进电源管理器对外部控制事件的反应方式。在本文中,我们提出的技术不仅反应,而是试图提前预测这些外部控制事件,因此,通过允许更好的转换决策,甚至节省冗余计算,扩大任何使用的电源管理器的能力。我们介绍了在长期演进(LTE)系统的移动设备背景下,采用将经典动态电源管理器与机器学习子系统相结合的预测电源管理系统的结果,重点是评估节省电源的潜力以及对引起的预测不确定性的处理。首先,我们检查LTE通信协议,并展示必须定期接收的某些控制数据,但可能不包含接收器的信息。最后,我们展示了基于真实LTE跟踪和硬件仿真的概念验证,与非预测系统相比,可以利用这些信息的预测来实现更优越的决策过程。在这里,我们实现了理论上的最佳情况下的15%的电力节省,理想的预测具有100%的准确性,没有额外的电力消耗。
{"title":"Exploiting Predictability in Dynamic Network Communication for Power-Efficient Data Transmission in LTE Radio Systems","authors":"Peter Brand, Jonathan Ah Sue, J. Brendel, J. Falk, R. Hasholzner, Jürgen Teich, S. Wildermann","doi":"10.1145/3078659.3078670","DOIUrl":"https://doi.org/10.1145/3078659.3078670","url":null,"abstract":"In embedded systems powered by batteries, power is undoubtedly a critical resource making power management an important topic in the design phase. Even though power management is a heavily researched topic, most approaches focus on improving the way the power manager reacts to outside control events. In this paper, we propose techniques that not only react but rather try to predict these outside control events in advance, thus, broadening the capabilities of any employed power manager by allowing for superior transition decisions and even saving redundant calculations. We present results on employing a predictive power management system that couples a classic dynamic power manager with a machine learning subsystem in the context of a mobile device in a Long Term Evolution (LTE) system, with emphasis on evaluating the potential of saving power as well as the handling of the induced prediction uncertainty. First, we examine the LTE communication protocol and showcase certain control data that has to be received periodically, but may contain no information for the receiver. Finally, we show a proof-of-concept based on real LTE traces and hardware simulation, that prediction of this information can be leveraged to allow for a far superior decision process compared to a non-predicting system. Here, we achieve a theoretical best case power saving of 15 % for an idealized prediction with 100 % accuracy and no additional power consumption.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126046736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings 俄罗斯方块:用于可预测执行静态映射的多应用运行时系统
Andrés Goens, R. Khasanov, J. Castrillón, Marcus Hähnel, Till Smejkal, Hermann Härtig
For embedded system software, it is common to use static mappings of tasks to cores. This becomes considerably more challenging in multi-application scenarios. In this paper, we propose TETRiS, a multi-application run-time system for static mappings for heterogeneous system-on-chip architectures. It leverages compile-time information to map and migrate tasks in a fashion that preserves the predictable performance of using static mappings, allowing the system to accommodate multiple applications. TETRiS runs on off-the-shelf embedded systems and is Linux-compatible. We embed our approach in a state-of-the-art compiler for multicore systems and evaluate the proposed run-time system in a modern heterogeneous platform using realistic benchmarks. We present two experiments whose execution time and energy consumptions are comparable to those obtained by the highly-optimized Linux scheduler CFS, and where execution time variance is reduced by a factor of 510, and energy consumption variance by a factor of 83.
对于嵌入式系统软件,通常使用任务到内核的静态映射。这在多应用程序场景中变得更具挑战性。在本文中,我们提出了一个用于异构片上系统架构的静态映射的多应用运行时系统TETRiS。它利用编译时信息以一种保留使用静态映射的可预测性能的方式映射和迁移任务,从而允许系统容纳多个应用程序。俄罗斯方块运行在现成的嵌入式系统上,并且与linux兼容。我们将我们的方法嵌入到多核系统的最先进的编译器中,并使用现实的基准在现代异构平台中评估建议的运行时系统。我们提出了两个实验,它们的执行时间和能耗与高度优化的Linux调度器CFS相当,并且执行时间方差减少了510倍,能耗方差减少了83倍。
{"title":"TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings","authors":"Andrés Goens, R. Khasanov, J. Castrillón, Marcus Hähnel, Till Smejkal, Hermann Härtig","doi":"10.1145/3078659.3078663","DOIUrl":"https://doi.org/10.1145/3078659.3078663","url":null,"abstract":"For embedded system software, it is common to use static mappings of tasks to cores. This becomes considerably more challenging in multi-application scenarios. In this paper, we propose TETRiS, a multi-application run-time system for static mappings for heterogeneous system-on-chip architectures. It leverages compile-time information to map and migrate tasks in a fashion that preserves the predictable performance of using static mappings, allowing the system to accommodate multiple applications. TETRiS runs on off-the-shelf embedded systems and is Linux-compatible. We embed our approach in a state-of-the-art compiler for multicore systems and evaluate the proposed run-time system in a modern heterogeneous platform using realistic benchmarks. We present two experiments whose execution time and energy consumptions are comparable to those obtained by the highly-optimized Linux scheduler CFS, and where execution time variance is reduced by a factor of 510, and energy consumption variance by a factor of 83.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124525979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Combining Dataflow Applications and Real-time Task Sets on Multi-core Platforms 在多核平台上结合数据流应用和实时任务集
H. Ali, B. Akesson, L. M. Pinho
Future real-time embedded systems will increasingly incorporate mixed application models with timing constraints running on the same multi-core platform. These application models are dataflow applications with timing constraints and traditional real-time applications modelled as independent arbitrary-deadline tasks. These systems require guarantees that all running applications execute satisfying their timing constraints. Also, to be cost-efficient in terms of design, they require efficient mapping strategies that maximize the use of system resources to reduce the overall cost. This work proposes an approach to integrate mixed application models (dataflow and traditional real-time applications) with timing requirements on the same multi-core platform. It comprises three main algorithms: 1) Slack-Based Merging, 2) Timing Parameter Extraction, and 3) Communication-Aware Mapping. Together, these three algorithms play a part in allowing mapping and scheduling of mixed application models in embedded real-time systems. The complete approach and the three algorithms presented have been validated through proofs and experimental evaluation.
未来的实时嵌入式系统将越来越多地结合具有时间约束的混合应用程序模型,运行在相同的多核平台上。这些应用程序模型是具有时间约束的数据流应用程序和建模为独立任意截止日期任务的传统实时应用程序。这些系统需要保证所有正在运行的应用程序都能满足它们的时间限制。此外,为了在设计方面具有成本效益,它们需要有效的映射策略,以最大限度地利用系统资源来降低总体成本。这项工作提出了一种在同一多核平台上集成具有时序要求的混合应用模型(数据流和传统实时应用)的方法。它包括三个主要算法:1)基于松弛的合并算法,2)时序参数提取算法,3)通信感知映射算法。总之,这三种算法在允许嵌入式实时系统中混合应用模型的映射和调度方面发挥了作用。完整的方法和提出的三种算法通过证明和实验评估得到了验证。
{"title":"Combining Dataflow Applications and Real-time Task Sets on Multi-core Platforms","authors":"H. Ali, B. Akesson, L. M. Pinho","doi":"10.1145/3078659.3078671","DOIUrl":"https://doi.org/10.1145/3078659.3078671","url":null,"abstract":"Future real-time embedded systems will increasingly incorporate mixed application models with timing constraints running on the same multi-core platform. These application models are dataflow applications with timing constraints and traditional real-time applications modelled as independent arbitrary-deadline tasks. These systems require guarantees that all running applications execute satisfying their timing constraints. Also, to be cost-efficient in terms of design, they require efficient mapping strategies that maximize the use of system resources to reduce the overall cost. This work proposes an approach to integrate mixed application models (dataflow and traditional real-time applications) with timing requirements on the same multi-core platform. It comprises three main algorithms: 1) Slack-Based Merging, 2) Timing Parameter Extraction, and 3) Communication-Aware Mapping. Together, these three algorithms play a part in allowing mapping and scheduling of mixed application models in embedded real-time systems. The complete approach and the three algorithms presented have been validated through proofs and experimental evaluation.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134283656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enabling zero-copy OpenMP offloading on the PULP many-core accelerator 在PULP多核加速器上启用零拷贝OpenMP卸载
Alessandro Capotondi, A. Marongiu
Many-core heterogeneous designs are nowadays widely available among embedded systems. Initiatives such as the HSA push for a model where the host processor and the accelerator(s) communicate via coherent, Unified Virtual Memory (UVM). In this paper we describe our experience in porting the OpenMP v4 programming model to a low-end, heterogeneous embedded system based on the PULP many-core accelerator featuring lightweight (software-managed) UVM support. We describe a GCC-based toolchain which enables: i) the automatic generation of host and accelerator binaries from a single, high-level, OpenMP parallel program; ii) the automatic instrumentation of the accelerator program to transparently manage UVM. This enables up to 4x faster execution compared to traditional copy-based offload mechanisms.
如今,多核异构设计在嵌入式系统中广泛应用。诸如HSA之类的倡议推动了一种模型,其中主机处理器和加速器通过一致的统一虚拟内存(UVM)进行通信。在本文中,我们描述了将OpenMP v4编程模型移植到基于PULP多核加速器的低端异构嵌入式系统的经验,该加速器具有轻量级(软件管理的)UVM支持。我们描述了一个基于gcc的工具链,它使:i)从单个高级OpenMP并行程序自动生成主机和加速器二进制文件;ii)加速器程序的自动仪表,以透明地管理UVM。这使得执行速度比传统的基于副本的卸载机制快4倍。
{"title":"Enabling zero-copy OpenMP offloading on the PULP many-core accelerator","authors":"Alessandro Capotondi, A. Marongiu","doi":"10.1145/3078659.3079071","DOIUrl":"https://doi.org/10.1145/3078659.3079071","url":null,"abstract":"Many-core heterogeneous designs are nowadays widely available among embedded systems. Initiatives such as the HSA push for a model where the host processor and the accelerator(s) communicate via coherent, Unified Virtual Memory (UVM). In this paper we describe our experience in porting the OpenMP v4 programming model to a low-end, heterogeneous embedded system based on the PULP many-core accelerator featuring lightweight (software-managed) UVM support. We describe a GCC-based toolchain which enables: i) the automatic generation of host and accelerator binaries from a single, high-level, OpenMP parallel program; ii) the automatic instrumentation of the accelerator program to transparently manage UVM. This enables up to 4x faster execution compared to traditional copy-based offload mechanisms.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134511182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Stencil Autotuning with Ordinal Regression: Extended Abstract 基于有序回归的模板自调整:扩展摘要
Biagio Cosenza, J. Durillo, Stefano Ermon, B. Juurlink
The increasing performance of today's computer architecture comes with an unprecedented augment of hardware complexity. Unfortunately this results in difficult-to-tune software and consequentially in a gap between the potential peak performance and the actual performance. Automatic tuning is an emerging approach that assists the programmer in managing this complexity. State-of-the-art autotuners are limited, though: they either require long tuning times, e.g., due to iterative searches, or cannot tackle the complexity of the problem due to the limitation of the supervised machine learning (ML) methodologies used. In particular, traditional ML autotuning approaches exploiting classification algorithms (such as neural networks and support vector machines) face difficulties in capturing all features of large search spaces. We propose a new way of performing automatic tuning based on structural learning: the tuning problem is formulated as a version ranking prediction modeling and solved using ordinal regression. We demonstrate its potential on a well-known autotuning problem: stencil computations. We compare state-of-the-art iterative compilation methods with our ordinal regression approach and analyze the quality of the obtained ranking in terms of Kendall rank correlation coefficients.
随着当今计算机体系结构性能的不断提高,硬件复杂性也出现了前所未有的增加。不幸的是,这会导致难以调优的软件,并必然导致潜在峰值性能与实际性能之间的差距。自动调优是一种新兴的方法,可以帮助程序员管理这种复杂性。然而,最先进的自动调谐器是有限的:它们要么需要很长的调谐时间,例如,由于迭代搜索,要么由于所使用的监督机器学习(ML)方法的限制,无法解决问题的复杂性。特别是,利用分类算法(如神经网络和支持向量机)的传统ML自动调整方法在捕获大型搜索空间的所有特征方面面临困难。我们提出了一种基于结构学习的自动调优方法:将调优问题表述为一个版本排序预测模型,并使用有序回归进行求解。我们展示了它在一个众所周知的自动调优问题上的潜力:模板计算。我们比较了最先进的迭代编译方法和我们的有序回归方法,并根据肯德尔秩相关系数分析了获得的排名的质量。
{"title":"Stencil Autotuning with Ordinal Regression: Extended Abstract","authors":"Biagio Cosenza, J. Durillo, Stefano Ermon, B. Juurlink","doi":"10.1145/3078659.3078664","DOIUrl":"https://doi.org/10.1145/3078659.3078664","url":null,"abstract":"The increasing performance of today's computer architecture comes with an unprecedented augment of hardware complexity. Unfortunately this results in difficult-to-tune software and consequentially in a gap between the potential peak performance and the actual performance. Automatic tuning is an emerging approach that assists the programmer in managing this complexity. State-of-the-art autotuners are limited, though: they either require long tuning times, e.g., due to iterative searches, or cannot tackle the complexity of the problem due to the limitation of the supervised machine learning (ML) methodologies used. In particular, traditional ML autotuning approaches exploiting classification algorithms (such as neural networks and support vector machines) face difficulties in capturing all features of large search spaces. We propose a new way of performing automatic tuning based on structural learning: the tuning problem is formulated as a version ranking prediction modeling and solved using ordinal regression. We demonstrate its potential on a well-known autotuning problem: stencil computations. We compare state-of-the-art iterative compilation methods with our ordinal regression approach and analyze the quality of the obtained ranking in terms of Kendall rank correlation coefficients.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133150349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Numerical Accuracy Improvement by Interprocedural Program Transformation 程序间程序转换提高数值精度
Nasrine Damouche, M. Martel, Alexandre Chapoutot
Floating-point numbers are used to approximate the exact real numbers in a wide range of domains like numerical simulations, embedded software, etc. However, floating-point numbers are a finite approximation of real numbers. In practice, this approximation may introduce round-off errors and this can lead to catastrophic results. To cope with this issue, we have developed a tool which corrects partly these round-off errors and which consequently improves the numerical accuracy of computations by automatically transforming programs in a source to source manner. Our transformation, relies on static analysis by abstract interpretation and operates on pieces of code with assignments, conditionals and loops. In former work, we have focused on the intraprocedural transformation of programs and, in this article, we introduce the interprocedural transformation to improve accuracy.
在数值模拟、嵌入式软件等广泛的领域中,浮点数被用来近似精确的实数。然而,浮点数是实数的有限近似值。在实践中,这种近似可能会引入舍入误差,从而导致灾难性的结果。为了解决这个问题,我们开发了一种工具,它可以部分地纠正这些舍入误差,从而通过以源到源的方式自动转换程序来提高计算的数值精度。我们的转换依赖于抽象解释的静态分析,并对带有赋值、条件和循环的代码片段进行操作。在以前的工作中,我们专注于程序的过程内转换,在本文中,我们介绍了过程间转换以提高准确性。
{"title":"Numerical Accuracy Improvement by Interprocedural Program Transformation","authors":"Nasrine Damouche, M. Martel, Alexandre Chapoutot","doi":"10.1145/3078659.3078662","DOIUrl":"https://doi.org/10.1145/3078659.3078662","url":null,"abstract":"Floating-point numbers are used to approximate the exact real numbers in a wide range of domains like numerical simulations, embedded software, etc. However, floating-point numbers are a finite approximation of real numbers. In practice, this approximation may introduce round-off errors and this can lead to catastrophic results. To cope with this issue, we have developed a tool which corrects partly these round-off errors and which consequently improves the numerical accuracy of computations by automatically transforming programs in a source to source manner. Our transformation, relies on static analysis by abstract interpretation and operates on pieces of code with assignments, conditionals and loops. In former work, we have focused on the intraprocedural transformation of programs and, in this article, we introduce the interprocedural transformation to improve accuracy.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122077642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics 基于近似算法的自适应fpga图像处理滤波器
Jutta Pirkl, Andreas Becher, Jorge Echavarria, J. Teich, S. Wildermann
Approximate Computing aims at trading off computational accuracy against improvements regarding performance, resource utilization and power consumption by making use of the capability of many applications to tolerate a certain loss of quality. A key issue is the dependency of the impact of approximation on the input data as well as user preferences and environmental conditions. In this context, we therefore investigate the concept of self-adaptive image processing that is able to autonomously adapt 2D-convolution filter operators of different accuracy degrees by means of partial reconfiguration on Field-Programmable-Gate-Arrays (FPGAs). Experimental evaluation shows that the dynamic system is able to better exploit a given error tolerance than any static approximation technique due to its responsiveness to changes in input data. Additionally, it provides a user control knob to select the desired output quality via the metric threshold at runtime.
近似计算旨在通过利用许多应用程序的能力来容忍一定的质量损失,从而在计算精度与性能、资源利用率和功耗方面的改进之间进行权衡。关键问题是近似值对输入数据的影响以及用户偏好和环境条件的依赖性。在这种情况下,我们因此研究了自适应图像处理的概念,该概念能够通过在现场可编程门阵列(fpga)上的部分重构来自主适应不同精度程度的2d卷积滤波器算子。实验评估表明,由于动态系统对输入数据变化的响应性,它比任何静态近似技术都能更好地利用给定的容错能力。此外,它还提供了一个用户控制旋钮,在运行时通过度量阈值选择所需的输出质量。
{"title":"Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics","authors":"Jutta Pirkl, Andreas Becher, Jorge Echavarria, J. Teich, S. Wildermann","doi":"10.1145/3078659.3078669","DOIUrl":"https://doi.org/10.1145/3078659.3078669","url":null,"abstract":"Approximate Computing aims at trading off computational accuracy against improvements regarding performance, resource utilization and power consumption by making use of the capability of many applications to tolerate a certain loss of quality. A key issue is the dependency of the impact of approximation on the input data as well as user preferences and environmental conditions. In this context, we therefore investigate the concept of self-adaptive image processing that is able to autonomously adapt 2D-convolution filter operators of different accuracy degrees by means of partial reconfiguration on Field-Programmable-Gate-Arrays (FPGAs). Experimental evaluation shows that the dynamic system is able to better exploit a given error tolerance than any static approximation technique due to its responsiveness to changes in input data. Additionally, it provides a user control knob to select the desired output quality via the metric threshold at runtime.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"48 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132026471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust Mapping of Process Networks to Many-Core Systems using Bio-Inspired Design Centering 基于仿生设计中心的过程网络多核系统鲁棒映射
G. Hempel, Andrés Goens, J. Castrillón, Josefine Asmus, I. Sbalzarini
Embedded systems are often designed as complex architectures with numerous processing elements. Effectively programming such systems requires parallel programming models e.g. task-based or dataflow-based models. With these types of models, the mapping of the abstract application model to the existing hardware architecture plays a decisive role and is usually optimized to achieve an ideal resource footprint or a near-minimal execution time. However, when mapping several independent programs to the same platform, resource conflicts can arise. This can be circumvented by remapping some of the tasks of an application, which in turn affect its timing behavior, possibly leading to constraint violations. In this work we present a novel method to compute mappings that are robust against local task remapping. The underlying method is based on the bio-inspired design centering algorithm of Lp-Adaptation. We evaluate this with several benchmarks on different platforms and show that mappings obtained with our algorithm are indeed robust. In all experiments, our robust mappings tolerated significantly more run-time perturbations without violating constraints than mappings devised with optimization heuristics
嵌入式系统通常被设计为具有许多处理元素的复杂体系结构。有效地编程这样的系统需要并行编程模型,例如基于任务或基于数据流的模型。对于这些类型的模型,抽象应用程序模型到现有硬件体系结构的映射起着决定性的作用,并且通常会进行优化,以实现理想的资源占用或近乎最小的执行时间。然而,当将几个独立的程序映射到同一个平台时,可能会出现资源冲突。这可以通过重新映射应用程序的一些任务来规避,这反过来会影响其计时行为,可能导致违反约束。在这项工作中,我们提出了一种计算映射的新方法,该方法对局部任务映射具有鲁棒性。底层方法是基于lp - adaptive的仿生设计定心算法。我们用不同平台上的几个基准测试来评估这一点,并表明用我们的算法获得的映射确实是鲁棒的。在所有实验中,我们的鲁棒映射在不违反约束的情况下比使用优化启发式设计的映射容忍更多的运行时扰动
{"title":"Robust Mapping of Process Networks to Many-Core Systems using Bio-Inspired Design Centering","authors":"G. Hempel, Andrés Goens, J. Castrillón, Josefine Asmus, I. Sbalzarini","doi":"10.1145/3078659.3078667","DOIUrl":"https://doi.org/10.1145/3078659.3078667","url":null,"abstract":"Embedded systems are often designed as complex architectures with numerous processing elements. Effectively programming such systems requires parallel programming models e.g. task-based or dataflow-based models. With these types of models, the mapping of the abstract application model to the existing hardware architecture plays a decisive role and is usually optimized to achieve an ideal resource footprint or a near-minimal execution time. However, when mapping several independent programs to the same platform, resource conflicts can arise. This can be circumvented by remapping some of the tasks of an application, which in turn affect its timing behavior, possibly leading to constraint violations. In this work we present a novel method to compute mappings that are robust against local task remapping. The underlying method is based on the bio-inspired design centering algorithm of Lp-Adaptation. We evaluate this with several benchmarks on different platforms and show that mappings obtained with our algorithm are indeed robust. In all experiments, our robust mappings tolerated significantly more run-time perturbations without violating constraints than mappings devised with optimization heuristics","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134019251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the Accuracy of Near-Optimal GPU-Based Path Planning for UAVs 基于gpu的无人机近最优路径规划精度研究
D. Palossi, A. Marongiu, L. Benini
Path planning is one of the key functional blocks for any autonomous aerial vehicle (UAV). The goal of a path planner module is to constantly update the route of the vehicle based on information sensed in real-time. Given the high computational requirements of this task, heterogeneous many-cores are appealing candidates for its execution. Approximate path computation has proven a promising approach to reduce total execution time, at the cost of a slight loss in accuracy. In this work we study performance and accuracy of state-of-the-art, near-optimal parallel path planning in combination with program transformations aimed at ensuring efficient use of embedded GPU resources. We propose a profile-based algorithmic variant which boosts GPU execution by up to ≈ 7x, while maintaining the accuracy loss below 5%.
路径规划是无人飞行器(UAV)的关键功能模块之一。路径规划模块的目标是基于实时感知的信息不断更新车辆的路线。鉴于该任务的高计算需求,异构多核是其执行的吸引人的候选者。近似路径计算已被证明是一种很有前途的方法,可以减少总执行时间,但代价是准确性略有下降。在这项工作中,我们研究了最先进的、接近最优的并行路径规划的性能和准确性,并结合了旨在确保有效利用嵌入式GPU资源的程序转换。我们提出了一种基于配置文件的算法变体,它将GPU的执行速度提高了约7倍,同时将精度损失保持在5%以下。
{"title":"On the Accuracy of Near-Optimal GPU-Based Path Planning for UAVs","authors":"D. Palossi, A. Marongiu, L. Benini","doi":"10.1145/3078659.3079072","DOIUrl":"https://doi.org/10.1145/3078659.3079072","url":null,"abstract":"Path planning is one of the key functional blocks for any autonomous aerial vehicle (UAV). The goal of a path planner module is to constantly update the route of the vehicle based on information sensed in real-time. Given the high computational requirements of this task, heterogeneous many-cores are appealing candidates for its execution. Approximate path computation has proven a promising approach to reduce total execution time, at the cost of a slight loss in accuracy. In this work we study performance and accuracy of state-of-the-art, near-optimal parallel path planning in combination with program transformations aimed at ensuring efficient use of embedded GPU resources. We propose a profile-based algorithmic variant which boosts GPU execution by up to ≈ 7x, while maintaining the accuracy loss below 5%.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122763879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Conversion of Simulink Models to SysteMoC Actor Networks Simulink模型到systememoc参与者网络的自动转换
Martín Letras, J. Falk, S. Wildermann, J. Teich
Simulink has gained a lot of acceptance due to its intuitive through block-based algorithm design, simulation, and rapid prototyping capabilities for signal processing as well as control applications. However, automatic code generation for heterogeneous architectures is currently not supported by Simulink. In the literature, there exist automatic translation toolchains for generation of C or C++ code from Simulink models, which then are used for implementation or validation purposes. But few of them approach the generation of models that can be used in well-established Electronic System Level (ESL) design methodologies and tools. In order to address this issue, we present a methodology to extract an executable specification based on Data Flow Graphs (DFGs) from a given Simulink model. Such a specification can then be used by ESL tools to perform a Design Space Exploration (DSE) and generate code for hardware/software partitions directly from the ESL model. In a case study from signal processing, we validate the equivalence of the results of the simulation in Simulink and the results obtained by simulation of the DFG fully automatically generated from the Simulink model in the SystemC-based actor language SysteMoC.
由于其直观的基于块的算法设计、仿真和快速原型设计功能,Simulink在信号处理和控制应用中获得了广泛的认可。然而,目前Simulink并不支持异构体系结构的自动代码生成。在文献中,存在用于从Simulink模型生成C或c++代码的自动翻译工具链,然后将其用于实现或验证目的。但是,他们中很少有人接近可以在已建立的电子系统级(ESL)设计方法和工具中使用的模型生成。为了解决这个问题,我们提出了一种从给定的Simulink模型中提取基于数据流图(DFGs)的可执行规范的方法。这样的规范可以被ESL工具用来执行设计空间探索(Design Space Exploration, DSE),并直接从ESL模型生成硬件/软件分区的代码。在信号处理的一个案例研究中,我们验证了在Simulink中仿真的结果与在基于systemc的参与者语言SysteMoC中对Simulink模型完全自动生成的DFG进行仿真的结果是等价的。
{"title":"Automatic Conversion of Simulink Models to SysteMoC Actor Networks","authors":"Martín Letras, J. Falk, S. Wildermann, J. Teich","doi":"10.1145/3078659.3078668","DOIUrl":"https://doi.org/10.1145/3078659.3078668","url":null,"abstract":"Simulink has gained a lot of acceptance due to its intuitive through block-based algorithm design, simulation, and rapid prototyping capabilities for signal processing as well as control applications. However, automatic code generation for heterogeneous architectures is currently not supported by Simulink. In the literature, there exist automatic translation toolchains for generation of C or C++ code from Simulink models, which then are used for implementation or validation purposes. But few of them approach the generation of models that can be used in well-established Electronic System Level (ESL) design methodologies and tools. In order to address this issue, we present a methodology to extract an executable specification based on Data Flow Graphs (DFGs) from a given Simulink model. Such a specification can then be used by ESL tools to perform a Design Space Exploration (DSE) and generate code for hardware/software partitions directly from the ESL model. In a case study from signal processing, we validate the equivalence of the results of the simulation in Simulink and the results obtained by simulation of the DFG fully automatically generated from the Simulink model in the SystemC-based actor language SysteMoC.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124428094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1