首页 > 最新文献

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems最新文献

英文 中文
Compiler-Directed Data Locality Optimization in MATLAB MATLAB中编译器导向的数据局部优化
Christakis Lezos, I. Latifis, G. Dimitroulakos, K. Masselos
Array programming languages, such as MATLAB, are often used for algorithm development by scientists and engineers without taking into consideration implementation related issues and with limited emphasis on relevant optimizations. Application code optimization, especially in terms of data storage and transfer behavior, is still an important issue and heavily affects implementations' quality in terms of performance, power consumption etc. Efficient approaches for the optimization of high level application code are required to derive high quality implementations while still reducing development time and cost. This paper presents MemAssist, a software tool supporting application developers in detecting parts of the application code in MATLAB that do not exploit efficiently the targeted processor architecture and especially the memory hierarchy. Furthermore, the proposed tool guides application developers in applying code transformations in MATLAB for the optimization of the algorithm's temporal data locality. An image processing algorithm has been optimized using MemAssist as a practical usage scenario. Experimental results prove that the use of MemAssist can heavily reduce cache misses (up to 40%) and improve execution time (up to 30% speedup) on two different processor architectures. Thus, MemAssist can be used for optimized application code development that can lead to efficient implementations while still reducing development time and cost.
数组编程语言,如MATLAB,经常被科学家和工程师用于算法开发,而不考虑与实现相关的问题,并且对相关优化的重视有限。应用程序代码优化,特别是在数据存储和传输行为方面,仍然是一个重要的问题,并且严重影响实现在性能、功耗等方面的质量。需要有效的方法来优化高级应用程序代码,以获得高质量的实现,同时仍然减少开发时间和成本。MemAssist是一种软件工具,它支持应用程序开发人员检测MATLAB中未有效利用目标处理器体系结构,特别是内存层次结构的部分应用程序代码。此外,该工具还指导应用程序开发人员在MATLAB中应用代码转换来优化算法的时序数据局部性。使用MemAssist作为实际使用场景,对图像处理算法进行了优化。实验结果证明,使用MemAssist可以在两种不同的处理器架构上大大减少缓存丢失(高达40%)并提高执行时间(高达30%的加速)。因此,MemAssist可用于优化应用程序代码开发,从而实现高效的实现,同时减少开发时间和成本。
{"title":"Compiler-Directed Data Locality Optimization in MATLAB","authors":"Christakis Lezos, I. Latifis, G. Dimitroulakos, K. Masselos","doi":"10.1145/2906363.2906378","DOIUrl":"https://doi.org/10.1145/2906363.2906378","url":null,"abstract":"Array programming languages, such as MATLAB, are often used for algorithm development by scientists and engineers without taking into consideration implementation related issues and with limited emphasis on relevant optimizations. Application code optimization, especially in terms of data storage and transfer behavior, is still an important issue and heavily affects implementations' quality in terms of performance, power consumption etc. Efficient approaches for the optimization of high level application code are required to derive high quality implementations while still reducing development time and cost. This paper presents MemAssist, a software tool supporting application developers in detecting parts of the application code in MATLAB that do not exploit efficiently the targeted processor architecture and especially the memory hierarchy. Furthermore, the proposed tool guides application developers in applying code transformations in MATLAB for the optimization of the algorithm's temporal data locality. An image processing algorithm has been optimized using MemAssist as a practical usage scenario. Experimental results prove that the use of MemAssist can heavily reduce cache misses (up to 40%) and improve execution time (up to 30% speedup) on two different processor architectures. Thus, MemAssist can be used for optimized application code development that can lead to efficient implementations while still reducing development time and cost.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130525752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Studying the Impact of Bit Switching on CPU Energy 研究位交换对CPU能量的影响
Ghassan Shobaki, Najm Eldeen Abu Rmaileh, J. Jamal
It has been proposed in previous work that compiler instruction scheduling may reduce energy consumption by reordering instructions to minimize bit switching. Multiple algorithms have been proposed in the literature for performing this form of instruction scheduling. However, the impact of these algorithms on actual energy consumption has not been quantified using real hardware measurements; only simulation results have been reported. In this paper, we study the impact of bit switching on the CPU energy consumption using direct hardware measurements on a modern ARM processor. The measurements are performed using an energy probe provided by ARM. The experimental results show that the switching energy is significant and measurable, thus negating the hypothesis that compiling for performance is equivalent to compiling for energy. Yet, our experimental evaluation of multiple bit-switching-aware algorithms suggests that developing a compiler scheduling algorithm for reducing energy consumption by minimizing bit switching is quite challenging, because bit switching may conflict with execution time. An instruction order that minimizes bit switching but increases execution time may result in an overall increase in CPU energy, because the execution time has a higher impact on CPU energy than bit switching. In conclusion, our experimental results show that although performance is a primary factor that affects energy, it is not the only factor; switching energy is another significant factor.
编译器指令调度可以通过重新排序指令来减少比特交换,从而降低能量消耗。文献中提出了多种算法来执行这种形式的指令调度。然而,这些算法对实际能源消耗的影响尚未使用实际硬件测量进行量化;只报道了模拟结果。在本文中,我们研究了位转换对CPU能耗的影响,使用直接硬件测量在现代ARM处理器上。测量使用ARM提供的能量探头进行。实验结果表明,开关能量显著且可测量,从而否定了为性能而编译等同于为能量而编译的假设。然而,我们对多种感知位交换算法的实验评估表明,开发一种通过最小化位交换来降低能耗的编译器调度算法是相当具有挑战性的,因为位交换可能与执行时间相冲突。最小化位交换但增加执行时间的指令顺序可能会导致CPU能量的总体增加,因为执行时间比位交换对CPU能量的影响更大。综上所述,我们的实验结果表明,虽然性能是影响能量的主要因素,但它不是唯一的因素;转换能量是另一个重要因素。
{"title":"Studying the Impact of Bit Switching on CPU Energy","authors":"Ghassan Shobaki, Najm Eldeen Abu Rmaileh, J. Jamal","doi":"10.1145/2906363.2906382","DOIUrl":"https://doi.org/10.1145/2906363.2906382","url":null,"abstract":"It has been proposed in previous work that compiler instruction scheduling may reduce energy consumption by reordering instructions to minimize bit switching. Multiple algorithms have been proposed in the literature for performing this form of instruction scheduling. However, the impact of these algorithms on actual energy consumption has not been quantified using real hardware measurements; only simulation results have been reported. In this paper, we study the impact of bit switching on the CPU energy consumption using direct hardware measurements on a modern ARM processor. The measurements are performed using an energy probe provided by ARM. The experimental results show that the switching energy is significant and measurable, thus negating the hypothesis that compiling for performance is equivalent to compiling for energy. Yet, our experimental evaluation of multiple bit-switching-aware algorithms suggests that developing a compiler scheduling algorithm for reducing energy consumption by minimizing bit switching is quite challenging, because bit switching may conflict with execution time. An instruction order that minimizes bit switching but increases execution time may result in an overall increase in CPU energy, because the execution time has a higher impact on CPU energy than bit switching. In conclusion, our experimental results show that although performance is a primary factor that affects energy, it is not the only factor; switching energy is another significant factor.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129196163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Task-Level Monitoring Framework for Multi-Processor Platforms 面向多处理器平台的任务级监控框架
Philipp Ittershagen, Kim Grüttner, W. Nebel
In this paper, a monitoring framework for observing properties of tasks running on a multi-processor platform is proposed. We describe the implementation of the framework on a TLM-based virtual platform containing an ARM Cortex A9 multi-core instruction-set simulator and shared memory modules. An application model consisting of periodic tasks and communication channels is used to demonstrate the applicability of the monitoring framework. Based on the application model, we describe a method for deriving a monitor implementation at design time that is able to check the execution order and the platform mapping during run-time. The model is implemented on top of a POSIX-compatible real-time operating system and the monitor is instantiated as a TLM component in the virtual platform. The monitor implementation is then able to check the execution order and the platform mapping of the application against the specification at run-time. Finally, we discuss the monitoring capability and its contribution to a safety concept for fail-safe systems.
本文提出了一种用于观察多处理器平台上运行的任务属性的监控框架。我们描述了该框架在一个基于tlm的虚拟平台上的实现,该平台包含一个ARM Cortex A9多核指令集模拟器和共享内存模块。使用由周期性任务和通信通道组成的应用程序模型来演示监视框架的适用性。基于应用程序模型,我们描述了一种在设计时派生监视器实现的方法,该方法能够在运行时检查执行顺序和平台映射。该模型是在兼容posix的实时操作系统之上实现的,监控器在虚拟平台中作为TLM组件实例化。然后,监视器实现能够在运行时根据规范检查应用程序的执行顺序和平台映射。最后,我们讨论了监测能力及其对故障安全系统安全概念的贡献。
{"title":"A Task-Level Monitoring Framework for Multi-Processor Platforms","authors":"Philipp Ittershagen, Kim Grüttner, W. Nebel","doi":"10.1145/2906363.2906373","DOIUrl":"https://doi.org/10.1145/2906363.2906373","url":null,"abstract":"In this paper, a monitoring framework for observing properties of tasks running on a multi-processor platform is proposed. We describe the implementation of the framework on a TLM-based virtual platform containing an ARM Cortex A9 multi-core instruction-set simulator and shared memory modules. An application model consisting of periodic tasks and communication channels is used to demonstrate the applicability of the monitoring framework. Based on the application model, we describe a method for deriving a monitor implementation at design time that is able to check the execution order and the platform mapping during run-time. The model is implemented on top of a POSIX-compatible real-time operating system and the monitor is instantiated as a TLM component in the virtual platform. The monitor implementation is then able to check the execution order and the platform mapping of the application against the specification at run-time. Finally, we discuss the monitoring capability and its contribution to a safety concept for fail-safe systems.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130608523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Machine Learning Approach to Generate Pareto Front for List-scheduling Algorithms 列表调度算法Pareto Front生成的机器学习方法
Pham Nam Khanh, Akash Kumar, Khin Mi Mi Aung
List Scheduling is one of the most widely used techniques for scheduling due to its simplicity and efficiency. In traditional list-based schedulers, a cost/priority function is used to compute the priority of tasks/jobs and put them in an ordered list. The cost function has been becoming more and more complex to cover increasing number of constraints in the system design. However, most of the existing list-based schedulers implement a static priority function that usually provides only one schedule for each task graph input. Therefore, they may not be able to satisfy the desire of system designers, who want to examine the trade-off between a number of design requirements (performance, power, energy, reliability ...). To address this problem, we propose a framework to utilize the Genetic Algorithm (GA) for exploring the design space and obtaining Pareto-optimal design points. Furthermore, multiple regression techniques are used to build predictive models for the Pareto fronts to limit the execution time of GA. The models are built using training task graph datasets and applied on incoming task graphs. The Pareto fronts for incoming task graphs are produced in time 2 orders of magnitude faster than the traditional GA, with only 4% degradation in the quality.
列表调度由于其简单和高效,是使用最广泛的调度技术之一。在传统的基于列表的调度器中,成本/优先级函数用于计算任务/作业的优先级,并将它们放入有序列表中。成本函数已经变得越来越复杂,以涵盖系统设计中越来越多的约束。然而,大多数现有的基于列表的调度器实现静态优先级函数,通常只为每个任务图输入提供一个调度。因此,它们可能无法满足系统设计者的愿望,因为他们想要检查许多设计需求(性能、功率、能源、可靠性……)之间的权衡。为了解决这个问题,我们提出了一个利用遗传算法(GA)来探索设计空间并获得帕累托最优设计点的框架。此外,利用多元回归技术建立了Pareto前沿的预测模型,以限制遗传算法的执行时间。该模型使用训练任务图数据集构建,并应用于传入任务图。输入任务图的Pareto front的生成时间比传统遗传算法快2个数量级,质量仅下降4%。
{"title":"Machine Learning Approach to Generate Pareto Front for List-scheduling Algorithms","authors":"Pham Nam Khanh, Akash Kumar, Khin Mi Mi Aung","doi":"10.1145/2906363.2906380","DOIUrl":"https://doi.org/10.1145/2906363.2906380","url":null,"abstract":"List Scheduling is one of the most widely used techniques for scheduling due to its simplicity and efficiency. In traditional list-based schedulers, a cost/priority function is used to compute the priority of tasks/jobs and put them in an ordered list. The cost function has been becoming more and more complex to cover increasing number of constraints in the system design. However, most of the existing list-based schedulers implement a static priority function that usually provides only one schedule for each task graph input. Therefore, they may not be able to satisfy the desire of system designers, who want to examine the trade-off between a number of design requirements (performance, power, energy, reliability ...). To address this problem, we propose a framework to utilize the Genetic Algorithm (GA) for exploring the design space and obtaining Pareto-optimal design points. Furthermore, multiple regression techniques are used to build predictive models for the Pareto fronts to limit the execution time of GA. The models are built using training task graph datasets and applied on incoming task graphs. The Pareto fronts for incoming task graphs are produced in time 2 orders of magnitude faster than the traditional GA, with only 4% degradation in the quality.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134494809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms 向量化同步数据流图映射到CPU-GPU平台的设计框架
Shuoxin Lin, Yanzhou Liu, W. Plishker, S. Bhattacharyya
Heterogeneous computing platforms with multicore central processing units (CPUs) and graphics processing units (GPUs) are of increasing interest to designers of embedded signal processing systems since they offer the potential for significant performance boost while maintaining the flexibility of software-based design flows. Developing optimized implementations for CPU-GPU platforms is challenging due to complex, inter-related design issues, including task scheduling, interprocessor communication, memory management, and modeling and exploitation of different forms of parallelism. In this paper, we present an automated, dataflow based, design framework called DIF-GPU for application mapping and software synthesis on heterogeneous CPU-GPU platforms. DIF-GPU is based on novel extensions to the dataflow interchange format (DIF) package, which is a software environment for developing and experimenting with dataflow-based design methods and synthesis techniques for embedded signal processing systems. DIF-GPU exploits multiple forms of parallelism by deeply incorporating efficient vectorization and scheduling techniques for synchronous dataflow specifications, and incorporating techniques for streamlining interprocessor communication. DIF-GPU also provides software synthesis capabilities to help accelerate the process of moving from high-level application models to optimized implementations.
具有多核中央处理单元(cpu)和图形处理单元(gpu)的异构计算平台越来越引起嵌入式信号处理系统设计者的兴趣,因为它们在保持基于软件的设计流程的灵活性的同时,提供了显著提高性能的潜力。由于复杂的、相互关联的设计问题,包括任务调度、处理器间通信、内存管理以及不同形式的并行性的建模和利用,为CPU-GPU平台开发优化实现具有挑战性。在本文中,我们提出了一个自动化的,基于数据流的设计框架,称为DIF-GPU,用于异构CPU-GPU平台上的应用映射和软件合成。DIF- gpu是基于对数据流交换格式(DIF)包的新颖扩展,它是一个用于开发和试验基于数据流的嵌入式信号处理系统的设计方法和合成技术的软件环境。DIF-GPU通过深度整合同步数据流规范的高效向量化和调度技术,以及整合简化处理器间通信的技术,利用了多种形式的并行性。DIF-GPU还提供软件合成功能,帮助加速从高级应用程序模型到优化实现的过程。
{"title":"A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms","authors":"Shuoxin Lin, Yanzhou Liu, W. Plishker, S. Bhattacharyya","doi":"10.1145/2906363.2906374","DOIUrl":"https://doi.org/10.1145/2906363.2906374","url":null,"abstract":"Heterogeneous computing platforms with multicore central processing units (CPUs) and graphics processing units (GPUs) are of increasing interest to designers of embedded signal processing systems since they offer the potential for significant performance boost while maintaining the flexibility of software-based design flows. Developing optimized implementations for CPU-GPU platforms is challenging due to complex, inter-related design issues, including task scheduling, interprocessor communication, memory management, and modeling and exploitation of different forms of parallelism. In this paper, we present an automated, dataflow based, design framework called DIF-GPU for application mapping and software synthesis on heterogeneous CPU-GPU platforms. DIF-GPU is based on novel extensions to the dataflow interchange format (DIF) package, which is a software environment for developing and experimenting with dataflow-based design methods and synthesis techniques for embedded signal processing systems. DIF-GPU exploits multiple forms of parallelism by deeply incorporating efficient vectorization and scheduling techniques for synchronous dataflow specifications, and incorporating techniques for streamlining interprocessor communication. DIF-GPU also provides software synthesis capabilities to help accelerate the process of moving from high-level application models to optimized implementations.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114885798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Exploring Single Source Shortest Path Parallelization on Shared Memory Accelerators 在共享内存加速器上探索单源最短路径并行化
D. Palossi, A. Marongiu
Single Source Shortest Path (SSSP) algorithms are widely used in embedded systems for several applications. The emerging trend towards the adoption of heterogeneous designs in embedded devices, where low-power parallel accelerators are coupled to the main processor, opens new opportunities to deliver superior performance/watt, but calls for efficient parallel SSSP implementation. In this work we provide a detailed exploration of the Δ-stepping algorithm performance on a representative heterogeneous embedded system, TI Keystone II, considering the impact of several parallelization parameters (threading, load balancing, synchronization).
单源最短路径(SSSP)算法在嵌入式系统中有着广泛的应用。嵌入式设备中采用异构设计的新趋势,即低功耗并行加速器与主处理器耦合,为提供卓越的性能/瓦特提供了新的机会,但需要高效的并行SSSP实现。在这项工作中,我们详细探讨了Δ-stepping算法在具有代表性的异构嵌入式系统TI Keystone II上的性能,并考虑了几个并行化参数(线程、负载平衡、同步)的影响。
{"title":"Exploring Single Source Shortest Path Parallelization on Shared Memory Accelerators","authors":"D. Palossi, A. Marongiu","doi":"10.1145/2906363.2915925","DOIUrl":"https://doi.org/10.1145/2906363.2915925","url":null,"abstract":"Single Source Shortest Path (SSSP) algorithms are widely used in embedded systems for several applications. The emerging trend towards the adoption of heterogeneous designs in embedded devices, where low-power parallel accelerators are coupled to the main processor, opens new opportunities to deliver superior performance/watt, but calls for efficient parallel SSSP implementation. In this work we provide a detailed exploration of the Δ-stepping algorithm performance on a representative heterogeneous embedded system, TI Keystone II, considering the impact of several parallelization parameters (threading, load balancing, synchronization).","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"559 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117053525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
In-Place Update in a Dataflow Synchronous Language: A Retiming-Enabled Language Experiment 一种数据流同步语言中的就地更新:一种支持定时的语言实验
Ulysse Beaugnon, Albert Cohen, Marc Pouzet
Dataflow synchronous languages such as Lustre have a purely functional semantics. This incurs a high overhead when dealing with arrays, as they have to be copied at each update. We propose to tackle this problem at the source, by constraining programs so that every functional array definition can be optimized into an in-place update. Our solution handles aliasing between function arguments. It also allows more programs with in-place updates to be accepted thanks to a new retiming framework, effectively rescheduling computations across time steps. Our proposed language and compilation method enforces zero-copy purely functional arrays while preserving expressiveness and programmer control through explicit copies.
数据流同步语言(如Lustre)具有纯函数语义。这在处理数组时会产生很高的开销,因为每次更新都必须复制数组。我们建议从源头解决这个问题,通过约束程序,使每个函数数组定义都可以优化为就地更新。我们的解决方案处理函数参数之间的混叠。由于新的重新计时框架,它还允许接受更多具有就地更新的程序,有效地重新安排跨时间步长的计算。我们提出的语言和编译方法强制执行零复制纯函数数组,同时通过显式复制保留表达性和程序员控制。
{"title":"In-Place Update in a Dataflow Synchronous Language: A Retiming-Enabled Language Experiment","authors":"Ulysse Beaugnon, Albert Cohen, Marc Pouzet","doi":"10.1145/2906363.2906379","DOIUrl":"https://doi.org/10.1145/2906363.2906379","url":null,"abstract":"Dataflow synchronous languages such as Lustre have a purely functional semantics. This incurs a high overhead when dealing with arrays, as they have to be copied at each update. We propose to tackle this problem at the source, by constraining programs so that every functional array definition can be optimized into an in-place update. Our solution handles aliasing between function arguments. It also allows more programs with in-place updates to be accepted thanks to a new retiming framework, effectively rescheduling computations across time steps. Our proposed language and compilation method enforces zero-copy purely functional arrays while preserving expressiveness and programmer control through explicit copies.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127055406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Extensible Platform Description Language Supporting Retargetable Toolchains and Adaptive Execution 一种支持可重目标工具链和自适应执行的可扩展平台描述语言
C. Kessler, Lu Li, A. Atalar, A. Dobre
XPDL is a modular, extensible platform description language for heterogeneous multicore systems and clusters. XPDL models provide metadata about hardware and installed system software that are relevant for adaptive static and dynamic optimizations of application programs and system settings for improved performance and energy efficiency. XPDL is based on XML and uses hyperlinks and inheritance to create modular, distributed libraries of platform models. We also provide a retargetable toolchain that browses and processes XPDL models and generates driver code for microbenchmarking to bootstrap empirical performance and energy models at deployment time. A C++ API enables convenient introspection of platform models, even at run-time, which allows for adaptive dynamic program optimizations such as tuned selection of implementation variants.
XPDL是一种模块化的、可扩展的平台描述语言,用于异构多核系统和集群。XPDL模型提供有关硬件和已安装系统软件的元数据,这些元数据与应用程序和系统设置的自适应静态和动态优化相关,以提高性能和能源效率。XPDL基于XML,并使用超链接和继承来创建平台模型的模块化分布式库。我们还提供了一个可重新定位的工具链,用于浏览和处理XPDL模型,并为微基准测试生成驱动程序代码,以便在部署时引导经验性能和能量模型。c++ API支持平台模型的方便自省,甚至在运行时也是如此,这允许自适应动态程序优化,例如调整实现变体的选择。
{"title":"An Extensible Platform Description Language Supporting Retargetable Toolchains and Adaptive Execution","authors":"C. Kessler, Lu Li, A. Atalar, A. Dobre","doi":"10.1145/2906363.2906366","DOIUrl":"https://doi.org/10.1145/2906363.2906366","url":null,"abstract":"XPDL is a modular, extensible platform description language for heterogeneous multicore systems and clusters. XPDL models provide metadata about hardware and installed system software that are relevant for adaptive static and dynamic optimizations of application programs and system settings for improved performance and energy efficiency. XPDL is based on XML and uses hyperlinks and inheritance to create modular, distributed libraries of platform models. We also provide a retargetable toolchain that browses and processes XPDL models and generates driver code for microbenchmarking to bootstrap empirical performance and energy models at deployment time. A C++ API enables convenient introspection of platform models, even at run-time, which allows for adaptive dynamic program optimizations such as tuned selection of implementation variants.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125486441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Introducing MoC Drivers for the Integration of Sensor-Actuator Behaviors in Model-Based Design Flows of Embedded Systems 介绍了基于模型的嵌入式系统设计流程中传感器-执行器行为集成的MoC驱动
Omair Rafique, K. Schneider
Model-based design flows for embedded systems have been introduced to allow late design changes while still keeping tight time-to-market deadlines. In general, these design flows start with abstract models and refine these to a final implementation maintaining already implemented properties. However, essentially all of these design flows suffer from a deployment gap in the sense that the finally generated files are general program files which assume a particular model of computation (MoC) that may not be provided by the chosen target architecture. For this reason, the final deployment is usually a non-trivial manual design step that can break all correctness-by-construction guarantees of the previous model-based design. In this paper, we therefore introduce the idea of MoC drivers which wraps the real sensor and actuator interaction in a shell that provides the MoC of the generated software. As a particular example, we discuss in this paper how MoC drivers bridge the deployment gap between automatically generated dataflow programs and event-driven behaviors of the target architecture. The approach is illustrated with a Speedometer application on a distributed automotive embedded platform.
嵌入式系统已经引入了基于模型的设计流,以允许后期的设计更改,同时仍然保持严格的上市时间期限。通常,这些设计流从抽象模型开始,并将其细化为维护已实现属性的最终实现。然而,从本质上讲,所有这些设计流都存在部署缺口,因为最终生成的文件是假定特定的计算模型(MoC)的通用程序文件,而所选的目标体系结构可能不提供这种计算模型。由于这个原因,最后的部署通常是一个重要的手工设计步骤,它可以打破先前基于模型的设计的所有构造正确性保证。因此,在本文中,我们引入了MoC驱动器的思想,它将真实的传感器和执行器交互封装在一个外壳中,提供生成软件的MoC。作为一个特殊的例子,我们在本文中讨论了MoC驱动程序如何弥合自动生成的数据流程序和目标体系结构的事件驱动行为之间的部署差距。最后以分布式汽车嵌入式平台上的速度计应用为例进行了说明。
{"title":"Introducing MoC Drivers for the Integration of Sensor-Actuator Behaviors in Model-Based Design Flows of Embedded Systems","authors":"Omair Rafique, K. Schneider","doi":"10.1145/2906363.2906368","DOIUrl":"https://doi.org/10.1145/2906363.2906368","url":null,"abstract":"Model-based design flows for embedded systems have been introduced to allow late design changes while still keeping tight time-to-market deadlines. In general, these design flows start with abstract models and refine these to a final implementation maintaining already implemented properties. However, essentially all of these design flows suffer from a deployment gap in the sense that the finally generated files are general program files which assume a particular model of computation (MoC) that may not be provided by the chosen target architecture. For this reason, the final deployment is usually a non-trivial manual design step that can break all correctness-by-construction guarantees of the previous model-based design. In this paper, we therefore introduce the idea of MoC drivers which wraps the real sensor and actuator interaction in a shell that provides the MoC of the generated software. As a particular example, we discuss in this paper how MoC drivers bridge the deployment gap between automatically generated dataflow programs and event-driven behaviors of the target architecture. The approach is illustrated with a Speedometer application on a distributed automotive embedded platform.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122800417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Practical Challenges of ILP-based SPM Allocation Optimizations 基于ilp的SPM分配优化的实际挑战
Dominic Oehlert, Arno Luppold, H. Falk
Scratchpad Memory (SPM) allocation is a well-known technique for compiler-based code optimizations. Integer-Linear Programming has been proven to be a powerful technique to determine which parts of a program should be moved to the SPM. Although the idea is quite straight-forward in theory, the technique features several challenges when being applied to modern embedded systems. In this paper, we aim to bring out the main issues and possible solutions which arise when trying to apply those optimizations to existing hardware platforms.
Scratchpad Memory (SPM)分配是一种众所周知的基于编译器的代码优化技术。整数-线性规划已被证明是一种强大的技术,可以确定程序的哪些部分应该移动到SPM。虽然这个想法在理论上非常简单,但在应用于现代嵌入式系统时,该技术具有几个挑战。在本文中,我们的目标是提出在尝试将这些优化应用于现有硬件平台时出现的主要问题和可能的解决方案。
{"title":"Practical Challenges of ILP-based SPM Allocation Optimizations","authors":"Dominic Oehlert, Arno Luppold, H. Falk","doi":"10.1145/2906363.2906371","DOIUrl":"https://doi.org/10.1145/2906363.2906371","url":null,"abstract":"Scratchpad Memory (SPM) allocation is a well-known technique for compiler-based code optimizations. Integer-Linear Programming has been proven to be a powerful technique to determine which parts of a program should be moved to the SPM. Although the idea is quite straight-forward in theory, the technique features several challenges when being applied to modern embedded systems. In this paper, we aim to bring out the main issues and possible solutions which arise when trying to apply those optimizations to existing hardware platforms.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127011924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1