首页 > 最新文献

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems最新文献

英文 中文
Computation in Memory for Data-Intensive Applications: Beyond CMOS and beyond Von- Neumann 数据密集型应用的内存计算:超越CMOS和超越冯-诺伊曼
S. Hamdioui
One of the most critical challenges for today's and future data-intensive and big-data problems (ranging from economics and business activities to public administration, from national security to many scientific research areas) is data storage and analysis. The primary goal is to increase the understanding of processes by extracting highly useful values hidden in the huge volumes of data. The increase of the data size has already surpassed the capabilities of today's computation architectures which suffer from the limited bandwidth (due to communication and memory-access bottlenecks), energy inefficiency and limited scalability (due to CMOS technology). This talk will first address the CMOS scaling and its impact on different aspects of IC and electronics; the major limitations the scaling is facing (such as leakage, yield, reliability, etc) will be shown and the need of a new technology will be motivated. Thereafter, an overview of computing systems, developed since the introduction of Stored program computers by John von Neumann in the forties, will be given. Shortcomings of today's architectures to deal with data-intensive applications will be discussed. It will be shown that the speed at which data is growing has already surpassed the capabilities of today's computation architectures suffering from communication bottleneck and energy inefficiency; hence the need for a new architecture. Finally, the talk will introduce a new architecture paradigm for big data problems; it is based on the integration of the storage and computation in the same physical location (using a cross-bar topology) and the use of non-volatile resistive-switching technology, based on memristors, instead of CMOS technology. The huge potential of such architecture in realizing order of magnitude improvement will be illustrated by comparing it with the state-of-the art architectures (multi-core, GPUs, FPGAs) for different data-intensive applications.
当今和未来的数据密集型和大数据问题(从经济和商业活动到公共管理,从国家安全到许多科学研究领域)最关键的挑战之一是数据存储和分析。主要目标是通过提取隐藏在大量数据中的非常有用的值来增加对过程的理解。数据大小的增长已经超过了今天的计算架构的能力,这些架构受到有限的带宽(由于通信和内存访问瓶颈),能源效率低下和有限的可扩展性(由于CMOS技术)的影响。本次演讲将首先讨论CMOS缩放及其对集成电路和电子产品不同方面的影响;将会显示出扩展所面临的主要限制(如泄漏、产量、可靠性等),并激发对新技术的需求。此后,将概述自约翰·冯·诺依曼在四十年代引入存储程序计算机以来发展起来的计算系统。本文将讨论当今处理数据密集型应用程序的体系结构的缺点。它将显示,数据增长的速度已经超过了今天遭受通信瓶颈和能源效率低下的计算架构的能力;因此需要一种新的体系结构。最后,演讲将介绍一种新的大数据问题架构范式;它基于存储和计算在同一物理位置的集成(使用交叉棒拓扑)和使用非易失性电阻开关技术,基于忆阻器,而不是CMOS技术。这种架构在实现数量级改进方面的巨大潜力将通过将其与针对不同数据密集型应用的最先进架构(多核、gpu、fpga)进行比较来说明。
{"title":"Computation in Memory for Data-Intensive Applications: Beyond CMOS and beyond Von- Neumann","authors":"S. Hamdioui","doi":"10.1145/2764967.2771820","DOIUrl":"https://doi.org/10.1145/2764967.2771820","url":null,"abstract":"One of the most critical challenges for today's and future data-intensive and big-data problems (ranging from economics and business activities to public administration, from national security to many scientific research areas) is data storage and analysis. The primary goal is to increase the understanding of processes by extracting highly useful values hidden in the huge volumes of data. The increase of the data size has already surpassed the capabilities of today's computation architectures which suffer from the limited bandwidth (due to communication and memory-access bottlenecks), energy inefficiency and limited scalability (due to CMOS technology). This talk will first address the CMOS scaling and its impact on different aspects of IC and electronics; the major limitations the scaling is facing (such as leakage, yield, reliability, etc) will be shown and the need of a new technology will be motivated. Thereafter, an overview of computing systems, developed since the introduction of Stored program computers by John von Neumann in the forties, will be given. Shortcomings of today's architectures to deal with data-intensive applications will be discussed. It will be shown that the speed at which data is growing has already surpassed the capabilities of today's computation architectures suffering from communication bottleneck and energy inefficiency; hence the need for a new architecture. Finally, the talk will introduce a new architecture paradigm for big data problems; it is based on the integration of the storage and computation in the same physical location (using a cross-bar topology) and the use of non-volatile resistive-switching technology, based on memristors, instead of CMOS technology. The huge potential of such architecture in realizing order of magnitude improvement will be illustrated by comparing it with the state-of-the art architectures (multi-core, GPUs, FPGAs) for different data-intensive applications.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122421996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
High-level software-pipelining in LLVM LLVM中的高级软件流水线
Roel Jordans, H. Corporaal
Software-pipelining is an important technique for increasing the instruction level parallelism of loops during compilation. Currently, the LLVM compiler infrastructure does not offer this optimization although some target specific implementations do exist. We have implemented a high-level method for software-pipelining within the LLVM framework. By implementing this within LLVM's optimization layer we have taken the first steps towards a target independent software-pipelining method.
软件流水线是在编译过程中提高循环的指令级并行性的一项重要技术。目前,LLVM编译器基础结构不提供这种优化,尽管存在一些特定于目标的实现。我们在LLVM框架内实现了一个高级的软件流水线方法。通过在LLVM的优化层中实现这一点,我们已经朝着目标独立的软件管道方法迈出了第一步。
{"title":"High-level software-pipelining in LLVM","authors":"Roel Jordans, H. Corporaal","doi":"10.1145/2764967.2771935","DOIUrl":"https://doi.org/10.1145/2764967.2771935","url":null,"abstract":"Software-pipelining is an important technique for increasing the instruction level parallelism of loops during compilation. Currently, the LLVM compiler infrastructure does not offer this optimization although some target specific implementations do exist. We have implemented a high-level method for software-pipelining within the LLVM framework. By implementing this within LLVM's optimization layer we have taken the first steps towards a target independent software-pipelining method.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123491769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Use of Previously Acquired Positioning of Optimizations for Phase Ordering Exploration 利用先前获得的优化定位进行相位排序探索
Ricardo Nobre, L. G. A. Martins, João MP Cardoso
This paper presents a new approach to efficiently search for suitable compiler pass sequences, a challenge known as phase ordering. Our approach relies on information about the relative positions of compiler passes in compiler pass sequences previously generated for a set of functions when compiling for a specific processor. We enhanced two iterative compiler pass exploration schemes, one relying on simple sequential compiler pass insertion and other implementing an auto-tuned simulated annealing process, with a data structure that holds information about the relative positions of compiler sequences; in order to reduce the set of compiler passes considered for insertion in a given position of a given candidate compiler pass sequence to include only the passes that have a higher probability of performing well on that relative position in the compiler sequence, speeding up the exploration time as a result. We tested our approach with two different compilers and two different targets; the ReflectC and the LLVM compilers, targeting a MicroBlaze processor and a LEON3 processor, respectively. The experimental results show that we can considerably reduce the number of algorithm iterations by a factor of up to more than an order of magnitude when targeting the MicroBlaze or the LEON3, while finding compiler sequences that result in binaries that when executed on the target processor/simulator are able to outperform (i.e. use less CPU cycles) all the standard optimization levels (i.e., we compare against the most performing optimization level flag on each kernel, e.g. -O1, -O2 or -O3 in the case of LLVM) by a geometric mean performance improvement of 1.23x and 1.20x when targeting the MicroBlaze processor, and 1.94x and 2.65x when targetting the LEON3 processor; for each of the two exploration algorithms and two kernel sets considered.
本文提出了一种新的方法来有效地搜索合适的编译器传递序列,这是一个被称为相位排序的挑战。我们的方法依赖于编译特定处理器时,编译器传递序列中编译器传递序列的相对位置信息。我们增强了两种迭代编译器通道探索方案,一种依赖于简单的顺序编译器通道插入,另一种实现了自动调整的模拟退火过程,其数据结构包含有关编译器序列相对位置的信息;为了减少考虑在给定候选编译器传递序列的给定位置插入的编译器传递集,只包括在编译器序列中相对位置上执行良好的概率更高的传递,从而加快了探索时间。我们用两个不同的编译器和两个不同的目标测试了我们的方法;ReflectC和LLVM编译器,分别针对MicroBlaze处理器和LEON3处理器。实验结果表明,当以MicroBlaze或LEON3为目标时,我们可以大大减少算法迭代的次数,最多可减少一个数量级,同时找到编译器序列,这些编译器序列在目标处理器/模拟器上执行时能够优于(即使用更少的CPU周期)所有标准优化级别(即,我们比较每个内核上性能最好的优化级别标志,例如- 01)。-O2或-O3在LLVM的情况下),当针对MicroBlaze处理器时,几何平均性能提高1.23倍和1.20倍,当针对LEON3处理器时,几何平均性能提高1.94倍和2.65倍;对于每两个探索算法和两个核集考虑。
{"title":"Use of Previously Acquired Positioning of Optimizations for Phase Ordering Exploration","authors":"Ricardo Nobre, L. G. A. Martins, João MP Cardoso","doi":"10.1145/2764967.2764978","DOIUrl":"https://doi.org/10.1145/2764967.2764978","url":null,"abstract":"This paper presents a new approach to efficiently search for suitable compiler pass sequences, a challenge known as phase ordering. Our approach relies on information about the relative positions of compiler passes in compiler pass sequences previously generated for a set of functions when compiling for a specific processor. We enhanced two iterative compiler pass exploration schemes, one relying on simple sequential compiler pass insertion and other implementing an auto-tuned simulated annealing process, with a data structure that holds information about the relative positions of compiler sequences; in order to reduce the set of compiler passes considered for insertion in a given position of a given candidate compiler pass sequence to include only the passes that have a higher probability of performing well on that relative position in the compiler sequence, speeding up the exploration time as a result. We tested our approach with two different compilers and two different targets; the ReflectC and the LLVM compilers, targeting a MicroBlaze processor and a LEON3 processor, respectively. The experimental results show that we can considerably reduce the number of algorithm iterations by a factor of up to more than an order of magnitude when targeting the MicroBlaze or the LEON3, while finding compiler sequences that result in binaries that when executed on the target processor/simulator are able to outperform (i.e. use less CPU cycles) all the standard optimization levels (i.e., we compare against the most performing optimization level flag on each kernel, e.g. -O1, -O2 or -O3 in the case of LLVM) by a geometric mean performance improvement of 1.23x and 1.20x when targeting the MicroBlaze processor, and 1.94x and 2.65x when targetting the LEON3 processor; for each of the two exploration algorithms and two kernel sets considered.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"37 19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125702988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A framework for optimizing OpenVX applications performance on embedded manycore accelerators 在嵌入式多核加速器上优化OpenVX应用程序性能的框架
Giuseppe Tagliavini, Germain Haugou, A. Marongiu, L. Benini
Nowadays Computer Vision application are ubiquitous, and their presence on embedded devices is more and more widespread. Heterogeneous embedded systems featuring a clustered manycore accelerator are a very promising target to execute embedded vision algorithms, but the code optimization for these platforms is a challenging task. Moreover, designers really need support tools that are both fast and accurate. In this work we introduce ADRENALINE, an environment for development and optimization of OpenVX applications targeting manycore accelerators. ADRENALINE consists of a custom OpenVX run-time and a virtual platform, and overall it is intended to provide support to enhance performance of embedded vision applications.
如今,计算机视觉的应用无处不在,其在嵌入式设备上的应用也越来越广泛。具有集群多核加速器的异构嵌入式系统是执行嵌入式视觉算法的一个非常有前途的目标,但这些平台的代码优化是一项具有挑战性的任务。此外,设计师确实需要既快速又准确的支持工具。在这项工作中,我们介绍了ADRENALINE,一个用于开发和优化针对多核加速器的OpenVX应用程序的环境。ADRENALINE由一个定制的OpenVX运行时和一个虚拟平台组成,总的来说,它旨在为增强嵌入式视觉应用程序的性能提供支持。
{"title":"A framework for optimizing OpenVX applications performance on embedded manycore accelerators","authors":"Giuseppe Tagliavini, Germain Haugou, A. Marongiu, L. Benini","doi":"10.1145/2764967.2776858","DOIUrl":"https://doi.org/10.1145/2764967.2776858","url":null,"abstract":"Nowadays Computer Vision application are ubiquitous, and their presence on embedded devices is more and more widespread. Heterogeneous embedded systems featuring a clustered manycore accelerator are a very promising target to execute embedded vision algorithms, but the code optimization for these platforms is a challenging task. Moreover, designers really need support tools that are both fast and accurate. In this work we introduce ADRENALINE, an environment for development and optimization of OpenVX applications targeting manycore accelerators. ADRENALINE consists of a custom OpenVX run-time and a virtual platform, and overall it is intended to provide support to enhance performance of embedded vision applications.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130841950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Utilization Improvement by Enforcing Mutual Exclusive Task Execution in Modal Stream Processing Applications 通过在模态流处理应用程序中强制互斥任务执行来提高利用率
G. Kuiper, Stefan J. Geuns, M. Bekooij
Real-time dataflow analysis techniques for multiprocessor systems ignore that the execution of tasks belonging to different operation modes are mutually exclusive. This results in more resources being reserved than strictly needed and a low resource utilization. In this paper we present a dataflow analysis approach which takes into account that tasks belonging to different modes often execute mutually exclusive. Therefore less resources need to be reserved to satisfy a throughput constraint and a higher processor utilization can be obtained. Furthermore, we introduce a lock which is used to enforce mutual exclusive execution of tasks during a mode transition when beneficial. The effects of mutual exclusive execution are included in a Structured Variable-Rate Phased Dataflow (SVPDF) temporal analysis model which is used to determine whether adding a lock results in satisfaction of the throughput constraint. This model is generated from a sequential input specification of the application such that deadlock-free execution, even after the addition of locks, is guaranteed. The applicability and benefits of the approach are demonstrated using a WLAN 802.11g application which switches between a detection and a decoding mode. It is shown that the use of two locks improves the worst-case response times of 3 tasks such that they can share the same processor, which improves the utilization of this processor and frees 2 other processors.
多处理器系统的实时数据流分析技术忽略了属于不同操作模式的任务的执行是相互排斥的。这导致保留的资源多于严格需要的资源,并且资源利用率较低。在本文中,我们提出了一种数据流分析方法,该方法考虑了属于不同模式的任务通常执行互斥。因此,需要保留较少的资源来满足吞吐量约束,并且可以获得更高的处理器利用率。此外,我们还引入了一个锁,用于在模式转换期间强制任务的互斥执行。互斥执行的影响包含在结构化可变速率阶段数据流(SVPDF)时间分析模型中,该模型用于确定添加锁是否会满足吞吐量约束。该模型是从应用程序的顺序输入规范生成的,这样即使在添加了锁之后,也可以保证无死锁的执行。通过在检测模式和解码模式之间切换的WLAN 802.11g应用程序演示了该方法的适用性和优点。结果表明,使用两个锁可以提高3个任务的最坏情况响应时间,这样它们可以共享同一个处理器,从而提高该处理器的利用率并释放另外2个处理器。
{"title":"Utilization Improvement by Enforcing Mutual Exclusive Task Execution in Modal Stream Processing Applications","authors":"G. Kuiper, Stefan J. Geuns, M. Bekooij","doi":"10.1145/2764967.2764970","DOIUrl":"https://doi.org/10.1145/2764967.2764970","url":null,"abstract":"Real-time dataflow analysis techniques for multiprocessor systems ignore that the execution of tasks belonging to different operation modes are mutually exclusive. This results in more resources being reserved than strictly needed and a low resource utilization. In this paper we present a dataflow analysis approach which takes into account that tasks belonging to different modes often execute mutually exclusive. Therefore less resources need to be reserved to satisfy a throughput constraint and a higher processor utilization can be obtained. Furthermore, we introduce a lock which is used to enforce mutual exclusive execution of tasks during a mode transition when beneficial. The effects of mutual exclusive execution are included in a Structured Variable-Rate Phased Dataflow (SVPDF) temporal analysis model which is used to determine whether adding a lock results in satisfaction of the throughput constraint. This model is generated from a sequential input specification of the application such that deadlock-free execution, even after the addition of locks, is guaranteed. The applicability and benefits of the approach are demonstrated using a WLAN 802.11g application which switches between a detection and a decoding mode. It is shown that the use of two locks improves the worst-case response times of 3 tasks such that they can share the same processor, which improves the utilization of this processor and frees 2 other processors.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122443978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Schedulability Aware WCET-Optimization of Periodic Preemptive Hard Real-Time Multitasking Systems 周期性抢占式硬实时多任务系统的可调度性wcet优化
Arno Luppold, H. Falk
In hard real-time multitasking systems, applying WCET-oriented code optimizations to individual tasks may not lead to optimal results with regard to the system's schedulability. We propose an approach based on Integer-Linear Programming which is able to perform schedulability aware code optimizations for periodic task sets with fixed priorities. We evaluate our approach by using a static instruction SPM optimization for the Infineon TriCore microcontroller.
在硬实时多任务系统中,将面向wcet的代码优化应用于单个任务可能不会导致系统可调度性方面的最佳结果。我们提出了一种基于整数线性规划的方法,该方法能够对具有固定优先级的周期性任务集执行可调度性感知的代码优化。我们通过使用英飞凌TriCore微控制器的静态指令SPM优化来评估我们的方法。
{"title":"Schedulability Aware WCET-Optimization of Periodic Preemptive Hard Real-Time Multitasking Systems","authors":"Arno Luppold, H. Falk","doi":"10.1145/2764967.2771930","DOIUrl":"https://doi.org/10.1145/2764967.2771930","url":null,"abstract":"In hard real-time multitasking systems, applying WCET-oriented code optimizations to individual tasks may not lead to optimal results with regard to the system's schedulability. We propose an approach based on Integer-Linear Programming which is able to perform schedulability aware code optimizations for periodic task sets with fixed priorities. We evaluate our approach by using a static instruction SPM optimization for the Infineon TriCore microcontroller.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124687659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bytewise Register Allocation 按字节分配寄存器
P. K. Krause
Traditionally, variables have been considered as atoms by register allocation: Each variable was to be placed in one register, or spilt (placed in main memory) or rematerialized (recalculated as needed). Some flexibility arose from what would be considered a register: Register aliasing allowed to treat a register meant to hold a 16-bit variable as two registers that could hold an 8-bit variable each. We allow for far more lexibility in register allocation: We decide on the storage of variables bytewise, i. e. we decide for each individual byte in a variable whether to store it in memory or a register, and consider any byte of any register as a possible storage location. We implemented a backend for the STM8 architecture (STMicroelectronics' current 8-bit architecture) in the C-compiler sdcc, and experimentally evaluate the beneits of bytewise register allocation. The results show that bytewise register allocation can result in substantial improvements in the generated code. Optimizing for code size we obtained 27.2%, 13.2% and 9.2% reductions in code size in the Whetstone, Dhrystone and Coremark benchmarks, respectively, when using bytewise allocation and spilling compared to conventional allocation.
传统上,通过寄存器分配将变量视为原子:每个变量被放置在一个寄存器中,或者被拆分(放在主存中),或者被重新物化(根据需要重新计算)。一些灵活性来自于被认为是寄存器的东西:寄存器混叠允许将一个存储16位变量的寄存器视为两个可以分别存储8位变量的寄存器。我们允许在寄存器分配方面有更大的灵活性:我们按字节决定变量的存储,也就是说,我们决定变量中的每个字节是存储在内存中还是存储在寄存器中,并且考虑任何寄存器的任何字节作为可能的存储位置。我们在c编译器sdcc中实现了STM8架构(意法半导体目前的8位架构)的后端,并实验评估了按字节分配寄存器的好处。结果表明,按字节分配寄存器可以大大改进生成的代码。在优化代码大小时,我们在wheetstone, Dhrystone和Coremark基准测试中分别获得了27.2%,13.2%和9.2%的代码大小减少,当使用字节分配和溢出时,与传统分配相比。
{"title":"Bytewise Register Allocation","authors":"P. K. Krause","doi":"10.1145/2764967.2764971","DOIUrl":"https://doi.org/10.1145/2764967.2764971","url":null,"abstract":"Traditionally, variables have been considered as atoms by register allocation: Each variable was to be placed in one register, or spilt (placed in main memory) or rematerialized (recalculated as needed). Some flexibility arose from what would be considered a register: Register aliasing allowed to treat a register meant to hold a 16-bit variable as two registers that could hold an 8-bit variable each. We allow for far more lexibility in register allocation: We decide on the storage of variables bytewise, i. e. we decide for each individual byte in a variable whether to store it in memory or a register, and consider any byte of any register as a possible storage location. We implemented a backend for the STM8 architecture (STMicroelectronics' current 8-bit architecture) in the C-compiler sdcc, and experimentally evaluate the beneits of bytewise register allocation. The results show that bytewise register allocation can result in substantial improvements in the generated code. Optimizing for code size we obtained 27.2%, 13.2% and 9.2% reductions in code size in the Whetstone, Dhrystone and Coremark benchmarks, respectively, when using bytewise allocation and spilling compared to conventional allocation.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116994158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Synchronous Reactive Nano-Kernels: Exploring the Limits of Power and Energy Efficiency in Embedded Systems 同步反应纳米内核:探索嵌入式系统中功率和能源效率的极限
Bartosz Ziólek, Mariusz Ryndzionek, Z. Chamski, P. Romaniuk
MpicOS is a reactive nano-kernel designed for controlling power- and energy-bound multicore embedded systems. Contrary to the mainstream approach of providing a multithreading framework with context saving, MpicOS is articulated around the reactive trigger-response abstraction with ultra-low power waits and a minimal API based on events and continuations. This change of paradigm keeps low the cost of re-engineering existing software, yet it results in major gains in power and energy usage of the system. Additionally, the reactive approach enables the deployment of novel applications on existing hardware platforms, resulting in new market opportunities and improved user experience.
MpicOS是一种反应纳米内核,设计用于控制功率和能量受限的多核嵌入式系统。与提供具有上下文保存功能的多线程框架的主流方法相反,MpicOS是围绕响应式触发器-响应抽象的,具有超低功耗等待和基于事件和延续的最小API。这种范式的改变使重新设计现有软件的成本保持在较低水平,但它在系统的功率和能源使用方面取得了重大进展。此外,响应式方法支持在现有硬件平台上部署新的应用程序,从而产生新的市场机会和改进的用户体验。
{"title":"Synchronous Reactive Nano-Kernels: Exploring the Limits of Power and Energy Efficiency in Embedded Systems","authors":"Bartosz Ziólek, Mariusz Ryndzionek, Z. Chamski, P. Romaniuk","doi":"10.1145/2764967.2771934","DOIUrl":"https://doi.org/10.1145/2764967.2771934","url":null,"abstract":"MpicOS is a reactive nano-kernel designed for controlling power- and energy-bound multicore embedded systems. Contrary to the mainstream approach of providing a multithreading framework with context saving, MpicOS is articulated around the reactive trigger-response abstraction with ultra-low power waits and a minimal API based on events and continuations. This change of paradigm keeps low the cost of re-engineering existing software, yet it results in major gains in power and energy usage of the system. Additionally, the reactive approach enables the deployment of novel applications on existing hardware platforms, resulting in new market opportunities and improved user experience.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124591485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Programming Strategies for Contextual Runtime Specialization 上下文运行时专门化的编程策略
Tiago Carvalho, Pedro Pinto, João MP Cardoso
Runtime adaptability is expected to adjust the application and the mapping of computations according to usage contexts, operating environments, resources availability, etc. However, extending applications with adaptive features can be a complex task, especially due to the current lack of programming models and compiler support. One of the run-time adaptability possibilities is the use of specialized code according to data workloads and environments. Traditional approaches use multiple code versions generated offline and, during runtime, a strategy is responsible to select a code version. Moving code generation to runtime can achieve important improvements but may impose unacceptable overhead. This paper presents an aspect-oriented programming approach for runtime adaptability. We focus on a separation of concerns (strategies vs. application) promoted by a domain-specific language for programming runtime strategies. Our strategies allow runtime specialization based on contextual information. We use a template-based runtime code generation approach to achieve program specialization. We demonstrate our approach with examples from image processing, which depict the benefits of runtime specialization and illustrate how several factors need to be considered to efficiently adapt the application.
期望运行时适应性能够根据使用上下文、操作环境、资源可用性等调整应用程序和计算映射。然而,扩展具有自适应特性的应用程序可能是一项复杂的任务,特别是由于目前缺乏编程模型和编译器支持。运行时适应性的一种可能性是根据数据工作负载和环境使用专门的代码。传统方法使用脱机生成的多个代码版本,并且在运行期间,策略负责选择代码版本。将代码生成移到运行时可以实现重要的改进,但可能会带来不可接受的开销。本文提出了一种面向方面的编程方法来实现运行时适应性。我们将重点放在关注点的分离(策略与应用程序)上,这种分离是由特定于领域的语言促进的,用于编程运行时策略。我们的策略允许基于上下文信息的运行时专门化。我们使用基于模板的运行时代码生成方法来实现程序专门化。我们用图像处理的例子来演示我们的方法,这些例子描述了运行时专门化的好处,并说明了如何考虑几个因素才能有效地适应应用程序。
{"title":"Programming Strategies for Contextual Runtime Specialization","authors":"Tiago Carvalho, Pedro Pinto, João MP Cardoso","doi":"10.1145/2764967.2764973","DOIUrl":"https://doi.org/10.1145/2764967.2764973","url":null,"abstract":"Runtime adaptability is expected to adjust the application and the mapping of computations according to usage contexts, operating environments, resources availability, etc. However, extending applications with adaptive features can be a complex task, especially due to the current lack of programming models and compiler support. One of the run-time adaptability possibilities is the use of specialized code according to data workloads and environments. Traditional approaches use multiple code versions generated offline and, during runtime, a strategy is responsible to select a code version. Moving code generation to runtime can achieve important improvements but may impose unacceptable overhead. This paper presents an aspect-oriented programming approach for runtime adaptability. We focus on a separation of concerns (strategies vs. application) promoted by a domain-specific language for programming runtime strategies. Our strategies allow runtime specialization based on contextual information. We use a template-based runtime code generation approach to achieve program specialization. We demonstrate our approach with examples from image processing, which depict the benefits of runtime specialization and illustrate how several factors need to be considered to efficiently adapt the application.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121645643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Modular translation validation of a full-sized synchronous compiler using off-the-shelf verification tools 使用现成的验证工具对全尺寸同步编译器进行模块化的翻译验证
V. Ngo, J. Talpin, T. Gautier, L. Besnard, P. Guernic
This presentation demonstrates a scalable, modular, refinable methodology for translation validation applied to a mature (20 years old), large (500k lines of C), open source (Eclipse/Polarsys IWG project POP) code generation suite, all by using off-the-shelf, open-source, SAT/SMT verification tools (Yices), by adapting and optimizing the translation validation principle introduced by Pnueli et al. in 1998. This methodology results from the ANR project VERISYNC, in which we aimed at revisiting Pnueli's seminal work on translation validation using off-the-shelf, up-to-date, verification technology. In face of the enormous task at hand, the verification of a compiler infrastructure comprising around 500 000 lines of C code, we devised to narrow down and isolate the problem to the very data-structures manipulated by the infrastructure at the successive steps of code generation, in order to both optimize the whole verification process and make the implementation of a working prototype at all doable. Our presentation outlines the successive steps of this endeavour, from clock synthesis, static scheduling to target code production.
本演示演示了一种可扩展的、模块化的、可细化的翻译验证方法,该方法应用于成熟的(20年的)、大型的(50万行C语言)、开源的(Eclipse/Polarsys IWG项目POP)代码生成套件,所有这些都是通过使用现成的、开源的、SAT/SMT验证工具(Yices),通过调整和优化Pnueli等人在1998年引入的翻译验证原则。这种方法源于ANR项目VERISYNC,在该项目中,我们旨在重新审视Pnueli在使用现成的、最新的验证技术进行翻译验证方面的开创性工作。面对手头的巨大任务,对包含大约50万行C代码的编译器基础结构的验证,我们设计将问题缩小并隔离到代码生成的连续步骤中基础结构所操纵的数据结构,以便优化整个验证过程并使工作原型的实现完全可行。我们的介绍概述了这一努力的连续步骤,从时钟合成,静态调度到目标代码生产。
{"title":"Modular translation validation of a full-sized synchronous compiler using off-the-shelf verification tools","authors":"V. Ngo, J. Talpin, T. Gautier, L. Besnard, P. Guernic","doi":"10.1145/2764967.2775291","DOIUrl":"https://doi.org/10.1145/2764967.2775291","url":null,"abstract":"This presentation demonstrates a scalable, modular, refinable methodology for translation validation applied to a mature (20 years old), large (500k lines of C), open source (Eclipse/Polarsys IWG project POP) code generation suite, all by using off-the-shelf, open-source, SAT/SMT verification tools (Yices), by adapting and optimizing the translation validation principle introduced by Pnueli et al. in 1998. This methodology results from the ANR project VERISYNC, in which we aimed at revisiting Pnueli's seminal work on translation validation using off-the-shelf, up-to-date, verification technology. In face of the enormous task at hand, the verification of a compiler infrastructure comprising around 500 000 lines of C code, we devised to narrow down and isolate the problem to the very data-structures manipulated by the infrastructure at the successive steps of code generation, in order to both optimize the whole verification process and make the implementation of a working prototype at all doable. Our presentation outlines the successive steps of this endeavour, from clock synthesis, static scheduling to target code production.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117179705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1