首页 > 最新文献

Software and Compilers for Embedded Systems最新文献

英文 中文
The PROMPT design principles for predictable multi-core architectures 可预测的多核架构的PROMPT设计原则
Pub Date : 2009-04-23 DOI: 10.1145/1543820.1543826
R. Wilhelm
Embedded hard real-time systems need reliable guarantees for the satisfaction of their timing constraints. The precision of the results and the efficiency of timing-analysis methods are highly dependent on the predictability of the execution platform. The possibility of proving the safety of embedded systems is seriously compromised by architectural developments aiming exclusively at improving average-case performance. Proving the correctness of a modern high-performance processor is beyond the reach of verification methods. Even the chances to derive reliable and precise bounds on execution times are endangered by exactly these developments. We propose design principles for multi-core architectures to provide efficiently predictable good worst-case performance as needed for embedded control in the aeronautics and automotive industries supporting the Integrated Modular Avionics (IMA) and the Automotive Open System Architecture (AUTOSAR) development trends. This talk presents a development process oriented at achieving predictability at all levels of the architecture hierarchy.
嵌入式硬实时系统需要可靠的保证来满足其时间约束。结果的精度和时间分析方法的效率高度依赖于执行平台的可预测性。证明嵌入式系统安全性的可能性受到了架构开发的严重损害,这些架构开发的目的仅仅是提高平均情况下的性能。证明现代高性能处理器的正确性超出了验证方法的范围。甚至推导出可靠和精确的执行时间界限的机会也受到这些发展的威胁。我们提出了多核架构的设计原则,为航空和汽车行业的嵌入式控制提供有效可预测的良好最差情况性能,以支持集成模块化航空电子设备(IMA)和汽车开放系统架构(AUTOSAR)的发展趋势。这个演讲展示了一个面向在架构层次的所有层次上实现可预测性的开发过程。
{"title":"The PROMPT design principles for predictable multi-core architectures","authors":"R. Wilhelm","doi":"10.1145/1543820.1543826","DOIUrl":"https://doi.org/10.1145/1543820.1543826","url":null,"abstract":"Embedded hard real-time systems need reliable guarantees for the satisfaction of their timing constraints. The precision of the results and the efficiency of timing-analysis methods are highly dependent on the predictability of the execution platform.\u0000 The possibility of proving the safety of embedded systems is seriously compromised by architectural developments aiming exclusively at improving average-case performance. Proving the correctness of a modern high-performance processor is beyond the reach of verification methods. Even the chances to derive reliable and precise bounds on execution times are endangered by exactly these developments.\u0000 We propose design principles for multi-core architectures to provide efficiently predictable good worst-case performance as needed for embedded control in the aeronautics and automotive industries supporting the Integrated Modular Avionics (IMA) and the Automotive Open System Architecture (AUTOSAR) development trends. This talk presents a development process oriented at achieving predictability at all levels of the architecture hierarchy.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117033961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Communication between nested loop programs via circular buffers in an embedded multiprocessor system 嵌入式多处理器系统中通过循环缓冲区的嵌套循环程序之间的通信
Pub Date : 2008-03-13 DOI: 10.1145/1361096.1361104
T. Bijlsma, M. Bekooij, P. Jansen, G. Smit
Multimedia applications, executed by embedded multiprocessor systems, can in some cases be represented as task graphs, with the tasks containing nested loop programs. The nested loop programs communicate via arrays and can be executed on different processors. Typically an array can be communicated via a circular buffer with a capacity smaller than the array. For such buffers, the communicating nested loop programs have to synchronize and a sufficient buffer capacity needs to be computed. In a circular buffer we use a write and a read window to support rereading, out-of-order reading or writing, and skipping of locations. A cyclo static dataflow model is derived from the application and used to compute buffer capacities that guarantee deadlock free execution. Our case-study applies circular buffers in a Digital Audio Broadcasting channel decoder application, where the frequency deinterleaver reads according to a non-affine pseudo-random function. For this application, buffer capacities are calculated that guarantee deadlock free execution.
由嵌入式多处理器系统执行的多媒体应用程序在某些情况下可以表示为任务图,其中任务包含嵌套循环程序。嵌套的循环程序通过数组进行通信,并且可以在不同的处理器上执行。通常,阵列可以通过容量小于阵列的圆形缓冲器进行通信。对于这样的缓冲区,通信嵌套循环程序必须同步,并且需要计算足够的缓冲区容量。在循环缓冲区中,我们使用一个写和一个读窗口来支持重读、乱序读写和位置跳转。cyclo静态数据流模型来源于应用程序,用于计算保证无死锁执行的缓冲区容量。我们的案例研究在数字音频广播频道解码器应用中应用了圆形缓冲器,其中频率去交织器根据非仿射伪随机函数读取。对于此应用程序,计算缓冲区容量以保证无死锁执行。
{"title":"Communication between nested loop programs via circular buffers in an embedded multiprocessor system","authors":"T. Bijlsma, M. Bekooij, P. Jansen, G. Smit","doi":"10.1145/1361096.1361104","DOIUrl":"https://doi.org/10.1145/1361096.1361104","url":null,"abstract":"Multimedia applications, executed by embedded multiprocessor systems, can in some cases be represented as task graphs, with the tasks containing nested loop programs. The nested loop programs communicate via arrays and can be executed on different processors. Typically an array can be communicated via a circular buffer with a capacity smaller than the array. For such buffers, the communicating nested loop programs have to synchronize and a sufficient buffer capacity needs to be computed. In a circular buffer we use a write and a read window to support rereading, out-of-order reading or writing, and skipping of locations. A cyclo static dataflow model is derived from the application and used to compute buffer capacities that guarantee deadlock free execution. Our case-study applies circular buffers in a Digital Audio Broadcasting channel decoder application, where the frequency deinterleaver reads according to a non-affine pseudo-random function. For this application, buffer capacities are calculated that guarantee deadlock free execution.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134559478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Optimal vs. heuristic integrated code generation for clustered VLIW architectures 集群VLIW体系结构的最优与启发式集成代码生成
Pub Date : 2008-03-13 DOI: 10.1145/1361096.1361099
Mattias V. Eriksson, Oskar Skoog, C. Kessler
In this paper we present two algorithms for integrated code generation for clustered VLIW architectures. One algorithm is a heuristic based on genetic algorithms, the other algorithm is based on integer linear programming. The performance of the algorithms are compared on a portion of the Mediabench [10] benchmark suite. We found the results of the genetic algorithm to be within one or two clock cycles from optimal for the cases where the optimum is known. In addition the heuristic algorithm produces results in predictable time also when the optimal integer linear program fails.
本文提出了两种用于集群VLIW体系结构的集成代码生成算法。一种算法是基于遗传算法的启发式算法,另一种算法是基于整数线性规划的算法。算法的性能在mediabbench[10]基准套件的一部分上进行比较。我们发现,在已知最优的情况下,遗传算法的结果距离最优值在一到两个时钟周期内。此外,当最优整数线性规划失败时,启发式算法也能在可预测的时间内产生结果。
{"title":"Optimal vs. heuristic integrated code generation for clustered VLIW architectures","authors":"Mattias V. Eriksson, Oskar Skoog, C. Kessler","doi":"10.1145/1361096.1361099","DOIUrl":"https://doi.org/10.1145/1361096.1361099","url":null,"abstract":"In this paper we present two algorithms for integrated code generation for clustered VLIW architectures. One algorithm is a heuristic based on genetic algorithms, the other algorithm is based on integer linear programming. The performance of the algorithms are compared on a portion of the Mediabench [10] benchmark suite. We found the results of the genetic algorithm to be within one or two clock cycles from optimal for the cases where the optimum is known. In addition the heuristic algorithm produces results in predictable time also when the optimal integer linear program fails.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115541880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Fast source-level data assignment to dual memory banks 快速源级数据分配到双内存库
Pub Date : 2008-03-13 DOI: 10.1145/1361096.1361105
A. Murray, Björn Franke
Due to their streaming nature memory bandwidth is critical for most digital signal processing applications. To accommodate for these bandwidth requirements digital signal processors are typically equipped with dual memory banks that enable simultaneous access to two operands if the data is partitioned appropriately. Fully automated and compiler integrated approaches to data partitioning and memory bank assignment, however, have found little acceptance by DSP software developers. This is partly due to their inflexibility and inability to cope with certain manual data pre-assignments, e.g. due to I/O constraints. In this paper we present a different and more flexible approach, namely source-level dual memory assignment where code generation targets DSP-C, a standardised C language extension widely supported by industrial C compilers for DSPs. Additionally, we present a novel partitioning algorithm based on soft colouring that is more efficient and scalable than the currently known best integer linear programming algorithm, whilst achieving competitive code quality. We have evaluated our scheme on an Analog Devices TigerSHARC DSP and achieved speedups of up to 1.57 on 13 UTDSP benchmarks.
由于其流特性,内存带宽对大多数数字信号处理应用至关重要。为了适应这些带宽要求,数字信号处理器通常配备双存储器,如果数据被适当分区,则可以同时访问两个操作数。然而,完全自动化和编译器集成的数据分区和内存库分配方法很少被DSP软件开发人员所接受。这部分是由于它们缺乏灵活性和无法处理某些手动数据预分配,例如由于I/O限制。在本文中,我们提出了一种不同的更灵活的方法,即源代码级双内存分配,其中代码生成目标是DSP-C,这是一种标准化的C语言扩展,广泛支持工业C编译器用于dsp。此外,我们提出了一种新的基于软着色的分区算法,该算法比目前已知的最佳整数线性规划算法更有效和可扩展,同时实现了具有竞争力的代码质量。我们已经在Analog Devices的TigerSHARC DSP上评估了我们的方案,并在13个UTDSP基准上实现了高达1.57的加速。
{"title":"Fast source-level data assignment to dual memory banks","authors":"A. Murray, Björn Franke","doi":"10.1145/1361096.1361105","DOIUrl":"https://doi.org/10.1145/1361096.1361105","url":null,"abstract":"Due to their streaming nature memory bandwidth is critical for most digital signal processing applications. To accommodate for these bandwidth requirements digital signal processors are typically equipped with dual memory banks that enable simultaneous access to two operands if the data is partitioned appropriately. Fully automated and compiler integrated approaches to data partitioning and memory bank assignment, however, have found little acceptance by DSP software developers. This is partly due to their inflexibility and inability to cope with certain manual data pre-assignments, e.g. due to I/O constraints. In this paper we present a different and more flexible approach, namely source-level dual memory assignment where code generation targets DSP-C, a standardised C language extension widely supported by industrial C compilers for DSPs. Additionally, we present a novel partitioning algorithm based on soft colouring that is more efficient and scalable than the currently known best integer linear programming algorithm, whilst achieving competitive code quality. We have evaluated our scheme on an Analog Devices TigerSHARC DSP and achieved speedups of up to 1.57 on 13 UTDSP benchmarks.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123504181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Memory footprint reduction for embedded systems 减少嵌入式系统的内存占用
Pub Date : 2008-03-13 DOI: 10.1145/1361096.1361102
K. D. Bosschere
The memory footprint is considered an important constraint for embedded systems. This is especially important in the context of increasing sophistication of embedded software, and the increasing use of modern software engineering techniques like component-based design. Since reusability is the major motivation for using components, most components are not optimized for the (limited) functionality they have to realize in an embedded system. All this leads to an increasing amount of code and data that might not be needed for a given functionality. The memory footprint of an embedded system consists of 2 parts: the footprint of the application and the footprint of the operating system. In this keynote talk, I will focus on the memory footprint reduction of application as well as the Linux kernel. I will report memory footprint reductions that have been obtained by the Diablo binary rewriter, which has been used to substantially reduce the memory footprint of both applications and of the system software. For the applications, the optimizer is capable of reducing the code size of programs compiled with two proprietary ARM tool chains (ADS 1.1 and RVCT 2.1) with on average 16% for statically linked ARM programs, while making them 12.8% faster. Execution of the rewritten programs also consumes on average 10.7% less energy. For the system software, we specialize the kernel both for the system calls that are actually occurring in the application program, and for the boot parameters of the kernel. We also assume that the hardware is fixed so that part of the bootstrap process is completely deterministic and can be optimized based on actual trace information. Finally, we compress frozen code, and we swap cold code to flash memory. All combined, these compaction techniques on the kernel can reduce the kernel's RAM footprint with up to 48% for the Linux kernel. The slowdown was limited to 1--2%. This proves that binary rewriting can help in substantially reducing the memory footprint of both the application and the system software. The nice thing is that it can be done automatically, and that it also reduces the execution time and the power consumption.
内存占用被认为是嵌入式系统的一个重要约束。这在嵌入式软件日益复杂,以及现代软件工程技术(如基于组件的设计)日益普及的背景下尤为重要。由于可重用性是使用组件的主要动机,因此大多数组件都没有针对它们必须在嵌入式系统中实现的(有限的)功能进行优化。所有这些都会导致给定功能可能不需要的代码和数据量不断增加。嵌入式系统的内存占用由两部分组成:应用程序的内存占用和操作系统的内存占用。在这次主题演讲中,我将重点关注应用程序和Linux内核的内存占用减少。我将报告由Diablo二进制重写器获得的内存占用减少情况,它已被用于大幅减少应用程序和系统软件的内存占用。对于应用程序,优化器能够减少用两个专有的ARM工具链(ADS 1.1和RVCT 2.1)编译的程序的代码大小,对于静态链接的ARM程序,平均减少16%的代码大小,同时使它们的速度提高12.8%。重写程序的执行也平均减少了10.7%的能源消耗。对于系统软件,我们针对应用程序中实际发生的系统调用和内核的引导参数对内核进行专门化。我们还假设硬件是固定的,因此引导过程的一部分是完全确定的,可以根据实际跟踪信息进行优化。最后,我们压缩冻结的代码,并将冻结的代码交换到闪存中。所有这些压缩技术结合在一起,可以减少内核的RAM占用,对于Linux内核最多可减少48%。经济放缓被限制在1- 2%。这证明了二进制重写可以帮助大大减少应用程序和系统软件的内存占用。好处是它可以自动完成,而且还减少了执行时间和功耗。
{"title":"Memory footprint reduction for embedded systems","authors":"K. D. Bosschere","doi":"10.1145/1361096.1361102","DOIUrl":"https://doi.org/10.1145/1361096.1361102","url":null,"abstract":"The memory footprint is considered an important constraint for embedded systems. This is especially important in the context of increasing sophistication of embedded software, and the increasing use of modern software engineering techniques like component-based design. Since reusability is the major motivation for using components, most components are not optimized for the (limited) functionality they have to realize in an embedded system. All this leads to an increasing amount of code and data that might not be needed for a given functionality. The memory footprint of an embedded system consists of 2 parts: the footprint of the application and the footprint of the operating system. In this keynote talk, I will focus on the memory footprint reduction of application as well as the Linux kernel. I will report memory footprint reductions that have been obtained by the Diablo binary rewriter, which has been used to substantially reduce the memory footprint of both applications and of the system software. For the applications, the optimizer is capable of reducing the code size of programs compiled with two proprietary ARM tool chains (ADS 1.1 and RVCT 2.1) with on average 16% for statically linked ARM programs, while making them 12.8% faster. Execution of the rewritten programs also consumes on average 10.7% less energy. For the system software, we specialize the kernel both for the system calls that are actually occurring in the application program, and for the boot parameters of the kernel. We also assume that the hardware is fixed so that part of the bootstrap process is completely deterministic and can be optimized based on actual trace information. Finally, we compress frozen code, and we swap cold code to flash memory. All combined, these compaction techniques on the kernel can reduce the kernel's RAM footprint with up to 48% for the Linux kernel. The slowdown was limited to 1--2%. This proves that binary rewriting can help in substantially reducing the memory footprint of both the application and the system software. The nice thing is that it can be done automatically, and that it also reduces the execution time and the power consumption.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133343628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A new heuristic for SOA problem based on effective tie break function 一种基于有效断接函数的SOA问题启发式算法
Pub Date : 2008-03-13 DOI: 10.1145/1361096.1361106
H. Shokry, H. M. El-Boghdadi, S. Shaheen
Producing efficient and compact code for embedded DSP processors is very important for nowadays faster and smaller size devices. Because such processors have highly irregular data-path, conventional code generation techniques typically result in inefficient code. Embedded software compilers are expected to make use of the Address Generation Unit (AGU); a feature commonly found in modern embedded DSP processors. This helps in generating optimized offset assignments to program variables in memory, and consequently minimize the overhead instructions dedicated for addresses computations. This paper addresses one of the problems of code optimizations; namely Simple Offset Assignment (SOA) problem. In this paper, we study the tie break function introduced by Leupers and Marwedel [1] and show that this function does not represent the actual tie break that could happen in the graph. Then we introduce the notion of Effective Tie Break Function (ETBF) and use it in proposing a new algorithm for solving the SOA problem. We apply the algorithm to randomly generated graphs. Our results show improvement in offset assignment cost of up to 7% over well known offset assignment algorithms [1,2,3].
为嵌入式DSP处理器编写高效、紧凑的代码对于现在更快、更小尺寸的设备是非常重要的。由于此类处理器具有高度不规则的数据路径,传统的代码生成技术通常会导致低效的代码。嵌入式软件编译器期望使用地址生成单元(AGU);这是现代嵌入式DSP处理器中常见的功能。这有助于为内存中的程序变量生成优化的偏移分配,从而最大限度地减少专用于地址计算的开销指令。本文解决了代码优化的一个问题;即简单偏移分配(SOA)问题。本文研究了Leupers和Marwedel[1]引入的平局中断函数,并证明该函数并不代表图中可能发生的实际平局中断。然后,我们引入了有效连接中断函数(ETBF)的概念,并利用它提出了一种解决SOA问题的新算法。我们将该算法应用于随机生成的图。我们的研究结果表明,与已知的偏移分配算法相比,偏移分配成本提高了7%[1,2,3]。
{"title":"A new heuristic for SOA problem based on effective tie break function","authors":"H. Shokry, H. M. El-Boghdadi, S. Shaheen","doi":"10.1145/1361096.1361106","DOIUrl":"https://doi.org/10.1145/1361096.1361106","url":null,"abstract":"Producing efficient and compact code for embedded DSP processors is very important for nowadays faster and smaller size devices. Because such processors have highly irregular data-path, conventional code generation techniques typically result in inefficient code. Embedded software compilers are expected to make use of the Address Generation Unit (AGU); a feature commonly found in modern embedded DSP processors. This helps in generating optimized offset assignments to program variables in memory, and consequently minimize the overhead instructions dedicated for addresses computations. This paper addresses one of the problems of code optimizations; namely Simple Offset Assignment (SOA) problem.\u0000 In this paper, we study the tie break function introduced by Leupers and Marwedel [1] and show that this function does not represent the actual tie break that could happen in the graph. Then we introduce the notion of Effective Tie Break Function (ETBF) and use it in proposing a new algorithm for solving the SOA problem. We apply the algorithm to randomly generated graphs. Our results show improvement in offset assignment cost of up to 7% over well known offset assignment algorithms [1,2,3].","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129740099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast cycle-approximate instruction set simulation 快速周期近似指令集仿真
Pub Date : 2008-03-13 DOI: 10.1145/1361096.1361109
Björn Franke
Instruction set simulators are indispensable tools in both ASIP design space exploration and the software development and optimisation process for existing platforms. Despite the recent progress in improving the speed of functional instruction set simulators cycle-accurate simulation is still prohibitively slow for all but the most simple programs. This severely limits the applicability of cycle-accurate simulators in the performance evaluation of complex embedded applications. In this paper we present a novel approach, namely the prediction of cycle counts based on information gathered during fast functional simulation and prior training. We have evaluated our approach against a cycle-accurate ARM v5 architecture simulator and a large set of benchmarks. We demonstrate it is capability of providing highly accurate performance predictions with an average error of less than 5.8% at a fraction of the time for cycle-accurate simulation.
指令集模拟器是ASIP设计空间探索和现有平台软件开发与优化过程中不可或缺的工具。尽管最近在提高功能指令集模拟器的速度方面取得了进展,但除了最简单的程序外,循环精确的模拟仍然非常缓慢。这严重限制了周期精确模拟器在复杂嵌入式应用性能评估中的适用性。在本文中,我们提出了一种新的方法,即基于快速功能模拟和先验训练中收集的信息来预测循环计数。我们已经针对周期精确的ARM v5架构模拟器和大量基准测试对我们的方法进行了评估。我们证明了它能够在一小部分时间内提供高度准确的性能预测,平均误差小于5.8%,用于周期精确模拟。
{"title":"Fast cycle-approximate instruction set simulation","authors":"Björn Franke","doi":"10.1145/1361096.1361109","DOIUrl":"https://doi.org/10.1145/1361096.1361109","url":null,"abstract":"Instruction set simulators are indispensable tools in both ASIP design space exploration and the software development and optimisation process for existing platforms. Despite the recent progress in improving the speed of functional instruction set simulators cycle-accurate simulation is still prohibitively slow for all but the most simple programs. This severely limits the applicability of cycle-accurate simulators in the performance evaluation of complex embedded applications. In this paper we present a novel approach, namely the prediction of cycle counts based on information gathered during fast functional simulation and prior training. We have evaluated our approach against a cycle-accurate ARM v5 architecture simulator and a large set of benchmarks. We demonstrate it is capability of providing highly accurate performance predictions with an average error of less than 5.8% at a fraction of the time for cycle-accurate simulation.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133706598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
WCET-driven, code-size critical procedure cloning wcet驱动,代码大小关键过程克隆
Pub Date : 2008-03-13 DOI: 10.1145/1361096.1361100
Paul Lokuciejewski, H. Falk, P. Marwedel, Henrik Theiling
In the domain of the worst-case execution time (WCET) analysis, loops are an inherent source of unpredictability and loss of precision since the determination of tight and safe information on the number of loop iterations is a difficult task. In particular, data-dependent loops whose iteration counts depend on function parameters can not be precisely handled by a timing analysis. Procedure Cloning can be exploited to make these loops explicit within the source code allowing a highly precise WCET analysis. In this paper we extend the standard Procedure Cloning optimization by WCET-aware concepts with the objective to improve the tightness of the WCET estimation. Our novel approach is driven by WCET information which successively eliminates code structures leading to overestimated timing results, thus making the code more suitable for the analysis. In addition, the code size increase during the optimization is monitored and large increases are avoided. The effectiveness of our optimization is shown by tests on real-world benchmarks. After performing our optimization, the estimated WCET is reduced by up to 64.2% while the employed code transformations yield an additional code size increase of 22.6% on average. In contrast, the average-case performance being the original objective of Procedure Cloning showed a slight decrease.
在最坏情况执行时间(WCET)分析领域,循环是不可预测性和精度损失的固有来源,因为确定关于循环迭代次数的严格和安全信息是一项困难的任务。特别是,迭代次数依赖于函数参数的数据依赖循环不能通过计时分析精确处理。可以利用过程克隆使这些循环在源代码中显式显示,从而实现高度精确的WCET分析。本文利用WCET感知的概念对标准的过程克隆优化进行了扩展,目的是提高WCET估计的严密性。我们的新方法是由WCET信息驱动的,它先后消除了导致高估时间结果的代码结构,从而使代码更适合分析。此外,在优化过程中,代码大小的增加受到监控,并避免大幅增加。我们的优化的有效性在真实的基准测试中得到了证明。在执行我们的优化之后,估计的WCET减少了64.2%,而所使用的代码转换平均增加了22.6%的额外代码大小。相反,作为过程克隆的原始目标的平均情况性能略有下降。
{"title":"WCET-driven, code-size critical procedure cloning","authors":"Paul Lokuciejewski, H. Falk, P. Marwedel, Henrik Theiling","doi":"10.1145/1361096.1361100","DOIUrl":"https://doi.org/10.1145/1361096.1361100","url":null,"abstract":"In the domain of the worst-case execution time (WCET) analysis, loops are an inherent source of unpredictability and loss of precision since the determination of tight and safe information on the number of loop iterations is a difficult task. In particular, data-dependent loops whose iteration counts depend on function parameters can not be precisely handled by a timing analysis. Procedure Cloning can be exploited to make these loops explicit within the source code allowing a highly precise WCET analysis.\u0000 In this paper we extend the standard Procedure Cloning optimization by WCET-aware concepts with the objective to improve the tightness of the WCET estimation. Our novel approach is driven by WCET information which successively eliminates code structures leading to overestimated timing results, thus making the code more suitable for the analysis. In addition, the code size increase during the optimization is monitored and large increases are avoided.\u0000 The effectiveness of our optimization is shown by tests on real-world benchmarks. After performing our optimization, the estimated WCET is reduced by up to 64.2% while the employed code transformations yield an additional code size increase of 22.6% on average. In contrast, the average-case performance being the original objective of Procedure Cloning showed a slight decrease.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124081880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A fully-non-transparent approach to the code location problem 一个完全不透明的方法来解决代码位置问题
Pub Date : 2008-03-13 DOI: 10.1145/1361096.1361108
Hugo Venturini, F. Riss, Jean-Claude Fernandez, M. Santana
In the context of embedded systems such as cell-phones, PDA or cars and planes software, optimizations of code are required because of timing and memory constraints imposed. Many problems arise when trying to debug optimized code. One of them is the irrelevance of the mapping between the source code and the optimized target program: the Code Location Problem. This paper proposes a solution to this problem in the case of highly optimized code in the context of embedded systems. Two approaches exist: non-transparent and transparent debugging. Our approach is non-transparent. The idea is to reveal the execution of the optimized program to the user so the latter understands the mapping to the source code in spite of transformations applied to the program. We do not emulate the execution of the unoptimized program. We make good use of the programmer's knowledge of its development platform. Standard debuggers do not provide the required mechanisms while compilers do not provide the relevant debug information. We propose a novel method to maintain accurate debug information when optimizing at compilation and we experiment this method on the MMDSP+ C compiler and the IDBug debugger.
在诸如手机、PDA或汽车和飞机软件等嵌入式系统的环境中,由于施加的时间和内存限制,需要对代码进行优化。在调试优化后的代码时会出现许多问题。其中之一是源代码和优化后的目标程序之间的映射不相关:代码定位问题。本文针对嵌入式系统中高度优化代码的情况,提出了一种解决方案。存在两种方法:非透明调试和透明调试。我们的做法是不透明的。其思想是向用户显示优化程序的执行情况,以便后者理解到源代码的映射,尽管对程序应用了转换。我们不模拟未优化程序的执行。我们很好地利用了程序员对其开发平台的了解。标准调试器不提供所需的机制,而编译器不提供相关的调试信息。本文提出了一种在编译优化时保持准确调试信息的新方法,并在MMDSP+ C编译器和IDBug调试器上进行了实验。
{"title":"A fully-non-transparent approach to the code location problem","authors":"Hugo Venturini, F. Riss, Jean-Claude Fernandez, M. Santana","doi":"10.1145/1361096.1361108","DOIUrl":"https://doi.org/10.1145/1361096.1361108","url":null,"abstract":"In the context of embedded systems such as cell-phones, PDA or cars and planes software, optimizations of code are required because of timing and memory constraints imposed. Many problems arise when trying to debug optimized code. One of them is the irrelevance of the mapping between the source code and the optimized target program: the Code Location Problem. This paper proposes a solution to this problem in the case of highly optimized code in the context of embedded systems.\u0000 Two approaches exist: non-transparent and transparent debugging. Our approach is non-transparent. The idea is to reveal the execution of the optimized program to the user so the latter understands the mapping to the source code in spite of transformations applied to the program. We do not emulate the execution of the unoptimized program. We make good use of the programmer's knowledge of its development platform. Standard debuggers do not provide the required mechanisms while compilers do not provide the relevant debug information. We propose a novel method to maintain accurate debug information when optimizing at compilation and we experiment this method on the MMDSP+ C compiler and the IDBug debugger.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126106257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Integrated code generation by using fuzzy control system 利用模糊控制系统集成代码生成
Pub Date : 2008-03-13 DOI: 10.1145/1361096.1361098
Xiaoyan Jia, Jie Guo, G. Fettweis
High quality code generation for DSPs that consist of irregular architectures is a challenge in terms of problem complexity. Since such problems are divided into several separated subtasks in the traditional compiler backends, the code quality is decreased owing to the ignorance of the interdependencies among these subtasks. Thus, an integrated compiler backend by using fuzzy control system is developed for an irregular architecture which is called Synchronous Transfer Architecture (STA). According to the experimental results, our novel method is proved to be more efficient than the traditional method. The code size and execution time of the generated code are reduced to be about 42.7% to 62.5% of those achieved by traditional compiler backend. Moreover, the power consumption is greatly reduced concerning the efficient utilization of the STA data paths.
对于由不规则体系结构组成的dsp来说,高质量的代码生成在问题复杂性方面是一个挑战。由于传统的编译器后端将这些问题划分为几个独立的子任务,忽略了这些子任务之间的相互依赖关系,降低了代码质量。在此基础上,针对非规则的同步传输体系结构,采用模糊控制系统开发了集成的编译器后端。实验结果表明,该方法比传统方法更有效。生成代码的代码大小和执行时间减少到传统编译器后端所达到的42.7%到62.5%。此外,在有效利用STA数据路径方面,大大降低了功耗。
{"title":"Integrated code generation by using fuzzy control system","authors":"Xiaoyan Jia, Jie Guo, G. Fettweis","doi":"10.1145/1361096.1361098","DOIUrl":"https://doi.org/10.1145/1361096.1361098","url":null,"abstract":"High quality code generation for DSPs that consist of irregular architectures is a challenge in terms of problem complexity. Since such problems are divided into several separated subtasks in the traditional compiler backends, the code quality is decreased owing to the ignorance of the interdependencies among these subtasks. Thus, an integrated compiler backend by using fuzzy control system is developed for an irregular architecture which is called Synchronous Transfer Architecture (STA). According to the experimental results, our novel method is proved to be more efficient than the traditional method. The code size and execution time of the generated code are reduced to be about 42.7% to 62.5% of those achieved by traditional compiler backend. Moreover, the power consumption is greatly reduced concerning the efficient utilization of the STA data paths.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122742895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Software and Compilers for Embedded Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1