首页 > 最新文献

Software and Compilers for Embedded Systems最新文献

英文 中文
Scalable DFA Compilation for High-Performance Regular-Expression Matching 用于高性能正则表达式匹配的可伸缩DFA编译
Pub Date : 2016-05-23 DOI: 10.1145/2906363.2907053
J. V. Lunteren
Regular-expression accelerators often rely on sophisticated compilers to fully exploit the available hardware capabilities for achieving wire-speed scan rates of multiple tens of gigabits per second. This paper presents a method for the efficient compilation of pattern-matching functions specified by deterministic finite automata (DFAs) into executable structures targeted at accelerators based on B-FSM programmable state machines. The compilation scheme presented is able to effectively exploit an adaptive compression mechanism to obtain one of the most compact state-transition-table structures in the industry, in combination with fast compilation times. The heuristic-based approach scales to very large DFAs having tens of millions of transitions, while achieving an approximately linear growth of the storage needs as a function of the DFA size.
正则表达式加速器通常依赖于复杂的编译器来充分利用可用的硬件功能,以实现每秒几十千兆比特的线速扫描速率。本文提出了一种基于B-FSM可编程状态机的方法,将确定性有限自动机(dfa)指定的模式匹配函数有效地编译成针对加速器的可执行结构。所提出的编译方案能够有效地利用自适应压缩机制来获得业界最紧凑的状态转移表结构之一,并结合快速的编译时间。基于启发式的方法扩展到具有数千万个转换的非常大的DFA,同时实现了存储需求作为DFA大小的函数的近似线性增长。
{"title":"Scalable DFA Compilation for High-Performance Regular-Expression Matching","authors":"J. V. Lunteren","doi":"10.1145/2906363.2907053","DOIUrl":"https://doi.org/10.1145/2906363.2907053","url":null,"abstract":"Regular-expression accelerators often rely on sophisticated compilers to fully exploit the available hardware capabilities for achieving wire-speed scan rates of multiple tens of gigabits per second. This paper presents a method for the efficient compilation of pattern-matching functions specified by deterministic finite automata (DFAs) into executable structures targeted at accelerators based on B-FSM programmable state machines. The compilation scheme presented is able to effectively exploit an adaptive compression mechanism to obtain one of the most compact state-transition-table structures in the industry, in combination with fast compilation times. The heuristic-based approach scales to very large DFAs having tens of millions of transitions, while achieving an approximately linear growth of the storage needs as a function of the DFA size.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127337244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving ESL power models using switching activity information from timed functional models 使用从定时功能模型中切换活动信息来改进ESL功率模型
Pub Date : 2014-06-10 DOI: 10.1145/2609248.2609250
Stefan Schürmans, Diandian Zhang, R. Leupers, G. Ascheid, Xiaotao Chen
Early design space exploration at Electronic System Level (ESL) can be done either using untimed functional models, timed functional models or performance models, which use random or zero data instead of the actual data. In order to be applicable to the two latter types, ESL power estimation approaches often rely only on sub-block activity information. This work shows the benefit of additionally using the switching activity information of actual data available in timed functional models for power estimation. A case study shows that a considerable gain in accuracy can be achieved while causing only a moderate simulation slowdown.
电子系统级(ESL)的早期设计空间探索可以使用非定时功能模型、定时功能模型或性能模型来完成,这些模型使用随机或零数据而不是实际数据。为了适用于后两种类型,ESL功率估计方法通常只依赖于子块活动信息。这项工作显示了额外使用定时功能模型中可用的实际数据的开关活动信息进行功率估计的好处。一个案例研究表明,在只引起适度的模拟减速的情况下,可以实现相当大的精度增益。
{"title":"Improving ESL power models using switching activity information from timed functional models","authors":"Stefan Schürmans, Diandian Zhang, R. Leupers, G. Ascheid, Xiaotao Chen","doi":"10.1145/2609248.2609250","DOIUrl":"https://doi.org/10.1145/2609248.2609250","url":null,"abstract":"Early design space exploration at Electronic System Level (ESL) can be done either using untimed functional models, timed functional models or performance models, which use random or zero data instead of the actual data. In order to be applicable to the two latter types, ESL power estimation approaches often rely only on sub-block activity information. This work shows the benefit of additionally using the switching activity information of actual data available in timed functional models for power estimation. A case study shows that a considerable gain in accuracy can be achieved while causing only a moderate simulation slowdown.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115047643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A verified transformation: from polychronous programs to a variant of clocked guarded actions 经过验证的转换:从多时间程序到时钟保护动作的变体
Pub Date : 2014-06-10 DOI: 10.1145/2609248.2609259
Zhibin Yang, J. Bodeveix, M. Filali, Kai Hu, Dian-fu Ma
SIGNAL belongs to the synchronous languages family. Such languages are widely used in the design of safety-critical real-time systems such as avionics, space systems, and nuclear power plants. This paper reports a key step of a verified SIGNAL compiler prototype, that is the transformation from a subset of SIGNAL to S-CGA (a variant of clocked guarded actions) and the proof of semantics preservation. Compared with the existing SIGNAL compiler, we use clocked guarded actions as the intermediate representation, to integrate more synchronous programs into our verified compiler prototype in the future. However, in contrast to the SIGNAL language, clocked guarded actions can evaluate a variable even if its clock does not hold. Thus, we propose a variant of clocked guarded actions, namely S-CGA, which constrains variable accesses as done by SIGNAL. To conform with the revised semantics of clocked guarded actions, we also do some adjustments on the existing translation rules from SIGNAL to clocked guarded actions. Finally, the verified transformation is mechanized in the theorem prover Coq.
SIGNAL属于同步语言族。这些语言被广泛用于安全关键实时系统的设计,如航空电子设备、空间系统和核电站。本文报道了一个经过验证的SIGNAL编译器原型的关键步骤,即从SIGNAL的一个子集到S-CGA(一种有时钟保护动作的变体)的转换和语义保持的证明。与现有的SIGNAL编译器相比,我们使用时钟保护动作作为中间表示,以便在将来将更多的同步程序集成到经过验证的编译器原型中。然而,与SIGNAL语言不同的是,有时钟保护的操作可以计算变量,即使它的时钟不保持。因此,我们提出了一种时钟保护动作的变体,即S-CGA,它像SIGNAL一样限制变量访问。为了符合修改后的时钟保护动作的语义,我们还对现有的从SIGNAL到时钟保护动作的翻译规则进行了一些调整。最后,在定理证明器Coq中实现了验证变换的机械化。
{"title":"A verified transformation: from polychronous programs to a variant of clocked guarded actions","authors":"Zhibin Yang, J. Bodeveix, M. Filali, Kai Hu, Dian-fu Ma","doi":"10.1145/2609248.2609259","DOIUrl":"https://doi.org/10.1145/2609248.2609259","url":null,"abstract":"SIGNAL belongs to the synchronous languages family. Such languages are widely used in the design of safety-critical real-time systems such as avionics, space systems, and nuclear power plants. This paper reports a key step of a verified SIGNAL compiler prototype, that is the transformation from a subset of SIGNAL to S-CGA (a variant of clocked guarded actions) and the proof of semantics preservation. Compared with the existing SIGNAL compiler, we use clocked guarded actions as the intermediate representation, to integrate more synchronous programs into our verified compiler prototype in the future. However, in contrast to the SIGNAL language, clocked guarded actions can evaluate a variable even if its clock does not hold. Thus, we propose a variant of clocked guarded actions, namely S-CGA, which constrains variable accesses as done by SIGNAL. To conform with the revised semantics of clocked guarded actions, we also do some adjustments on the existing translation rules from SIGNAL to clocked guarded actions. Finally, the verified transformation is mechanized in the theorem prover Coq.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125584259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Software code generation for dynamic dataflow programs 动态数据流程序的软件代码生成
Pub Date : 2014-06-10 DOI: 10.1145/2609248.2609260
Gustav Cedersjö, J. Janneck
In this paper we address the problem of generating efficient software implementations for a large class of dataflow programs that is characterized by highly data-dependent behavior and which is therefore in general not amenable to compile-time scheduling. Previous work on implementing dataflow programs has emphasized classes of stream processing algorithms that exhibit sufficiently regular behavior to permit extensive compile-time analysis and scheduling, however many real-world stream programs, do not fall into these classes and exhibit behavior that can, for example, depend on the values and even the timing of their input data. Based on an abstract machine model, we partition the problem of implementing such programs in software into three parts, viz. reduction, composition, and code emission, and present solutions for each of them. Using the reference code of an MPEG decoder, we evaluate the resulting code quality and compare it to the state of the art compilers for the same class of stream programs, with favorable results.
在本文中,我们解决了为一大类数据流程序生成高效软件实现的问题,这些数据流程序的特点是高度依赖数据的行为,因此通常不适合编译时调度。以前关于实现数据流程序的工作强调了流处理算法的类,这些类表现出足够规则的行为,以允许广泛的编译时分析和调度,然而许多现实世界的流程序不属于这些类,并且表现出可以依赖于输入数据的值甚至时间的行为。在抽象机器模型的基础上,我们将软件实现这类程序的问题划分为简化、组合和代码释放三个部分,并给出了各自的解决方案。使用MPEG解码器的参考代码,我们评估生成的代码质量,并将其与同类流程序的最先进编译器的状态进行比较,结果良好。
{"title":"Software code generation for dynamic dataflow programs","authors":"Gustav Cedersjö, J. Janneck","doi":"10.1145/2609248.2609260","DOIUrl":"https://doi.org/10.1145/2609248.2609260","url":null,"abstract":"In this paper we address the problem of generating efficient software implementations for a large class of dataflow programs that is characterized by highly data-dependent behavior and which is therefore in general not amenable to compile-time scheduling. Previous work on implementing dataflow programs has emphasized classes of stream processing algorithms that exhibit sufficiently regular behavior to permit extensive compile-time analysis and scheduling, however many real-world stream programs, do not fall into these classes and exhibit behavior that can, for example, depend on the values and even the timing of their input data. Based on an abstract machine model, we partition the problem of implementing such programs in software into three parts, viz. reduction, composition, and code emission, and present solutions for each of them. Using the reference code of an MPEG decoder, we evaluate the resulting code quality and compare it to the state of the art compilers for the same class of stream programs, with favorable results.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130079491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Fast and efficient dataflow graph generation 快速高效的数据流图形生成
Pub Date : 2014-06-10 DOI: 10.1145/2609248.2609258
Bruno Bodin, Youen Lesparre, J. Delosme, Alix Munier Kordon
Dataflow modeling is a highly regarded method for the design of embedded systems. Measuring the performance of the associated analysis and compilation tools requires an efficient dataflow graph generator. This paper presents a new graph generator for Phased Computation Graphs (PCG), which augment Cyclo-Static Dataflow Graphs with both initial phases and thresholds. A sufficient condition of liveness is first extended to the PCG model. The determination of initial conditions minimizing the total amount of initial data in the channels and ensuring liveness can then be expressed using Integer Linear Programming. This contribution and other improvements of previous works are incorporated in Turbine, a new dataflow graph generator. Its effectiveness is demonstrated experimentally by comparing it to two existing generators, DFTools and SDF3.
数据流建模是嵌入式系统设计中备受推崇的一种方法。测量相关分析和编译工具的性能需要一个高效的数据流图生成器。本文提出了一种新的相位计算图(PCG)图生成器,它增加了具有初始相位和阈值的循环静态数据流图。首先将活动性的充分条件推广到PCG模型。初始条件的确定,最小化通道中初始数据的总量,并确保活动性,然后可以使用整数线性规划表示。这一贡献和其他先前工作的改进被纳入涡轮,一个新的数据流图生成器。通过与现有的两种生成器DFTools和SDF3进行比较,实验证明了其有效性。
{"title":"Fast and efficient dataflow graph generation","authors":"Bruno Bodin, Youen Lesparre, J. Delosme, Alix Munier Kordon","doi":"10.1145/2609248.2609258","DOIUrl":"https://doi.org/10.1145/2609248.2609258","url":null,"abstract":"Dataflow modeling is a highly regarded method for the design of embedded systems. Measuring the performance of the associated analysis and compilation tools requires an efficient dataflow graph generator. This paper presents a new graph generator for Phased Computation Graphs (PCG), which augment Cyclo-Static Dataflow Graphs with both initial phases and thresholds.\u0000 A sufficient condition of liveness is first extended to the PCG model. The determination of initial conditions minimizing the total amount of initial data in the channels and ensuring liveness can then be expressed using Integer Linear Programming. This contribution and other improvements of previous works are incorporated in Turbine, a new dataflow graph generator. Its effectiveness is demonstrated experimentally by comparing it to two existing generators, DFTools and SDF3.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131194115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Temporal analysis flow based on an enabling rate characterization for multi-rate applications executed on mpsocs with non-starvation-free schedulers 时间分析流基于在mpsoc上执行的多速率应用程序的启用速率特征,这些应用程序具有非无饥饿调度程序
Pub Date : 2014-06-10 DOI: 10.1145/2609248.2609262
J. Hausmans, Stefan J. Geuns, M. Wiggers, M. Bekooij
Real-time stream processing applications often contain multi-rate behavior. This multi-rate behavior can be accurately modeled using Synchronous Dataflow (SDF) graphs. However, no temporal analysis technique exists which is applicable for arbitrary cyclic SDF graphs and can handle cyclic resource dependencies. This paper presents a temporal analysis flow for SDF graphs which is applicable for systems with non-starvation-free schedulers such as static priority pre-emptive schedulers. The analysis flow uses an enabling rate characterization to calculate response times. This enabling rate characterization is determined using multi-dimensional periodic schedules and allows a more accurate modeling of enabling patterns than is possible with a characterization that is based on periods and enabling jitters. The presented approach is applicable for arbitrary (cyclic) graph topologies and can take buffer capacity constraints into account during analysis. Also cyclic resource dependencies can be analyzed. The presented analysis flow is the first approach that considers arbitrary SDF graph topologies in combination with cyclic resource dependencies that are caused by non-starvation-free schedulers. The proposed analysis flow is evaluated using a radio processing application. The analysis results are obtained using a tool in which the analysis flow is implemented. This case-study illustrates that the used enabling characterization achieves up to 87% better response times than with an enabling jitter based characterization.
实时流处理应用程序通常包含多速率行为。这种多速率行为可以使用同步数据流(SDF)图精确地建模。然而,目前还没有一种时间分析技术可以适用于任意循环SDF图,并且可以处理循环资源依赖关系。本文提出了一种SDF图的时序分析流程,该流程适用于具有非无饥饿调度程序(如静态优先级抢占式调度程序)的系统。分析流使用启用速率表征来计算响应时间。这种使能率表征是使用多维周期时间表确定的,与基于周期和使能抖动的表征相比,它允许对使能模式进行更精确的建模。所提出的方法适用于任意(循环)图拓扑,并且可以在分析过程中考虑缓冲区容量约束。还可以分析循环资源依赖关系。本文给出的分析流是第一种将任意SDF图拓扑与由非饥饿调度程序引起的循环资源依赖结合起来考虑的方法。使用无线电处理应用程序评估所提出的分析流。分析结果是通过实现分析流程的工具获得的。本案例研究表明,所使用的启用表征比启用基于抖动的表征实现了高达87%的响应时间。
{"title":"Temporal analysis flow based on an enabling rate characterization for multi-rate applications executed on mpsocs with non-starvation-free schedulers","authors":"J. Hausmans, Stefan J. Geuns, M. Wiggers, M. Bekooij","doi":"10.1145/2609248.2609262","DOIUrl":"https://doi.org/10.1145/2609248.2609262","url":null,"abstract":"Real-time stream processing applications often contain multi-rate behavior. This multi-rate behavior can be accurately modeled using Synchronous Dataflow (SDF) graphs. However, no temporal analysis technique exists which is applicable for arbitrary cyclic SDF graphs and can handle cyclic resource dependencies.\u0000 This paper presents a temporal analysis flow for SDF graphs which is applicable for systems with non-starvation-free schedulers such as static priority pre-emptive schedulers. The analysis flow uses an enabling rate characterization to calculate response times. This enabling rate characterization is determined using multi-dimensional periodic schedules and allows a more accurate modeling of enabling patterns than is possible with a characterization that is based on periods and enabling jitters.\u0000 The presented approach is applicable for arbitrary (cyclic) graph topologies and can take buffer capacity constraints into account during analysis. Also cyclic resource dependencies can be analyzed. The presented analysis flow is the first approach that considers arbitrary SDF graph topologies in combination with cyclic resource dependencies that are caused by non-starvation-free schedulers.\u0000 The proposed analysis flow is evaluated using a radio processing application. The analysis results are obtained using a tool in which the analysis flow is implemented. This case-study illustrates that the used enabling characterization achieves up to 87% better response times than with an enabling jitter based characterization.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128855804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A parallelizing compiler for multicore systems 多核系统的并行编译器
Pub Date : 2014-06-10 DOI: 10.1145/2609248.2609254
This manuscript summarizes the main ideas introduced in [1]. We propose a compiler that automatically transforms a sequential application into a parallel counterpart for multicore processors. It is based on an intermediate representation, named KIR, which exposes multiple levels of parallelism and hides the complexity of the implementation details thanks to the domain-independent kernels (e.g., assignment, reduction). The effectiveness and performance of our approach, built on top of GCC, has been tested with a large variety of codes.
本文对[1]中介绍的主要思想进行了总结。我们提出了一种编译器,可以自动将顺序应用程序转换为多核处理器的并行对应程序。它基于一个名为KIR的中间表示,它暴露了多个并行级别,并隐藏了实现细节的复杂性,这要归功于领域独立的内核(例如,赋值、约简)。我们的方法是建立在GCC之上的,其有效性和性能已经用大量不同的代码进行了测试。
{"title":"A parallelizing compiler for multicore systems","authors":"","doi":"10.1145/2609248.2609254","DOIUrl":"https://doi.org/10.1145/2609248.2609254","url":null,"abstract":"This manuscript summarizes the main ideas introduced in [1]. We propose a compiler that automatically transforms a sequential application into a parallel counterpart for multicore processors. It is based on an intermediate representation, named KIR, which exposes multiple levels of parallelism and hides the complexity of the implementation details thanks to the domain-independent kernels (e.g., assignment, reduction). The effectiveness and performance of our approach, built on top of GCC, has been tested with a large variety of codes.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114592255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A lightweight incremental analysis and profiling framework for embedded devices 一个用于嵌入式设备的轻量级增量分析和概要框架
Pub Date : 2014-06-10 DOI: 10.1145/2609248.2609263
Sara Elshobaky, A. El-Mahdy, Erven Rohou, Layla A. A. El-Sayed, Mohamed Nazih ElDerini
Embedded systems such as mobile devices are currently ubiquitous. The performance potential of these devices is rapidly improving by incorporating multi-core and GPU technologies, and is rapidly catching up with the workstation platforms. Nevertheless, the heterogeneity of the underlying hardware as well as the low-power constraints severely limit performance portability. In this paper we consider the case of leveraging JIT compilers to provide portable parallelization while hiding the corresponding expensive runtime analysis. We propose a novel lightweight JIT framework that exploits the device idle time and the large storage space generally available on these devices. The framework performs 'incremental' analysis while the processor is idle (such as during charging time), and exploits the storage space to cache intermediate analysis results. Such approach requires reengineering existing complex optimization analysis methods. For this paper, we focus on the traditional loop parallelization analysis, and implement a working prototype into the LLVM framework, integrating a lightweight dynamic profiling method to identify hotspots. Initial results demonstrate the low overhead of our method for parallelizing simple loops on an embedded GPU.
像移动设备这样的嵌入式系统目前无处不在。通过整合多核和GPU技术,这些设备的性能潜力正在迅速提高,并正在迅速赶上工作站平台。然而,底层硬件的异构性以及低功耗限制严重限制了性能可移植性。在本文中,我们考虑利用JIT编译器提供可移植的并行化,同时隐藏相应的昂贵的运行时分析的情况。我们提出了一个新的轻量级JIT框架,利用设备空闲时间和这些设备上通常可用的大存储空间。该框架在处理器空闲时(例如在充电期间)执行“增量”分析,并利用存储空间缓存中间分析结果。这种方法需要对现有的复杂优化分析方法进行重新设计。本文以传统的循环并行化分析为基础,在LLVM框架中实现了一个工作原型,并集成了一种轻量级的动态分析方法来识别热点。初步结果表明,我们的方法在嵌入式GPU上并行化简单循环的开销很低。
{"title":"A lightweight incremental analysis and profiling framework for embedded devices","authors":"Sara Elshobaky, A. El-Mahdy, Erven Rohou, Layla A. A. El-Sayed, Mohamed Nazih ElDerini","doi":"10.1145/2609248.2609263","DOIUrl":"https://doi.org/10.1145/2609248.2609263","url":null,"abstract":"Embedded systems such as mobile devices are currently ubiquitous. The performance potential of these devices is rapidly improving by incorporating multi-core and GPU technologies, and is rapidly catching up with the workstation platforms. Nevertheless, the heterogeneity of the underlying hardware as well as the low-power constraints severely limit performance portability. In this paper we consider the case of leveraging JIT compilers to provide portable parallelization while hiding the corresponding expensive runtime analysis. We propose a novel lightweight JIT framework that exploits the device idle time and the large storage space generally available on these devices. The framework performs 'incremental' analysis while the processor is idle (such as during charging time), and exploits the storage space to cache intermediate analysis results. Such approach requires reengineering existing complex optimization analysis methods. For this paper, we focus on the traditional loop parallelization analysis, and implement a working prototype into the LLVM framework, integrating a lightweight dynamic profiling method to identify hotspots. Initial results demonstrate the low overhead of our method for parallelizing simple loops on an embedded GPU.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123623178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Energy-aware parallelization flow and toolset for C code 能量感知的并行化流程和C代码工具集
Pub Date : 2014-06-10 DOI: 10.1145/2609248.2609264
M. Lazarescu, Albert Cohen, Adrien Guatto, Nhat Minh Lê, L. Lavagno, Antoniu Pop, M. Prieto, A. Terechko, A. Sutii
Multicore architectures are increasingly used in embedded systems to achieve higher throughput with lower energy consumption. This trend accentuates the need to convert existing sequential code to effectively exploit the resources of these architectures. We present a parallelization flow and toolset for legacy C code that includes a performance estimation tool, a parallelization tool, and a streaming-oriented parallelization framework. These are part of the work-in-progress EU FP7 PHARAON project that aims to develop a complete set of techniques and tools to guide and assist software development for heterogeneous parallel architectures. We demonstrate the effectiveness of the use of the toolset in an experiment where we measure the parallelization quality and time for inexperienced users, and the parallelization flow and performance results for the parallelization of a practical example of a stereo vision application.
多核架构越来越多地应用于嵌入式系统中,以实现更高的吞吐量和更低的能耗。这种趋势强调了转换现有顺序代码以有效利用这些体系结构资源的需求。我们为遗留C代码提供了一个并行化流程和工具集,其中包括一个性能评估工具、一个并行化工具和一个面向流的并行化框架。这些是正在进行的EU FP7 PHARAON项目的一部分,该项目旨在开发一套完整的技术和工具,以指导和协助异构并行架构的软件开发。我们在一个实验中证明了工具集使用的有效性,在这个实验中,我们测量了没有经验的用户的并行化质量和时间,以及立体视觉应用程序的并行化的并行化流程和性能结果。
{"title":"Energy-aware parallelization flow and toolset for C code","authors":"M. Lazarescu, Albert Cohen, Adrien Guatto, Nhat Minh Lê, L. Lavagno, Antoniu Pop, M. Prieto, A. Terechko, A. Sutii","doi":"10.1145/2609248.2609264","DOIUrl":"https://doi.org/10.1145/2609248.2609264","url":null,"abstract":"Multicore architectures are increasingly used in embedded systems to achieve higher throughput with lower energy consumption. This trend accentuates the need to convert existing sequential code to effectively exploit the resources of these architectures. We present a parallelization flow and toolset for legacy C code that includes a performance estimation tool, a parallelization tool, and a streaming-oriented parallelization framework. These are part of the work-in-progress EU FP7 PHARAON project that aims to develop a complete set of techniques and tools to guide and assist software development for heterogeneous parallel architectures. We demonstrate the effectiveness of the use of the toolset in an experiment where we measure the parallelization quality and time for inexperienced users, and the parallelization flow and performance results for the parallelization of a practical example of a stereo vision application.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"733 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122938667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimal general offset assignment 最优一般偏移分配
Pub Date : 2014-06-10 DOI: 10.1145/2609248.2609251
Sven Mallach, Roberto Castañeda Lozano
We present an exact approach to the General Offset Assignment problem arising in the domain of address code generation for application specific and digital signal processors. General Offset Assignment is composed of two subproblems, namely to find a permutation of variables in memory and to select a responsible address register for each access to one of these variables. Our method is a combination of established techniques to solve both subproblems using integer linear programming. To the best of our knowledge, it is the first approach capable of solving almost all instances of the established OffsetStone benchmark set to global optimality within reasonable time. We provide a first comprehensive evaluation of the quality of several state-of-the-art heuristics relative to the optimal solutions.
我们提出了一种精确的方法来解决在特定应用和数字信号处理器的地址码生成领域中出现的一般偏移分配问题。一般偏移量分配由两个子问题组成,即在内存中找到变量的排列,并为每次访问这些变量中的一个选择一个负责的地址寄存器。我们的方法是结合现有的技术来解决这两个子问题,使用整数线性规划。据我们所知,它是第一种能够在合理的时间内解决几乎所有已建立的OffsetStone基准设置为全局最优的实例的方法。我们提供了几个最先进的启发式相对于最优解决方案的质量的第一个综合评价。
{"title":"Optimal general offset assignment","authors":"Sven Mallach, Roberto Castañeda Lozano","doi":"10.1145/2609248.2609251","DOIUrl":"https://doi.org/10.1145/2609248.2609251","url":null,"abstract":"We present an exact approach to the General Offset Assignment problem arising in the domain of address code generation for application specific and digital signal processors. General Offset Assignment is composed of two subproblems, namely to find a permutation of variables in memory and to select a responsible address register for each access to one of these variables. Our method is a combination of established techniques to solve both subproblems using integer linear programming. To the best of our knowledge, it is the first approach capable of solving almost all instances of the established OffsetStone benchmark set to global optimality within reasonable time. We provide a first comprehensive evaluation of the quality of several state-of-the-art heuristics relative to the optimal solutions.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114703091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Software and Compilers for Embedded Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1