Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems最新文献

英文中文

Determining Performance Boundaries on High-Level System Specifications 确定高级系统规范的性能边界

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2906386

W. V. Teijlingen, R. V. Leuken, C. Galuzzi, B. Kienhuis

We can significantly reduce the time required to realize designs if it is possible to find limits to the performance of an embedded system, solely based on high-level system specifications. For that purpose, we present in this paper the cprof profiler, which determines the number of clock cycles needed to execute a C-program in hardware. The cprof tool is based on the Clang compiler front-end to parse C-programs and to produce instrumented source code for the profiling. Using cprof, we determine a lower and upper bound limit for all 29 cases of the PolyBench/C benchmark suite. The lower and upper bound are determined using the absolute performance estimations assuming all statement are mapped onto the same processing resource and unbounded performance estimations assuming unlimited resources. We also compared the clock cycles found by cprof with RTL implementations for all 29 Polybench/C cases and found that cprof determines with 1.2% accuracy the correct number of clock cycles. It does this in a fraction of the time compared to the time needed to do a full RTL simulation.

如果能够找到嵌入式系统性能的限制，仅基于高级系统规范，我们就可以大大减少实现设计所需的时间。为此，我们在本文中介绍了cprofiler，它决定了在硬件中执行c程序所需的时钟周期数。cprof工具基于Clang编译器前端，用于解析c程序并为分析生成经过检测的源代码。使用cprof，我们为PolyBench/C基准测试套件的所有29种情况确定了下限和上限。使用绝对性能估计(假设所有语句都映射到相同的处理资源)和无界性能估计(假设资源无限)确定下界和上界。我们还比较了cprof与RTL实现在所有29种Polybench/C情况下发现的时钟周期，发现cprof确定正确时钟周期数的准确性为1.2%。与完成完整的RTL模拟所需的时间相比，它只需要一小部分时间。

引用次数: 0

A Rule-based Methodology for Hardware Configuration Validation in Embedded Systems 嵌入式系统中基于规则的硬件配置验证方法

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2906377

Lin Li, Philipp Wagner, Ramesh Ramaswamy, A. Mayer, Thomas Wild, A. Herkersdorf

As the complexity of multicore SoCs increases, more potential system issues are arising. Hardware-related configuration issues are becoming more complicated owing to the introduction of more cores and various complex peripherals. Considering the complexity of multicore programming, consultation of the main source of guidance, i.e. the user manual, is not an efficient approach to identify such problems. Improper hardware-related configurations could lead to either functional or performance issues. Some of these issues are even subtle and hard to detect. Therefore, a rule-based validation methodology is proposed to deal with hardware-related configuration issues in an efficient and reliable way. Hardware trace is applied in this methodology to detect issues even before symptoms appear. The method directly observes the register accesses and detects bugs based on trace data. It is independent of the application as long as they are run on the given platform, which means the same method implementation could be applied to any applications on the same platform. In this paper, an initial proof-of-concept for the proposed methodology has been implemented and demonstrated on the Infineon TC29 device.

随着多核soc复杂性的增加，出现了更多潜在的系统问题。由于引入了更多的内核和各种复杂的外设，与硬件相关的配置问题变得越来越复杂。考虑到多核编程的复杂性，咨询指导的主要来源，即用户手册，并不是确定这类问题的有效方法。不正确的硬件相关配置可能导致功能或性能问题。其中一些问题甚至很微妙，很难察觉。因此，提出了一种基于规则的验证方法，以高效可靠的方式处理硬件相关的配置问题。在此方法中应用硬件跟踪，以便在症状出现之前检测问题。该方法直接观察寄存器访问，并根据跟踪数据检测错误。只要在给定平台上运行，它就独立于应用程序，这意味着相同的方法实现可以应用于同一平台上的任何应用程序。本文已在英飞凌TC29器件上实现并演示了所提出方法的初步概念验证。

{"title":"A Rule-based Methodology for Hardware Configuration Validation in Embedded Systems","authors":"Lin Li, Philipp Wagner, Ramesh Ramaswamy, A. Mayer, Thomas Wild, A. Herkersdorf","doi":"10.1145/2906363.2906377","DOIUrl":"https://doi.org/10.1145/2906363.2906377","url":null,"abstract":"As the complexity of multicore SoCs increases, more potential system issues are arising. Hardware-related configuration issues are becoming more complicated owing to the introduction of more cores and various complex peripherals. Considering the complexity of multicore programming, consultation of the main source of guidance, i.e. the user manual, is not an efficient approach to identify such problems. Improper hardware-related configurations could lead to either functional or performance issues. Some of these issues are even subtle and hard to detect. Therefore, a rule-based validation methodology is proposed to deal with hardware-related configuration issues in an efficient and reliable way. Hardware trace is applied in this methodology to detect issues even before symptoms appear. The method directly observes the register accesses and detects bugs based on trace data. It is independent of the application as long as they are run on the given platform, which means the same method implementation could be applied to any applications on the same platform. In this paper, an initial proof-of-concept for the proposed methodology has been implemented and demonstrated on the Infineon TC29 device.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122137018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sporadic Task Handling in Time-Triggered Systems 时间触发系统中的零星任务处理

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2906383

Matthias Freier, Jian-Jia Chen

Scheduling of real-time applications is an important research topic. We consider a large-scale application consisting of 100--1000 tasks with inter-task communications, which can be represented by a task graph. For scheduling these applications, previous research results have shown that the time-triggered scheduling approach is capable to effectively utilize real-time platforms. However, the time-triggered scheduling approach only supports periodically activated tasks. Sporadic (aperiodic) tasks, which are also common in industrial applications, require additional treatments in time-triggered approaches. In this paper, we present a method to handle the sporadic tasks (that are not periodic) by shifting the time-triggered schedule. This method improves the responsiveness of the real-time sporadic tasks, whereas the schedule of the time-triggered tasks remains feasible. We define a time-triggered server to handle sporadic events and reserve time slots to ensure a safe recovery of the delayed time-triggered schedule. If a sporadic task arrives, this task starts its execution during the time-triggered server slot and the current time-triggered schedule is shifted. This paper provides the feasibility analysis for the time-triggered and the sporadic tasks under this slot shifting method. We determine time-triggered scheduling parameters to maximize the performance of the time-triggered server. Experiments confirm higher reachable system utilization by using our slot shifting approach.

实时应用的调度是一个重要的研究课题。我们考虑一个由100—1000个具有任务间通信的任务组成的大规模应用程序，它可以用任务图来表示。对于这些应用程序的调度，以往的研究结果表明，时间触发调度方法能够有效地利用实时平台。但是，时间触发调度方法只支持周期性激活的任务。零星(非周期性)任务在工业应用中也很常见，需要在时间触发方法中进行额外处理。在本文中，我们提出了一种通过改变时间触发调度来处理零星任务(非周期性)的方法。该方法提高了实时零星任务的响应性，而时间触发任务的调度仍然可行。我们定义了一个时间触发的服务器来处理零星事件和预留时间段，以确保延迟的时间触发调度的安全恢复。如果有零星任务到达，则该任务将在时间触发的服务器槽位开始执行，并转移当前时间触发的调度。本文对该方法下的时间触发任务和零星任务进行了可行性分析。我们确定时间触发的调度参数，以最大限度地提高时间触发服务器的性能。实验结果表明，该方法具有较高的可达系统利用率。

{"title":"Sporadic Task Handling in Time-Triggered Systems","authors":"Matthias Freier, Jian-Jia Chen","doi":"10.1145/2906363.2906383","DOIUrl":"https://doi.org/10.1145/2906363.2906383","url":null,"abstract":"Scheduling of real-time applications is an important research topic. We consider a large-scale application consisting of 100--1000 tasks with inter-task communications, which can be represented by a task graph. For scheduling these applications, previous research results have shown that the time-triggered scheduling approach is capable to effectively utilize real-time platforms. However, the time-triggered scheduling approach only supports periodically activated tasks. Sporadic (aperiodic) tasks, which are also common in industrial applications, require additional treatments in time-triggered approaches. In this paper, we present a method to handle the sporadic tasks (that are not periodic) by shifting the time-triggered schedule. This method improves the responsiveness of the real-time sporadic tasks, whereas the schedule of the time-triggered tasks remains feasible. We define a time-triggered server to handle sporadic events and reserve time slots to ensure a safe recovery of the delayed time-triggered schedule. If a sporadic task arrives, this task starts its execution during the time-triggered server slot and the current time-triggered schedule is shifted. This paper provides the feasibility analysis for the time-triggered and the sporadic tasks under this slot shifting method. We determine time-triggered scheduling parameters to maximize the performance of the time-triggered server. Experiments confirm higher reachable system utilization by using our slot shifting approach.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130904573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

From dataflow analysis basics to the programming of ASICs 从数据流分析基础到asic的编程

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2930673

M. Bekooij

Programming stream processing multiprocessor systems is a challenging task especially if there are real-time requirements. Therefore it is desirable to use formal models and real-time analysis techniques. However the classical periodic task-model does not match well with stream processing applications which results in suboptimal designs. In this talk we show that data-driven execution of stream processing application improves the robustness against faulty workload assumptions. Using the earlier-the-better-refinement theory practically useful deterministic timed-dataflow analysis models can be created of these applications. Strong analytical properties are obtained by reservation of resources in the multiprocessor systems. Compilation tools can hide the modelling effort for the programmers of the multiprocessor systems. Future cyber-physical systems can benefit from the higher level of non-determinism that is supported by the presented timed-dataflow analysis techniques.

编程流处理多处理器系统是一项具有挑战性的任务，特别是在有实时要求的情况下。因此，使用正式模型和实时分析技术是可取的。然而，传统的周期任务模型不能很好地与流处理应用相匹配，从而导致了次优设计。在这次演讲中，我们展示了数据驱动的流处理应用程序的执行提高了对错误工作负载假设的鲁棒性。使用越早细化越好的理论，可以为这些应用程序创建实用的确定性时间数据流分析模型。在多处理机系统中，通过预留资源获得了较强的解析性质。编译工具可以为多处理器系统的程序员隐藏建模工作。未来的信息物理系统可以从时序数据流分析技术所支持的更高级别的非确定性中受益。

引用次数: 0

RDBG: a Reactive Programs Extensible Debugger RDBG:一个响应式程序可扩展调试器

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2906372

Erwan Jahier

Debugging reactive programs requires to provide a lot of inputs -- at each reaction step. Moreover, because a reactive system reacts to an environment it tries to control, providing realistic inputs can be hard. The same considerations apply for automatic testing. This work take advantage on previous work on automated testing of reactive programs that close this feedback loop. This article demonstrates how to implement opportunistically such a debugging commands interpreter by taking advantage of an existing (ocaml) toplevel Read-Eval-Print Loop (REPL). Then it shows how a small kernel is enough to build a full-featured debugger with little effort. The given examples provide a tutorial for end-users that wish to write their own debugging primitives, fitting to their needs, or to tune existing ones. An orthogonal contribution of this article is to present an efficient way to implement the debugger coroutining using continuations. The Reactive programs DeBuGger (RDBG) prototype aims at being versatile and general enough to be able to deal with any reactive languages. We have experimented it on 2 synchronous programming: Lustre and Lutin.

调试反应性程序需要在每个反应步骤中提供大量输入。此外，由于反应性系统会对它试图控制的环境做出反应，因此很难提供真实的输入。同样的考虑也适用于自动测试。这项工作利用了之前对关闭反馈回路的反应性程序进行自动化测试的工作。本文演示了如何利用现有的(ocaml)顶层读取-执行-打印循环(REPL)来实现这样一个调试命令解释器。然后展示了一个小内核如何能够毫不费力地构建一个功能齐全的调试器。给定的示例为希望编写自己的调试原语以满足其需求或调优现有原语的最终用户提供了教程。本文的另一个贡献是提供了一种使用延续实现调试器协同调度的有效方法。响应式程序调试器(Reactive programs DeBuGger, RDBG)原型旨在实现通用性和通用性，以便能够处理任何响应式语言。我们在2个同步编程上进行了实验:Lustre和Lutin。

引用次数: 5

CSDFa: A Model for Exploiting the Trade-Off between Data and Pipeline Parallelism CSDFa:一个利用数据和管道并行性之间权衡的模型

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2906364

Peter Koek, Stefan J. Geuns, J. Hausmans, H. Corporaal, M. Bekooij

Real-time stream processing applications, such as Software Defined Radio applications, are often executed concurrently on multiprocessor systems. A unified data flow model and analysis method have been proposed that can be used to simultaneously determine the amount of pipeline and coarse-grained data parallelism required to meet the temporal constraints of such applications. However, this unified model is only defined for Synchronous Data Flow (SDF) graphs. Defining a unified model for a more expressive model such as Cyclo-Static Data Flow (CSDF) is not possible, because auto-concurrency can cause a time-dependent order of tokens and dependencies. This paper introduces the Cyclo-Static Data Flow with Auto-concurrency (CSDFa) model. In CSDFa, tokens have indices and the consumption order of tokens is static and time-independent. This allows expressing and trading off pipeline and coarse-grained data parallelism in a single, unified model. Furthermore, we introduce a new type of circular buffer that implements the same static order as is used by the CSDFa model. The overhead of operations on this buffer is independent of the amount of auto-concurrency, which corresponds to the constant firing durations in the CSDFa model. Exploiting the trade-off between data and pipeline parallelism with the CSDFa model is demonstrated with a part of a FMCW radar processing pipeline. We show that the CSDFa model enables optimizing the balance between processing units and memory, resulting in a significant reduction of silicon area. Additionally, it is shown that reducing the maximum allowed latency increases the minimum required amount of data parallelism by up to a factor of 16.

实时流处理应用程序，如软件定义无线电应用程序，通常在多处理器系统上并发执行。提出了一种统一的数据流模型和分析方法，可用于同时确定满足此类应用的时间约束所需的管道和粗粒度数据并行性的数量。然而，这个统一模型仅为同步数据流(SDF)图定义。不可能为更具表现力的模型(如循环静态数据流(CSDF))定义统一的模型，因为自动并发可能导致令牌和依赖项的顺序依赖于时间。本文介绍了具有自动并发性的循环静态数据流(CSDFa)模型。在CSDFa中，令牌有索引，令牌的消费顺序是静态的、与时间无关的。这允许在单个统一模型中表达和权衡管道和粗粒度数据并行性。此外，我们引入了一种新的循环缓冲区，它实现了与CSDFa模型使用的相同的静态顺序。此缓冲区上的操作开销与自动并发性的数量无关，自动并发性对应于CSDFa模型中的恒定触发持续时间。利用CSDFa模型在数据和管道并行性之间的权衡，以FMCW雷达处理管道的一部分为例进行了演示。我们表明，CSDFa模型可以优化处理单元和内存之间的平衡，从而显着减少硅面积。此外，减少允许的最大延迟可以将数据并行性所需的最小数量增加16倍。

{"title":"CSDFa: A Model for Exploiting the Trade-Off between Data and Pipeline Parallelism","authors":"Peter Koek, Stefan J. Geuns, J. Hausmans, H. Corporaal, M. Bekooij","doi":"10.1145/2906363.2906364","DOIUrl":"https://doi.org/10.1145/2906363.2906364","url":null,"abstract":"Real-time stream processing applications, such as Software Defined Radio applications, are often executed concurrently on multiprocessor systems. A unified data flow model and analysis method have been proposed that can be used to simultaneously determine the amount of pipeline and coarse-grained data parallelism required to meet the temporal constraints of such applications. However, this unified model is only defined for Synchronous Data Flow (SDF) graphs. Defining a unified model for a more expressive model such as Cyclo-Static Data Flow (CSDF) is not possible, because auto-concurrency can cause a time-dependent order of tokens and dependencies. This paper introduces the Cyclo-Static Data Flow with Auto-concurrency (CSDFa) model. In CSDFa, tokens have indices and the consumption order of tokens is static and time-independent. This allows expressing and trading off pipeline and coarse-grained data parallelism in a single, unified model. Furthermore, we introduce a new type of circular buffer that implements the same static order as is used by the CSDFa model. The overhead of operations on this buffer is independent of the amount of auto-concurrency, which corresponds to the constant firing durations in the CSDFa model. Exploiting the trade-off between data and pipeline parallelism with the CSDFa model is demonstrated with a part of a FMCW radar processing pipeline. We show that the CSDFa model enables optimizing the balance between processing units and memory, resulting in a significant reduction of silicon area. Additionally, it is shown that reducing the maximum allowed latency increases the minimum required amount of data parallelism by up to a factor of 16.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116236285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Compositional Temporal Analysis Method for Fixed Priority Pre-emptive Scheduled Modal Stream Processing Applications 固定优先级抢占式调度模态流处理应用的组合时间分析方法

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2906375

G. Kuiper, Stefan J. Geuns, J. Hausmans, M. Bekooij

Modal real-time stream processing applications often contain cyclic dependencies and are typically executed on multiprocessor systems with processor sharing. Most real-time operating system kernels for these systems support Static Priority Pre-emptive (SPP) scheduling, however there is currently no suitable temporal analysis technique available for this type of systems. In this paper, we present a compositional temporal analysis approach for modal and cyclic stream processing applications executed on SPP scheduled multiprocessor systems. In this approach, locks and barriers are added such that the temporal behavior of modes can be characterized independently. As a result, the composition of modes does not change their characterization. This enables the use of an existing Structured Variable-Rate Phased Dataflow (SVPDF) model based dataflow analysis technique to determine the worst-case temporal behavior. The SVPDF model and the parallel implementation including locks and barriers are generated by a multiprocessor compiler. The applicability of the analysis approach is demonstrated using a WLAN 802.11p application. Conditions under which pipelined execution can be achieved are identified. The analysis results are verified with a dataflow simulator that supports sharing of resources.

模态实时流处理应用程序通常包含循环依赖关系，并且通常在具有处理器共享的多处理器系统上执行。大多数用于这些系统的实时操作系统内核都支持静态优先级抢占(SPP)调度，但是目前还没有合适的时间分析技术可用于这种类型的系统。本文提出了一种组合时间分析方法，用于在SPP调度多处理器系统上执行的模态和循环流处理应用。在这种方法中，添加了锁和屏障，从而可以独立地表征模式的时间行为。因此，模态的组成不会改变它们的特性。这使得现有的基于结构化可变速率阶段性数据流(SVPDF)模型的数据流分析技术能够确定最坏情况下的时间行为。SVPDF模型和包含锁和屏障的并行实现由多处理器编译器生成。通过WLAN 802.11p应用程序演示了分析方法的适用性。确定了实现流水线执行的条件。利用支持资源共享的数据流模拟器对分析结果进行了验证。

{"title":"Compositional Temporal Analysis Method for Fixed Priority Pre-emptive Scheduled Modal Stream Processing Applications","authors":"G. Kuiper, Stefan J. Geuns, J. Hausmans, M. Bekooij","doi":"10.1145/2906363.2906375","DOIUrl":"https://doi.org/10.1145/2906363.2906375","url":null,"abstract":"Modal real-time stream processing applications often contain cyclic dependencies and are typically executed on multiprocessor systems with processor sharing. Most real-time operating system kernels for these systems support Static Priority Pre-emptive (SPP) scheduling, however there is currently no suitable temporal analysis technique available for this type of systems. In this paper, we present a compositional temporal analysis approach for modal and cyclic stream processing applications executed on SPP scheduled multiprocessor systems. In this approach, locks and barriers are added such that the temporal behavior of modes can be characterized independently. As a result, the composition of modes does not change their characterization. This enables the use of an existing Structured Variable-Rate Phased Dataflow (SVPDF) model based dataflow analysis technique to determine the worst-case temporal behavior. The SVPDF model and the parallel implementation including locks and barriers are generated by a multiprocessor compiler. The applicability of the analysis approach is demonstrated using a WLAN 802.11p application. Conditions under which pipelined execution can be achieved are identified. The analysis results are verified with a dataflow simulator that supports sharing of resources.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126350743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automatic Generation of Thread Communication Graphs from SystemC Source Code 自动生成线程通信图从SystemC源代码

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2906365

T. Schmidt, Guantao Liu, R. Dömer

In an ideal top-down system design flow, graphical diagrams are designed before an executable specification in a System Level Description Language (SLDL) is derived. Such initial charts typically also serve as visual documentation of the textual specification and aid in maintaining the model. In the absence of graphical charts, e.g. in case of legacy or 3rd party code, a textual SLDL model is hard to comprehend for any unfamiliar designer. Here, we propose to automatically extract graphical charts from given SystemC code to ease the understanding of the source code with a visual representation. Specifically, we extract the communication flow between the threads from the design model by use of an automatic SystemC compiler infrastructure that statically analyzes the code and generates custom Thread Communication Graphs (TCG) similar to message sequence charts. Our experimental results on embedded applications demonstrate that our novel static analysis can quickly extract accurate TCG that are highly useful for designers in becoming familiar with new source code.

在理想的自顶向下的系统设计流程中，在导出系统级描述语言(SLDL)中的可执行规范之前，先设计图形图。这样的初始图表通常还可以作为文本规范的可视化文档，并有助于维护模型。在没有图形图表的情况下，例如在遗留代码或第三方代码的情况下，文本SLDL模型对于任何不熟悉的设计人员来说都很难理解。在这里，我们建议从给定的SystemC代码中自动提取图形图表，以便通过可视化表示简化对源代码的理解。具体来说，我们通过使用自动的SystemC编译器基础结构从设计模型中提取线程之间的通信流，该结构静态地分析代码并生成类似于消息序列图的自定义线程通信图(TCG)。我们在嵌入式应用程序上的实验结果表明，我们的新静态分析可以快速提取准确的TCG，这对设计人员熟悉新源代码非常有用。

引用次数: 8

Vectorization in PyPy's Tracing Just-In-Time Compiler PyPy跟踪实时编译器中的矢量化

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2906384

Richard Plangger, A. Krall

PyPy is a widely known virtual machine for the Python programming language. PyPy itself is implemented in the statically typed subset of Python called RPython. RPython includes a tracing Just-In-Time (JIT) compiler and is capable of generating the compiler for a language from the specification of the interpreter for that language. In PyPy 4.0.0 we extended the tracing JIT compiler to support vectorization of loops and emit code for the SSE4 vector operations of the x86 instruction set. This article presents the details of the new vectorizer of PyPy. The vectorizer uses a loop unrolling approach to vectorization. It has been designed for efficient compilation as the compilation is done during the execution of the application. The scientific library NumPy introduced arrays which are homogeneous, primitive typed and contiguous in memory. These kind of arrays are used to avoid the problems with dynamic typing. Our contribution to PyPy's new vectorizer supports scalar and constant expansion, accumulator splitting for reductions, guard strengthening and array bounds check removal. The empirical evaluation shows that the vectorizer can gain speedups close to the theoretical optimum of the SSE4 instruction set.

PyPy是一个广为人知的Python编程语言虚拟机。PyPy本身是在Python的静态类型子集RPython中实现的。RPython包含跟踪JIT编译器，并能够根据语言的解释器规范为该语言生成编译器。在PyPy 4.0.0中，我们扩展了跟踪JIT编译器，以支持循环的向量化，并为x86指令集的SSE4矢量操作发出代码。本文介绍了新的PyPy矢量器的细节。向量化器使用循环展开方法进行向量化。它被设计为高效编译，因为编译是在应用程序执行期间完成的。科学库NumPy引入了同构的、原始类型的和内存中连续的数组。这些类型的数组用于避免动态类型的问题。我们对PyPy的新矢量器的贡献支持标量和常数扩展，用于约简的累加器分裂，保护加强和数组边界检查删除。经验评价表明，该矢量化器可以获得接近SSE4指令集理论最优的加速。

{"title":"Vectorization in PyPy's Tracing Just-In-Time Compiler","authors":"Richard Plangger, A. Krall","doi":"10.1145/2906363.2906384","DOIUrl":"https://doi.org/10.1145/2906363.2906384","url":null,"abstract":"PyPy is a widely known virtual machine for the Python programming language. PyPy itself is implemented in the statically typed subset of Python called RPython. RPython includes a tracing Just-In-Time (JIT) compiler and is capable of generating the compiler for a language from the specification of the interpreter for that language. In PyPy 4.0.0 we extended the tracing JIT compiler to support vectorization of loops and emit code for the SSE4 vector operations of the x86 instruction set. This article presents the details of the new vectorizer of PyPy. The vectorizer uses a loop unrolling approach to vectorization. It has been designed for efficient compilation as the compilation is done during the execution of the application. The scientific library NumPy introduced arrays which are homogeneous, primitive typed and contiguous in memory. These kind of arrays are used to avoid the problems with dynamic typing. Our contribution to PyPy's new vectorizer supports scalar and constant expansion, accumulator splitting for reductions, guard strengthening and array bounds check removal. The empirical evaluation shows that the vectorizer can gain speedups close to the theoretical optimum of the SSE4 instruction set.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123981073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Cross-Layer Reliability Modeling and Optimization: Compiler and Run-Time System Interactions 跨层可靠性建模和优化:编译器和运行时系统交互

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

Pub Date : 2016-05-23 DOI: 10.1145/2906363.2911171

M. Shafique, Semeen Rehman, F. Kriebel, J. Henkel

This paper presents a cross-layer reliability modeling and optimization approach that leverages multiple software layers like compiler and run-time system to improve the overall reliability considering unreliable or partially-reliable hardware. In order to bridge the gap between hardware and software to achieve high efficiency, our technique incorporates the knowledge from hardware layers during reliability modeling and design of optimization techniques. We demonstrate how different software layers operate synergistically to achieve a high degree of reliability.

本文提出了一种跨层可靠性建模和优化方法，该方法利用编译器和运行时系统等多个软件层来提高硬件不可靠或部分可靠的整体可靠性。在可靠性建模和优化技术设计中，我们的技术结合了硬件层的知识，以弥合硬件和软件之间的差距，从而实现高效率。我们演示了不同的软件层如何协同操作以实现高度的可靠性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀