International Conference on Hardware/Software Codesign and System Synthesis最新文献

英文中文

Traversal caches: a first step towards FPGA acceleration of pointer-based data structures 遍历缓存:迈向FPGA加速指针数据结构的第一步

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450150

G. Stitt, Gaurav Chaudhari, J. Coole

Field-programmable gate arrays (FPGAs) often achieve order of magnitude speedups compared to microprocessors, but typically have been unable to improve the performance of applications with irregular memory access patterns, such as traversals of pointer-based data structures. Due to the common use of these data structures, the applicability and widespread success of FPGAs has been limited. In this paper, we introduce the traversal cache framework - a first step towards improving the performance of FPGA applications that utilize pointer-based data structures. The traversal cache is a local FPGA memory that stores repeated traversals of pointer-based data structures, allowing for these traversals to be efficiently streamed into the FPGA. Although the cache is generally limited to improving applications that exhibit repeated traversals, we show that many applications in fact have this characteristic. Furthermore, we show that few repetitions are needed to achieve performance improvements. We present experimental results showing that FPGA implementations using the traversal cache framework achieve speedups ranging from 7x to 29x compared to pointer-based software on a 3.2 GHz Xeon.

与微处理器相比，现场可编程门阵列(fpga)通常可以实现数量级的速度提升，但通常无法提高具有不规则内存访问模式的应用程序的性能，例如遍历基于指针的数据结构。由于这些数据结构的普遍使用，限制了fpga的适用性和广泛成功。在本文中，我们介绍了遍历缓存框架，这是提高利用基于指针的数据结构的FPGA应用性能的第一步。遍历缓存是一个本地FPGA内存，存储基于指针的数据结构的重复遍历，允许这些遍历有效地流到FPGA中。尽管缓存通常仅限于改进表现出重复遍历的应用程序，但我们表明，实际上许多应用程序都具有这种特性。此外，我们表明，只需少量的重复就可以实现性能改进。我们提出的实验结果表明，与3.2 GHz至强处理器上基于指针的软件相比，使用遍历缓存框架的FPGA实现的速度提高了7倍到29倍。

{"title":"Traversal caches: a first step towards FPGA acceleration of pointer-based data structures","authors":"G. Stitt, Gaurav Chaudhari, J. Coole","doi":"10.1145/1450135.1450150","DOIUrl":"https://doi.org/10.1145/1450135.1450150","url":null,"abstract":"Field-programmable gate arrays (FPGAs) often achieve order of magnitude speedups compared to microprocessors, but typically have been unable to improve the performance of applications with irregular memory access patterns, such as traversals of pointer-based data structures. Due to the common use of these data structures, the applicability and widespread success of FPGAs has been limited. In this paper, we introduce the traversal cache framework - a first step towards improving the performance of FPGA applications that utilize pointer-based data structures. The traversal cache is a local FPGA memory that stores repeated traversals of pointer-based data structures, allowing for these traversals to be efficiently streamed into the FPGA. Although the cache is generally limited to improving applications that exhibit repeated traversals, we show that many applications in fact have this characteristic. Furthermore, we show that few repetitions are needed to achieve performance improvements. We present experimental results showing that FPGA implementations using the traversal cache framework achieve speedups ranging from 7x to 29x compared to pointer-based software on a 3.2 GHz Xeon.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134420644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

SPaC: a symbolic pareto calculator 符号帕累托计算器

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450176

H. Shojaei, T. Basten, M. Geilen, Phillip Stanley-Marbell

The compositional computation of Pareto points in multi-dimensional optimization problems is an important means to efficiently explore the optimization space. This paper presents a symbolic Pareto calculator, SPaC, for the algebraic computation of multidimensional trade-offs. SPaC uses BDDs as a representation for solution sets and operations on them. The tool can be used in multi-criteria optimization and design-space exploration of embedded systems. The paper describes the design and implementation of Pareto algebra operations, and it shows that BDDs can be used effectively in Pareto optimization.

多维优化问题中Pareto点的组合计算是有效探索优化空间的重要手段。本文提出了一种用于多维权衡代数计算的符号帕累托计算器(SPaC)。SPaC使用bdd作为解决方案集及其操作的表示。该工具可用于嵌入式系统的多准则优化和设计空间探索。本文描述了Pareto代数运算的设计和实现，并证明了bdd可以有效地用于Pareto优化。

引用次数: 6

Software optimization for MPSoC: a mpeg-2 decoder case study 软件优化的MPSoC: mpeg-2解码器的案例研究

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450146

Eric Cheung, H. Hsieh, F. Balarin

Using traditional software profiling to optimize embedded software in an MPSoC design is not reliable. With multiple processors running concurrently and programs interacting, traditional profiling on individual processors cannot capture useful execution information to assist software optimization. A new method to model parallel executions of interacting programs is needed. In this paper, we consider the software optimization problem for throughput-constrained MPSoC designs. We define the "longest delay path" as a sequence of steps leading to a throughput constraint violation and propose an algorithm to build up the path dynamically during simulation. Using an industrial-strength MPEG-2 decoder design in our case study and custom instructions for software optimization, we show that we can optimize the software efficiently in MPSoC designs using frequently executed statement information from the longest delay path.

在MPSoC设计中，使用传统的软件分析来优化嵌入式软件是不可靠的。由于多个处理器并发运行并且程序相互作用，单个处理器上的传统分析无法捕获有用的执行信息来辅助软件优化。需要一种新的方法来模拟交互程序的并行执行。在本文中，我们考虑了吞吐量受限的MPSoC设计的软件优化问题。我们将“最长延迟路径”定义为导致吞吐量约束违反的一系列步骤，并提出了在仿真过程中动态构建路径的算法。在我们的案例研究中使用工业强度的MPEG-2解码器设计和定制的软件优化指令，我们表明我们可以使用最长延迟路径中频繁执行的语句信息有效地优化MPSoC设计中的软件。

引用次数: 0

Dynamic tuning of configurable architectures: the AWW online algorithm 可配置架构的动态调优:AWW在线算法

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450158

Chen-Chun Huang, David Sheldon, F. Vahid

Architectures with software-writable parameters, or configurable architectures, enable runtime reconfiguration of computing platforms to the applications they execute. Such dynamic tuning can improve application performance, as well as energy. However, reconfiguring incurs a temporary performance cost. Thus, online algorithms are needed that decide when to reconfigure and which configuration to choose such that overall performance is optimized. We introduce the adaptive weighted window (AWW) algorithm, and compare with several other algorithms, including algorithms previously developed by the online algorithm community. We describe experiments showing that AWW results are within 4% of the offline optimal on average. AWW outperforms the other algorithms, and is robust across three datasets and across three categories of application sequences too. AWW improves a non-dynamic approach on average by 6%, and by up to 30% in low-reconfiguration-time situations.

具有软件可写参数的体系结构，或可配置的体系结构，使计算平台的运行时重新配置到它们所执行的应用程序。这种动态调优可以提高应用程序的性能，也可以降低能耗。但是，重新配置会产生暂时的性能成本。因此，需要在线算法来决定何时重新配置以及选择哪种配置以优化整体性能。我们介绍了自适应加权窗口(AWW)算法，并与其他几种算法进行了比较，包括在线算法社区先前开发的算法。我们描述的实验表明，AWW结果平均在离线最优的4%以内。AWW优于其他算法，并且在三个数据集和三类应用程序序列中都具有鲁棒性。AWW平均比非动态方法提高了6%，在低重新配置时间的情况下提高了30%。

引用次数: 10

Specification-based compaction of directed tests for functional validation of pipelined processors 基于规范的定向测试压缩，用于流水线处理器的功能验证

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450167

Heon-Mo Koo, P. Mishra

Functional validation is a major bottleneck in microprocessor design methodology. Simulation is the widely used method for functional validation using billions of random and biased-random test programs. Although directed tests require a smaller test set compared to random tests to achieve the same functional coverage goal, there is a lack of automated techniques for directed test generation. Furthermore, the number of directed tests can still be prohibitively large. This paper presents a methodology for specification-based coverage analysis and test generation. The primary contribution of this paper is a compaction technique that can drastically reduce the required number of directed test programs to achieve a coverage goal. Our experimental results using a MIPS processor and an industrial processor (e500) demonstrate more than 90% reduction in number of directed tests without sacrificing the functional coverage goal.

功能验证是微处理器设计方法的主要瓶颈。模拟是广泛使用的功能验证方法，使用数十亿个随机和偏随机测试程序。尽管与随机测试相比，定向测试需要更小的测试集来实现相同的功能覆盖目标，但是缺乏用于定向测试生成的自动化技术。此外，定向测试的数量仍然可能大得令人望而却步。本文提出了一种基于规范的覆盖率分析和测试生成的方法。本文的主要贡献是一种压缩技术，它可以大大减少为实现覆盖目标而需要的直接测试程序的数量。我们使用MIPS处理器和工业处理器(e500)的实验结果表明，在不牺牲功能覆盖目标的情况下，直接测试的数量减少了90%以上。

引用次数: 8

You can catch more bugs with transaction level honey 你可以用事务级蜂蜜捕获更多的bug

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450163

M. Abramovici, K. Goossens, B. Vermeulen, J. Greenbaum, N. Stollon, A. Donlin

In this special session we explore holistic approaches to hardware/software debug that use or integrate transaction level models (TLMs). We present several TLM-based approaches to system-level diagnostics, ranging from use of most popular transaction level modeling languages through to hybrid technologies that combine TLMs with other well known diagnostic tools like in-silicon trace logic.

在这个特别的会议中，我们将探讨使用或集成事务级模型(tlm)的硬件/软件调试的整体方法。我们提出了几种基于tlm的系统级诊断方法，包括使用最流行的事务级建模语言，以及将tlm与其他众所周知的诊断工具(如硅内跟踪逻辑)相结合的混合技术。

引用次数: 7

Guaranteed scheduling for repetitive hard real-time tasks under the maximal temperature constraint 最大温度约束下重复性硬实时任务的保证调度

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450196

Gang Quan, Yan Zhang, William Wiles, Pei Pei

We study the problem of scheduling repetitive real-time tasks with the Earliest Deadline First (EDF) policy that can guarantee the given maximal temperature constraint. We show that the traditional scheduling approach, i.e., to repeat the schedule that is feasible through the range of one hyper-period, does not apply any more. Then, we present necessary and sufficient conditions for real-time schedules to guarantee the maximal temperature constraint. Based on these conditions, a novel scheduling algorithm is proposed for developing the appropriate schedule that can ensure the maximal temperature guarantee. Finally, we use experiments to evaluate the performance of our approach.

研究了在给定最大温度约束条件下，采用最早截止日期优先(EDF)策略调度重复性实时任务的问题。我们证明了传统的调度方法，即在一个超周期的范围内重复可行的调度，不再适用。然后给出了保证最大温度约束的实时调度的充分必要条件。在此基础上，提出了一种新的调度算法，以制定能够保证最大温度保证的合适调度。最后，我们用实验来评估我们的方法的性能。

引用次数: 39

Asynchronous transient resilient links for NoC NoC异步瞬态弹性链路

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450182

S. Ogg, B. Al-Hashimi, A. Yakovlev

This paper proposes a new link for asynchronous NoC communications that is resilient to transient faults on the wires of the link without impact on the data transfer capability. Resilience to transients is achieved by exploiting the phase relationship between data symbols and a common reference symbol where the symbols are transmitted using additional wires. Detection of transient faults is performed by comparison of the data symbol and the reference symbol. We demonstrate it is possible to achieve a similar number of transitions per bit as existing delay insensitive codes, from a power consumption point of view, but achieving resilience to transient faults. The link has been synthesized and validated using 0.12 ¼m technology and power, area and performance are given. It has been shown that the link area cost is 409 ¼m2 per data bit and energy per bit is 356 fJ/bit. Latency through the link is 0.8 ns and the maximum operating frequency or throughput of the link is 1.056 GHz.

本文提出了一种异步NoC通信的新链路，该链路可以在不影响数据传输能力的情况下对链路上的瞬时故障进行恢复。通过利用数据符号和公共参考符号之间的相位关系实现对瞬变的弹性，其中符号使用额外的导线传输。通过比较数据符号和参考符号来实现暂态故障的检测。从功耗的角度来看，我们证明有可能实现与现有延迟不敏感码相似的每比特转换次数，但实现对瞬态故障的弹性。采用0.12 μ m工艺对该连杆进行了合成和验证，并给出了功率、面积和性能。结果表明，每数据位的链路面积成本为409¼m2，每比特的能量为356fj /bit。链路的时延为0.8 ns，最大工作频率或吞吐量为1.056 GHz。

引用次数: 22

Extending open core protocol to support system-level cache coherence 扩展开放核心协议以支持系统级缓存一致性

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450173

K. Aisopos, Chien-Chun Chou, L. Peh

Open Core Protocol (OCP) is a standard on-chip core interface specification. The current release is flexible and configurable to support the communication needs of a wide range of Intellectual Property cores, and is now in widespread use. However, it does not support system-level coherence. This paper summarizes an effort within the OCP-IP cache coherence working group on incorporating cache coherence extensions into OCP, which is expected to have strong impact on the MPSoC industry. In this paper, we propose a backward-compatible coherent Open Core Protocol interface and discuss the design challenges and implications introduced. This interface is flexible and can support a range of coherence protocols and schemes: we show how it can specify a snoopy bus-based scheme as well as a directory-based scheme. The correctness of the specification and models was verified using NuSMV, via exploring the entire state space for the two basic coherence schemes.

开放核心协议(OCP)是一个标准的片上核心接口规范。当前的版本是灵活的和可配置的，以支持广泛的知识产权核心的通信需求，现在被广泛使用。然而，它不支持系统级的一致性。本文总结了OCP- ip缓存一致性工作组在将缓存一致性扩展纳入OCP方面所做的努力，预计这将对MPSoC行业产生重大影响。在本文中，我们提出了一个向后兼容的连贯开放核心协议接口，并讨论了所引入的设计挑战和影响。这个接口是灵活的，可以支持一系列的一致性协议和方案:我们展示了它如何指定一个基于snoopy总线的方案以及一个基于目录的方案。通过探索两种基本相干方案的整个状态空间，利用NuSMV验证了规范和模型的正确性。

引用次数: 8

Scratchpad allocation for concurrent embedded software 并发嵌入式软件的刮记板分配

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450145

Vivy Suhendra, Abhik Roychoudhury, T. Mitra

Software-controlled scratchpad memory is increasingly employed in embedded systems as it offers better timing predictability compared to caches. Previous scratchpad allocation algorithms typically consider single process applications. But embedded applications are mostly multi-tasking with real-time constraints, where the scratchpad memory space has to be shared among interacting processes that may preempt each other. In this paper, we develop a novel dynamic scratchpad allocation technique that takes these process interferences into account to improve the performance and predictability of the memory system. We model the application as a Message Sequence Chart (MSC) to best capture the interprocess interactions. Our goal is to optimize the worst-case response time (WCRT) of the application through runtime reloading of the scratchpad memory content at appropriate execution points. We propose an iterative allocation algorithm that consists of two critical steps: (1) analyze the MSC along with the existing allocation to determine potential interference patterns, and (2) exploit this interference information to tune the scratchpad reloading points and content so as to best improve the WCRT. We evaluate our memory allocation scheme on a real-world embedded application controlling an Unmanned Aerial Vehicle (UAV).

软件控制的刮记存储器越来越多地应用于嵌入式系统，因为与缓存相比，它提供了更好的时间可预测性。以前的刮刮板分配算法通常考虑单进程应用程序。但是嵌入式应用程序大多是具有实时限制的多任务，在这种情况下，刮板内存空间必须在可能相互抢占的交互进程之间共享。在本文中，我们开发了一种新的动态刮记板分配技术，该技术考虑了这些过程干扰，以提高存储系统的性能和可预测性。我们将应用程序建模为消息序列图(Message Sequence Chart, MSC)，以便最好地捕获进程间交互。我们的目标是通过在适当的执行点重新加载临时内存内容来优化应用程序的最坏情况响应时间(WCRT)。我们提出了一种迭代分配算法，该算法包括两个关键步骤:(1)分析MSC和现有分配，以确定潜在的干扰模式;(2)利用这些干扰信息来调整刮刮板的重新加载点和内容，从而最大限度地改进WCRT。我们在一个控制无人机的实际嵌入式应用中评估了我们的内存分配方案。

{"title":"Scratchpad allocation for concurrent embedded software","authors":"Vivy Suhendra, Abhik Roychoudhury, T. Mitra","doi":"10.1145/1450135.1450145","DOIUrl":"https://doi.org/10.1145/1450135.1450145","url":null,"abstract":"Software-controlled scratchpad memory is increasingly employed in embedded systems as it offers better timing predictability compared to caches. Previous scratchpad allocation algorithms typically consider single process applications. But embedded applications are mostly multi-tasking with real-time constraints, where the scratchpad memory space has to be shared among interacting processes that may preempt each other. In this paper, we develop a novel dynamic scratchpad allocation technique that takes these process interferences into account to improve the performance and predictability of the memory system. We model the application as a Message Sequence Chart (MSC) to best capture the interprocess interactions. Our goal is to optimize the worst-case response time (WCRT) of the application through runtime reloading of the scratchpad memory content at appropriate execution points. We propose an iterative allocation algorithm that consists of two critical steps: (1) analyze the MSC along with the existing allocation to determine potential interference patterns, and (2) exploit this interference information to tune the scratchpad reloading points and content so as to best improve the WCRT. We evaluate our memory allocation scheme on a real-world embedded application controlling an Unmanned Aerial Vehicle (UAV).","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122249039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Conference on Hardware/Software Codesign and System Synthesis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀