First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)最新文献

英文中文

A low-cost and low-power multi-standard video encoder 一种低成本、低功耗的多标准视频编码器

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944675

R. Llopis, R. Sethuraman, C. A. Pinto, H. Peters, S. Maul, M. Oosterhuis

Video encoders are an important IP block in mobile multimedia systems. In this paper, we describe a low-cost low-power multi-standard (MPEG4, JPEG, and H.263) video/image encoder. The low-cost and low-power aspects are achieved by the right choice of algorithms and architectures. In the algorithm front, an embedded compression technique for reducing the size of loop memory has enabled a single-chip low-cost realization of the encoder. Further, the hardware components that accelerate the kernels of encoding are implemented as application specific instruction-set processors (ASIPs) thereby providing flexibility to address multi-standard encoding. The power and area estimates for the encoder for QCIF@15fps in 0.18 /spl mu/m CMOS technology are 30 mW and 20 mm/sup 2/ respectively including the loop memory.

视频编码器是移动多媒体系统中重要的IP模块。本文介绍了一种低成本、低功耗的多标准(MPEG4、JPEG和H.263)视频/图像编码器。低成本和低功耗方面是通过正确选择算法和架构来实现的。在算法方面，一种用于减小循环存储器尺寸的嵌入式压缩技术使编码器的单芯片低成本实现成为可能。此外，加速编码内核的硬件组件被实现为特定于应用程序的指令集处理器(asip)，从而提供了处理多标准编码的灵活性。在0.18 /spl mu/m CMOS技术中，QCIF@15fps编码器的功率和面积估计分别为30 mW和20 mm/sup / /，包括环路存储器。

引用次数: 9

RTOS scheduling in transaction level models 事务级模型中的RTOS调度

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944653

Haobo Yu, A. Gerstlauer, D. Gajski

Raising the level of abstraction in system design promises to enable faster exploration of the design space at early stages. While scheduling decision for embedded software has great impact on system performance, it's much desired that the designer can select the right scheduling algorithm at high abstraction levels so as to save him from the error-prone and time consuming task of tuning code delays or task priority assignments at the final stage of system design. In this paper we tackle this problem by introducing a RTOS model and an approach to refine any unscheduled transaction level model (TLM) to a TLM with RTOS scheduling support. The refinement process provides a useful tool to the system designer to quickly evaluate different dynamic scheduling algorithms and make the optimal choice at an early stage of system design.

提高系统设计中的抽象层次，可以在早期阶段更快地探索设计空间。嵌入式软件的调度决策对系统性能有很大的影响，设计者希望能够在较高的抽象层次上选择合适的调度算法，从而避免在系统设计的最后阶段进行易出错且耗时的代码延迟调优或任务优先级分配等任务。在本文中，我们通过引入RTOS模型和一种方法来解决这个问题，该方法将任何未调度的事务级模型(TLM)改进为具有RTOS调度支持的TLM。该优化过程为系统设计者快速评估不同的动态调度算法并在系统设计的早期阶段做出最优选择提供了一个有用的工具。

引用次数: 45

A fault model notation and error-control scheme for switch-to-switch buses in a network-on-chip 片上网络中交换总线的故障模型标记和错误控制方案

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944694

H. Zimmer, A. Jantsch

The reliability of a network-on-chip will be significantly influenced by the reliability of the switch-to-switch connections. Faults on these buses may cause disturbances on multiple adjacent wires, so that errors on these wires can no longer be considered as statistically independent from one another, as it is expected due to deep submicron effects. A new fault model notation for buses is proposed which can represent multiple-wire, multiple-cycle faults. An estimation method based on this notation is presented which can accurately predict error probabilities. This method is used to examine bus encoding schemes. Finally, an encoding scheme for four quality-of-service classes is proposed which can be dynamically selected for each packet.

片上网络的可靠性将受到交换机到交换机连接可靠性的显著影响。这些总线上的故障可能会对多个相邻的电线造成干扰，因此这些电线上的错误不能再被认为是统计上相互独立的，因为它是由于深亚微米效应所期望的。提出了一种新的总线故障模型表示法，可以表示多线、多周期故障。在此基础上提出了一种能准确预测误差概率的估计方法。该方法用于检查总线编码方案。最后，提出了四种服务质量分类的编码方案，可以对每个分组进行动态选择。

引用次数: 145

A low power scheduler using game theory 使用博弈论的低功耗调度程序

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944681

N. Ranganathan, A. Murugavel

In this paper, we describe a new methodology based on game theory for minimizing the average power of a circuit during scheduling in behavioral synthesis. The problem of scheduling in data-path synthesis is formulated as an auction based non-cooperative finite game, for which solutions are developed based on the Nash equilibrium function. Each operation in the data-path is modeled as a player bidding for executing an operation in the given control cycle, with the estimated power consumption as the bid. Also, a combined scheduling and binding algorithm is developed using a similar approach in which the two tasks are modeled together such that the Nash equilibrium function needs to be applied only once to accomplish both the scheduling and binding tasks together. The combined algorithm yields further power reduction due to additional savings during binding. The proposed algorithms yield better power reduction than ILP-based methods with comparable run times and no increase in area overhead.

本文提出了一种基于博弈论的行为综合调度中电路平均功率最小化的新方法。将数据路径综合中的调度问题描述为一个基于竞价的非合作有限对策问题，并基于纳什均衡函数给出了求解方法。数据路径中的每个操作都被建模为在给定控制周期内执行操作的参与者竞标，以估计的功耗作为竞标。此外，使用类似的方法开发了一种组合调度和绑定算法，其中两个任务一起建模，使得纳什均衡函数只需应用一次即可同时完成调度和绑定任务。由于绑定期间的额外节省，组合算法进一步降低了功耗。与基于ilp的方法相比，所提出的算法具有更好的功耗降低效果，并且具有相当的运行时间，并且不会增加面积开销。

引用次数: 11

Tracking object life cycle for leakage energy optimization 跟踪目标生命周期，优化泄漏能量

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944701

Guangyu Chen, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, M. Wolczko

The focus of this work is on utilizing the state of objects during their lifespan in optimizing the leakage energy consumed in the data caches when executing embedded Java applications. Our analysis reveals that a major portion of the leakage energy is actually wasted in retaining the objects beyond their last use. In order to eliminate this wastage, we investigate three approaches that use the garbage collector, escape analysis and last use analysis for reducing leakage energy. Finally, we track the access gap between successive object accesses to reduce leakage energy of live objects. A combination of these schemes is shown to provide 21% data cache leakage energy reduction in our default configuration.

这项工作的重点是在执行嵌入式Java应用程序时，利用对象在其生命周期中的状态来优化数据缓存中消耗的泄漏能量。我们的分析表明，泄漏能量的很大一部分实际上是浪费在保留物体超过最后一次使用。为了消除这种浪费，我们研究了使用垃圾收集器、逸出分析和最后使用分析三种方法来减少泄漏能量。最后，我们跟踪连续对象访问之间的访问间隙，以减少活动对象的泄漏能量。在我们的默认配置中，这些方案的组合显示可以提供21%的数据缓存泄漏能量减少。

引用次数: 5

Transaction level modeling: an overview 事务级建模:概述

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944651

Lukai Cai, D. Gajski

Recently, the transaction-level modeling has been widely referred to in system-level design community. However, the transaction-level models (TLMs) are not well defined and the usage of TLMs in the existing design domains, namely modeling, validation, refinement, exploration, and synthesis, is not well coordinated. This paper introduces a TLM taxonomy and compares the benefits of TLMs' use.

近年来，事务级建模在系统级设计领域得到了广泛的应用。然而，事务级模型(tlm)没有得到很好的定义，并且tlm在现有设计领域(即建模、验证、细化、探索和综合)中的使用没有得到很好的协调。本文介绍了一种TLM分类法，并比较了使用TLM的好处。

引用次数: 632

Architecture and synthesis for multi-cycle on-chip communication 多周期片上通信的体系结构与综合

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944667

J. Cong, Yiping Fan, Guoling Han, Xun Yang, Zhiru Zhang

There are two important infection points in the development of deep submicron (DSM) process technologies. The first point is when the average interconnect delay exceeds the gate delay, which happened during mid 1990s and led to the so-called timing closure problem. The second point is when single-cycle full chip synchronization is no longer possible, which is about to happen soon. It can be shown that, even with the aggressive interconnect optimization techniques (e.g., buffer insertion and wire-sizing), 5 clock cycles are still needed to go from corner-to-corner for the die of 28.3 mm /spl times/ 28.3 mm in the 0.07 /spl mu/m technology generation, assuming a 5.63 GHz clock by 2006 predicted in ITRS'01 (2001). This clearly suggests that multi-cycle on-chip communication is a necessity in multi-gigahertz synchronous designs. However, it is not supported in the current design tools and methodologies, as most of these implicitly assume that full chip synchronization in a single clock cycle is feasible. Our contributions are as follows: (i) we propose a regular distributed register (RDR) microarchitecture which offers high regularity and direct support of multi-cycle communication; (ii) we develop a set of novel architectural synthesis algorithms to efficiently synthesize behavior-level designs onto the RDR architecture.

深亚微米(DSM)工艺技术的发展有两个重要的关键点。第一个点是当平均互连延迟超过门延迟时，这种情况发生在20世纪90年代中期，导致所谓的定时关闭问题。第二个点是单周期全芯片同步不再可能，这很快就会发生。可以证明，即使采用积极的互连优化技术(例如，缓冲器插入和导线尺寸)，对于28.3 mm /spl倍/ 28.3 mm的0.07 /spl mu/m技术一代的芯片，仍然需要5个时钟周期，假设ITRS'01(2001)预测到2006年的5.63 GHz时钟。这清楚地表明，在多千兆赫同步设计中，多周期片上通信是必要的。然而，目前的设计工具和方法并不支持它，因为大多数设计工具和方法都隐含地假设在单个时钟周期内实现全芯片同步是可行的。我们的贡献如下:(i)我们提出了一个规则的分布式寄存器(RDR)微架构，它提供了高规律性和直接支持多周期通信;(ii)我们开发了一套新颖的架构合成算法，以有效地将行为级设计合成到RDR架构上。

{"title":"Architecture and synthesis for multi-cycle on-chip communication","authors":"J. Cong, Yiping Fan, Guoling Han, Xun Yang, Zhiru Zhang","doi":"10.1145/944645.944667","DOIUrl":"https://doi.org/10.1145/944645.944667","url":null,"abstract":"There are two important infection points in the development of deep submicron (DSM) process technologies. The first point is when the average interconnect delay exceeds the gate delay, which happened during mid 1990s and led to the so-called timing closure problem. The second point is when single-cycle full chip synchronization is no longer possible, which is about to happen soon. It can be shown that, even with the aggressive interconnect optimization techniques (e.g., buffer insertion and wire-sizing), 5 clock cycles are still needed to go from corner-to-corner for the die of 28.3 mm /spl times/ 28.3 mm in the 0.07 /spl mu/m technology generation, assuming a 5.63 GHz clock by 2006 predicted in ITRS'01 (2001). This clearly suggests that multi-cycle on-chip communication is a necessity in multi-gigahertz synchronous designs. However, it is not supported in the current design tools and methodologies, as most of these implicitly assume that full chip synchronization in a single clock cycle is feasible. Our contributions are as follows: (i) we propose a regular distributed register (RDR) microarchitecture which offers high regularity and direct support of multi-cycle communication; (ii) we develop a set of novel architectural synthesis algorithms to efficiently synthesize behavior-level designs onto the RDR architecture.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126780788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Pareto-optimization-based run-time task scheduling for embedded systems 基于pareto优化的嵌入式系统运行时任务调度

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944680

Peng Yang, F. Catthoor

Pareto-set-based optimization can be found in several different areas of embedded system design. One example is task scheduling, where different task mapping and ordering choices for a target platform will lead to different performance/cost tradeoffs. To explore this design space at runtime, a fast and effective heuristic is needed. We have modeled the problem as the well known Multiple Choice Knapsack Problem (MCKP) and have developed a fast greedy heuristic for the run-time task scheduling. To show the effectiveness of our algorithm, examples from randomly generated task graphs and realistic applications are studied. Compared to the optimal dynamic programming solver, the heuristic is more than ten times faster while the result is less than 5% away from the optimum. Moreover, due to its iterative feature, the algorithm is well suitable to be used as an online algorithm.

基于帕累托集的优化可以在嵌入式系统设计的几个不同领域中找到。一个例子是任务调度，其中针对目标平台的不同任务映射和排序选择将导致不同的性能/成本权衡。为了在运行时探索这个设计空间，需要一种快速有效的启发式方法。我们将该问题建模为众所周知的多选题背包问题(Multiple Choice backpack problem, MCKP)，并开发了一种用于运行时任务调度的快速贪婪启发式算法。为了证明算法的有效性，本文还对随机生成任务图的实例和实际应用进行了研究。与最优动态规划求解器相比，启发式算法的求解速度提高了10倍以上，而求解结果与最优解的误差小于5%。此外，由于其迭代特性，该算法非常适合作为在线算法使用。

引用次数: 81

An efficient retargetable framework for instruction-set simulation 指令集仿真的有效可重定向框架

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944649

Mehrdad Reshadi, N. Bansal, P. Mishra, N. Dutt

Instruction-set structure (ISA) simulators are an integral part of today's processor and software design process. While increasing complexity of the architectures demands high performance simulation, the increasing variety of available architectures makes retargetability a critical feature of an instruction-set simulator. Retargetability requires generic models while high performance demands target specific customizations. To address these contradictory requirements, we have developed a generic instruction model and a generic decode algorithm that facilitates easy and efficient retargetability of the ISA-simulator for a wide range of processor architectures such as RISC, CISC, VLIW and variable length instruction set processors. The instruction model is used to generate compact and easy to debug instruction descriptions that are very similar to that of architecture manual. These descriptions are used to generate high performance simulators. The generation of the simulator is completely separate from the simulation engine. Hence, we can incorporate any fast simulation technique in our retargetable framework without losing performance. We illustrate the retargetability of our approach using two popular, yet different realistic architectures: the Sparc and the ARM.

指令集结构(ISA)模拟器是当今处理器和软件设计过程中不可或缺的一部分。随着体系结构复杂性的增加，对高性能仿真的要求越来越高，可用体系结构种类的增加使得可重定向性成为指令集模拟器的一个关键特征。可重定向性需要通用模型，而高性能需要针对特定的定制。为了解决这些相互矛盾的要求，我们开发了一种通用指令模型和通用解码算法，以促进isa模拟器在各种处理器架构(如RISC, CISC, VLIW和可变长度指令集处理器)中的轻松有效的可重定向性。指令模型用于生成紧凑且易于调试的指令描述，这些指令描述与架构手册非常相似。这些描述用于生成高性能模拟器。模拟器的生成完全独立于仿真引擎。因此，我们可以在不损失性能的情况下将任何快速仿真技术合并到我们的可重定向框架中。我们使用两种流行但不同的现实架构来说明我们方法的可重定向性:Sparc和ARM。

{"title":"An efficient retargetable framework for instruction-set simulation","authors":"Mehrdad Reshadi, N. Bansal, P. Mishra, N. Dutt","doi":"10.1145/944645.944649","DOIUrl":"https://doi.org/10.1145/944645.944649","url":null,"abstract":"Instruction-set structure (ISA) simulators are an integral part of today's processor and software design process. While increasing complexity of the architectures demands high performance simulation, the increasing variety of available architectures makes retargetability a critical feature of an instruction-set simulator. Retargetability requires generic models while high performance demands target specific customizations. To address these contradictory requirements, we have developed a generic instruction model and a generic decode algorithm that facilitates easy and efficient retargetability of the ISA-simulator for a wide range of processor architectures such as RISC, CISC, VLIW and variable length instruction set processors. The instruction model is used to generate compact and easy to debug instruction descriptions that are very similar to that of architecture manual. These descriptions are used to generate high performance simulators. The generation of the simulator is completely separate from the simulation engine. Hence, we can incorporate any fast simulation technique in our retargetable framework without losing performance. We illustrate the retargetability of our approach using two popular, yet different realistic architectures: the Sparc and the ARM.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121221402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

The analysis and design of architecture systems for speech recognition on modern handheld-computing devices 基于现代手持计算设备的语音识别体系结构分析与设计

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944661

Andreas Hagen, D. Connors, B. Pellom

Growing demand for high performance in embedded systems is creating new opportunities to use speech recognition systems. In several ways, the needs of embedded computing differ from those of more traditional general-purpose systems. Embedded systems have more stringent constraints on cost and power consumption that lead to design bottlenecks for many computationally-intensive applications. This paper characterizes the speech recognition process on handheld mobile devices and evaluates the use of modern architecture features and compiler techniques for performing real-time speech recognition. We evaluate the University of Colorado sonic speech recognition software on the IMPACT architectural simulator and compiler framework. Experimental results show that by using a strategic set of compiler optimization, a 500 MHz processor with moderate levels of instruction-level parallelism and cache resources can meet the real-time computing and power constraints of an advanced speech recognition application.

嵌入式系统对高性能的需求不断增长，为使用语音识别系统创造了新的机会。在几个方面，嵌入式计算的需求不同于传统的通用系统的需求。嵌入式系统在成本和功耗方面有更严格的限制，这导致许多计算密集型应用的设计瓶颈。本文描述了手持移动设备上的语音识别过程，并评估了用于执行实时语音识别的现代架构特征和编译器技术的使用。我们在IMPACT架构模拟器和编译器框架上评估了科罗拉多大学的声音语音识别软件。实验结果表明，通过一组编译器优化策略，具有中等指令级并行性和缓存资源的500mhz处理器可以满足高级语音识别应用的实时计算和功耗限制。

引用次数: 30

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀