Proceedings. 41st Design Automation Conference, 2004.最新文献

英文中文

Area-efficient instruction set synthesis for reconfigurable system-on-chip designs 面向可重构片上系统设计的区域高效指令集综合

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996679

P. Brisk, A. Kaplan, M. Sarrafzadeh

Silicon compilers are often used in conjunction with Field Programmable Gate Arrays (FPGAs) to deliver flexibility, fast prototyping, and accelerated time-to-market. Many of these compilers produce hardware that is larger than necessary, as they do not allow instructions to share hardware resources. This study presents an efficient heuristic which transforms a set of custom instructions into a single hardware datapath on which they can execute. Our approach is based on the classic problems of finding the longest common subsequence and substring of two (or more) sequences. This heuristic produces circuits which are as much as 85.33% smaller than those synthesized by integer linear programming (ILP) approaches which do not explore resource sharing. On average, we obtained 55.41% area reduction for pipelined datapaths, and 66.92% area reduction for VLIW datapaths. Our solution is simple and effective, and can easily be integrated into an existing silicon compiler.

硅编译器通常与现场可编程门阵列(fpga)结合使用，以提供灵活性，快速原型设计和加速上市时间。许多这样的编译器产生的硬件比需要的大，因为它们不允许指令共享硬件资源。本研究提出了一种有效的启发式方法，将一组自定义指令转换为单个硬件数据路径，并在该路径上执行这些指令。我们的方法是基于寻找两个(或多个)序列的最长公共子序列和子串的经典问题。这种启发式方法比不探索资源共享的整数线性规划(ILP)方法合成的电路小85.33%。管道数据路径的平均面积减少55.41%，VLIW数据路径的平均面积减少66.92%。我们的解决方案简单有效，可以很容易地集成到现有的硅编译器中。

引用次数: 103

A fast hardware/software co-verification method for systern-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication 一种基于C/ c++模拟器和FPGA模拟器的片上系统的快速软硬件协同验证方法

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996655

Yuichi Nakamura, Koh Hosokawa, I. Kuroda, Ko Yoshikawa, T. Yoshimura

This paper describes a new hardware/software co-verification method for System-On-a-Chip, based on the integration of a C/C++ simulator and an inexpensive FPGA emulator. Communication between the simulator and emulator occurs via a flexible interface based on shared communication registers. This method enables easy debugging, rich portability, and high verification speed, at a low cost. We describe the application of this environment to the verification of three different complex commercial SoCs, supporting concurrent hardware and embedded software development. In these projects, our verification methodology was used to perform complete system verification at 0.2-1.1 MHz, while supporting full graphical interface functions such as "waveform" or "signal dump" viewers, and debugging functions such as "step" or "break".

本文介绍了一种基于C/ c++仿真器和廉价FPGA仿真器集成的单片系统软硬件协同验证方法。仿真器和仿真器之间的通信通过基于共享通信寄存器的灵活接口进行。该方法调试简单，可移植性强，验证速度快，成本低。我们描述了该环境的应用，以验证三种不同的复杂商用soc，支持并发硬件和嵌入式软件开发。在这些项目中，我们的验证方法被用于在0.2-1.1 MHz进行完整的系统验证，同时支持完整的图形界面功能，如“波形”或“信号转储”查看器，以及调试功能，如“步进”或“中断”。

引用次数: 68

An algorithm for converting floating-point computations to fixed-point in MATLAB based FPGA design 基于MATLAB的FPGA设计中浮点运算到定点运算的转换算法

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996701

Sanghamitra Roy, P. Banerjee

Most practical FPGA designs of digital signal processing applications are limited to fixed-point arithmetic owing to the cost and complexiry of floating-point hardware. While mapping DSP applications onto FPGAs, a DSP algorithm designer, who often develops his applications in MATLAB, must determine the dynamic range and desired precision of input, intermediate and output signals in a design implementation to ensure that the algorithm fidelity criteria are met. The first step in a flow to map MATLAB applications into hardware is the conversion of the floating-point MATLAB algorithm into a fixed-point version. This paper describes an approach to automate this conversion, for mapping to FPGAs by profiling the expected inputs to estimate errors. Our algorithm attempts to minimize the hardware resources while constraining the quantization error within a specified limit

由于浮点硬件的成本和复杂性，大多数实际的数字信号处理应用的FPGA设计都局限于定点算法。在将DSP应用映射到fpga时，通常在MATLAB中开发应用的DSP算法设计者必须在设计实现中确定输入、中间和输出信号的动态范围和所需精度，以确保满足算法保真标准。将MATLAB应用程序映射到硬件的流程的第一步是将浮点MATLAB算法转换为定点版本。本文描述了一种自动化转换的方法，通过分析预期的输入来估计误差，从而映射到fpga。我们的算法尽量减少硬件资源，同时将量化误差限制在指定的范围内

引用次数: 29

Compact thermal modeling for temperature-aware design 紧凑的热建模温度敏感的设计

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996800

Wei Huang, M. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, S. Velusamy

Thermal design in sub-100nm technologies is one of the major challenges to the CAD community. In this paper, we first introduce the idea of temperature-aware design. We then propose a compact thermal model which can be integrated with modern CAD tools to achieve a temperature-aware design methodology. Finally, we use the compact thermal model in a case study of microprocessor design to show the importance of using temperature as a guideline for the design. Results from our thermal model show that a temperature-aware design approach can provide more accurate estimations, and therefore better decisions and faster design convergence.

亚100nm技术的热设计是CAD社区面临的主要挑战之一。本文首先介绍了温度感知设计的思想。然后，我们提出了一个紧凑的热模型，可以与现代CAD工具集成，以实现温度感知设计方法。最后，我们在微处理器设计的案例研究中使用了紧凑的热模型，以显示使用温度作为设计指南的重要性。我们的热模型结果表明，温度感知设计方法可以提供更准确的估计，从而更好的决策和更快的设计收敛。

引用次数: 348

Statistical optimization of leakage power considering process variations using dual-Vth and sizing 考虑工艺变化的双v值和尺寸泄漏功率统计优化

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996775

A. Srivastava, D. Sylvester, D. Blaauw

Increasing levels of process variability in sub-100nm CMOS design has become a critical concern for performance and power constraint designs. In this paper, we propose a new statistically aware Dual-Vt and sizing optimization that considers both the variability in performance and leakage of a design. While extensive work has been performed in the past on statistical analysis methods, circuit optimization is still largely performed using deterministic methods. We show in this paper that deterministic optimization quickly looses effectiveness for stringent performance and leakage constraints in designs with significant variability. We then propose a statistically aware dual-Vt and sizing algorithm where both delay constraints and sensitivity computations are performed in a statistical manner. We demonstrate that using this statistically aware optimization, leakage power can be reduced by 15-35% compared to traditional deterministic analysis. The improvements increase for strict delay constraints making statistical optimization especially important for high performance designs.

在100nm以下的CMOS设计中，不断增加的工艺可变性水平已经成为性能和功率限制设计的关键问题。在本文中，我们提出了一种新的统计意识的双vt和尺寸优化，它同时考虑了性能的可变性和设计的泄漏。虽然过去在统计分析方法上进行了大量的工作，但电路优化仍然主要使用确定性方法进行。我们在本文中表明，在具有显著可变性的设计中，确定性优化在严格的性能和泄漏约束下迅速失去有效性。然后，我们提出了一种统计感知的双vt和分级算法，其中延迟约束和灵敏度计算都以统计方式进行。我们证明，使用这种统计感知优化，与传统的确定性分析相比，泄漏功率可以降低15-35%。严格的延迟约束使得统计优化对高性能设计尤为重要。

引用次数: 142

Fast statistical timing analysis handling arbitrary delay correlations 处理任意延迟相关性的快速统计时序分析

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996664

M. Orshansky, A. Bandyopadhyay

An efficient statistical timing analysis algorithm that can handle arbitrary (spatial and structural) causes of delay correlation is described. The algorithm derives the entire cumulative distribution function of the circuit delay using a new mathematical formulation. Spatial as well as structural correlations between gate and wire delays can be taken into account. The algorithm can handle node delays described by non-Gaussian distributions. Because the analytical computation of an exact cumulative distribution function for a probabilistic graph with arbitrary distributions is infeasible, we find tight upper and lower bounds on the true cumulative distribution. An efficient algorithm to compute the bounds is based on a PERT-like single traversal of the sub-graph containing the set of N deterministically longest paths. The efficiency and accuracy of the algorithm is demonstrated on a set of ISCAS'85 benchmarks. Across all the benchmarks, the average rms error between the exact distribution and lower bound is 0.7%, and the average maximum error at 95th percentile is 0.6%. The computation of bounds for the largest benchmark takes 39 seconds.

描述了一种有效的统计时序分析算法，可以处理任意(空间和结构)延迟相关原因。该算法采用一种新的数学公式推导出整个电路延迟的累积分布函数。可以考虑栅极和导线延迟之间的空间和结构相关性。该算法可以处理由非高斯分布描述的节点延迟。由于具有任意分布的概率图的精确累积分布函数的解析计算是不可实现的，我们找到了真实累积分布的紧上界和下界。计算边界的一种有效算法是基于对包含N条确定性最长路径集的子图进行类似pert的单遍历。在一组ISCAS'85基准测试中验证了该算法的效率和准确性。在所有基准中，准确分布与下限之间的平均均方根误差为0.7%，第95百分位的平均最大误差为0.6%。计算最大基准的边界需要39秒。

引用次数: 116

Debugging HW/SW interface for MPSoC: video encoder system design case study MPSoC的硬件/软件接口调试:视频编码器系统设计案例研究

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996808

M. Youssef, S. Yoo, A. Sasongko, Y. Paviot, A. Jerraya

This paper reports a case study of multiprocessor SoC (MPSoC) design of a complex video encoder, namely OpenDivX. OpenDivX is a popular version of MPEG4. It requires massive computation resources and deals with complex data structures to represent video streams. In this study, the initial specification is given in sequential C code that had to be parallelized to be executed on four different processors. High level programming model, namely Message Passing Interface (MPI) was used to enable inter-task communication among parallelized C code. A four processor hardware prototyping platform was used to debug the parallelized software before final SoC hardware is ready. The targeting of abstract parallel code using MPI to the multiprocessor architecture required the design of an additional hardware-dependent software layer to refine the abstract programming model. The design was made by a team work of three types of designer: application software, hardware-dependent software and hardware platform designers. The collaboration was necessary to master the whole flow from the specification to the platform.The study showed that HW/SW interface debug was the most time-consuming step. This is identified as a potential killer for application-specific MPSoC design. To further investigate the ways to accelerate the HW/SW interface debug, we analyzed bugs found in the case study and the available debug environments. Finally, we address a debug strategy that exploits efficiently existing debug environments to reduce the time for HW/SW interface debug.

本文报道了一个复杂视频编码器OpenDivX的多处理器SoC (MPSoC)设计案例。OpenDivX是MPEG4的一个流行版本。它需要大量的计算资源和处理复杂的数据结构来表示视频流。在本研究中，最初的规范是用顺序的C代码给出的，这些代码必须并行化才能在四个不同的处理器上执行。采用高级编程模型即消息传递接口(Message Passing Interface, MPI)实现并行C代码之间的任务间通信。在最终SoC硬件准备就绪之前，使用四处理器硬件原型平台对并行化软件进行调试。使用MPI将抽象并行代码定位为多处理器体系结构，需要设计一个额外的硬件相关软件层来完善抽象编程模型。该设计由应用软件、硬件依赖软件和硬件平台三种设计师组成的团队完成。协作对于掌握从规范到平台的整个流程是必要的。研究表明，硬件/软件接口调试是最耗时的步骤。这被认为是特定应用的MPSoC设计的潜在杀手。为了进一步研究加速硬件/软件接口调试的方法，我们分析了案例研究中发现的错误和可用的调试环境。最后，我们提出了一种调试策略，该策略有效地利用现有的调试环境来减少硬件/软件接口调试的时间。

{"title":"Debugging HW/SW interface for MPSoC: video encoder system design case study","authors":"M. Youssef, S. Yoo, A. Sasongko, Y. Paviot, A. Jerraya","doi":"10.1145/996566.996808","DOIUrl":"https://doi.org/10.1145/996566.996808","url":null,"abstract":"This paper reports a case study of multiprocessor SoC (MPSoC) design of a complex video encoder, namely OpenDivX. OpenDivX is a popular version of MPEG4. It requires massive computation resources and deals with complex data structures to represent video streams. In this study, the initial specification is given in sequential C code that had to be parallelized to be executed on four different processors. High level programming model, namely Message Passing Interface (MPI) was used to enable inter-task communication among parallelized C code. A four processor hardware prototyping platform was used to debug the parallelized software before final SoC hardware is ready. The targeting of abstract parallel code using MPI to the multiprocessor architecture required the design of an additional hardware-dependent software layer to refine the abstract programming model. The design was made by a team work of three types of designer: application software, hardware-dependent software and hardware platform designers. The collaboration was necessary to master the whole flow from the specification to the platform.The study showed that HW/SW interface debug was the most time-consuming step. This is identified as a potential killer for application-specific MPSoC design. To further investigate the ways to accelerate the HW/SW interface debug, we analyzed bugs found in the case study and the available debug environments. Finally, we address a debug strategy that exploits efficiently existing debug environments to reduce the time for HW/SW interface debug.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127238820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Performance analysis of different arbitration algorithms of the AMBA AHB bus AMBA AHB 总线不同仲裁算法的性能分析

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996734

M. Conti, M. Caldari, G. Vece, S. Orcioni, C. Turchetti

Bus performances are extremely important in a platform-based design. System Level analysis of bus performances gives important information for the analysis and choice between different architectures driven by functional, timing and power constraints of the System-on-Chip. This paper presents the effect of different arbitration algorithms and bus usage methodologies on the bus AMBA AHB performances in terms of effective throughput and power dissipation. SystemC and VHDL models have been developed and simulations have been performed.

总线性能在基于平台的设计中非常重要。总线性能的系统级分析为分析和选择受片上系统功能、时序和功耗约束的不同架构提供了重要信息。本文从有效吞吐量和功耗方面介绍了不同的仲裁算法和总线使用方法对总线AMBA AHB性能的影响。开发了SystemC和VHDL模型，并进行了仿真。

引用次数: 25

Virtual memory window for application-specific reconfigurable coprocessors 用于特定应用程序可重构协处理器的虚拟内存窗口

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996818

M. Vuletic, L. Pozzi, P. Ienne

The complexity of hardware/software (HW/SW) interfacing and the lack of portability across different platforms, restrain the widespread use of reconfigurable accelerators and limit the designer productivity. Furthermore, communication between SW and HW parts of codesigned applications are typically exposed to SW programmers and HW designers. In this work, we introduce a virtualization layer that allows reconfigurable application-specific coprocessors to access the user-space virtual memory and share the memory address space with user applications. The layer, consisting of an operating system (OS) extension and a HW component, shifts the burden of moving data between processor and coprocessor from the programmer to the OS, lowers the complexity of interfacing, and hides physical details of the system. Not only does the virtualization layer enhance programming abstraction and portability, but it also performs runtime optimizations: by predicting future memory accesses and speculatively prefetching data, the virtualization layer improves the coprocessor execution-applications achieve better performance without any user intervention. We use two different reconfigurable system-on-chip (SoC) running Linux and codesigned applications to prove the viability of our concept. The applications run faster than their SW versions, and the overhead due to the virtualisation is limited. Dynamic prefetching in the virtualisation layer further reduces the abstraction overhead

硬件/软件(HW/SW)接口的复杂性和跨平台可移植性的缺乏，限制了可重构加速器的广泛使用，限制了设计者的生产力。此外，共同设计的应用程序的软件和硬件部分之间的通信通常暴露给软件程序员和硬件设计人员。在这项工作中，我们引入了一个虚拟化层，它允许可重构的特定于应用程序的协处理器访问用户空间虚拟内存，并与用户应用程序共享内存地址空间。该层由操作系统(OS)扩展和硬件组件组成，将在处理器和协处理器之间移动数据的负担从程序员转移到操作系统，降低了接口的复杂性，并隐藏了系统的物理细节。虚拟化层不仅增强了编程抽象和可移植性，而且还执行了运行时优化:通过预测未来的内存访问和推测性地预取数据，虚拟化层改进了协处理器的执行——应用程序在没有任何用户干预的情况下实现了更好的性能。我们使用两种不同的可重构片上系统(SoC)运行Linux和共同设计的应用程序来证明我们概念的可行性。这些应用程序比它们的软件版本运行得更快，而且虚拟化带来的开销是有限的。虚拟化层中的动态预取进一步减少了抽象开销

{"title":"Virtual memory window for application-specific reconfigurable coprocessors","authors":"M. Vuletic, L. Pozzi, P. Ienne","doi":"10.1145/996566.996818","DOIUrl":"https://doi.org/10.1145/996566.996818","url":null,"abstract":"The complexity of hardware/software (HW/SW) interfacing and the lack of portability across different platforms, restrain the widespread use of reconfigurable accelerators and limit the designer productivity. Furthermore, communication between SW and HW parts of codesigned applications are typically exposed to SW programmers and HW designers. In this work, we introduce a virtualization layer that allows reconfigurable application-specific coprocessors to access the user-space virtual memory and share the memory address space with user applications. The layer, consisting of an operating system (OS) extension and a HW component, shifts the burden of moving data between processor and coprocessor from the programmer to the OS, lowers the complexity of interfacing, and hides physical details of the system. Not only does the virtualization layer enhance programming abstraction and portability, but it also performs runtime optimizations: by predicting future memory accesses and speculatively prefetching data, the virtualization layer improves the coprocessor execution-applications achieve better performance without any user intervention. We use two different reconfigurable system-on-chip (SoC) running Linux and codesigned applications to prove the viability of our concept. The applications run faster than their SW versions, and the overhead due to the virtualisation is limited. Dynamic prefetching in the virtualisation layer further reduces the abstraction overhead","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126746583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Large-scale full-wave simulation 大尺度全波模拟

Proceedings. 41st Design Automation Conference, 2004.

Pub Date : 2004-06-07 DOI: 10.1145/996566.996782

S. Kapur, D. Long

We describe a new extraction tool, EMX (Electro-Magnetic eXtractor), for the analysis of RF, analog and high-speed digital circuits. EMX is a fast full-wave field solver. It incorporates two new techniques which make it significantly faster and more memory-efficient than previous solvers. First, it takes advantage of layout regularity in typical designs. Second, EMX uses a new method for computing the vector-potential component in the mixed potential integral equation. These techniques give a speed-up of more than a factor of ten, together with a corresponding reduction in memory.

我们描述了一种新的提取工具，EMX(电磁提取器)，用于分析射频，模拟和高速数字电路。EMX是一个快速的全波场求解器。它结合了两项新技术，使其比以前的求解器更快，更节省内存。首先，它利用了典型设计中的布局规律。其次，EMX采用了一种计算混合势积分方程中矢量-势分量的新方法。这些技术提供了超过10倍的速度提升，同时内存也相应减少。

引用次数: 36

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. 41st Design Automation Conference, 2004.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀