首页 > 最新文献

Proceedings. 41st Design Automation Conference, 2004.最新文献

英文 中文
STAC: statistical timing analysis with correlation STAC:具有相关性的统计时序分析
Pub Date : 2004-06-07 DOI: 10.1145/996566.996665
Jiayong Le, Xin Li, L. Pileggi
Current technology trends have led to the growing impact of both inter-die and intra-die process variations on circuit performance. While it is imperative to model parameter variations for sub-100nm technologies to produce an upper bound prediction on timing, it is equally important to consider the correlation of these variations for the bound to be useful. In this paper we present an efficient block-based statistical static timing analysis algorithm that can account for correlations from process parameters and re-converging paths. The algorithm can also accommodate dominant interconnect coupling effects to provide an accurate compilation of statistical timing information. The generality and efficiency for the proposed algorithm is obtained from a novel simplification technique that is derived from the statistical independence theories and principal component analysis (PCA) methods. The technique significantly reduces the cost for mean, variance and covariance computation of a set of correlated random variables.
当前的技术趋势导致了芯片间和芯片内工艺变化对电路性能的影响越来越大。虽然必须对sub-100nm技术的参数变化进行建模,以产生对时间的上限预测,但同样重要的是要考虑这些变化的相关性,以使该界限有用。在本文中,我们提出了一种有效的基于块的统计静态时序分析算法,该算法可以考虑过程参数和再收敛路径的相关性。该算法还可以适应主要的互连耦合效应,以提供准确的统计时序信息编译。该算法的通用性和高效性来自于一种新的简化技术,该技术来源于统计独立性理论和主成分分析方法。该技术显著降低了一组相关随机变量的均值、方差和协方差的计算成本。
{"title":"STAC: statistical timing analysis with correlation","authors":"Jiayong Le, Xin Li, L. Pileggi","doi":"10.1145/996566.996665","DOIUrl":"https://doi.org/10.1145/996566.996665","url":null,"abstract":"Current technology trends have led to the growing impact of both inter-die and intra-die process variations on circuit performance. While it is imperative to model parameter variations for sub-100nm technologies to produce an upper bound prediction on timing, it is equally important to consider the correlation of these variations for the bound to be useful. In this paper we present an efficient block-based statistical static timing analysis algorithm that can account for correlations from process parameters and re-converging paths. The algorithm can also accommodate dominant interconnect coupling effects to provide an accurate compilation of statistical timing information. The generality and efficiency for the proposed algorithm is obtained from a novel simplification technique that is derived from the statistical independence theories and principal component analysis (PCA) methods. The technique significantly reduces the cost for mean, variance and covariance computation of a set of correlated random variables.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"77 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113933235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 134
An efficient scalable and flexible data transfer architecture for multiprocessor SoC with massive distributed memory 一种高效、可扩展、灵活的多处理器SoC数据传输架构
Pub Date : 2004-06-07 DOI: 10.1145/996566.996636
Sang-Il Han, A. Baghdadi, M. Bonaciu, S. Chae, A. Jerraya
Massive data transfer encountered in emerging multimedia embedded applications requires architecture allowing both highly distributed memory structure and multiprocessor computation to be handled. The key issue that needs to be solved is then how to manage data transfers between large numbers of distributed memories. To overcome this issue, our paper proposes a scalable Distributed Memory Server (DMS) for multiprocessor SoC (MPSoC). The proposed DMS is composed of: (1) high-performance and flexible memory service access points (MSAPs), which execute data transfers without intervention of the processing elements, (2) data network, and (3) control network. It can handle direct massive data transfer between the distributed memories of an MPSoC. The scalability and flexibility of the proposed DMS are illustrated through the implementation of an MPEG4 video encoder for QCIF and CIF formats. The experiments show clearly how DMS can be adapted to accommodate different SoC configurations requiring various data transfer bandwidths. Synthesis results show that bandwidth can scale up to 28.8 GB/sec.
在新兴的多媒体嵌入式应用中,大量数据传输需要同时支持高度分布式内存结构和多处理器计算的体系结构。需要解决的关键问题是如何管理大量分布式内存之间的数据传输。为了克服这个问题,本文提出了一种可扩展的多处理器SoC分布式内存服务器(DMS)。所提出的DMS由以下三部分组成:(1)高性能和灵活的存储服务接入点(msap),该接入点在不干预处理元素的情况下执行数据传输;(2)数据网络;(3)控制网络。它可以处理MPSoC的分布式存储器之间的直接大量数据传输。通过对QCIF和CIF格式的MPEG4视频编码器的实现,说明了所提出的DMS的可扩展性和灵活性。实验清楚地表明DMS如何适应需要各种数据传输带宽的不同SoC配置。综合结果表明,带宽可扩展到28.8 GB/sec。
{"title":"An efficient scalable and flexible data transfer architecture for multiprocessor SoC with massive distributed memory","authors":"Sang-Il Han, A. Baghdadi, M. Bonaciu, S. Chae, A. Jerraya","doi":"10.1145/996566.996636","DOIUrl":"https://doi.org/10.1145/996566.996636","url":null,"abstract":"Massive data transfer encountered in emerging multimedia embedded applications requires architecture allowing both highly distributed memory structure and multiprocessor computation to be handled. The key issue that needs to be solved is then how to manage data transfers between large numbers of distributed memories. To overcome this issue, our paper proposes a scalable Distributed Memory Server (DMS) for multiprocessor SoC (MPSoC). The proposed DMS is composed of: (1) high-performance and flexible memory service access points (MSAPs), which execute data transfers without intervention of the processing elements, (2) data network, and (3) control network. It can handle direct massive data transfer between the distributed memories of an MPSoC. The scalability and flexibility of the proposed DMS are illustrated through the implementation of an MPEG4 video encoder for QCIF and CIF formats. The experiments show clearly how DMS can be adapted to accommodate different SoC configurations requiring various data transfer bandwidths. Synthesis results show that bandwidth can scale up to 28.8 GB/sec.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127755636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Power minimization using simultaneous gate sizing, dual-Vdd and dual-Vth assignment 同时使用栅极尺寸、双vdd和双vth赋值实现功率最小化
Pub Date : 2004-06-07 DOI: 10.1145/996566.996777
A. Srivastava, D. Sylvester, D. Blaauw
We develop an approach to minimize total power in a dual-Vdd and dual-Vth design. The algorithm runs in two distinct phases. The first phase relies on upsizing to create slack and maximize low Vdd assignments in a backward topological manner. The second phase proceeds in a forward topological fashion and both sizes and re-assigns gates to high Vdd to enable significant static power savings through high Vth assignment. The proposed algorithm is implemented and tested on a set of combinational benchmark circuits. A comparison with traditional CVS and dual-Vth/sizing algorithms demonstrate the advantage of the algorithm over a range of activity factors, including an average power reduction of 30% (50%) at high (nominal) primary input activities.
我们开发了一种在双vdd和双vth设计中最小化总功率的方法。该算法分为两个不同的阶段。第一阶段依赖于以向后拓扑方式创建松弛和最大化低Vdd分配的放大。第二阶段以正向拓扑方式进行,将门的大小和重新分配到高Vdd,从而通过高Vth分配实现显著的静态功耗节省。该算法在一组组合基准电路上进行了实现和测试。与传统CVS和双vth /分级算法的比较表明,该算法在一系列活动因素上具有优势,包括在高(标称)主输入活动下平均功耗降低30%(50%)。
{"title":"Power minimization using simultaneous gate sizing, dual-Vdd and dual-Vth assignment","authors":"A. Srivastava, D. Sylvester, D. Blaauw","doi":"10.1145/996566.996777","DOIUrl":"https://doi.org/10.1145/996566.996777","url":null,"abstract":"We develop an approach to minimize total power in a dual-Vdd and dual-Vth design. The algorithm runs in two distinct phases. The first phase relies on upsizing to create slack and maximize low Vdd assignments in a backward topological manner. The second phase proceeds in a forward topological fashion and both sizes and re-assigns gates to high Vdd to enable significant static power savings through high Vth assignment. The proposed algorithm is implemented and tested on a set of combinational benchmark circuits. A comparison with traditional CVS and dual-Vth/sizing algorithms demonstrate the advantage of the algorithm over a range of activity factors, including an average power reduction of 30% (50%) at high (nominal) primary input activities.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"301 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131404403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
Design and reliability challenges in nanometer technologies 纳米技术的设计和可靠性挑战
Pub Date : 2004-06-07 DOI: 10.1145/996566.996588
S. Borkar, T. Karnik, V. De
CMOS technology scaling is causing the channel lengths to be sub-wavelength of light. Parameter variation, caused by sub-wavelength lithography, will pose a major challenge for design and reliability of future high performance microprocessors in nanometer technologies. In this paper, we present the impact of these variations on processor functionality, Predictability and reliability. We propose design and CAD solutions for variation tolerance. We conclude this paper with sofi error rate scaling trends and sofl error tolerant circuits for reliabilitv enhancement.
CMOS技术的缩放导致通道长度为光的亚波长。亚波长光刻技术引起的参数变化将对未来高性能纳米微处理器的设计和可靠性提出重大挑战。在本文中,我们提出了这些变化对处理器功能,可预测性和可靠性的影响。我们提出设计和CAD解决方案的变化公差。最后给出了sofi错误率缩放趋势和sofi容错电路以提高可靠性。
{"title":"Design and reliability challenges in nanometer technologies","authors":"S. Borkar, T. Karnik, V. De","doi":"10.1145/996566.996588","DOIUrl":"https://doi.org/10.1145/996566.996588","url":null,"abstract":"CMOS technology scaling is causing the channel lengths to be sub-wavelength of light. Parameter variation, caused by sub-wavelength lithography, will pose a major challenge for design and reliability of future high performance microprocessors in nanometer technologies. In this paper, we present the impact of these variations on processor functionality, Predictability and reliability. We propose design and CAD solutions for variation tolerance. We conclude this paper with sofi error rate scaling trends and sofl error tolerant circuits for reliabilitv enhancement.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131436504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 240
Statistical optimization of leakage power considering process variations using dual-Vth and sizing 考虑工艺变化的双v值和尺寸泄漏功率统计优化
Pub Date : 2004-06-07 DOI: 10.1145/996566.996775
A. Srivastava, D. Sylvester, D. Blaauw
Increasing levels of process variability in sub-100nm CMOS design has become a critical concern for performance and power constraint designs. In this paper, we propose a new statistically aware Dual-Vt and sizing optimization that considers both the variability in performance and leakage of a design. While extensive work has been performed in the past on statistical analysis methods, circuit optimization is still largely performed using deterministic methods. We show in this paper that deterministic optimization quickly looses effectiveness for stringent performance and leakage constraints in designs with significant variability. We then propose a statistically aware dual-Vt and sizing algorithm where both delay constraints and sensitivity computations are performed in a statistical manner. We demonstrate that using this statistically aware optimization, leakage power can be reduced by 15-35% compared to traditional deterministic analysis. The improvements increase for strict delay constraints making statistical optimization especially important for high performance designs.
在100nm以下的CMOS设计中,不断增加的工艺可变性水平已经成为性能和功率限制设计的关键问题。在本文中,我们提出了一种新的统计意识的双vt和尺寸优化,它同时考虑了性能的可变性和设计的泄漏。虽然过去在统计分析方法上进行了大量的工作,但电路优化仍然主要使用确定性方法进行。我们在本文中表明,在具有显著可变性的设计中,确定性优化在严格的性能和泄漏约束下迅速失去有效性。然后,我们提出了一种统计感知的双vt和分级算法,其中延迟约束和灵敏度计算都以统计方式进行。我们证明,使用这种统计感知优化,与传统的确定性分析相比,泄漏功率可以降低15-35%。严格的延迟约束使得统计优化对高性能设计尤为重要。
{"title":"Statistical optimization of leakage power considering process variations using dual-Vth and sizing","authors":"A. Srivastava, D. Sylvester, D. Blaauw","doi":"10.1145/996566.996775","DOIUrl":"https://doi.org/10.1145/996566.996775","url":null,"abstract":"Increasing levels of process variability in sub-100nm CMOS design has become a critical concern for performance and power constraint designs. In this paper, we propose a new statistically aware Dual-Vt and sizing optimization that considers both the variability in performance and leakage of a design. While extensive work has been performed in the past on statistical analysis methods, circuit optimization is still largely performed using deterministic methods. We show in this paper that deterministic optimization quickly looses effectiveness for stringent performance and leakage constraints in designs with significant variability. We then propose a statistically aware dual-Vt and sizing algorithm where both delay constraints and sensitivity computations are performed in a statistical manner. We demonstrate that using this statistically aware optimization, leakage power can be reduced by 15-35% compared to traditional deterministic analysis. The improvements increase for strict delay constraints making statistical optimization especially important for high performance designs.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116112814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 142
Fast statistical timing analysis handling arbitrary delay correlations 处理任意延迟相关性的快速统计时序分析
Pub Date : 2004-06-07 DOI: 10.1145/996566.996664
M. Orshansky, A. Bandyopadhyay
An efficient statistical timing analysis algorithm that can handle arbitrary (spatial and structural) causes of delay correlation is described. The algorithm derives the entire cumulative distribution function of the circuit delay using a new mathematical formulation. Spatial as well as structural correlations between gate and wire delays can be taken into account. The algorithm can handle node delays described by non-Gaussian distributions. Because the analytical computation of an exact cumulative distribution function for a probabilistic graph with arbitrary distributions is infeasible, we find tight upper and lower bounds on the true cumulative distribution. An efficient algorithm to compute the bounds is based on a PERT-like single traversal of the sub-graph containing the set of N deterministically longest paths. The efficiency and accuracy of the algorithm is demonstrated on a set of ISCAS'85 benchmarks. Across all the benchmarks, the average rms error between the exact distribution and lower bound is 0.7%, and the average maximum error at 95th percentile is 0.6%. The computation of bounds for the largest benchmark takes 39 seconds.
描述了一种有效的统计时序分析算法,可以处理任意(空间和结构)延迟相关原因。该算法采用一种新的数学公式推导出整个电路延迟的累积分布函数。可以考虑栅极和导线延迟之间的空间和结构相关性。该算法可以处理由非高斯分布描述的节点延迟。由于具有任意分布的概率图的精确累积分布函数的解析计算是不可实现的,我们找到了真实累积分布的紧上界和下界。计算边界的一种有效算法是基于对包含N条确定性最长路径集的子图进行类似pert的单遍历。在一组ISCAS'85基准测试中验证了该算法的效率和准确性。在所有基准中,准确分布与下限之间的平均均方根误差为0.7%,第95百分位的平均最大误差为0.6%。计算最大基准的边界需要39秒。
{"title":"Fast statistical timing analysis handling arbitrary delay correlations","authors":"M. Orshansky, A. Bandyopadhyay","doi":"10.1145/996566.996664","DOIUrl":"https://doi.org/10.1145/996566.996664","url":null,"abstract":"An efficient statistical timing analysis algorithm that can handle arbitrary (spatial and structural) causes of delay correlation is described. The algorithm derives the entire cumulative distribution function of the circuit delay using a new mathematical formulation. Spatial as well as structural correlations between gate and wire delays can be taken into account. The algorithm can handle node delays described by non-Gaussian distributions. Because the analytical computation of an exact cumulative distribution function for a probabilistic graph with arbitrary distributions is infeasible, we find tight upper and lower bounds on the true cumulative distribution. An efficient algorithm to compute the bounds is based on a PERT-like single traversal of the sub-graph containing the set of N deterministically longest paths. The efficiency and accuracy of the algorithm is demonstrated on a set of ISCAS'85 benchmarks. Across all the benchmarks, the average rms error between the exact distribution and lower bound is 0.7%, and the average maximum error at 95th percentile is 0.6%. The computation of bounds for the largest benchmark takes 39 seconds.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116357250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Debugging HW/SW interface for MPSoC: video encoder system design case study MPSoC的硬件/软件接口调试:视频编码器系统设计案例研究
Pub Date : 2004-06-07 DOI: 10.1145/996566.996808
M. Youssef, S. Yoo, A. Sasongko, Y. Paviot, A. Jerraya
This paper reports a case study of multiprocessor SoC (MPSoC) design of a complex video encoder, namely OpenDivX. OpenDivX is a popular version of MPEG4. It requires massive computation resources and deals with complex data structures to represent video streams. In this study, the initial specification is given in sequential C code that had to be parallelized to be executed on four different processors. High level programming model, namely Message Passing Interface (MPI) was used to enable inter-task communication among parallelized C code. A four processor hardware prototyping platform was used to debug the parallelized software before final SoC hardware is ready. The targeting of abstract parallel code using MPI to the multiprocessor architecture required the design of an additional hardware-dependent software layer to refine the abstract programming model. The design was made by a team work of three types of designer: application software, hardware-dependent software and hardware platform designers. The collaboration was necessary to master the whole flow from the specification to the platform.The study showed that HW/SW interface debug was the most time-consuming step. This is identified as a potential killer for application-specific MPSoC design. To further investigate the ways to accelerate the HW/SW interface debug, we analyzed bugs found in the case study and the available debug environments. Finally, we address a debug strategy that exploits efficiently existing debug environments to reduce the time for HW/SW interface debug.
本文报道了一个复杂视频编码器OpenDivX的多处理器SoC (MPSoC)设计案例。OpenDivX是MPEG4的一个流行版本。它需要大量的计算资源和处理复杂的数据结构来表示视频流。在本研究中,最初的规范是用顺序的C代码给出的,这些代码必须并行化才能在四个不同的处理器上执行。采用高级编程模型即消息传递接口(Message Passing Interface, MPI)实现并行C代码之间的任务间通信。在最终SoC硬件准备就绪之前,使用四处理器硬件原型平台对并行化软件进行调试。使用MPI将抽象并行代码定位为多处理器体系结构,需要设计一个额外的硬件相关软件层来完善抽象编程模型。该设计由应用软件、硬件依赖软件和硬件平台三种设计师组成的团队完成。协作对于掌握从规范到平台的整个流程是必要的。研究表明,硬件/软件接口调试是最耗时的步骤。这被认为是特定应用的MPSoC设计的潜在杀手。为了进一步研究加速硬件/软件接口调试的方法,我们分析了案例研究中发现的错误和可用的调试环境。最后,我们提出了一种调试策略,该策略有效地利用现有的调试环境来减少硬件/软件接口调试的时间。
{"title":"Debugging HW/SW interface for MPSoC: video encoder system design case study","authors":"M. Youssef, S. Yoo, A. Sasongko, Y. Paviot, A. Jerraya","doi":"10.1145/996566.996808","DOIUrl":"https://doi.org/10.1145/996566.996808","url":null,"abstract":"This paper reports a case study of multiprocessor SoC (MPSoC) design of a complex video encoder, namely OpenDivX. OpenDivX is a popular version of MPEG4. It requires massive computation resources and deals with complex data structures to represent video streams. In this study, the initial specification is given in sequential C code that had to be parallelized to be executed on four different processors. High level programming model, namely Message Passing Interface (MPI) was used to enable inter-task communication among parallelized C code. A four processor hardware prototyping platform was used to debug the parallelized software before final SoC hardware is ready. The targeting of abstract parallel code using MPI to the multiprocessor architecture required the design of an additional hardware-dependent software layer to refine the abstract programming model. The design was made by a team work of three types of designer: application software, hardware-dependent software and hardware platform designers. The collaboration was necessary to master the whole flow from the specification to the platform.The study showed that HW/SW interface debug was the most time-consuming step. This is identified as a potential killer for application-specific MPSoC design. To further investigate the ways to accelerate the HW/SW interface debug, we analyzed bugs found in the case study and the available debug environments. Finally, we address a debug strategy that exploits efficiently existing debug environments to reduce the time for HW/SW interface debug.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127238820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Performance analysis of different arbitration algorithms of the AMBA AHB bus AMBA AHB 总线不同仲裁算法的性能分析
Pub Date : 2004-06-07 DOI: 10.1145/996566.996734
M. Conti, M. Caldari, G. Vece, S. Orcioni, C. Turchetti
Bus performances are extremely important in a platform-based design. System Level analysis of bus performances gives important information for the analysis and choice between different architectures driven by functional, timing and power constraints of the System-on-Chip. This paper presents the effect of different arbitration algorithms and bus usage methodologies on the bus AMBA AHB performances in terms of effective throughput and power dissipation. SystemC and VHDL models have been developed and simulations have been performed.
总线性能在基于平台的设计中非常重要。总线性能的系统级分析为分析和选择受片上系统功能、时序和功耗约束的不同架构提供了重要信息。本文从有效吞吐量和功耗方面介绍了不同的仲裁算法和总线使用方法对总线AMBA AHB性能的影响。开发了SystemC和VHDL模型,并进行了仿真。
{"title":"Performance analysis of different arbitration algorithms of the AMBA AHB bus","authors":"M. Conti, M. Caldari, G. Vece, S. Orcioni, C. Turchetti","doi":"10.1145/996566.996734","DOIUrl":"https://doi.org/10.1145/996566.996734","url":null,"abstract":"Bus performances are extremely important in a platform-based design. System Level analysis of bus performances gives important information for the analysis and choice between different architectures driven by functional, timing and power constraints of the System-on-Chip. This paper presents the effect of different arbitration algorithms and bus usage methodologies on the bus AMBA AHB performances in terms of effective throughput and power dissipation. SystemC and VHDL models have been developed and simulations have been performed.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127342906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Virtual memory window for application-specific reconfigurable coprocessors 用于特定应用程序可重构协处理器的虚拟内存窗口
Pub Date : 2004-06-07 DOI: 10.1145/996566.996818
M. Vuletic, L. Pozzi, P. Ienne
The complexity of hardware/software (HW/SW) interfacing and the lack of portability across different platforms, restrain the widespread use of reconfigurable accelerators and limit the designer productivity. Furthermore, communication between SW and HW parts of codesigned applications are typically exposed to SW programmers and HW designers. In this work, we introduce a virtualization layer that allows reconfigurable application-specific coprocessors to access the user-space virtual memory and share the memory address space with user applications. The layer, consisting of an operating system (OS) extension and a HW component, shifts the burden of moving data between processor and coprocessor from the programmer to the OS, lowers the complexity of interfacing, and hides physical details of the system. Not only does the virtualization layer enhance programming abstraction and portability, but it also performs runtime optimizations: by predicting future memory accesses and speculatively prefetching data, the virtualization layer improves the coprocessor execution-applications achieve better performance without any user intervention. We use two different reconfigurable system-on-chip (SoC) running Linux and codesigned applications to prove the viability of our concept. The applications run faster than their SW versions, and the overhead due to the virtualisation is limited. Dynamic prefetching in the virtualisation layer further reduces the abstraction overhead
硬件/软件(HW/SW)接口的复杂性和跨平台可移植性的缺乏,限制了可重构加速器的广泛使用,限制了设计者的生产力。此外,共同设计的应用程序的软件和硬件部分之间的通信通常暴露给软件程序员和硬件设计人员。在这项工作中,我们引入了一个虚拟化层,它允许可重构的特定于应用程序的协处理器访问用户空间虚拟内存,并与用户应用程序共享内存地址空间。该层由操作系统(OS)扩展和硬件组件组成,将在处理器和协处理器之间移动数据的负担从程序员转移到操作系统,降低了接口的复杂性,并隐藏了系统的物理细节。虚拟化层不仅增强了编程抽象和可移植性,而且还执行了运行时优化:通过预测未来的内存访问和推测性地预取数据,虚拟化层改进了协处理器的执行——应用程序在没有任何用户干预的情况下实现了更好的性能。我们使用两种不同的可重构片上系统(SoC)运行Linux和共同设计的应用程序来证明我们概念的可行性。这些应用程序比它们的软件版本运行得更快,而且虚拟化带来的开销是有限的。虚拟化层中的动态预取进一步减少了抽象开销
{"title":"Virtual memory window for application-specific reconfigurable coprocessors","authors":"M. Vuletic, L. Pozzi, P. Ienne","doi":"10.1145/996566.996818","DOIUrl":"https://doi.org/10.1145/996566.996818","url":null,"abstract":"The complexity of hardware/software (HW/SW) interfacing and the lack of portability across different platforms, restrain the widespread use of reconfigurable accelerators and limit the designer productivity. Furthermore, communication between SW and HW parts of codesigned applications are typically exposed to SW programmers and HW designers. In this work, we introduce a virtualization layer that allows reconfigurable application-specific coprocessors to access the user-space virtual memory and share the memory address space with user applications. The layer, consisting of an operating system (OS) extension and a HW component, shifts the burden of moving data between processor and coprocessor from the programmer to the OS, lowers the complexity of interfacing, and hides physical details of the system. Not only does the virtualization layer enhance programming abstraction and portability, but it also performs runtime optimizations: by predicting future memory accesses and speculatively prefetching data, the virtualization layer improves the coprocessor execution-applications achieve better performance without any user intervention. We use two different reconfigurable system-on-chip (SoC) running Linux and codesigned applications to prove the viability of our concept. The applications run faster than their SW versions, and the overhead due to the virtualisation is limited. Dynamic prefetching in the virtualisation layer further reduces the abstraction overhead","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126746583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Large-scale full-wave simulation 大尺度全波模拟
Pub Date : 2004-06-07 DOI: 10.1145/996566.996782
S. Kapur, D. Long
We describe a new extraction tool, EMX (Electro-Magnetic eXtractor), for the analysis of RF, analog and high-speed digital circuits. EMX is a fast full-wave field solver. It incorporates two new techniques which make it significantly faster and more memory-efficient than previous solvers. First, it takes advantage of layout regularity in typical designs. Second, EMX uses a new method for computing the vector-potential component in the mixed potential integral equation. These techniques give a speed-up of more than a factor of ten, together with a corresponding reduction in memory.
我们描述了一种新的提取工具,EMX(电磁提取器),用于分析射频,模拟和高速数字电路。EMX是一个快速的全波场求解器。它结合了两项新技术,使其比以前的求解器更快,更节省内存。首先,它利用了典型设计中的布局规律。其次,EMX采用了一种计算混合势积分方程中矢量-势分量的新方法。这些技术提供了超过10倍的速度提升,同时内存也相应减少。
{"title":"Large-scale full-wave simulation","authors":"S. Kapur, D. Long","doi":"10.1145/996566.996782","DOIUrl":"https://doi.org/10.1145/996566.996782","url":null,"abstract":"We describe a new extraction tool, EMX (Electro-Magnetic eXtractor), for the analysis of RF, analog and high-speed digital circuits. EMX is a fast full-wave field solver. It incorporates two new techniques which make it significantly faster and more memory-efficient than previous solvers. First, it takes advantage of layout regularity in typical designs. Second, EMX uses a new method for computing the vector-potential component in the mixed potential integral equation. These techniques give a speed-up of more than a factor of ten, together with a corresponding reduction in memory.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126923551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
期刊
Proceedings. 41st Design Automation Conference, 2004.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1