首页 > 最新文献

Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors最新文献

英文 中文
An efficient external-memory implementation of region query with application to area routing 区域查询的高效外部存储器实现,并应用于区域路由
S. Liao, Narendra V. Shenoy, W. Nicholls
We present the tile-cached kd-tree, an efficient external-memory (disk) implementation of two-dimensional region query for use in a detailed area router. Most researchers have heretofore focused on in-memory algorithms. However as the need to tackle very large problems increases, conventional in-memory algorithms suffer from unpredictable caching and paging behavior and their performance may degrade considerably. In addition, since the region-query data structure is only part of the overall system, its consumption of large memory resources affects other parts of the system as well. Our implementation takes advantage of spatial locality in the detailed-routing process. We partition the routing space into tiles, each storing the data of objects (rectangles) that lie strictly within it. Objects that cross tile boundaries are separately stored. The data within a tile are then written out to disk, and a configurable cache is used to hold in memory the most recently visited tiles. Experimental results on large real-life routing problems show that this scheme significantly reduces memory usage with tolerable performance penalty.
我们提出了一种用于详细区域路由器的二维区域查询的高效外部存储器(磁盘)实现,即块缓存kd树。到目前为止,大多数研究人员都集中在内存算法上。然而,随着需要处理非常大的问题的增加,传统的内存算法会受到不可预测的缓存和分页行为的影响,其性能可能会大幅下降。此外,由于区域查询数据结构只是整个系统的一部分,因此它对大量内存资源的消耗也会影响系统的其他部分。我们的实现在详细路由过程中利用了空间局部性。我们将路由空间划分为块,每个块存储严格位于其中的对象(矩形)的数据。跨越贴图边界的对象是单独存储的。然后将磁贴中的数据写入磁盘,并使用一个可配置的缓存将最近访问的磁贴保存在内存中。在大型实际路由问题上的实验结果表明,该方案在可容忍的性能损失下显著降低了内存使用。
{"title":"An efficient external-memory implementation of region query with application to area routing","authors":"S. Liao, Narendra V. Shenoy, W. Nicholls","doi":"10.1109/ICCD.2002.1106744","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106744","url":null,"abstract":"We present the tile-cached kd-tree, an efficient external-memory (disk) implementation of two-dimensional region query for use in a detailed area router. Most researchers have heretofore focused on in-memory algorithms. However as the need to tackle very large problems increases, conventional in-memory algorithms suffer from unpredictable caching and paging behavior and their performance may degrade considerably. In addition, since the region-query data structure is only part of the overall system, its consumption of large memory resources affects other parts of the system as well. Our implementation takes advantage of spatial locality in the detailed-routing process. We partition the routing space into tiles, each storing the data of objects (rectangles) that lie strictly within it. Objects that cross tile boundaries are separately stored. The data within a tile are then written out to disk, and a configurable cache is used to hold in memory the most recently visited tiles. Experimental results on large real-life routing problems show that this scheme significantly reduces memory usage with tolerable performance penalty.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132898856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Legacy SystemC co-simulation of multi-processor systems-on-chip 多处理器片上系统的遗留SystemC联合仿真
L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, M. Poncino
We present a co-simulation environment for multiprocessor architectures, that is based on SystemC and allows a transparent integration of instruction set simulators (ISSs) within the SystemC simulation framework. The integration is based on the well-known concept of bus wrapper, that realizes the interface between the ISS and the simulator. The proposed solution uses an ISS-wrapper interface based on the standard gdb remote debugging interface, and implements two alternative schemes that differ in the amount of communication they require. The two approaches provide different degrees of tradeoff between simulation granularity and speed, and show significant speedup with respect to a micro-architectural, full SystemC simulation of the system description.
我们提出了一个多处理器架构的联合仿真环境,它基于SystemC,并允许在SystemC仿真框架内透明地集成指令集模拟器(iss)。该集成基于总线封装器的概念,实现了ISS与模拟器之间的接口。建议的解决方案使用基于标准gdb远程调试接口的iss包装器接口,并实现两种所需通信量不同的备选方案。这两种方法在模拟粒度和速度之间提供了不同程度的权衡,并且相对于系统描述的微架构、完整的SystemC模拟,显示出显著的加速。
{"title":"Legacy SystemC co-simulation of multi-processor systems-on-chip","authors":"L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, M. Poncino","doi":"10.1109/ICCD.2002.1106819","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106819","url":null,"abstract":"We present a co-simulation environment for multiprocessor architectures, that is based on SystemC and allows a transparent integration of instruction set simulators (ISSs) within the SystemC simulation framework. The integration is based on the well-known concept of bus wrapper, that realizes the interface between the ISS and the simulator. The proposed solution uses an ISS-wrapper interface based on the standard gdb remote debugging interface, and implements two alternative schemes that differ in the amount of communication they require. The two approaches provide different degrees of tradeoff between simulation granularity and speed, and show significant speedup with respect to a micro-architectural, full SystemC simulation of the system description.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114338149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Models of IP's for automotive virtual integration platforms 汽车虚拟集成平台IP模型
P. Giusto, J. Brunel, A. Ferrari, E. Fourgeau, L. Lavagno, B. O'Rourke, A. Sangiovanni-Vincentelli, Emanuele Guasto
Summary form only given.The concept of virtual integration platform plays a key role in any novel methodology that is trying to address earlier validation of distributed applications in regular and faulty conditions. The methodology must rely upon libraries that model the most important features of the commonly used IP's in the automotive segment such as FlexRay, the emerging bus protocol for safety critical applications supported by BMW, Daimler-Chrysler, Philips, Bosch, and Motorola, OSEK compliant RTOSes and protocol stacks, microprocessors such as Motoro/IBM PowerPC, Infineon 167, NEC v850, Tricore, ST 10, and Janus. We believe that tools must support the easy plug and play of the IP models in a seamless way to the user. For example, it must be possible to run a fast simulation at the token level (frames) to provide insights about the best network protocol configuration within a reasonable accuracy for the estimated frame latency. Next, it must be possible to export such a configuration to (semi)-automatically configure the downstream and more refined bus protocol models for the finer grain validation step. Both steps must rely upon interchangeable IP's with clear interfaces and trade-offs between simulation speed and accuracy of the timing estimates. In this paper, we present two examples of models of IP's that can be used at two different steps in the design exploration, the token-level/cycle approximate transaction based level and the cycle accurate level. The first example is the Universal Communication Model (UCM) that captures the main common features of the most relevant bus protocols such as topology, redundancy, arbitration, etc. The model enables quick token-level simulations. The user is able to determine the communication cycle layout and bus scheduling, k-matrix, and then export it for the configuration of downstream more refined models such as the Motorola FlexRay cycle accurate transaction based model. Bus delays are as important as task execution delays and RTOS switching overheads. In the second example we introduce Janus, a multi-processor micro-controller for power train applications. The cycle approximate transaction based model of Janus can be used to assess the ECU HW/SW partitioning, in particular to quickly explore different task scheduling and allocation. Then, this model is refined and exported to configure a HW/SW co-verification tool for the cycle accurate validation of the ECU HW/SW architecture. In an example scenario, an engine control ECU is providing information about the engine (e.g. engine revolution speed) to a gear control ECU over a CAN bus (the latter typically requires precise revolution speed to operate and could also require to set the engine operation condition). In this scenario, car and subsystem makers play different roles in order to provide a virtual model of the system to validate the functionality and the performance before going to implementation. The same models can then be used to march tow
只提供摘要形式。虚拟集成平台的概念在任何试图解决分布式应用程序在正常和故障条件下的早期验证的新方法中都起着关键作用。该方法必须依赖于对汽车领域常用IP的最重要特征进行建模的库,如FlexRay,宝马,戴姆勒-克莱斯勒,飞利浦,博世和摩托罗拉支持的安全关键应用的新兴总线协议,OSEK兼容的rtos和协议栈,微处理器,如Motoro/IBM PowerPC,英飞凌167,NEC v850, Tricore, ST 10和Janus。我们认为,工具必须支持IP模型的简单即插即用,以无缝的方式提供给用户。例如,必须能够在令牌级别(帧)上运行快速模拟,以便在估计帧延迟的合理精度范围内提供有关最佳网络协议配置的见解。接下来,必须能够导出这样的配置,以便(半)自动地配置下游和更精细的总线协议模型,以用于更细粒度的验证步骤。这两个步骤都必须依赖于具有清晰接口的可互换IP,并在模拟速度和时间估计的准确性之间进行权衡。在本文中,我们提出了IP模型的两个例子,它们可以在设计探索的两个不同步骤中使用,即基于令牌级别/周期近似事务级别和周期精确级别。第一个例子是通用通信模型(UCM),它捕获了最相关的总线协议的主要公共特性,如拓扑、冗余、仲裁等。该模型支持快速令牌级模拟。用户可以确定通信周期布局和总线调度,k矩阵,然后导出它用于配置下游更精细的模型,如Motorola FlexRay周期精确的基于事务的模型。总线延迟与任务执行延迟和RTOS切换开销一样重要。在第二个例子中,我们将介绍Janus,一种用于动力传动系统应用的多处理器微控制器。Janus基于周期近似事务的模型可以用来评估ECU硬件/软件分区,特别是可以快速探索不同任务的调度和分配。然后,对该模型进行细化并导出,以配置一个硬件/软件协同验证工具,用于ECU硬件/软件架构的周期精确验证。在一个示例场景中,发动机控制ECU通过CAN总线向齿轮控制ECU提供有关发动机的信息(例如发动机转速)(后者通常需要精确的转速才能运行,也可能需要设置发动机运行条件)。在这种情况下,汽车和子系统制造商扮演不同的角色,以便在实施之前提供系统的虚拟模型来验证功能和性能。然后可以使用相同的模型进行实现。
{"title":"Models of IP's for automotive virtual integration platforms","authors":"P. Giusto, J. Brunel, A. Ferrari, E. Fourgeau, L. Lavagno, B. O'Rourke, A. Sangiovanni-Vincentelli, Emanuele Guasto","doi":"10.1109/ICCD.2002.1106797","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106797","url":null,"abstract":"Summary form only given.The concept of virtual integration platform plays a key role in any novel methodology that is trying to address earlier validation of distributed applications in regular and faulty conditions. The methodology must rely upon libraries that model the most important features of the commonly used IP's in the automotive segment such as FlexRay, the emerging bus protocol for safety critical applications supported by BMW, Daimler-Chrysler, Philips, Bosch, and Motorola, OSEK compliant RTOSes and protocol stacks, microprocessors such as Motoro/IBM PowerPC, Infineon 167, NEC v850, Tricore, ST 10, and Janus. We believe that tools must support the easy plug and play of the IP models in a seamless way to the user. For example, it must be possible to run a fast simulation at the token level (frames) to provide insights about the best network protocol configuration within a reasonable accuracy for the estimated frame latency. Next, it must be possible to export such a configuration to (semi)-automatically configure the downstream and more refined bus protocol models for the finer grain validation step. Both steps must rely upon interchangeable IP's with clear interfaces and trade-offs between simulation speed and accuracy of the timing estimates. In this paper, we present two examples of models of IP's that can be used at two different steps in the design exploration, the token-level/cycle approximate transaction based level and the cycle accurate level. The first example is the Universal Communication Model (UCM) that captures the main common features of the most relevant bus protocols such as topology, redundancy, arbitration, etc. The model enables quick token-level simulations. The user is able to determine the communication cycle layout and bus scheduling, k-matrix, and then export it for the configuration of downstream more refined models such as the Motorola FlexRay cycle accurate transaction based model. Bus delays are as important as task execution delays and RTOS switching overheads. In the second example we introduce Janus, a multi-processor micro-controller for power train applications. The cycle approximate transaction based model of Janus can be used to assess the ECU HW/SW partitioning, in particular to quickly explore different task scheduling and allocation. Then, this model is refined and exported to configure a HW/SW co-verification tool for the cycle accurate validation of the ECU HW/SW architecture. In an example scenario, an engine control ECU is providing information about the engine (e.g. engine revolution speed) to a gear control ECU over a CAN bus (the latter typically requires precise revolution speed to operate and could also require to set the engine operation condition). In this scenario, car and subsystem makers play different roles in order to provide a virtual model of the system to validate the functionality and the performance before going to implementation. The same models can then be used to march tow","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127951916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive balanced computing (ABC) microprocessor using reconfigurable functional caches (RFCs) 采用可重构功能缓存(rfc)的自适应平衡计算(ABC)微处理器
Huesung Kim, Arun Kumar Somani, A. Tyagi
A general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. If the computing resources are variable to the needs of an application, a better performance can be achieved. Adaptive Balanced Computing (ABC) performs a dynamic resource configuration of on-chip cache memory by converting the cache into a specialized computing unit. With a small amount of additional logic and slightly modified microarchitecture, a part of the cache memory can be configured to perform specialized computations in a conventional processor. In this paper, we evaluate the ABC using RFCs in various cache organizations to see the impact of resource reconfiguration. The simulations with multimedia and DSP applications show that the resource configuration speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations.
通用计算处理器执行各种各样的功能。尽管通用处理器的性能一直在稳步提高,但某些软件技术,如多媒体和数字信号处理应用程序,对计算能力的要求越来越高。如果计算资源随应用程序的需要而变化,则可以获得更好的性能。自适应平衡计算(ABC)通过将缓存转换为专门的计算单元,对片上缓存进行动态资源配置。通过少量的附加逻辑和稍微修改的微体系结构,可以配置一部分缓存内存来执行传统处理器中的专门计算。在本文中,我们在各种缓存组织中使用rfc来评估ABC,以查看资源重新配置的影响。多媒体和DSP应用的仿真表明,整体应用的资源配置速度提高了1.04 ~ 3.94倍,核心计算的资源配置速度提高了2.61 ~ 27.4倍。
{"title":"Adaptive balanced computing (ABC) microprocessor using reconfigurable functional caches (RFCs)","authors":"Huesung Kim, Arun Kumar Somani, A. Tyagi","doi":"10.1109/ICCD.2002.1106761","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106761","url":null,"abstract":"A general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. If the computing resources are variable to the needs of an application, a better performance can be achieved. Adaptive Balanced Computing (ABC) performs a dynamic resource configuration of on-chip cache memory by converting the cache into a specialized computing unit. With a small amount of additional logic and slightly modified microarchitecture, a part of the cache memory can be configured to perform specialized computations in a conventional processor. In this paper, we evaluate the ABC using RFCs in various cache organizations to see the impact of resource reconfiguration. The simulations with multimedia and DSP applications show that the resource configuration speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129224983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing an asynchronous microcontroller using Pipefitter 用Pipefitter设计一个异步微控制器
I. Blunno, L. Lavagno
This paper discusses how Pipefitter, a tool chain that implements a fully automated synthesis flow for asynchronous circuits, can be used to design a simple asynchronous microcontroller. The use of RTL-like Verilog HDL as the input format makes the first steps of the design flow (i.e. specification and simulation) very easy for the designer. Pipefitter directly synthesizes the control unit as a hazard-free standard cell netlist, uses a genetic algorithm to perform binding and multiplexer optimization for the data path, allows the user to manually specify the binding, and can automatically pipeline a sequential specification. It also produces a synthesizable Verilog specification for the Data Path, as well as a set of scripts driving both its synthesis and timing analysis by state-of-the-art commercial synchronous RTL and logic synthesis tools. The automated insertion of matched delays completes the logic design, and hands off the netlist to the standard cell-based layout tools. The example presented in this paper shows how Pipefitter can be effectively used for the design of asynchronous application specific integrated circuits.
本文讨论了如何使用Pipefitter工具链来设计一个简单的异步微控制器,该工具链实现了异步电路的全自动合成流程。使用类似rtl的Verilog HDL作为输入格式使得设计流程的第一步(即规范和仿真)对设计人员来说非常容易。Pipefitter直接将控制单元合成为无危害的标准单元网表,使用遗传算法对数据路径进行绑定和多路优化,允许用户手动指定绑定,并可以自动流水线顺序规范。它还为数据路径生成了一个可合成的Verilog规范,以及一组通过最先进的商业同步RTL和逻辑合成工具驱动其合成和定时分析的脚本。匹配延迟的自动插入完成了逻辑设计,并将网表交给标准的基于单元的布局工具。文中给出的实例说明了Pipefitter如何有效地用于异步专用集成电路的设计。
{"title":"Designing an asynchronous microcontroller using Pipefitter","authors":"I. Blunno, L. Lavagno","doi":"10.1109/ICCD.2002.1106818","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106818","url":null,"abstract":"This paper discusses how Pipefitter, a tool chain that implements a fully automated synthesis flow for asynchronous circuits, can be used to design a simple asynchronous microcontroller. The use of RTL-like Verilog HDL as the input format makes the first steps of the design flow (i.e. specification and simulation) very easy for the designer. Pipefitter directly synthesizes the control unit as a hazard-free standard cell netlist, uses a genetic algorithm to perform binding and multiplexer optimization for the data path, allows the user to manually specify the binding, and can automatically pipeline a sequential specification. It also produces a synthesizable Verilog specification for the Data Path, as well as a set of scripts driving both its synthesis and timing analysis by state-of-the-art commercial synchronous RTL and logic synthesis tools. The automated insertion of matched delays completes the logic design, and hands off the netlist to the standard cell-based layout tools. The example presented in this paper shows how Pipefitter can be effectively used for the design of asynchronous application specific integrated circuits.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126249251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A stream processor development platform 一个流处理器开发平台
B. Serebrin, John Douglas Owens, Chen H. Chen, S. Crago, U. Kapasi, P. Mattson, Jinyung Namkoong, S. Rixner, W. Dally
We describe a hardware and software platform for developing streaming applications. Programmers write stream programs in high-level languages, and a set of software tools maps these programs to code that runs on a streaming hardware system. The hardware platform includes two Imagine stream processors, together providing 32 GFLOPS peak performance, and a high-speed onboard network to carry video and other data between peripherals and the Imagine processors.
我们描述了一个开发流媒体应用的硬件和软件平台。程序员用高级语言编写流程序,一组软件工具将这些程序映射到在流硬件系统上运行的代码。硬件平台包括两个Imagine流处理器,共同提供32 GFLOPS的峰值性能,以及一个高速板载网络,用于在外设和Imagine处理器之间传输视频和其他数据。
{"title":"A stream processor development platform","authors":"B. Serebrin, John Douglas Owens, Chen H. Chen, S. Crago, U. Kapasi, P. Mattson, Jinyung Namkoong, S. Rixner, W. Dally","doi":"10.1109/ICCD.2002.1106786","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106786","url":null,"abstract":"We describe a hardware and software platform for developing streaming applications. Programmers write stream programs in high-level languages, and a set of software tools maps these programs to code that runs on a streaming hardware system. The hardware platform includes two Imagine stream processors, together providing 32 GFLOPS peak performance, and a high-speed onboard network to carry video and other data between peripherals and the Imagine processors.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121625585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
On the impact of technology scaling on mixed PTL/static circuits 技术尺度对混合PTL/静态电路的影响
G. Cho, Tom Chen
We present the impact of technology scaling on mixed PTL/static circuits and compare the results with that of domino and conventional static CMOS. The state-of-the-art technologies of 0.18 /spl mu/m, 0.13 /spl mu/m, and 0.1 /spl mu/m were used in the study with V/sub dd/ being scaled accordingly. The benchmark suite consists of 10 circuits of varying complexities and they are actual circuits used in a state-of-the-art 64-bit microprocessor in the form of either dynamic or static CMOS circuits. The objective of this work is to determine how performance and power consumption scales with technology scaling. Our experimental results show that the mixed PTL/static circuit style is a promising alternative in power and power-delay product while achieving comparable delay to the dynamic circuit style.
我们提出了技术缩放对混合PTL/静态电路的影响,并将结果与多米诺骨牌和传统静态CMOS进行了比较。研究采用了0.18、0.13和0.1 /spl亩/m的最先进技术,并对V/sub / dd/进行了相应的缩放。基准套件由10个不同复杂程度的电路组成,它们是在最先进的64位微处理器中以动态或静态CMOS电路的形式使用的实际电路。这项工作的目标是确定性能和功耗如何随技术扩展而扩展。我们的实验结果表明,混合PTL/静态电路风格在功率和功率延迟产品方面是一种很有前途的选择,同时可以实现与动态电路风格相当的延迟。
{"title":"On the impact of technology scaling on mixed PTL/static circuits","authors":"G. Cho, Tom Chen","doi":"10.1109/ICCD.2002.1106789","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106789","url":null,"abstract":"We present the impact of technology scaling on mixed PTL/static circuits and compare the results with that of domino and conventional static CMOS. The state-of-the-art technologies of 0.18 /spl mu/m, 0.13 /spl mu/m, and 0.1 /spl mu/m were used in the study with V/sub dd/ being scaled accordingly. The benchmark suite consists of 10 circuits of varying complexities and they are actual circuits used in a state-of-the-art 64-bit microprocessor in the form of either dynamic or static CMOS circuits. The objective of this work is to determine how performance and power consumption scales with technology scaling. Our experimental results show that the mixed PTL/static circuit style is a promising alternative in power and power-delay product while achieving comparable delay to the dynamic circuit style.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115323947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Timing window applications in UltraSPARC-IIIi/spl trade/ microprocessor design 时序窗在ultrasparc - iii /spl贸易/微处理器设计中的应用
Rita Yu Chen, P. Yip, G. Konstadinidis, and J. N. Demas, F. Klass, Robert E. Mains, M. Schmitt, D. Bistry
This paper presents two timing window methodologies used in UltraSPARC-IIIi/spl trade/ microprocessor design. They have improved the accuracy of timing and noise analysis. In timing analysis, timing windows are applied to calculate effective Miller factors of coupling nets; in noise analysis, they are applied to waive false noise violations. Results show that by using timing windows in timing analysis, 72% of the CPU-level nets have more accurate Miller factors. Thus, it reduces the number of false timing paths. During the development of this application, a simple and practical convergence rule is defined to stop the iteration. Also, the timing window application on noise analysis has identified 42% of the CPU-level noise violations which can be waived in UltraSPARC-IIIi/spl trade/ chip. This significantly improved the productivity of the design.
本文介绍了用于ultrasparc - iii /spl交易/微处理器设计的两种定时窗口方法。它们提高了定时和噪声分析的准确性。在时序分析中,采用时序窗计算耦合网的有效米勒系数;在噪声分析中,它们被用于消除虚假噪声违例。结果表明,在时序分析中使用时序窗,72%的cpu级网络具有更精确的米勒因子。因此,它减少了错误定时路径的数量。在该应用程序的开发过程中,定义了一个简单实用的收敛规则来停止迭代。此外,噪声分析的时序窗口应用程序已经确定了42%的cpu级噪声违规,这些违规可以在ultrasparc - iii /spl交易/芯片中免除。这大大提高了设计的生产率。
{"title":"Timing window applications in UltraSPARC-IIIi/spl trade/ microprocessor design","authors":"Rita Yu Chen, P. Yip, G. Konstadinidis, and J. N. Demas, F. Klass, Robert E. Mains, M. Schmitt, D. Bistry","doi":"10.1109/ICCD.2002.1106764","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106764","url":null,"abstract":"This paper presents two timing window methodologies used in UltraSPARC-IIIi/spl trade/ microprocessor design. They have improved the accuracy of timing and noise analysis. In timing analysis, timing windows are applied to calculate effective Miller factors of coupling nets; in noise analysis, they are applied to waive false noise violations. Results show that by using timing windows in timing analysis, 72% of the CPU-level nets have more accurate Miller factors. Thus, it reduces the number of false timing paths. During the development of this application, a simple and practical convergence rule is defined to stop the iteration. Also, the timing window application on noise analysis has identified 42% of the CPU-level noise violations which can be waived in UltraSPARC-IIIi/spl trade/ chip. This significantly improved the productivity of the design.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122487286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Functional verification of the IBM zSeries eServer z900 system IBM zSeries eServer z900系统的功能验证
Joerg Walter
This paper presents an overview on how the zSeries eServer z900 system has been functionally verified. It describes the hierarchical structure of verification, starting with designer simulation, unit-simulation, chip-simulation up to system simulation. For each step, the tools, methods and goals of verification are described. It also presents a description of the IT environment used at the different levels of verification, especially of dedicated simulation hardware like accelerator and emulator machines used for system simulation and hardware/software co-verification.
本文概述了如何对zSeries eServer z900系统进行功能验证。它描述了验证的层次结构,从设计者仿真、单元仿真、芯片仿真到系统仿真。对于每一步,都描述了验证的工具、方法和目标。它还描述了在不同验证级别上使用的It环境,特别是用于系统仿真和硬件/软件协同验证的专用仿真硬件,如加速器和仿真机。
{"title":"Functional verification of the IBM zSeries eServer z900 system","authors":"Joerg Walter","doi":"10.1109/ICCD.2002.1106741","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106741","url":null,"abstract":"This paper presents an overview on how the zSeries eServer z900 system has been functionally verified. It describes the hierarchical structure of verification, starting with designer simulation, unit-simulation, chip-simulation up to system simulation. For each step, the tools, methods and goals of verification are described. It also presents a description of the IT environment used at the different levels of verification, especially of dedicated simulation hardware like accelerator and emulator machines used for system simulation and hardware/software co-verification.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131503829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Balancing the interconnect topology for arrays of processors between cost and power 在成本和功耗之间平衡处理器阵列的互连拓扑
Esther Y. Cheng, Feng Zhou, B. Yao, Chung-Kuan Cheng, R. Graham
High performance SoC requires nonblocking interconnections between an array of processors built on one chip. With the advent of deep sub-micron technologies, switches are becoming much cheaper while wires are still expensive. Therefore, optimization efforts should focus on the wire resources. In this paper, we devise air objective function to balance the interconnect topology between routing area and power dissipation. Based on the objective function, we find the best one-dimensional and two-dimensional nonblocking interconnect architectures. Furthermore, we define a derivative benefit and devise a strategy for improving the performance of hierarchical nonblocking interconnect architectures and derive optimized results.
高性能SoC需要在一个芯片上构建的处理器阵列之间的非阻塞互连。随着深亚微米技术的出现,开关变得越来越便宜,而电线仍然昂贵。因此,优化工作应该集中在线材资源上。在本文中,我们设计了空中目标函数来平衡路由面积和功耗之间的互连拓扑。基于目标函数,我们找到了最佳的一维和二维无阻塞互连结构。此外,我们定义了派生效益,并设计了一种策略来提高分层非阻塞互连架构的性能,并得出了优化结果。
{"title":"Balancing the interconnect topology for arrays of processors between cost and power","authors":"Esther Y. Cheng, Feng Zhou, B. Yao, Chung-Kuan Cheng, R. Graham","doi":"10.1109/ICCD.2002.1106767","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106767","url":null,"abstract":"High performance SoC requires nonblocking interconnections between an array of processors built on one chip. With the advent of deep sub-micron technologies, switches are becoming much cheaper while wires are still expensive. Therefore, optimization efforts should focus on the wire resources. In this paper, we devise air objective function to balance the interconnect topology between routing area and power dissipation. Based on the objective function, we find the best one-dimensional and two-dimensional nonblocking interconnect architectures. Furthermore, we define a derivative benefit and devise a strategy for improving the performance of hierarchical nonblocking interconnect architectures and derive optimized results.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122663769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1