Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106744
S. Liao, Narendra V. Shenoy, W. Nicholls
We present the tile-cached kd-tree, an efficient external-memory (disk) implementation of two-dimensional region query for use in a detailed area router. Most researchers have heretofore focused on in-memory algorithms. However as the need to tackle very large problems increases, conventional in-memory algorithms suffer from unpredictable caching and paging behavior and their performance may degrade considerably. In addition, since the region-query data structure is only part of the overall system, its consumption of large memory resources affects other parts of the system as well. Our implementation takes advantage of spatial locality in the detailed-routing process. We partition the routing space into tiles, each storing the data of objects (rectangles) that lie strictly within it. Objects that cross tile boundaries are separately stored. The data within a tile are then written out to disk, and a configurable cache is used to hold in memory the most recently visited tiles. Experimental results on large real-life routing problems show that this scheme significantly reduces memory usage with tolerable performance penalty.
{"title":"An efficient external-memory implementation of region query with application to area routing","authors":"S. Liao, Narendra V. Shenoy, W. Nicholls","doi":"10.1109/ICCD.2002.1106744","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106744","url":null,"abstract":"We present the tile-cached kd-tree, an efficient external-memory (disk) implementation of two-dimensional region query for use in a detailed area router. Most researchers have heretofore focused on in-memory algorithms. However as the need to tackle very large problems increases, conventional in-memory algorithms suffer from unpredictable caching and paging behavior and their performance may degrade considerably. In addition, since the region-query data structure is only part of the overall system, its consumption of large memory resources affects other parts of the system as well. Our implementation takes advantage of spatial locality in the detailed-routing process. We partition the routing space into tiles, each storing the data of objects (rectangles) that lie strictly within it. Objects that cross tile boundaries are separately stored. The data within a tile are then written out to disk, and a configurable cache is used to hold in memory the most recently visited tiles. Experimental results on large real-life routing problems show that this scheme significantly reduces memory usage with tolerable performance penalty.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132898856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106819
L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, M. Poncino
We present a co-simulation environment for multiprocessor architectures, that is based on SystemC and allows a transparent integration of instruction set simulators (ISSs) within the SystemC simulation framework. The integration is based on the well-known concept of bus wrapper, that realizes the interface between the ISS and the simulator. The proposed solution uses an ISS-wrapper interface based on the standard gdb remote debugging interface, and implements two alternative schemes that differ in the amount of communication they require. The two approaches provide different degrees of tradeoff between simulation granularity and speed, and show significant speedup with respect to a micro-architectural, full SystemC simulation of the system description.
{"title":"Legacy SystemC co-simulation of multi-processor systems-on-chip","authors":"L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, M. Poncino","doi":"10.1109/ICCD.2002.1106819","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106819","url":null,"abstract":"We present a co-simulation environment for multiprocessor architectures, that is based on SystemC and allows a transparent integration of instruction set simulators (ISSs) within the SystemC simulation framework. The integration is based on the well-known concept of bus wrapper, that realizes the interface between the ISS and the simulator. The proposed solution uses an ISS-wrapper interface based on the standard gdb remote debugging interface, and implements two alternative schemes that differ in the amount of communication they require. The two approaches provide different degrees of tradeoff between simulation granularity and speed, and show significant speedup with respect to a micro-architectural, full SystemC simulation of the system description.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114338149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106797
P. Giusto, J. Brunel, A. Ferrari, E. Fourgeau, L. Lavagno, B. O'Rourke, A. Sangiovanni-Vincentelli, Emanuele Guasto
Summary form only given.The concept of virtual integration platform plays a key role in any novel methodology that is trying to address earlier validation of distributed applications in regular and faulty conditions. The methodology must rely upon libraries that model the most important features of the commonly used IP's in the automotive segment such as FlexRay, the emerging bus protocol for safety critical applications supported by BMW, Daimler-Chrysler, Philips, Bosch, and Motorola, OSEK compliant RTOSes and protocol stacks, microprocessors such as Motoro/IBM PowerPC, Infineon 167, NEC v850, Tricore, ST 10, and Janus. We believe that tools must support the easy plug and play of the IP models in a seamless way to the user. For example, it must be possible to run a fast simulation at the token level (frames) to provide insights about the best network protocol configuration within a reasonable accuracy for the estimated frame latency. Next, it must be possible to export such a configuration to (semi)-automatically configure the downstream and more refined bus protocol models for the finer grain validation step. Both steps must rely upon interchangeable IP's with clear interfaces and trade-offs between simulation speed and accuracy of the timing estimates. In this paper, we present two examples of models of IP's that can be used at two different steps in the design exploration, the token-level/cycle approximate transaction based level and the cycle accurate level. The first example is the Universal Communication Model (UCM) that captures the main common features of the most relevant bus protocols such as topology, redundancy, arbitration, etc. The model enables quick token-level simulations. The user is able to determine the communication cycle layout and bus scheduling, k-matrix, and then export it for the configuration of downstream more refined models such as the Motorola FlexRay cycle accurate transaction based model. Bus delays are as important as task execution delays and RTOS switching overheads. In the second example we introduce Janus, a multi-processor micro-controller for power train applications. The cycle approximate transaction based model of Janus can be used to assess the ECU HW/SW partitioning, in particular to quickly explore different task scheduling and allocation. Then, this model is refined and exported to configure a HW/SW co-verification tool for the cycle accurate validation of the ECU HW/SW architecture. In an example scenario, an engine control ECU is providing information about the engine (e.g. engine revolution speed) to a gear control ECU over a CAN bus (the latter typically requires precise revolution speed to operate and could also require to set the engine operation condition). In this scenario, car and subsystem makers play different roles in order to provide a virtual model of the system to validate the functionality and the performance before going to implementation. The same models can then be used to march tow
只提供摘要形式。虚拟集成平台的概念在任何试图解决分布式应用程序在正常和故障条件下的早期验证的新方法中都起着关键作用。该方法必须依赖于对汽车领域常用IP的最重要特征进行建模的库,如FlexRay,宝马,戴姆勒-克莱斯勒,飞利浦,博世和摩托罗拉支持的安全关键应用的新兴总线协议,OSEK兼容的rtos和协议栈,微处理器,如Motoro/IBM PowerPC,英飞凌167,NEC v850, Tricore, ST 10和Janus。我们认为,工具必须支持IP模型的简单即插即用,以无缝的方式提供给用户。例如,必须能够在令牌级别(帧)上运行快速模拟,以便在估计帧延迟的合理精度范围内提供有关最佳网络协议配置的见解。接下来,必须能够导出这样的配置,以便(半)自动地配置下游和更精细的总线协议模型,以用于更细粒度的验证步骤。这两个步骤都必须依赖于具有清晰接口的可互换IP,并在模拟速度和时间估计的准确性之间进行权衡。在本文中,我们提出了IP模型的两个例子,它们可以在设计探索的两个不同步骤中使用,即基于令牌级别/周期近似事务级别和周期精确级别。第一个例子是通用通信模型(UCM),它捕获了最相关的总线协议的主要公共特性,如拓扑、冗余、仲裁等。该模型支持快速令牌级模拟。用户可以确定通信周期布局和总线调度,k矩阵,然后导出它用于配置下游更精细的模型,如Motorola FlexRay周期精确的基于事务的模型。总线延迟与任务执行延迟和RTOS切换开销一样重要。在第二个例子中,我们将介绍Janus,一种用于动力传动系统应用的多处理器微控制器。Janus基于周期近似事务的模型可以用来评估ECU硬件/软件分区,特别是可以快速探索不同任务的调度和分配。然后,对该模型进行细化并导出,以配置一个硬件/软件协同验证工具,用于ECU硬件/软件架构的周期精确验证。在一个示例场景中,发动机控制ECU通过CAN总线向齿轮控制ECU提供有关发动机的信息(例如发动机转速)(后者通常需要精确的转速才能运行,也可能需要设置发动机运行条件)。在这种情况下,汽车和子系统制造商扮演不同的角色,以便在实施之前提供系统的虚拟模型来验证功能和性能。然后可以使用相同的模型进行实现。
{"title":"Models of IP's for automotive virtual integration platforms","authors":"P. Giusto, J. Brunel, A. Ferrari, E. Fourgeau, L. Lavagno, B. O'Rourke, A. Sangiovanni-Vincentelli, Emanuele Guasto","doi":"10.1109/ICCD.2002.1106797","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106797","url":null,"abstract":"Summary form only given.The concept of virtual integration platform plays a key role in any novel methodology that is trying to address earlier validation of distributed applications in regular and faulty conditions. The methodology must rely upon libraries that model the most important features of the commonly used IP's in the automotive segment such as FlexRay, the emerging bus protocol for safety critical applications supported by BMW, Daimler-Chrysler, Philips, Bosch, and Motorola, OSEK compliant RTOSes and protocol stacks, microprocessors such as Motoro/IBM PowerPC, Infineon 167, NEC v850, Tricore, ST 10, and Janus. We believe that tools must support the easy plug and play of the IP models in a seamless way to the user. For example, it must be possible to run a fast simulation at the token level (frames) to provide insights about the best network protocol configuration within a reasonable accuracy for the estimated frame latency. Next, it must be possible to export such a configuration to (semi)-automatically configure the downstream and more refined bus protocol models for the finer grain validation step. Both steps must rely upon interchangeable IP's with clear interfaces and trade-offs between simulation speed and accuracy of the timing estimates. In this paper, we present two examples of models of IP's that can be used at two different steps in the design exploration, the token-level/cycle approximate transaction based level and the cycle accurate level. The first example is the Universal Communication Model (UCM) that captures the main common features of the most relevant bus protocols such as topology, redundancy, arbitration, etc. The model enables quick token-level simulations. The user is able to determine the communication cycle layout and bus scheduling, k-matrix, and then export it for the configuration of downstream more refined models such as the Motorola FlexRay cycle accurate transaction based model. Bus delays are as important as task execution delays and RTOS switching overheads. In the second example we introduce Janus, a multi-processor micro-controller for power train applications. The cycle approximate transaction based model of Janus can be used to assess the ECU HW/SW partitioning, in particular to quickly explore different task scheduling and allocation. Then, this model is refined and exported to configure a HW/SW co-verification tool for the cycle accurate validation of the ECU HW/SW architecture. In an example scenario, an engine control ECU is providing information about the engine (e.g. engine revolution speed) to a gear control ECU over a CAN bus (the latter typically requires precise revolution speed to operate and could also require to set the engine operation condition). In this scenario, car and subsystem makers play different roles in order to provide a virtual model of the system to validate the functionality and the performance before going to implementation. The same models can then be used to march tow","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127951916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106761
Huesung Kim, Arun Kumar Somani, A. Tyagi
A general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. If the computing resources are variable to the needs of an application, a better performance can be achieved. Adaptive Balanced Computing (ABC) performs a dynamic resource configuration of on-chip cache memory by converting the cache into a specialized computing unit. With a small amount of additional logic and slightly modified microarchitecture, a part of the cache memory can be configured to perform specialized computations in a conventional processor. In this paper, we evaluate the ABC using RFCs in various cache organizations to see the impact of resource reconfiguration. The simulations with multimedia and DSP applications show that the resource configuration speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations.
{"title":"Adaptive balanced computing (ABC) microprocessor using reconfigurable functional caches (RFCs)","authors":"Huesung Kim, Arun Kumar Somani, A. Tyagi","doi":"10.1109/ICCD.2002.1106761","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106761","url":null,"abstract":"A general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. If the computing resources are variable to the needs of an application, a better performance can be achieved. Adaptive Balanced Computing (ABC) performs a dynamic resource configuration of on-chip cache memory by converting the cache into a specialized computing unit. With a small amount of additional logic and slightly modified microarchitecture, a part of the cache memory can be configured to perform specialized computations in a conventional processor. In this paper, we evaluate the ABC using RFCs in various cache organizations to see the impact of resource reconfiguration. The simulations with multimedia and DSP applications show that the resource configuration speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129224983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106818
I. Blunno, L. Lavagno
This paper discusses how Pipefitter, a tool chain that implements a fully automated synthesis flow for asynchronous circuits, can be used to design a simple asynchronous microcontroller. The use of RTL-like Verilog HDL as the input format makes the first steps of the design flow (i.e. specification and simulation) very easy for the designer. Pipefitter directly synthesizes the control unit as a hazard-free standard cell netlist, uses a genetic algorithm to perform binding and multiplexer optimization for the data path, allows the user to manually specify the binding, and can automatically pipeline a sequential specification. It also produces a synthesizable Verilog specification for the Data Path, as well as a set of scripts driving both its synthesis and timing analysis by state-of-the-art commercial synchronous RTL and logic synthesis tools. The automated insertion of matched delays completes the logic design, and hands off the netlist to the standard cell-based layout tools. The example presented in this paper shows how Pipefitter can be effectively used for the design of asynchronous application specific integrated circuits.
{"title":"Designing an asynchronous microcontroller using Pipefitter","authors":"I. Blunno, L. Lavagno","doi":"10.1109/ICCD.2002.1106818","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106818","url":null,"abstract":"This paper discusses how Pipefitter, a tool chain that implements a fully automated synthesis flow for asynchronous circuits, can be used to design a simple asynchronous microcontroller. The use of RTL-like Verilog HDL as the input format makes the first steps of the design flow (i.e. specification and simulation) very easy for the designer. Pipefitter directly synthesizes the control unit as a hazard-free standard cell netlist, uses a genetic algorithm to perform binding and multiplexer optimization for the data path, allows the user to manually specify the binding, and can automatically pipeline a sequential specification. It also produces a synthesizable Verilog specification for the Data Path, as well as a set of scripts driving both its synthesis and timing analysis by state-of-the-art commercial synchronous RTL and logic synthesis tools. The automated insertion of matched delays completes the logic design, and hands off the netlist to the standard cell-based layout tools. The example presented in this paper shows how Pipefitter can be effectively used for the design of asynchronous application specific integrated circuits.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126249251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106786
B. Serebrin, John Douglas Owens, Chen H. Chen, S. Crago, U. Kapasi, P. Mattson, Jinyung Namkoong, S. Rixner, W. Dally
We describe a hardware and software platform for developing streaming applications. Programmers write stream programs in high-level languages, and a set of software tools maps these programs to code that runs on a streaming hardware system. The hardware platform includes two Imagine stream processors, together providing 32 GFLOPS peak performance, and a high-speed onboard network to carry video and other data between peripherals and the Imagine processors.
{"title":"A stream processor development platform","authors":"B. Serebrin, John Douglas Owens, Chen H. Chen, S. Crago, U. Kapasi, P. Mattson, Jinyung Namkoong, S. Rixner, W. Dally","doi":"10.1109/ICCD.2002.1106786","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106786","url":null,"abstract":"We describe a hardware and software platform for developing streaming applications. Programmers write stream programs in high-level languages, and a set of software tools maps these programs to code that runs on a streaming hardware system. The hardware platform includes two Imagine stream processors, together providing 32 GFLOPS peak performance, and a high-speed onboard network to carry video and other data between peripherals and the Imagine processors.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121625585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106789
G. Cho, Tom Chen
We present the impact of technology scaling on mixed PTL/static circuits and compare the results with that of domino and conventional static CMOS. The state-of-the-art technologies of 0.18 /spl mu/m, 0.13 /spl mu/m, and 0.1 /spl mu/m were used in the study with V/sub dd/ being scaled accordingly. The benchmark suite consists of 10 circuits of varying complexities and they are actual circuits used in a state-of-the-art 64-bit microprocessor in the form of either dynamic or static CMOS circuits. The objective of this work is to determine how performance and power consumption scales with technology scaling. Our experimental results show that the mixed PTL/static circuit style is a promising alternative in power and power-delay product while achieving comparable delay to the dynamic circuit style.
{"title":"On the impact of technology scaling on mixed PTL/static circuits","authors":"G. Cho, Tom Chen","doi":"10.1109/ICCD.2002.1106789","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106789","url":null,"abstract":"We present the impact of technology scaling on mixed PTL/static circuits and compare the results with that of domino and conventional static CMOS. The state-of-the-art technologies of 0.18 /spl mu/m, 0.13 /spl mu/m, and 0.1 /spl mu/m were used in the study with V/sub dd/ being scaled accordingly. The benchmark suite consists of 10 circuits of varying complexities and they are actual circuits used in a state-of-the-art 64-bit microprocessor in the form of either dynamic or static CMOS circuits. The objective of this work is to determine how performance and power consumption scales with technology scaling. Our experimental results show that the mixed PTL/static circuit style is a promising alternative in power and power-delay product while achieving comparable delay to the dynamic circuit style.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115323947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106764
Rita Yu Chen, P. Yip, G. Konstadinidis, and J. N. Demas, F. Klass, Robert E. Mains, M. Schmitt, D. Bistry
This paper presents two timing window methodologies used in UltraSPARC-IIIi/spl trade/ microprocessor design. They have improved the accuracy of timing and noise analysis. In timing analysis, timing windows are applied to calculate effective Miller factors of coupling nets; in noise analysis, they are applied to waive false noise violations. Results show that by using timing windows in timing analysis, 72% of the CPU-level nets have more accurate Miller factors. Thus, it reduces the number of false timing paths. During the development of this application, a simple and practical convergence rule is defined to stop the iteration. Also, the timing window application on noise analysis has identified 42% of the CPU-level noise violations which can be waived in UltraSPARC-IIIi/spl trade/ chip. This significantly improved the productivity of the design.
本文介绍了用于ultrasparc - iii /spl交易/微处理器设计的两种定时窗口方法。它们提高了定时和噪声分析的准确性。在时序分析中,采用时序窗计算耦合网的有效米勒系数;在噪声分析中,它们被用于消除虚假噪声违例。结果表明,在时序分析中使用时序窗,72%的cpu级网络具有更精确的米勒因子。因此,它减少了错误定时路径的数量。在该应用程序的开发过程中,定义了一个简单实用的收敛规则来停止迭代。此外,噪声分析的时序窗口应用程序已经确定了42%的cpu级噪声违规,这些违规可以在ultrasparc - iii /spl交易/芯片中免除。这大大提高了设计的生产率。
{"title":"Timing window applications in UltraSPARC-IIIi/spl trade/ microprocessor design","authors":"Rita Yu Chen, P. Yip, G. Konstadinidis, and J. N. Demas, F. Klass, Robert E. Mains, M. Schmitt, D. Bistry","doi":"10.1109/ICCD.2002.1106764","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106764","url":null,"abstract":"This paper presents two timing window methodologies used in UltraSPARC-IIIi/spl trade/ microprocessor design. They have improved the accuracy of timing and noise analysis. In timing analysis, timing windows are applied to calculate effective Miller factors of coupling nets; in noise analysis, they are applied to waive false noise violations. Results show that by using timing windows in timing analysis, 72% of the CPU-level nets have more accurate Miller factors. Thus, it reduces the number of false timing paths. During the development of this application, a simple and practical convergence rule is defined to stop the iteration. Also, the timing window application on noise analysis has identified 42% of the CPU-level noise violations which can be waived in UltraSPARC-IIIi/spl trade/ chip. This significantly improved the productivity of the design.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122487286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106741
Joerg Walter
This paper presents an overview on how the zSeries eServer z900 system has been functionally verified. It describes the hierarchical structure of verification, starting with designer simulation, unit-simulation, chip-simulation up to system simulation. For each step, the tools, methods and goals of verification are described. It also presents a description of the IT environment used at the different levels of verification, especially of dedicated simulation hardware like accelerator and emulator machines used for system simulation and hardware/software co-verification.
{"title":"Functional verification of the IBM zSeries eServer z900 system","authors":"Joerg Walter","doi":"10.1109/ICCD.2002.1106741","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106741","url":null,"abstract":"This paper presents an overview on how the zSeries eServer z900 system has been functionally verified. It describes the hierarchical structure of verification, starting with designer simulation, unit-simulation, chip-simulation up to system simulation. For each step, the tools, methods and goals of verification are described. It also presents a description of the IT environment used at the different levels of verification, especially of dedicated simulation hardware like accelerator and emulator machines used for system simulation and hardware/software co-verification.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131503829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106767
Esther Y. Cheng, Feng Zhou, B. Yao, Chung-Kuan Cheng, R. Graham
High performance SoC requires nonblocking interconnections between an array of processors built on one chip. With the advent of deep sub-micron technologies, switches are becoming much cheaper while wires are still expensive. Therefore, optimization efforts should focus on the wire resources. In this paper, we devise air objective function to balance the interconnect topology between routing area and power dissipation. Based on the objective function, we find the best one-dimensional and two-dimensional nonblocking interconnect architectures. Furthermore, we define a derivative benefit and devise a strategy for improving the performance of hierarchical nonblocking interconnect architectures and derive optimized results.
{"title":"Balancing the interconnect topology for arrays of processors between cost and power","authors":"Esther Y. Cheng, Feng Zhou, B. Yao, Chung-Kuan Cheng, R. Graham","doi":"10.1109/ICCD.2002.1106767","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106767","url":null,"abstract":"High performance SoC requires nonblocking interconnections between an array of processors built on one chip. With the advent of deep sub-micron technologies, switches are becoming much cheaper while wires are still expensive. Therefore, optimization efforts should focus on the wire resources. In this paper, we devise air objective function to balance the interconnect topology between routing area and power dissipation. Based on the objective function, we find the best one-dimensional and two-dimensional nonblocking interconnect architectures. Furthermore, we define a derivative benefit and devise a strategy for improving the performance of hierarchical nonblocking interconnect architectures and derive optimized results.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122663769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}