首页 > 最新文献

Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors最新文献

英文 中文
Estimation of sequential circuit activity considering spatial and temporal correlations 考虑空间和时间相关性的顺序电路活动估计
T. Chou, K. Roy
We present an exact and an approximate method for estimating signal activity at the internal nodes of sequential logic circuits. The methodology takes spatial and temporal correlations of logic signals into consideration. Given the state transition graph (STG) of a finite state machine (FSM), we create an extended state transition graph (ESTG), where the temporal correlations of the input signals are explicitly represented. From the graph we derive the equations to calculate exact signal probabilities and activities. For large circuits an approximate method for calculating the activities by unrolling the next state logic is proposed. Experimental results show that if temporal and spatial correlations are not considered, the switching activities of the internal nodes can be off by more than 40% compared to simulation based techniques. However, the results of the approximate method proposed in the paper is within 5% of logic simulation results.
我们提出了一种精确和近似的方法来估计序列逻辑电路内部节点的信号活度。该方法考虑了逻辑信号的时空相关性。给定有限状态机(FSM)的状态转移图(STG),我们创建扩展状态转移图(ESTG),其中显式表示输入信号的时间相关性。从图中我们推导出精确计算信号概率和活动的方程。对于大型电路,提出了一种通过展开下一状态逻辑来计算活动的近似方法。实验结果表明,如果不考虑时间和空间相关性,与基于仿真的技术相比,内部节点的切换活动可以减少40%以上。然而,本文提出的近似方法的结果与逻辑仿真结果的误差在5%以内。
{"title":"Estimation of sequential circuit activity considering spatial and temporal correlations","authors":"T. Chou, K. Roy","doi":"10.1109/ICCD.1995.528926","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528926","url":null,"abstract":"We present an exact and an approximate method for estimating signal activity at the internal nodes of sequential logic circuits. The methodology takes spatial and temporal correlations of logic signals into consideration. Given the state transition graph (STG) of a finite state machine (FSM), we create an extended state transition graph (ESTG), where the temporal correlations of the input signals are explicitly represented. From the graph we derive the equations to calculate exact signal probabilities and activities. For large circuits an approximate method for calculating the activities by unrolling the next state logic is proposed. Experimental results show that if temporal and spatial correlations are not considered, the switching activities of the internal nodes can be off by more than 40% compared to simulation based techniques. However, the results of the approximate method proposed in the paper is within 5% of logic simulation results.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130446464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
EPNR: an energy-efficient automated layout synthesis package EPNR:高效节能的自动布局合成包
G. Holt, A. Tyagi
This paper reports our experiences with incorporating energy (or switched capacitance) based algorithms into an automated layout synthesis system based on standard cells. Our experimental results show an average savings of 18.5% in interconnect energy at a cost of about 6.2% area increase relative to area-minimized layouts on MCNC Logic Synthesis '93 benchmarks. The basic premise is that the wires with high switching should be made short even if it involves stretching several low switching wires. We modified an existing layout system, VPNR, to include these techniques during the placement and global routing phases. Attempts to include switching probabilities into channel routing did not produce appreciable results. Our experiments also lend insight into the composition of the solution space for VLSI energy minimization problems.
本文报告了我们将基于能量(或开关电容)的算法纳入基于标准单元的自动布局综合系统的经验。我们的实验结果表明,相对于MCNC Logic Synthesis’93基准上的面积最小化布局,互连能量平均节省18.5%,面积增加约6.2%。基本前提是,即使涉及拉伸几根低开关线,也应使高开关线缩短。我们修改了现有的布局系统VPNR,以便在布局和全局路由阶段包含这些技术。尝试将交换概率包含在信道路由中并没有产生明显的结果。我们的实验也有助于深入了解超大规模集成电路能量最小化问题的解决方案空间的组成。
{"title":"EPNR: an energy-efficient automated layout synthesis package","authors":"G. Holt, A. Tyagi","doi":"10.1109/ICCD.1995.528814","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528814","url":null,"abstract":"This paper reports our experiences with incorporating energy (or switched capacitance) based algorithms into an automated layout synthesis system based on standard cells. Our experimental results show an average savings of 18.5% in interconnect energy at a cost of about 6.2% area increase relative to area-minimized layouts on MCNC Logic Synthesis '93 benchmarks. The basic premise is that the wires with high switching should be made short even if it involves stretching several low switching wires. We modified an existing layout system, VPNR, to include these techniques during the placement and global routing phases. Attempts to include switching probabilities into channel routing did not produce appreciable results. Our experiments also lend insight into the composition of the solution space for VLSI energy minimization problems.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129217425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A high-performance asynchronous SCSI controller 高性能异步SCSI控制器
K. Yun, D. Dill
We describe the design of a high performance asynchronous SCSI (small computer systems interface) controller data path and the associated control circuits. The data path is an asynchronous pipeline and the control circuits for the data path are built out of extended burst-mode machines. This design is functionally compatible with a widely used commercial SCSI controller and was simulated correctly with respect to all of the applicable test vectors used for the commercial design. The technology used for this design is a 0.8 /spl mu/m CMOS standard cell. The performance is limited by the SCSI specification, not the design itself, and the area is competitive with the commercial design. This design improves the data transfer throughput by up to 2.5 times from previous work by incorporating a FIFO and a distributed control scheme based on extended burst-mode state machines.
本文描述了一种高性能异步SCSI(小型计算机系统接口)控制器的数据路径和相关控制电路的设计。数据路径是异步管道,数据路径的控制电路由扩展突发模式机器构建。该设计在功能上与广泛使用的商业SCSI控制器兼容,并且针对用于商业设计的所有适用测试向量进行了正确的模拟。本设计采用的技术是0.8 /spl mu/m CMOS标准电池。性能受限于SCSI规范,而不是设计本身,并且该领域与商业设计竞争。该设计通过结合FIFO和基于扩展突发模式状态机的分布式控制方案,将数据传输吞吐量提高了2.5倍。
{"title":"A high-performance asynchronous SCSI controller","authors":"K. Yun, D. Dill","doi":"10.1109/ICCD.1995.528789","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528789","url":null,"abstract":"We describe the design of a high performance asynchronous SCSI (small computer systems interface) controller data path and the associated control circuits. The data path is an asynchronous pipeline and the control circuits for the data path are built out of extended burst-mode machines. This design is functionally compatible with a widely used commercial SCSI controller and was simulated correctly with respect to all of the applicable test vectors used for the commercial design. The technology used for this design is a 0.8 /spl mu/m CMOS standard cell. The performance is limited by the SCSI specification, not the design itself, and the area is competitive with the commercial design. This design improves the data transfer throughput by up to 2.5 times from previous work by incorporating a FIFO and a distributed control scheme based on extended burst-mode state machines.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131438935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Concurrent timing optimization of latch-based digital systems 基于锁存器的数字系统并行时序优化
H. Hsieh, Wentai Liu, R. Cavin, C. T. Gray
Many techniques have been proposed to optimize digital system timing. Each technique can be advantageous in particular applications, however they are most often applied individually rather than concurrently. The framework presented here allows for concurrent timing optimization using retiming, intentional clock skew, and wave pipelining for latch-based designed systems with single or multi-phase clocking. This optimization is formulated as a mixed integer linear program. Our integrated framework also includes a new optimization technique called resynchronization which allows for the insertion of latches in the shortest paths and thus avoids race conditions. Our work has been applied to several designs and is able to significantly reduce the clock period.
已经提出了许多优化数字系统时序的技术。每种技术在特定的应用程序中都是有利的,但是它们通常是单独应用而不是同时应用。本文提出的框架允许使用重定时、有意时钟倾斜和波流水线对基于锁存器的设计系统进行并发时序优化,这些系统具有单相或多相时钟。该优化被表述为一个混合整数线性规划。我们的集成框架还包括一种新的优化技术,称为重新同步,它允许在最短路径中插入锁存器,从而避免竞争条件。我们的工作已经应用到几个设计中,并且能够显着减少时钟周期。
{"title":"Concurrent timing optimization of latch-based digital systems","authors":"H. Hsieh, Wentai Liu, R. Cavin, C. T. Gray","doi":"10.1109/ICCD.1995.528941","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528941","url":null,"abstract":"Many techniques have been proposed to optimize digital system timing. Each technique can be advantageous in particular applications, however they are most often applied individually rather than concurrently. The framework presented here allows for concurrent timing optimization using retiming, intentional clock skew, and wave pipelining for latch-based designed systems with single or multi-phase clocking. This optimization is formulated as a mixed integer linear program. Our integrated framework also includes a new optimization technique called resynchronization which allows for the insertion of latches in the shortest paths and thus avoids race conditions. Our work has been applied to several designs and is able to significantly reduce the clock period.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132230045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A superscalar RISC processor with pseudo vector processing feature 具有伪向量处理特性的超标量RISC处理器
K. Shimamura, Shigeya Tanaka, Tetsuya Shimomura, T. Hotta, E. Kamada, H. Sawamoto, Teruhisa Shimizu, K. Nakazawa
A novel architectural extension, in which floating-point data are transferred directly from main memory to floating-point registers, has been successfully implemented in a superscalar RISC processor. This extension allows main memory access throughput of 1.2 Gbyte/s, and effective performance reaches 267 MFLOPS (89% of the peak performance) for typical floating-point applications. The processor utilizes 0.3-micron 4-level metal CMOS technology with 2.5 V power supply and contains 3.9 million transistors in 15.7 mm/spl times/15.7 mm die size. Only 4.5% of the die area is used for the extension. Pipeline stage optimization and scoreboard-based dependency check method allow the extension to be realized without affecting the operating frequency.
在标量RISC处理器上成功地实现了一种新的结构扩展,将浮点数据从主存直接传输到浮点寄存器。这个扩展允许主内存访问吞吐量1.2 gb /s,有效性能达到267 MFLOPS(峰值性能的89%)典型的浮点应用程序。该处理器采用0.3微米4级金属CMOS技术,电源为2.5 V,芯片尺寸为15.7 mm/ sp1倍/15.7 mm,包含390万个晶体管。只有4.5%的模具面积用于扩展。管道阶段优化和基于记分牌的依赖检查方法允许在不影响运行频率的情况下实现扩展。
{"title":"A superscalar RISC processor with pseudo vector processing feature","authors":"K. Shimamura, Shigeya Tanaka, Tetsuya Shimomura, T. Hotta, E. Kamada, H. Sawamoto, Teruhisa Shimizu, K. Nakazawa","doi":"10.1109/ICCD.1995.528797","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528797","url":null,"abstract":"A novel architectural extension, in which floating-point data are transferred directly from main memory to floating-point registers, has been successfully implemented in a superscalar RISC processor. This extension allows main memory access throughput of 1.2 Gbyte/s, and effective performance reaches 267 MFLOPS (89% of the peak performance) for typical floating-point applications. The processor utilizes 0.3-micron 4-level metal CMOS technology with 2.5 V power supply and contains 3.9 million transistors in 15.7 mm/spl times/15.7 mm die size. Only 4.5% of the die area is used for the extension. Pipeline stage optimization and scoreboard-based dependency check method allow the extension to be realized without affecting the operating frequency.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125066723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A new architectural-level fault simulation using propagation prediction of grouped fault-effects 基于分组故障效应传播预测的新型体系结构级故障模拟
M. Hsiao, J. Patel
A new technique is proposed to handle fault simulation at the architectural level. The technique bypasses the need for complete gate level structure and efficiently uses the architectural information. Symbolic data representing groups of stuck at faults, known as fault effects, are propagated across the circuit with intelligent propagation prediction. Fault effects may combine and form new groups in the process. Automated behavioral simulation using only three data types is used to propagate fault effects at the architectural level by propagation prediction; no additional high level constraints or precomputation of faulty behavior are needed for simulation. Although not a fully deterministic algorithm, the results of ALFSIM, Architectural Level Fault Simulation, show high accuracy when compared with the gate level fault simulation.
提出了一种在体系结构层面处理故障仿真的新技术。该技术绕过了对完整的门级结构的需要,并有效地利用了建筑信息。表示故障组的符号数据,称为故障效应,通过智能传播预测在电路中传播。在这个过程中,故障效应可能会组合并形成新的组。采用仅使用三种数据类型的自动行为模拟,通过传播预测在体系结构级别传播故障效应;模拟不需要额外的高级约束或错误行为的预计算。虽然ALFSIM算法不是完全确定的算法,但与门级故障仿真相比,其结果显示出较高的精度。
{"title":"A new architectural-level fault simulation using propagation prediction of grouped fault-effects","authors":"M. Hsiao, J. Patel","doi":"10.1109/ICCD.1995.528934","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528934","url":null,"abstract":"A new technique is proposed to handle fault simulation at the architectural level. The technique bypasses the need for complete gate level structure and efficiently uses the architectural information. Symbolic data representing groups of stuck at faults, known as fault effects, are propagated across the circuit with intelligent propagation prediction. Fault effects may combine and form new groups in the process. Automated behavioral simulation using only three data types is used to propagate fault effects at the architectural level by propagation prediction; no additional high level constraints or precomputation of faulty behavior are needed for simulation. Although not a fully deterministic algorithm, the results of ALFSIM, Architectural Level Fault Simulation, show high accuracy when compared with the gate level fault simulation.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125373919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Efficient testability enhancement for combinational circuit 有效提高组合电路的可测试性
Yu Fang, A. Albicki
We propose a novel testability enhancement scheme based on XOR Chain Structure. The structure is effective for improving both controllability and observability. The insertion points are selected by fast testability analysis and random pattern resistant node source tracking. Experiments with ISCAS85 benchmark circuits show that the scheme is effective. The incurred hardware overhead and performance penalty is relatively low.
提出了一种新的基于异或链结构的可测试性增强方案。该结构有效地提高了系统的可控性和可观测性。通过快速可测试性分析和抗随机模式节点源跟踪选择插入点。在ISCAS85基准电路上的实验表明,该方案是有效的。产生的硬件开销和性能损失相对较低。
{"title":"Efficient testability enhancement for combinational circuit","authors":"Yu Fang, A. Albicki","doi":"10.1109/ICCD.1995.528806","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528806","url":null,"abstract":"We propose a novel testability enhancement scheme based on XOR Chain Structure. The structure is effective for improving both controllability and observability. The insertion points are selected by fast testability analysis and random pattern resistant node source tracking. Experiments with ISCAS85 benchmark circuits show that the scheme is effective. The incurred hardware overhead and performance penalty is relatively low.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124444177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Implementing a STARI chip 实现一个STARI芯片
M. Greenstreet
STARI is a high-speed signaling technique that uses both synchronous and self-timed circuits. To demonstrate STARI, a chip has been fabricated using the MOSIS 2/spl mu/ CMOS process. In a simple test fixture, it operates at data rates of 120 Mbits/sec over a pair of wires. Because STARl uses both synchronous and self-timed circuits, it provides an opportunity to compare these two design methods. The synchronous circuits of the STARI chip achieve rates of operation two to three times those of the self-timed circuits. However, the self-timed FIFO in the receiver provides robust compensation for clock skew that could not be achieved with synchronous circuitry alone. Thus, the STARI chip demonstrates advantages of combining these two design techniques.
STARI是一种高速信号技术,同时使用同步和自定时电路。为了演示STARI,使用MOSIS 2/spl mu/ CMOS工艺制造了一个芯片。在一个简单的测试装置中,它通过一对导线以120兆比特/秒的数据速率运行。由于STARl同时使用同步和自定时电路,因此它提供了比较这两种设计方法的机会。STARI芯片的同步电路的运行速率是自定时电路的两到三倍。然而,接收机中的自定时FIFO提供了对时钟偏差的鲁棒补偿,这是单独使用同步电路无法实现的。因此,STARI芯片展示了结合这两种设计技术的优势。
{"title":"Implementing a STARI chip","authors":"M. Greenstreet","doi":"10.1109/ICCD.1995.528788","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528788","url":null,"abstract":"STARI is a high-speed signaling technique that uses both synchronous and self-timed circuits. To demonstrate STARI, a chip has been fabricated using the MOSIS 2/spl mu/ CMOS process. In a simple test fixture, it operates at data rates of 120 Mbits/sec over a pair of wires. Because STARl uses both synchronous and self-timed circuits, it provides an opportunity to compare these two design methods. The synchronous circuits of the STARI chip achieve rates of operation two to three times those of the self-timed circuits. However, the self-timed FIFO in the receiver provides robust compensation for clock skew that could not be achieved with synchronous circuitry alone. Thus, the STARI chip demonstrates advantages of combining these two design techniques.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122704374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Reducing data access penalty using intelligent opcode-driven cache prefetching 使用智能操作码驱动的缓存预取减少数据访问损失
Chi-Hung Chi, Siu-Chung Lau
In the latest processor architectures such as IBM PowerPC and HP Precision Architecture (PA), it is found that certain important compound opcodes such as LOAD-UPDATE and LOAD-MODIFY contain accurate information about how data will be referenced in the near future. Furthermore, these opcodes have been fully utilized by the compiler in the program code generation. With the migration of data cache onto the processor chip, it is now possible for the on-chip cache controller to perform intelligent data prefetching based on the information from the instruction decode unit. In this paper, a novel hardware-driven data prefetching scheme, called the Instruction Opcode-Based Prefetching (IOBP), is proposed. Our simulation shows that this IOBP scheme is very effective in reducing processor stall time due to memory accesses, especially for array or pointer references with constant strides.
在最新的处理器体系结构(如IBM PowerPC和HP Precision Architecture (PA))中,我们发现某些重要的复合操作码(如LOAD-UPDATE和LOAD-MODIFY)包含有关数据在不久的将来将如何被引用的准确信息。此外,这些操作码在程序代码生成中被编译器充分利用。随着数据缓存迁移到处理器芯片上,现在片上缓存控制器可以根据来自指令解码单元的信息执行智能数据预取。本文提出了一种新的硬件驱动的数据预取方案,称为基于指令操作码的预取(IOBP)。我们的模拟表明,这种IOBP方案在减少由于内存访问而导致的处理器停机时间方面非常有效,特别是对于具有恒定步长的数组或指针引用。
{"title":"Reducing data access penalty using intelligent opcode-driven cache prefetching","authors":"Chi-Hung Chi, Siu-Chung Lau","doi":"10.1109/ICCD.1995.528916","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528916","url":null,"abstract":"In the latest processor architectures such as IBM PowerPC and HP Precision Architecture (PA), it is found that certain important compound opcodes such as LOAD-UPDATE and LOAD-MODIFY contain accurate information about how data will be referenced in the near future. Furthermore, these opcodes have been fully utilized by the compiler in the program code generation. With the migration of data cache onto the processor chip, it is now possible for the on-chip cache controller to perform intelligent data prefetching based on the information from the instruction decode unit. In this paper, a novel hardware-driven data prefetching scheme, called the Instruction Opcode-Based Prefetching (IOBP), is proposed. Our simulation shows that this IOBP scheme is very effective in reducing processor stall time due to memory accesses, especially for array or pointer references with constant strides.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114636509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A prototype router for the massively parallel computer RWC-1 大规模并行计算机RWC-1的原型路由器
T. Yokota, H. Matsuoka, K. Okamoto, Hideo Hirono, A. Hori, S. Sakai
The RWC-1 is a massively parallel computer based on a multi-threaded architecture. This architecture requires extremely high communication performance with reasonable hardware cost. ln this paper, we first introduce a new class of direct interconnection networks called MDCE (Multidimensional Directed Cycles Ensemble extension). MDCE has many desirable features for RWC-1 including small degree, low latency, and high throughput. MDCE is thus adopted for a RWC-1 network. We have designed an MDCE router and fabricated an experimental VLSI chip. We explain the design details in this paper. The chip employs operating system support features as well as communication functions, and enables advanced resource management, A prototype chip with about 125,000 gates has been fabricated using 0.6-/spl mu/m CMOS gate array technology. Its clock runs at 50 MHz and a transmission rate of 300 M bytes per second per communication port is achieved.
RWC-1是基于多线程架构的大规模并行计算机。这种架构要求极高的通信性能和合理的硬件成本。在本文中,我们首先介绍了一类新的直接互连网络,称为MDCE(多维有向环集成扩展)。MDCE具有RWC-1所需的许多特性,包括小度、低延迟和高吞吐量。因此,RWC-1网络采用MDCE。我们设计了一个MDCE路由器,并制作了一个实验性的VLSI芯片。本文对设计细节进行了详细说明。该芯片具有操作系统支持功能和通信功能,并可实现先进的资源管理。采用0.6-/spl mu/m CMOS门阵列技术,已制造出约12.5万个门的原型芯片。时钟工作频率为50mhz,每个通信端口的传输速率为每秒300m字节。
{"title":"A prototype router for the massively parallel computer RWC-1","authors":"T. Yokota, H. Matsuoka, K. Okamoto, Hideo Hirono, A. Hori, S. Sakai","doi":"10.1109/ICCD.1995.528822","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528822","url":null,"abstract":"The RWC-1 is a massively parallel computer based on a multi-threaded architecture. This architecture requires extremely high communication performance with reasonable hardware cost. ln this paper, we first introduce a new class of direct interconnection networks called MDCE (Multidimensional Directed Cycles Ensemble extension). MDCE has many desirable features for RWC-1 including small degree, low latency, and high throughput. MDCE is thus adopted for a RWC-1 network. We have designed an MDCE router and fabricated an experimental VLSI chip. We explain the design details in this paper. The chip employs operating system support features as well as communication functions, and enables advanced resource management, A prototype chip with about 125,000 gates has been fabricated using 0.6-/spl mu/m CMOS gate array technology. Its clock runs at 50 MHz and a transmission rate of 300 M bytes per second per communication port is achieved.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126868076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1