Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528918
M. Leeser, J. O'Leary
Many modern microprocessors implement floating point square root hardware using subtractive algorithms. Such processors include the HP PA7200, the MIPS R4400, and the Intel Pentium. The Intel Pentium division bug highlights the importance of verifying such implementations. In this paper we discuss the verification of a radix-2 square root unit similar to that used in the MIPS R4400. The verification is done by theorem proving to bridge the gap between the algorithm and the implementation. At the top level, we verify that a subtractive, non-restoring algorithm correctly calculates the square root function. We then show a series of optimizing transformations that refine the top level algorithm into the hardware implementation. Each transformation can be verified. We show the transformation of the top level proof to a level that is closer to the hardware implementation. The implementation is at the RTL level, and consists of a structural description of the hardware including an adder/subtracter, simple combinational hardware and some registers.
{"title":"Verification of a subtractive radix-2 square root algorithm and implementation","authors":"M. Leeser, J. O'Leary","doi":"10.1109/ICCD.1995.528918","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528918","url":null,"abstract":"Many modern microprocessors implement floating point square root hardware using subtractive algorithms. Such processors include the HP PA7200, the MIPS R4400, and the Intel Pentium. The Intel Pentium division bug highlights the importance of verifying such implementations. In this paper we discuss the verification of a radix-2 square root unit similar to that used in the MIPS R4400. The verification is done by theorem proving to bridge the gap between the algorithm and the implementation. At the top level, we verify that a subtractive, non-restoring algorithm correctly calculates the square root function. We then show a series of optimizing transformations that refine the top level algorithm into the hardware implementation. Each transformation can be verified. We show the transformation of the top level proof to a level that is closer to the hardware implementation. The implementation is at the RTL level, and consists of a structural description of the hardware including an adder/subtracter, simple combinational hardware and some registers.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"70 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116406382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528912
A. Tsutsui, T. Miyazaki, Kazuhisa Yamada, N. Ohta
A new FPGA (Field Programmable Gate Array) is developed for high-speed digital telecommunication systems. As architecture is based on the fundamental characteristics extracted from an analysis of actual systems. The FPGA has several unique features for realizing high-speed transport data processing. It allows us to build the high-performance components that are frequently used in transport data processing. In addition, its inter-chip connection mechanism enables us to build flexible multi-FPGA modules. Furthermore, we introduce a dedicated CAD system for the FPGA. We design several actual transport processing circuits on the FPGA using the CAD system and evaluate them. Experimental results show that the device has the potential to realize practical systems.
{"title":"Special purpose FPGA for high-speed digital telecommunication systems","authors":"A. Tsutsui, T. Miyazaki, Kazuhisa Yamada, N. Ohta","doi":"10.1109/ICCD.1995.528912","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528912","url":null,"abstract":"A new FPGA (Field Programmable Gate Array) is developed for high-speed digital telecommunication systems. As architecture is based on the fundamental characteristics extracted from an analysis of actual systems. The FPGA has several unique features for realizing high-speed transport data processing. It allows us to build the high-performance components that are frequently used in transport data processing. In addition, its inter-chip connection mechanism enables us to build flexible multi-FPGA modules. Furthermore, we introduce a dedicated CAD system for the FPGA. We design several actual transport processing circuits on the FPGA using the CAD system and evaluate them. Experimental results show that the device has the potential to realize practical systems.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125206980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528813
Kai-Yuan Chao, D. F. Wong
A placement scheme that considers both electrical performance requirements and thermal behavior for the high-performance multichip modules is described in this paper. Practical thermal models are used for placement of high-speed chips in multichip module packages under two different cooling environments: conduction cooling and convection cooling. Placement methods are modified to optimize conventional electrical performance and chip junction temperatures.
{"title":"Thermal placement for high-performance multichip modules","authors":"Kai-Yuan Chao, D. F. Wong","doi":"10.1109/ICCD.1995.528813","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528813","url":null,"abstract":"A placement scheme that considers both electrical performance requirements and thermal behavior for the high-performance multichip modules is described in this paper. Practical thermal models are used for placement of high-speed chips in multichip module packages under two different cooling environments: conduction cooling and convection cooling. Placement methods are modified to optimize conventional electrical performance and chip junction temperatures.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"313 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132760415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528791
S. L. Coumeri, Donald E. Thomas
Our goal is to create a simulation environment for hardware-software codesign. It is important to perform simulation of the hardware/software system at various stages of the codesign process. In our environment the hardware and software are viewed as two independent processes in which the hardware is described in a hardware description language and the software is written in a programming language. The processes can be placed on separate machines and run in parallel. Analysis of the environment has shown that significant simulation speed-ups can be achieved if a high degree of parallelism exists between the hardware and software and if there is a sufficient amount of computational CPU time in the software process.
{"title":"A simulation environment for hardware-software codesign","authors":"S. L. Coumeri, Donald E. Thomas","doi":"10.1109/ICCD.1995.528791","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528791","url":null,"abstract":"Our goal is to create a simulation environment for hardware-software codesign. It is important to perform simulation of the hardware/software system at various stages of the codesign process. In our environment the hardware and software are viewed as two independent processes in which the hardware is described in a hardware description language and the software is written in a programming language. The processes can be placed on separate machines and run in parallel. Analysis of the environment has shown that significant simulation speed-ups can be achieved if a high degree of parallelism exists between the hardware and software and if there is a sufficient amount of computational CPU time in the software process.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"264 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131069472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528830
A. Wolfe
A case study in low-power system-level design is presented. We detail the design of a typical low-power embedded system, a touchscreen interface device for a personal computer. This device is designed to operate only on excess power provided by unused RS232 communication lines. We focus on the design and measurement procedures used to reduce the power requirements of this system to less than 50 mW. Furthermore, we highlight opportunities to use system-level design and analysis tools for low-power design. Finally, we identify, key issues in low-power system design that are not currently being explored by the design automation community.
{"title":"A case study in low-power system-level design","authors":"A. Wolfe","doi":"10.1109/ICCD.1995.528830","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528830","url":null,"abstract":"A case study in low-power system-level design is presented. We detail the design of a typical low-power embedded system, a touchscreen interface device for a personal computer. This device is designed to operate only on excess power provided by unused RS232 communication lines. We focus on the design and measurement procedures used to reduce the power requirements of this system to less than 50 mW. Furthermore, we highlight opportunities to use system-level design and analysis tools for low-power design. Finally, we identify, key issues in low-power system design that are not currently being explored by the design automation community.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122402014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528812
C. Roth, F. Levine, Edward H. Welbon
Performance monitors (PM) have been traditionally viewed as hardware luxuries only available to large/multichip processors. This perception is quickly changing thanks to the incorporation of monitoring instrumentation in most of the current high-volume microprocessors used in PCs and workstations. The PowerPC 604 uP has raised the standard of excellence in this area. It provides a wealth of very advanced features for analyzing system hardware, software, and symmetric multiprocessor systems. These capabilities are becoming indispensable as more function is moved from the system boards to the microprocessors. Furthermore, the PowerPC 604 is enhancing the effort of porting software between various architectures. Software vendors to system architects are currently taking advantage of these PowerPC 604 performance monitor capabilities with great success. Some of these companies include IBM, Apple, Motorola, Groupe Bull, and Microsoft among others.
{"title":"Performance monitoring on the PowerPC 604 microprocessor","authors":"C. Roth, F. Levine, Edward H. Welbon","doi":"10.1109/ICCD.1995.528812","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528812","url":null,"abstract":"Performance monitors (PM) have been traditionally viewed as hardware luxuries only available to large/multichip processors. This perception is quickly changing thanks to the incorporation of monitoring instrumentation in most of the current high-volume microprocessors used in PCs and workstations. The PowerPC 604 uP has raised the standard of excellence in this area. It provides a wealth of very advanced features for analyzing system hardware, software, and symmetric multiprocessor systems. These capabilities are becoming indispensable as more function is moved from the system boards to the microprocessors. Furthermore, the PowerPC 604 is enhancing the effort of porting software between various architectures. Software vendors to system architects are currently taking advantage of these PowerPC 604 performance monitor capabilities with great success. Some of these companies include IBM, Apple, Motorola, Groupe Bull, and Microsoft among others.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120948520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528833
S. Butner, David A. Skirmont
This paper presents the architecture of a very high performance 4-input, 4-output asynchronous transfer mode (ATM) switch that has been designed as part of the ARPA-sponsored "Thunder and Lightning" project at the University of California, Santa Barbara. This research project is focused on the design and prototype demonstration of ATM links and switches operating at or above 40 gigabits per second per TDM link, with potential scalability to 100 Gbps. Such aggressive link rates place severe requirements on switch architecture, particularly the buffering scheme. In this paper we present the ATM switch structure and justify the main design choices.
{"title":"Architecture and design of a 40 gigabit per second ATM switch","authors":"S. Butner, David A. Skirmont","doi":"10.1109/ICCD.1995.528833","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528833","url":null,"abstract":"This paper presents the architecture of a very high performance 4-input, 4-output asynchronous transfer mode (ATM) switch that has been designed as part of the ARPA-sponsored \"Thunder and Lightning\" project at the University of California, Santa Barbara. This research project is focused on the design and prototype demonstration of ATM links and switches operating at or above 40 gigabits per second per TDM link, with potential scalability to 100 Gbps. Such aggressive link rates place severe requirements on switch architecture, particularly the buffering scheme. In this paper we present the ATM switch structure and justify the main design choices.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122721965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528930
R. Drechsler, B. Becker
We present methods for the construction of small Ordered Kronecker Functional Decision Diagrams (OKFDDs). OKFDDs are a generalization of Ordered Binary Decision Diagrams (OBDDs) and Ordered Functional Decision Diagrams (OFDDs) as well. Our approach is based on dynamic variable ordering and decomposition type choice. For changing the decomposition type we use a new method. We briefly discuss the implementation of PUMA, our OKFDD package. The quality of our methods in comparison with sifting and interleaving for OBDDs is demonstrated based on experiments performed with PUMA.
{"title":"Dynamic minimization of OKFDDs","authors":"R. Drechsler, B. Becker","doi":"10.1109/ICCD.1995.528930","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528930","url":null,"abstract":"We present methods for the construction of small Ordered Kronecker Functional Decision Diagrams (OKFDDs). OKFDDs are a generalization of Ordered Binary Decision Diagrams (OBDDs) and Ordered Functional Decision Diagrams (OFDDs) as well. Our approach is based on dynamic variable ordering and decomposition type choice. For changing the decomposition type we use a new method. We briefly discuss the implementation of PUMA, our OKFDD package. The quality of our methods in comparison with sifting and interleaving for OBDDs is demonstrated based on experiments performed with PUMA.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114187662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528940
R. Carragher, M. Fujita, Chung-Kuan Cheng
We address in this paper the fanout tree problem introduced by Berman, et. al., that is using buffer fanout trees to reduce the fanout delay in a technology mapped network. We construct two basic types of fanout trees and provide simple techniques to manipulate them for further delay reduction. These trees are inserted along critical paths throughout the network. We also perform gate-transformation, that is substitution of a gates of equivalent logical functions, if the technology permits. Experimental results show improvement over Touati's LT-tree construction technique.
{"title":"Simple tree-construction heuristics for the fanout problem","authors":"R. Carragher, M. Fujita, Chung-Kuan Cheng","doi":"10.1109/ICCD.1995.528940","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528940","url":null,"abstract":"We address in this paper the fanout tree problem introduced by Berman, et. al., that is using buffer fanout trees to reduce the fanout delay in a technology mapped network. We construct two basic types of fanout trees and provide simple techniques to manipulate them for further delay reduction. These trees are inserted along critical paths throughout the network. We also perform gate-transformation, that is substitution of a gates of equivalent logical functions, if the technology permits. Experimental results show improvement over Touati's LT-tree construction technique.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114500159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528814
G. Holt, A. Tyagi
This paper reports our experiences with incorporating energy (or switched capacitance) based algorithms into an automated layout synthesis system based on standard cells. Our experimental results show an average savings of 18.5% in interconnect energy at a cost of about 6.2% area increase relative to area-minimized layouts on MCNC Logic Synthesis '93 benchmarks. The basic premise is that the wires with high switching should be made short even if it involves stretching several low switching wires. We modified an existing layout system, VPNR, to include these techniques during the placement and global routing phases. Attempts to include switching probabilities into channel routing did not produce appreciable results. Our experiments also lend insight into the composition of the solution space for VLSI energy minimization problems.
{"title":"EPNR: an energy-efficient automated layout synthesis package","authors":"G. Holt, A. Tyagi","doi":"10.1109/ICCD.1995.528814","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528814","url":null,"abstract":"This paper reports our experiences with incorporating energy (or switched capacitance) based algorithms into an automated layout synthesis system based on standard cells. Our experimental results show an average savings of 18.5% in interconnect energy at a cost of about 6.2% area increase relative to area-minimized layouts on MCNC Logic Synthesis '93 benchmarks. The basic premise is that the wires with high switching should be made short even if it involves stretching several low switching wires. We modified an existing layout system, VPNR, to include these techniques during the placement and global routing phases. Attempts to include switching probabilities into channel routing did not produce appreciable results. Our experiments also lend insight into the composition of the solution space for VLSI energy minimization problems.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129217425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}