Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528817
T. Kozlowski, E. Dagless, J. Saul
Most of the current exclusive-OR sum-of-products minimization algorithms use rule-based heuristics to transform an initial circuit description into a possibly compact form. This paper presents an enhanced minimization algorithm, MINT, introducing new transformations including rules operating on three product terms at a time. These multiple-product-term transformations prove to be an efficient extension of previously defined two-product-term operating rules. Additionally, new efficient procedures for the optimization based on the use of don't cares are introduced. The algorithm can simplify multiple-valued input two-valued multiple-output incompletely specified functions.
{"title":"An enhanced algorithm for the minimization of exclusive-OR sum-of-products for incompletely specified functions","authors":"T. Kozlowski, E. Dagless, J. Saul","doi":"10.1109/ICCD.1995.528817","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528817","url":null,"abstract":"Most of the current exclusive-OR sum-of-products minimization algorithms use rule-based heuristics to transform an initial circuit description into a possibly compact form. This paper presents an enhanced minimization algorithm, MINT, introducing new transformations including rules operating on three product terms at a time. These multiple-product-term transformations prove to be an efficient extension of previously defined two-product-term operating rules. Additionally, new efficient procedures for the optimization based on the use of don't cares are introduced. The algorithm can simplify multiple-valued input two-valued multiple-output incompletely specified functions.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114814202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528828
S. W. Daniel, J. Rexford, J. Dolter, K. Shin
Modern parallel and distributed applications have a wide range of communication characteristics and performance requirements. This paper presents programmable routing controller (PRC), a custom ASIC that supports flexible network policies to accommodate diverse application requirements. By dedicating a small programmable processor to each incoming link, the PRC can implement wormhole, virtual cut-through, and packet switching, as well as hybrid schemes, under a variety of unicast and multicast routing algorithms. The PRC can support several applications or traffic types simultaneously by implementing multiple routing-switching microcode routines.
{"title":"A programmable routing controller for flexible communications in point-to-point networks","authors":"S. W. Daniel, J. Rexford, J. Dolter, K. Shin","doi":"10.1109/ICCD.1995.528828","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528828","url":null,"abstract":"Modern parallel and distributed applications have a wide range of communication characteristics and performance requirements. This paper presents programmable routing controller (PRC), a custom ASIC that supports flexible network policies to accommodate diverse application requirements. By dedicating a small programmable processor to each incoming link, the PRC can implement wormhole, virtual cut-through, and packet switching, as well as hybrid schemes, under a variety of unicast and multicast routing algorithms. The PRC can support several applications or traffic types simultaneously by implementing multiple routing-switching microcode routines.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126767749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528801
G. Maturana, James L. Ball, J. Gee, A. Iyer, J. M. O'Connor
This paper describes a cycle accurate model of the UltraSPARC processor. The model is written in C++, and is built on top of a powerful programming framework with a built-in message-passing mechanism and a timing discipline for simulating concurrent modules. The goal was to help verify the processor by cross checking the RTL model at run time, as well as to provide accurate performance estimates. Because of Incas' much faster execution rate than the RTL, it was also used to model the UItraSPARC module in RTL simulations of the full system, for compiler and library tuning, and for diagnostics development.
{"title":"Incas: a cycle accurate model of UltraSPARC","authors":"G. Maturana, James L. Ball, J. Gee, A. Iyer, J. M. O'Connor","doi":"10.1109/ICCD.1995.528801","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528801","url":null,"abstract":"This paper describes a cycle accurate model of the UltraSPARC processor. The model is written in C++, and is built on top of a powerful programming framework with a built-in message-passing mechanism and a timing discipline for simulating concurrent modules. The goal was to help verify the processor by cross checking the RTL model at run time, as well as to provide accurate performance estimates. Because of Incas' much faster execution rate than the RTL, it was also used to model the UItraSPARC module in RTL simulations of the full system, for compiler and library tuning, and for diagnostics development.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131890651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528785
J. S. Wang, W. Dai
This paper describes two novel macromodels for incorporating the single and coupled transmission lines characterized by the frequency-dependent losses into a scattering-parameter (S-parameter) macromodel based simulator. This approach computes the moments of the S-parameter based upon the frequency-dependent parasitic functions: R(f), L(f), C(f), and G(f) which characterize either the single or the coupled transmission lines. These same moments can be used later to construct the macromodels. Once the macromodels are built, the transient analysis can be performed by using the S-parameter based macromodel simulator.
{"title":"Transient analysis of coupled transmission lines characterized with the frequency-dependent losses using scattering-parameter based macromodel","authors":"J. S. Wang, W. Dai","doi":"10.1109/ICCD.1995.528785","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528785","url":null,"abstract":"This paper describes two novel macromodels for incorporating the single and coupled transmission lines characterized by the frequency-dependent losses into a scattering-parameter (S-parameter) macromodel based simulator. This approach computes the moments of the S-parameter based upon the frequency-dependent parasitic functions: R(f), L(f), C(f), and G(f) which characterize either the single or the coupled transmission lines. These same moments can be used later to construct the macromodels. Once the macromodels are built, the transient analysis can be performed by using the S-parameter based macromodel simulator.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"228 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131892421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528799
A. Dalal, L. Lev, S. Mitra
The design, implementation, and verification of the power distribution network for the 5.2 million transistor UltraSPARC-I microprocessor is described. A novel simulation method allows rapid identification of exact layout locations with potential electromigration or excessive voltage drop problems. Hierarchical verification capabilities of this approach are utilized to design an efficient and robust distribution of V/sub dd/ and V/sub ss/ across a large die, in the face of stringent IR drop and floorplanning constraints. A comprehensive methodology for power distribution and management, along with seamless integration of the power distribution into existing CAD tools throughout the design cycle results in correct-by-construction power networks for cell libraries and functional blocks, area efficient power interconnections and reduced time-to-market due to correction of all reliability failures in the power networks prior to mask generation.
{"title":"Design of an efficient power distribution network for the UltraSPARC-I microprocessor","authors":"A. Dalal, L. Lev, S. Mitra","doi":"10.1109/ICCD.1995.528799","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528799","url":null,"abstract":"The design, implementation, and verification of the power distribution network for the 5.2 million transistor UltraSPARC-I microprocessor is described. A novel simulation method allows rapid identification of exact layout locations with potential electromigration or excessive voltage drop problems. Hierarchical verification capabilities of this approach are utilized to design an efficient and robust distribution of V/sub dd/ and V/sub ss/ across a large die, in the face of stringent IR drop and floorplanning constraints. A comprehensive methodology for power distribution and management, along with seamless integration of the power distribution into existing CAD tools throughout the design cycle results in correct-by-construction power networks for cell libraries and functional blocks, area efficient power interconnections and reduced time-to-market due to correction of all reliability failures in the power networks prior to mask generation.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132692439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528911
T. Pan, Hyon S. Kay, Y. Chun, C. Wey
The complexity of quotient-digit selection process can be reduced significantly by using a look-up table, referred to as quotient-digit selection table (QST). However, the huge table size limits such approach for small-radix implementation. This paper presents an alternative quotient decision process to reduce the table size. Instead of finding the exact quotient digit, a speculated quotient digit is estimated. The speculated quotient digit is used to update the possible partial remainders while the speculated quotient digit is corrected. The process includes two steps: determination of speculated quotient digit and quotient-digit correction. Thus instead of using a huge QST table, two smaller tables are employed. Result shows that the proposed approach significantly reduces the size of the original QST.
{"title":"High-radix SRT division with speculation of quotient digits","authors":"T. Pan, Hyon S. Kay, Y. Chun, C. Wey","doi":"10.1109/ICCD.1995.528911","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528911","url":null,"abstract":"The complexity of quotient-digit selection process can be reduced significantly by using a look-up table, referred to as quotient-digit selection table (QST). However, the huge table size limits such approach for small-radix implementation. This paper presents an alternative quotient decision process to reduce the table size. Instead of finding the exact quotient digit, a speculated quotient digit is estimated. The speculated quotient digit is used to update the possible partial remainders while the speculated quotient digit is corrected. The process includes two steps: determination of speculated quotient digit and quotient-digit correction. Thus instead of using a huge QST table, two smaller tables are employed. Result shows that the proposed approach significantly reduces the size of the original QST.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132578762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528786
J. Kudoh, Toshiro Takahashi, Yukio Umada, M. Kimura, Shigeru Yamamoto, Y. Ito
A 530 kG gate array with novel GTL I/O circuits has been developed using 0.5 /spl mu/m CMOS triple-metal-layer process technology. The I/O circuit of a push-pull output driver and a dynamic termination receiver can transmit 250 Mb/s data through a long stub line which is connected to a terminated bus line. IDDQ testability is designed for the differential receiver without any delay time overheads.
采用0.5 /spl μ m CMOS三金属层工艺技术,研制出了具有新型GTL I/O电路的530 kG栅极阵列。推挽输出驱动器和动态终端接收器的I/O电路可以通过连接到终端总线的长stub线传输250 Mb/s的数据。差分接收机的IDDQ可测试性设计没有任何延迟时间开销。
{"title":"A CMOS gate array with dynamic-termination GTL I/O circuits","authors":"J. Kudoh, Toshiro Takahashi, Yukio Umada, M. Kimura, Shigeru Yamamoto, Y. Ito","doi":"10.1109/ICCD.1995.528786","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528786","url":null,"abstract":"A 530 kG gate array with novel GTL I/O circuits has been developed using 0.5 /spl mu/m CMOS triple-metal-layer process technology. The I/O circuit of a push-pull output driver and a dynamic termination receiver can transmit 250 Mb/s data through a long stub line which is connected to a terminated bus line. IDDQ testability is designed for the differential receiver without any delay time overheads.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134281658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528827
M. Kadiyala, L. Bhuyan
Parallel applications differ from significant bus traffic due to the transfer of shared data. Large block sizes exploit locality and decrease the effective memory access time. It also has a tendency to group data together even though only a part of it is needed by any one processor. This is known as the false sharing problem. This research presents a dynamic sub-block coherence protocol which minimizes false sharing by trying to dynamically locate the point of false reference. Sharing traffic is minimized by maintaining coherence on smaller blocks (sub-blocks) which are truly shared, whereas larger blocks are used as the basic units of transfer. Larger blocks exploit locality while coherence is maintained on sub-blocks which minimize bus traffic due to shared misses. The simulation results indicate that the dynamic sub-block protocol reduces the false sharing misses by 20 to 30 percent over the fixed sub-block scheme.
{"title":"A dynamic cache sub-block design to reduce false sharing","authors":"M. Kadiyala, L. Bhuyan","doi":"10.1109/ICCD.1995.528827","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528827","url":null,"abstract":"Parallel applications differ from significant bus traffic due to the transfer of shared data. Large block sizes exploit locality and decrease the effective memory access time. It also has a tendency to group data together even though only a part of it is needed by any one processor. This is known as the false sharing problem. This research presents a dynamic sub-block coherence protocol which minimizes false sharing by trying to dynamically locate the point of false reference. Sharing traffic is minimized by maintaining coherence on smaller blocks (sub-blocks) which are truly shared, whereas larger blocks are used as the basic units of transfer. Larger blocks exploit locality while coherence is maintained on sub-blocks which minimize bus traffic due to shared misses. The simulation results indicate that the dynamic sub-block protocol reduces the false sharing misses by 20 to 30 percent over the fixed sub-block scheme.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133552996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528945
W. Fang, B. Sheu, H. Venus, R. Sandau
A smart-pixel cellular neural network with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI implementation feasibility was illustrated by a prototype smart-pixel 5/spl times/5-neuroprocessor array chip of active dimensions 1380 /spl mu/m/spl times/746 /spl mu/m in a 2-/spl mu/m CMOS technology.
{"title":"Smart-pixel array processors based on optimal cellular neural networks for space sensor applications","authors":"W. Fang, B. Sheu, H. Venus, R. Sandau","doi":"10.1109/ICCD.1995.528945","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528945","url":null,"abstract":"A smart-pixel cellular neural network with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI implementation feasibility was illustrated by a prototype smart-pixel 5/spl times/5-neuroprocessor array chip of active dimensions 1380 /spl mu/m/spl times/746 /spl mu/m in a 2-/spl mu/m CMOS technology.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121937720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528794
D. Appenzeller, A. Kuehlmann
This paper presents the use of formal methods in the design of a PowerPC microprocessor. The chosen methodology employs two independently developed design views, a register-transfer level specification for efficient system simulation and a transistor level implementation geared toward maximal processor performance. A BDD-based verification tool is used to functionally compare the two views which essentially validates the transistor-level implementation with respect to any functional simulation/verification performed at the register-transfer level. We show that a tight integration of the verification approach into the overall design methodology allows the formal verification of complex microprocessor implementations without compromising the design process or performance of the resulting system.
{"title":"Formal verification of a PowerPC microprocessor","authors":"D. Appenzeller, A. Kuehlmann","doi":"10.1109/ICCD.1995.528794","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528794","url":null,"abstract":"This paper presents the use of formal methods in the design of a PowerPC microprocessor. The chosen methodology employs two independently developed design views, a register-transfer level specification for efficient system simulation and a transistor level implementation geared toward maximal processor performance. A BDD-based verification tool is used to functionally compare the two views which essentially validates the transistor-level implementation with respect to any functional simulation/verification performed at the register-transfer level. We show that a tight integration of the verification approach into the overall design methodology allows the formal verification of complex microprocessor implementations without compromising the design process or performance of the resulting system.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125852565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}