Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106749
F. Aloul, I. Markov, K. Sakallah
Boolean functions are fundamental to synthesis and verification of digital logic, and compact representations of Boolean functions have great practical significance. Popular representations, such as CNF, DNF, circuits and ROBDDs [4], offer different advantages and are preferred for different tasks. Conversion between those representations is common, especially when one is used to represent the input and another speeds up relevant algorithms. Our work addresses the construction of ROBDDs that represent outputs of a given Boolean circuit. It is used in synthesis and verification. Earlier works (Fujita, Fujisawa, and Kawato, 1988. Malik et al., 1988.) proposed ordering circuit inputs and gates by graph traversals. We contribute orderings based on circuit partitioning and placement, leveraging the progress in recursive bisection and multi-level min-cut partitioning achieved in late 1990s. Our empirical results show that the proposed orderings based on circuit partitioning and placement are more successful than straightforward DFS and BFS, as well as related heuristics.
{"title":"Improving the efficiency of circuit-to-BDD conversion by gate and input ordering","authors":"F. Aloul, I. Markov, K. Sakallah","doi":"10.1109/ICCD.2002.1106749","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106749","url":null,"abstract":"Boolean functions are fundamental to synthesis and verification of digital logic, and compact representations of Boolean functions have great practical significance. Popular representations, such as CNF, DNF, circuits and ROBDDs [4], offer different advantages and are preferred for different tasks. Conversion between those representations is common, especially when one is used to represent the input and another speeds up relevant algorithms. Our work addresses the construction of ROBDDs that represent outputs of a given Boolean circuit. It is used in synthesis and verification. Earlier works (Fujita, Fujisawa, and Kawato, 1988. Malik et al., 1988.) proposed ordering circuit inputs and gates by graph traversals. We contribute orderings based on circuit partitioning and placement, leveraging the progress in recursive bisection and multi-level min-cut partitioning achieved in late 1990s. Our empirical results show that the proposed orderings based on circuit partitioning and placement are more successful than straightforward DFS and BFS, as well as related heuristics.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":" 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120828846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106748
Christoph Scholl, B. Becker
We consider the problem of checking whether an implementation which contains parts with incomplete information is equivalent to a given full specification. We study implementations which are not completely specified, but contain boxes which are associated with incompletely specified functions (called Incompletely Specified Boxes or IS-Boxes). After motivating the use of implementations with Incompletely Specified Boxes we define our notion of equivalence for this kind of implementations and present a method to solve the problem. A series of experimental results demonstrates the effectiveness and feasibility of the methods presented.
{"title":"Checking equivalence for circuits containing incompletely specified boxes","authors":"Christoph Scholl, B. Becker","doi":"10.1109/ICCD.2002.1106748","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106748","url":null,"abstract":"We consider the problem of checking whether an implementation which contains parts with incomplete information is equivalent to a given full specification. We study implementations which are not completely specified, but contain boxes which are associated with incompletely specified functions (called Incompletely Specified Boxes or IS-Boxes). After motivating the use of implementations with Incompletely Specified Boxes we define our notion of equivalence for this kind of implementations and present a method to solve the problem. A series of experimental results demonstrates the effectiveness and feasibility of the methods presented.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116092083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106793
A. Hossain, D. Pease, James S. Burns, N. Parveen
Instruction fetch mechanism is a performance bottleneck of a Superscalar Processor. The fetch performance of the processor can be improved with the aid of an instruction memory structure known as Trace Cache. This paper presents parameters and analytical expressions, which describe instruction fetch performance of a Trace Cache microarchitecture. The instruction fetch rates predicted by the expressions differ by seven percent from the simulated fetch rates for SPEC2000 benchmark programs. Presented analytical expressions are implemented in a computer program named Tulip. Tulip is used to explore parameters, and their influence on fetch performance. Tulip is also used to understand Trace Cache performance tradeoffs.
{"title":"Trace Cache performance parameters","authors":"A. Hossain, D. Pease, James S. Burns, N. Parveen","doi":"10.1109/ICCD.2002.1106793","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106793","url":null,"abstract":"Instruction fetch mechanism is a performance bottleneck of a Superscalar Processor. The fetch performance of the processor can be improved with the aid of an instruction memory structure known as Trace Cache. This paper presents parameters and analytical expressions, which describe instruction fetch performance of a Trace Cache microarchitecture. The instruction fetch rates predicted by the expressions differ by seven percent from the simulated fetch rates for SPEC2000 benchmark programs. Presented analytical expressions are implemented in a computer program named Tulip. Tulip is used to explore parameters, and their influence on fetch performance. Tulip is also used to understand Trace Cache performance tradeoffs.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127217452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106765
B. Chappell, Xinning Wang, Priyadarsan Patra, Prashant Saxena, J. Vendrell, Satyanarayan Gupta, S. Varadarajan, W. Gomes, S. Hussain, H. Krishnamurthy, M. Venkateshmurthy, S. Jain
System structure and a taped out 0.18u 2 GHz product application result are described for a domino synthesis capability that covers all aspects of domino design, from estimation to silicon-ready layout, with custom-class optimization. The described optimization flow, abstraction modes, and key cost factors deliver power-optimized, noise-correct domino performance on complex logic.
{"title":"A system-level solution to domino synthesis with 2 GHz application","authors":"B. Chappell, Xinning Wang, Priyadarsan Patra, Prashant Saxena, J. Vendrell, Satyanarayan Gupta, S. Varadarajan, W. Gomes, S. Hussain, H. Krishnamurthy, M. Venkateshmurthy, S. Jain","doi":"10.1109/ICCD.2002.1106765","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106765","url":null,"abstract":"System structure and a taped out 0.18u 2 GHz product application result are described for a domino synthesis capability that covers all aspects of domino design, from estimation to silicon-ready layout, with custom-class optimization. The described optimization flow, abstraction modes, and key cost factors deliver power-optimized, noise-correct domino performance on complex logic.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134071061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106824
Panit Watcharawitch, S. Moore
Embedded processors are increasingly deployed in applications requiring high performance with good real-time characteristics whilst being low power. Parallelism has to be extracted in order to improve the performance at an architectural level. Extracting instruction level parallelism requires extensive speculation which adds complexity and increases power consumption. Alternatively, parallelism can be provided at the thread level. Many embedded applications can be written in a threaded manner in Java which can be directly translated to use hardware-level multithreaded operations. This paper presents an architectural study of JMA, a high-performance multithreaded architecture which supports Java-multithreading and realtime scheduling whilst remaining low-power.
{"title":"JMA: the Java-multithreading architecture for embedded processors","authors":"Panit Watcharawitch, S. Moore","doi":"10.1109/ICCD.2002.1106824","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106824","url":null,"abstract":"Embedded processors are increasingly deployed in applications requiring high performance with good real-time characteristics whilst being low power. Parallelism has to be extracted in order to improve the performance at an architectural level. Extracting instruction level parallelism requires extensive speculation which adds complexity and increases power consumption. Alternatively, parallelism can be provided at the thread level. Many embedded applications can be written in a threaded manner in Java which can be directly translated to use hardware-level multithreaded operations. This paper presents an architectural study of JMA, a high-performance multithreaded architecture which supports Java-multithreading and realtime scheduling whilst remaining low-power.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":" 48","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113952600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106758
T. Thorp, D. Liu, P. Trivedi
In order for dynamic circuits to operate correctly, their inputs must be monotonically rising during evaluation. Blocking dynamic circuits satisfy this constraint by delaying evaluation until all inputs have been properly setup relative to the evaluation clock. By viewing dynamic gates as latches, we demonstrate that the optimal delay of a blocking dynamic gate may occur when the setup time is negative. With blocking dynamic circuits, cascading low-skew dynamic gates allows each dynamic gate to tolerate a degraded input level. The larger noise margin provides greater flexibility with the delay vs. noise margin trade-off (i.e. the circuit robustness vs. speed tradeoff). This paper generalizes blocking dynamic circuits and provides a systematic approach for assigning clock phases, given delay and noise margin constraints. Using this framework, one can analyze any logic network consisting of blocking dynamic circuits.
{"title":"Analysis of blocking dynamic circuits","authors":"T. Thorp, D. Liu, P. Trivedi","doi":"10.1109/ICCD.2002.1106758","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106758","url":null,"abstract":"In order for dynamic circuits to operate correctly, their inputs must be monotonically rising during evaluation. Blocking dynamic circuits satisfy this constraint by delaying evaluation until all inputs have been properly setup relative to the evaluation clock. By viewing dynamic gates as latches, we demonstrate that the optimal delay of a blocking dynamic gate may occur when the setup time is negative. With blocking dynamic circuits, cascading low-skew dynamic gates allows each dynamic gate to tolerate a degraded input level. The larger noise margin provides greater flexibility with the delay vs. noise margin trade-off (i.e. the circuit robustness vs. speed tradeoff). This paper generalizes blocking dynamic circuits and provides a systematic approach for assigning clock phases, given delay and noise margin constraints. Using this framework, one can analyze any logic network consisting of blocking dynamic circuits.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128644315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106821
S. Vlaovic, E. Davidson
Although x86 processors have been around for a long time and are the most ubiquitous processors in the world, the amount of academic research regarding details of their performance has been minimal. We introduce an x86 simulation environment, called TAXI (Trace Analysis for X86 Interpretation), and use it to present results for eight Win32 applications. In this paper, we explain the design and implementation of TAXI.
尽管x86处理器已经存在了很长时间,并且是世界上最普遍的处理器,但是关于其性能细节的学术研究却很少。我们介绍了一个x86仿真环境,称为TAXI (Trace Analysis for x86 Interpretation),并使用它来呈现八个Win32应用程序的结果。在本文中,我们解释了TAXI的设计和实现。
{"title":"TAXI: Trace Analysis for x86 Interpretation","authors":"S. Vlaovic, E. Davidson","doi":"10.1109/ICCD.2002.1106821","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106821","url":null,"abstract":"Although x86 processors have been around for a long time and are the most ubiquitous processors in the world, the amount of academic research regarding details of their performance has been minimal. We introduce an x86 simulation environment, called TAXI (Trace Analysis for X86 Interpretation), and use it to present results for eight Win32 applications. In this paper, we explain the design and implementation of TAXI.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130800408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106798
D. Duarte, N. Vijaykrishnan, M. J. Irwin, Hyun Suk Kim, G. McFarland
Power is considered to be the major limiter to the design of faster and more complex processors in the near future. In order to address this challenge, a combination of process, circuit design and micro-architectural changes are required Consequently, to focus optimization efforts in the right direction, the models proposed and studies performed in this work are a first step for understanding the relative importance of leakage and dynamic energy in future technologies. Further, we analyze the effectiveness of two energy reduction mechanisms that employ voltage scaling, namely, supply and threshold voltage selection. We consider the impact of imminent technology changes and packaging improvements while showing that neglecting the impact of temperature may lead to underestimating power savings by up to 19.5%.
{"title":"Impact of scaling on the effectiveness of dynamic power reduction schemes","authors":"D. Duarte, N. Vijaykrishnan, M. J. Irwin, Hyun Suk Kim, G. McFarland","doi":"10.1109/ICCD.2002.1106798","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106798","url":null,"abstract":"Power is considered to be the major limiter to the design of faster and more complex processors in the near future. In order to address this challenge, a combination of process, circuit design and micro-architectural changes are required Consequently, to focus optimization efforts in the right direction, the models proposed and studies performed in this work are a first step for understanding the relative importance of leakage and dynamic energy in future technologies. Further, we analyze the effectiveness of two energy reduction mechanisms that employ voltage scaling, namely, supply and threshold voltage selection. We consider the impact of imminent technology changes and packaging improvements while showing that neglecting the impact of temperature may lead to underestimating power savings by up to 19.5%.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129647927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106779
S. Ozev, A. Orailoglu
Concurrent detection of failures in analog circuits is becoming increasingly more important as safety-critical systems become more widespread. A methodology for the automatic design of concurrent failure detection circuitry for linear analog systems is discussed in this paper In contrast to previous approaches, the methodology aims at providing coverage in terms of all the circuit components while minimizing the loading overhead by reducing the number of internal circuit nodes that need to be tapped Parameter tolerances are incorporated through either statistical or mathematical analysis to determine the threshold for failure alarm. Experimental results confirm that full coverage can be attained while keeping the hardware overhead within a pre-specified budget.
{"title":"Cost-effective concurrent test hardware design for linear analog circuits","authors":"S. Ozev, A. Orailoglu","doi":"10.1109/ICCD.2002.1106779","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106779","url":null,"abstract":"Concurrent detection of failures in analog circuits is becoming increasingly more important as safety-critical systems become more widespread. A methodology for the automatic design of concurrent failure detection circuitry for linear analog systems is discussed in this paper In contrast to previous approaches, the methodology aims at providing coverage in terms of all the circuit components while minimizing the loading overhead by reducing the number of internal circuit nodes that need to be tapped Parameter tolerances are incorporated through either statistical or mathematical analysis to determine the threshold for failure alarm. Experimental results confirm that full coverage can be attained while keeping the hardware overhead within a pre-specified budget.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125618593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106746
Xiaojian Yang, Bo-Kyung Choi, M. Sarrafzadeh
In this paper we study the correlation between wirelength and routability for standard-cell placement problem, under the modern place-and-route environment. We present a placement tool named Dragon (version 2.1), and show its ability to produce good quality placement for designs with high row utilization. Compared to an industrial placer and an academic state-of-the-art placer, Dragon can produce placement with better routability and shorter total wirelength. We describe many novel algorithmic details and implementation details of this placement tool. Experimental results show that minimizing wirelength improves routability and layout quality.
{"title":"A standard-cell placement tool for designs with high row utilization","authors":"Xiaojian Yang, Bo-Kyung Choi, M. Sarrafzadeh","doi":"10.1109/ICCD.2002.1106746","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106746","url":null,"abstract":"In this paper we study the correlation between wirelength and routability for standard-cell placement problem, under the modern place-and-route environment. We present a placement tool named Dragon (version 2.1), and show its ability to produce good quality placement for designs with high row utilization. Compared to an industrial placer and an academic state-of-the-art placer, Dragon can produce placement with better routability and shorter total wirelength. We describe many novel algorithmic details and implementation details of this placement tool. Experimental results show that minimizing wirelength improves routability and layout quality.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117098726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}