A test pattern compression scheme is proposed in order to reduce test data volume and application time. The number of scan chains that can be supported by an ATE is significantly increased by utilizing an on-chip decompressor. The functionality of the ATE is kept intact by moving the decompression task to the circuit under test. While the number of virtual scan chains visible to the ATE is kept small, the number of internal scan chains driven by the decompressed pattern sequence can be significantly increased.
{"title":"Test volume and application time reduction through scan chain concealment","authors":"I. Bayraktaroglu, A. Orailoglu","doi":"10.1145/378239.378388","DOIUrl":"https://doi.org/10.1145/378239.378388","url":null,"abstract":"A test pattern compression scheme is proposed in order to reduce test data volume and application time. The number of scan chains that can be supported by an ATE is significantly increased by utilizing an on-chip decompressor. The functionality of the ATE is kept intact by moving the decompression task to the circuit under test. While the number of virtual scan chains visible to the ATE is kept small, the number of internal scan chains driven by the decompressed pattern sequence can be significantly increased.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128132620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Alpert, Jiang Hu, S. Sapatnekar, P. Villarrubia
The dominating contribution of interconnect to system performance has made it critical to plan for buffer and wiring resources in the layout. Both buffers and wires must be considered, since wire routes determine buffer requirements and buffer locations constrain wire routes. In contrast to recent buffer block planning approaches, our design methodology distributes buffer sites throughout the layout. A tile graph is used to abstract the buffer planning problem while also addressing wire planning. We present a four-stage heuristic called RABID for resource allocation and experimentally verify its effectiveness.
{"title":"A practical methodology for early buffer and wire resource allocation","authors":"C. Alpert, Jiang Hu, S. Sapatnekar, P. Villarrubia","doi":"10.1145/378239.378461","DOIUrl":"https://doi.org/10.1145/378239.378461","url":null,"abstract":"The dominating contribution of interconnect to system performance has made it critical to plan for buffer and wiring resources in the layout. Both buffers and wires must be considered, since wire routes determine buffer requirements and buffer locations constrain wire routes. In contrast to recent buffer block planning approaches, our design methodology distributes buffer sites throughout the layout. A tile graph is used to abstract the buffer planning problem while also addressing wire planning. We present a four-stage heuristic called RABID for resource allocation and experimentally verify its effectiveness.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116144575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to meet the high throughput requirements of applications exhibiting high ILP, VLIW ASIPs may increasingly include large numbers of functional units (FUs). Unfortunately, 'switching' data through register files shared by large numbers of FUs quickly becomes a dominant cost performance factor suggesting that clustering smaller number of FUs around local register files may be beneficial even if data transfers are required among clusters. With such machines in mind, we propose a compiler transformation, predicated switching, which enables aggressive speculation while leveraging the penalties associated with inter-cluster communication to achieve gains in performance. Based on representative benchmarks, we demonstrate that this novel technique is particularly suitable for application specific clustered machines aimed at supporting high ILP as compared to state of-the-art approaches.
{"title":"Clustered VLIW architectures with predicated switching","authors":"M. Jacome, G. Veciana, Satish Pillai","doi":"10.1145/378239.379050","DOIUrl":"https://doi.org/10.1145/378239.379050","url":null,"abstract":"In order to meet the high throughput requirements of applications exhibiting high ILP, VLIW ASIPs may increasingly include large numbers of functional units (FUs). Unfortunately, 'switching' data through register files shared by large numbers of FUs quickly becomes a dominant cost performance factor suggesting that clustering smaller number of FUs around local register files may be beneficial even if data transfers are required among clusters. With such machines in mind, we propose a compiler transformation, predicated switching, which enables aggressive speculation while leveraging the penalties associated with inter-cluster communication to achieve gains in performance. Based on representative benchmarks, we demonstrate that this novel technique is particularly suitable for application specific clustered machines aimed at supporting high ILP as compared to state of-the-art approaches.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114861935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the strategy used to verify the error logic in the Alpha 21364 microprocessor. Traditional pre-silicon strategies of focused testing or unit-level random testing yield limited results in finding complex bugs in the error handling logic of a microprocessor. This paper introduces a technique to simulate error conditions and their recovery in a global environment using random test stimulus closely approximating traffic found in a real system. A significant number of bugs were found using this technique. A majority of these bugs could not be uncovered using a simple random environment, or were counter-intuitive to focused test design.
{"title":"Pre-silicon verification of the Alpha 21364 microprocessor error handling system","authors":"Richard Lee, B. Tsien","doi":"10.1145/378239.379073","DOIUrl":"https://doi.org/10.1145/378239.379073","url":null,"abstract":"This paper presents the strategy used to verify the error logic in the Alpha 21364 microprocessor. Traditional pre-silicon strategies of focused testing or unit-level random testing yield limited results in finding complex bugs in the error handling logic of a microprocessor. This paper introduces a technique to simulate error conditions and their recovery in a global environment using random test stimulus closely approximating traffic found in a real system. A significant number of bugs were found using this technique. A majority of these bugs could not be uncovered using a simple random environment, or were counter-intuitive to focused test design.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115096995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advances in the FPGA technology, both in terms of device capacity and architecture, have resulted in introduction of reconfigurable computing machines, where the hardware adapts itself to the running application to gain speedup. To keep up with the ever-growing performance expectations of such systems, designers need new methodologies and tools for developing reconfigurable computing systems (RCS). This paper addresses the need for fast compilation and physical design phase to be used in application development/debugging/testing cycle for RCS. We present a high-level synthesis approach that is integrated with placement, making the compilation cycle much faster. On the average, our tool generates the VHDL code (and the corresponding placement information) from the data flow graph of a program in less than a minute. By compromising 30% in the clock frequency of the circuit, we can achieve about 10 times speedup in the Xilinx placement phase, and 2.5 times overall speedup in the Xilinx place-and-route phase, a reasonable trade-off when developing RCS applications.
{"title":"Integrating scheduling and physical design into a coherent compilation cycle for reconfigurable computing architectures","authors":"K. Bazargan, S. Memik, M. Sarrafzadeh","doi":"10.1145/378239.379038","DOIUrl":"https://doi.org/10.1145/378239.379038","url":null,"abstract":"Advances in the FPGA technology, both in terms of device capacity and architecture, have resulted in introduction of reconfigurable computing machines, where the hardware adapts itself to the running application to gain speedup. To keep up with the ever-growing performance expectations of such systems, designers need new methodologies and tools for developing reconfigurable computing systems (RCS). This paper addresses the need for fast compilation and physical design phase to be used in application development/debugging/testing cycle for RCS. We present a high-level synthesis approach that is integrated with placement, making the compilation cycle much faster. On the average, our tool generates the VHDL code (and the corresponding placement information) from the data flow graph of a program in less than a minute. By compromising 30% in the clock frequency of the circuit, we can achieve about 10 times speedup in the Xilinx placement phase, and 2.5 times overall speedup in the Xilinx place-and-route phase, a reasonable trade-off when developing RCS applications.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"333 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121600275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wave steering is a new design methodology that realizes high throughput circuits by embedding layout friendly synthesized structures in silicon. Wave steered circuits inherently utilize latches in order to guarantee the correct signal arrival times at the inputs of these synthesized structures and maintain the high throughput of operation. In this paper, we show a method of reordering signals to achieve minimum circuit latency for wave steered circuits and propose an integer linear programming (ILP) formulation for scheduling and retiming these circuits to minimize the number of latches for minimum latency. Experimental results show that in 0.25 /spl mu/m CMOS technology, as much as 33.2% reduction in latch count, at minimum latency, can be achieved over unoptimized wave steered circuits operating at 500 MHz.
{"title":"Latency and latch count minimization in wave steered circuits","authors":"A. Singh, A. Mukherjee, M. Marek-Sadowska","doi":"10.1145/378239.378529","DOIUrl":"https://doi.org/10.1145/378239.378529","url":null,"abstract":"Wave steering is a new design methodology that realizes high throughput circuits by embedding layout friendly synthesized structures in silicon. Wave steered circuits inherently utilize latches in order to guarantee the correct signal arrival times at the inputs of these synthesized structures and maintain the high throughput of operation. In this paper, we show a method of reordering signals to achieve minimum circuit latency for wave steered circuits and propose an integer linear programming (ILP) formulation for scheduling and retiming these circuits to minimize the number of latches for minimum latency. Experimental results show that in 0.25 /spl mu/m CMOS technology, as much as 33.2% reduction in latch count, at minimum latency, can be achieved over unoptimized wave steered circuits operating at 500 MHz.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134116966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the concept of flexibility-a geometric property associated with Steiner trees. Flexibility is related to the routability of the Steiner tree. We present an optimal algorithm which takes a Steiner tree and outputs a more flexible Steiner tree. Our experiments show that a net with a flexible Steiner tree increases its routability. Experiments with a global router show that congestion is improved by approximately 20%.
{"title":"Creating and exploiting flexibility in Steiner trees","authors":"E. Bozorgzadeh, R. Kastner, M. Sarrafzadeh","doi":"10.1145/378239.378462","DOIUrl":"https://doi.org/10.1145/378239.378462","url":null,"abstract":"This paper presents the concept of flexibility-a geometric property associated with Steiner trees. Flexibility is related to the routability of the Steiner tree. We present an optimal algorithm which takes a Steiner tree and outputs a more flexible Steiner tree. Our experiments show that a net with a flexible Steiner tree increases its routability. Experiments with a global router show that congestion is improved by approximately 20%.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The computation of logic-stage delays is a fundamental sub-problem for many EDA tasks. Although accurate delays can be obtained via circuit simulation, we must estimate the input assignments that will maximize the delay. With conventional methods, it is not feasible to estimate the delay for all input assignments on large sub-networks, so previous approaches have relied on heuristics. We present a symbolic algorithm that enables efficient computation of the Elmore delay under all input assignments and delay refinement using circuit-simulation. We analyze the Elmore estimate with three metrics using data extracted from symbolic timing simulations of industrial circuits.
{"title":"Computing logic-stage delays using circuit simulation and symbolic Elmore analysis","authors":"Clayton B. McDonald, R. Bryant","doi":"10.1145/378239.378486","DOIUrl":"https://doi.org/10.1145/378239.378486","url":null,"abstract":"The computation of logic-stage delays is a fundamental sub-problem for many EDA tasks. Although accurate delays can be obtained via circuit simulation, we must estimate the input assignments that will maximize the delay. With conventional methods, it is not feasible to estimate the delay for all input assignments on large sub-networks, so previous approaches have relied on heuristics. We present a symbolic algorithm that enables efficient computation of the Elmore delay under all input assignments and delay refinement using circuit-simulation. We analyze the Elmore estimate with three metrics using data extracted from symbolic timing simulations of industrial circuits.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132135442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a CAD methodology and a tool for high-level synthesis (HLS) of digital hardware for mixed analog-digital chips. In contrast to HLS for digital applications, HLS for mixed-signal systems is mainly challenged by constraints, such as digital switching noise (DSN), that are due to the analog circuits. This paper discusses an integrated approach to HLS and power net routing for effectively reducing DSN. Motivation for this research is that HLS has a high impact on DSN reduction, however, DSN evaluation is very difficult at a high level. Integrated approach also employs an original method for fast evaluation of DSN and an algorithm for power net routing and sizing. Experiments showed that our combined binding and scheduling method produces better results than traditional HLS techniques. Finally, DSN evaluation using the proposed algorithm can be significantly faster than SPICE simulation.
{"title":"Integrated high-level synthesis and power-net routing for digital design under switching noise constraints","authors":"A. Doboli, R. Vemuri","doi":"10.1145/378239.379037","DOIUrl":"https://doi.org/10.1145/378239.379037","url":null,"abstract":"This paper presents a CAD methodology and a tool for high-level synthesis (HLS) of digital hardware for mixed analog-digital chips. In contrast to HLS for digital applications, HLS for mixed-signal systems is mainly challenged by constraints, such as digital switching noise (DSN), that are due to the analog circuits. This paper discusses an integrated approach to HLS and power net routing for effectively reducing DSN. Motivation for this research is that HLS has a high impact on DSN reduction, however, DSN evaluation is very difficult at a high level. Integrated approach also employs an original method for fast evaluation of DSN and an algorithm for power net routing and sizing. Experiments showed that our combined binding and scheduling method produces better results than traditional HLS techniques. Finally, DSN evaluation using the proposed algorithm can be significantly faster than SPICE simulation.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128067201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper presents a simulation-based test algorithm generation and test scheduling methodology for multi-port memories. The purpose is to minimize the testing time while keeping the test algorithm in a simple and regular format for easy test generation, fault diagnosis, and built-in self-test (BIST) circuit implementation. Conventional functional fault models are used to generate tests covering most defects. In addition, multi-port specific defects are covered using structural fault models. Port-scheduling is introduced to take advantage of the inherent parallelism among different ports. Experimental results for commonly used multi-port memories, including dual-port, four-port, and n-read-l-write memories, have been obtained, showing that efficient test algorithms can be generated and scheduled to meet different test bandwidth constraints. Moreover, memories with more ports benefit more with respect to testing time.
{"title":"Simulation-based test algorithm generation and port scheduling for multi-port memories","authors":"Chi-Feng Wu, Chih-Tsun Huang, Kuo-Liang Cheng, Chih-Wea Wang, Cheng-Wen Wu","doi":"10.1109/DAC.2001.156155","DOIUrl":"https://doi.org/10.1109/DAC.2001.156155","url":null,"abstract":"The paper presents a simulation-based test algorithm generation and test scheduling methodology for multi-port memories. The purpose is to minimize the testing time while keeping the test algorithm in a simple and regular format for easy test generation, fault diagnosis, and built-in self-test (BIST) circuit implementation. Conventional functional fault models are used to generate tests covering most defects. In addition, multi-port specific defects are covered using structural fault models. Port-scheduling is introduced to take advantage of the inherent parallelism among different ports. Experimental results for commonly used multi-port memories, including dual-port, four-port, and n-read-l-write memories, have been obtained, showing that efficient test algorithms can be generated and scheduled to meet different test bandwidth constraints. Moreover, memories with more ports benefit more with respect to testing time.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}