A test pattern compression scheme is proposed in order to reduce test data volume and application time. The number of scan chains that can be supported by an ATE is significantly increased by utilizing an on-chip decompressor. The functionality of the ATE is kept intact by moving the decompression task to the circuit under test. While the number of virtual scan chains visible to the ATE is kept small, the number of internal scan chains driven by the decompressed pattern sequence can be significantly increased.
{"title":"Test volume and application time reduction through scan chain concealment","authors":"I. Bayraktaroglu, A. Orailoglu","doi":"10.1145/378239.378388","DOIUrl":"https://doi.org/10.1145/378239.378388","url":null,"abstract":"A test pattern compression scheme is proposed in order to reduce test data volume and application time. The number of scan chains that can be supported by an ATE is significantly increased by utilizing an on-chip decompressor. The functionality of the ATE is kept intact by moving the decompression task to the circuit under test. While the number of virtual scan chains visible to the ATE is kept small, the number of internal scan chains driven by the decompressed pattern sequence can be significantly increased.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128132620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advances in the FPGA technology, both in terms of device capacity and architecture, have resulted in introduction of reconfigurable computing machines, where the hardware adapts itself to the running application to gain speedup. To keep up with the ever-growing performance expectations of such systems, designers need new methodologies and tools for developing reconfigurable computing systems (RCS). This paper addresses the need for fast compilation and physical design phase to be used in application development/debugging/testing cycle for RCS. We present a high-level synthesis approach that is integrated with placement, making the compilation cycle much faster. On the average, our tool generates the VHDL code (and the corresponding placement information) from the data flow graph of a program in less than a minute. By compromising 30% in the clock frequency of the circuit, we can achieve about 10 times speedup in the Xilinx placement phase, and 2.5 times overall speedup in the Xilinx place-and-route phase, a reasonable trade-off when developing RCS applications.
{"title":"Integrating scheduling and physical design into a coherent compilation cycle for reconfigurable computing architectures","authors":"K. Bazargan, S. Memik, M. Sarrafzadeh","doi":"10.1145/378239.379038","DOIUrl":"https://doi.org/10.1145/378239.379038","url":null,"abstract":"Advances in the FPGA technology, both in terms of device capacity and architecture, have resulted in introduction of reconfigurable computing machines, where the hardware adapts itself to the running application to gain speedup. To keep up with the ever-growing performance expectations of such systems, designers need new methodologies and tools for developing reconfigurable computing systems (RCS). This paper addresses the need for fast compilation and physical design phase to be used in application development/debugging/testing cycle for RCS. We present a high-level synthesis approach that is integrated with placement, making the compilation cycle much faster. On the average, our tool generates the VHDL code (and the corresponding placement information) from the data flow graph of a program in less than a minute. By compromising 30% in the clock frequency of the circuit, we can achieve about 10 times speedup in the Xilinx placement phase, and 2.5 times overall speedup in the Xilinx place-and-route phase, a reasonable trade-off when developing RCS applications.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"333 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121600275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to meet the high throughput requirements of applications exhibiting high ILP, VLIW ASIPs may increasingly include large numbers of functional units (FUs). Unfortunately, 'switching' data through register files shared by large numbers of FUs quickly becomes a dominant cost performance factor suggesting that clustering smaller number of FUs around local register files may be beneficial even if data transfers are required among clusters. With such machines in mind, we propose a compiler transformation, predicated switching, which enables aggressive speculation while leveraging the penalties associated with inter-cluster communication to achieve gains in performance. Based on representative benchmarks, we demonstrate that this novel technique is particularly suitable for application specific clustered machines aimed at supporting high ILP as compared to state of-the-art approaches.
{"title":"Clustered VLIW architectures with predicated switching","authors":"M. Jacome, G. Veciana, Satish Pillai","doi":"10.1145/378239.379050","DOIUrl":"https://doi.org/10.1145/378239.379050","url":null,"abstract":"In order to meet the high throughput requirements of applications exhibiting high ILP, VLIW ASIPs may increasingly include large numbers of functional units (FUs). Unfortunately, 'switching' data through register files shared by large numbers of FUs quickly becomes a dominant cost performance factor suggesting that clustering smaller number of FUs around local register files may be beneficial even if data transfers are required among clusters. With such machines in mind, we propose a compiler transformation, predicated switching, which enables aggressive speculation while leveraging the penalties associated with inter-cluster communication to achieve gains in performance. Based on representative benchmarks, we demonstrate that this novel technique is particularly suitable for application specific clustered machines aimed at supporting high ILP as compared to state of-the-art approaches.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114861935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the strategy used to verify the error logic in the Alpha 21364 microprocessor. Traditional pre-silicon strategies of focused testing or unit-level random testing yield limited results in finding complex bugs in the error handling logic of a microprocessor. This paper introduces a technique to simulate error conditions and their recovery in a global environment using random test stimulus closely approximating traffic found in a real system. A significant number of bugs were found using this technique. A majority of these bugs could not be uncovered using a simple random environment, or were counter-intuitive to focused test design.
{"title":"Pre-silicon verification of the Alpha 21364 microprocessor error handling system","authors":"Richard Lee, B. Tsien","doi":"10.1145/378239.379073","DOIUrl":"https://doi.org/10.1145/378239.379073","url":null,"abstract":"This paper presents the strategy used to verify the error logic in the Alpha 21364 microprocessor. Traditional pre-silicon strategies of focused testing or unit-level random testing yield limited results in finding complex bugs in the error handling logic of a microprocessor. This paper introduces a technique to simulate error conditions and their recovery in a global environment using random test stimulus closely approximating traffic found in a real system. A significant number of bugs were found using this technique. A majority of these bugs could not be uncovered using a simple random environment, or were counter-intuitive to focused test design.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115096995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Alpert, Jiang Hu, S. Sapatnekar, P. Villarrubia
The dominating contribution of interconnect to system performance has made it critical to plan for buffer and wiring resources in the layout. Both buffers and wires must be considered, since wire routes determine buffer requirements and buffer locations constrain wire routes. In contrast to recent buffer block planning approaches, our design methodology distributes buffer sites throughout the layout. A tile graph is used to abstract the buffer planning problem while also addressing wire planning. We present a four-stage heuristic called RABID for resource allocation and experimentally verify its effectiveness.
{"title":"A practical methodology for early buffer and wire resource allocation","authors":"C. Alpert, Jiang Hu, S. Sapatnekar, P. Villarrubia","doi":"10.1145/378239.378461","DOIUrl":"https://doi.org/10.1145/378239.378461","url":null,"abstract":"The dominating contribution of interconnect to system performance has made it critical to plan for buffer and wiring resources in the layout. Both buffers and wires must be considered, since wire routes determine buffer requirements and buffer locations constrain wire routes. In contrast to recent buffer block planning approaches, our design methodology distributes buffer sites throughout the layout. A tile graph is used to abstract the buffer planning problem while also addressing wire planning. We present a four-stage heuristic called RABID for resource allocation and experimentally verify its effectiveness.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116144575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper presents a simulation-based test algorithm generation and test scheduling methodology for multi-port memories. The purpose is to minimize the testing time while keeping the test algorithm in a simple and regular format for easy test generation, fault diagnosis, and built-in self-test (BIST) circuit implementation. Conventional functional fault models are used to generate tests covering most defects. In addition, multi-port specific defects are covered using structural fault models. Port-scheduling is introduced to take advantage of the inherent parallelism among different ports. Experimental results for commonly used multi-port memories, including dual-port, four-port, and n-read-l-write memories, have been obtained, showing that efficient test algorithms can be generated and scheduled to meet different test bandwidth constraints. Moreover, memories with more ports benefit more with respect to testing time.
{"title":"Simulation-based test algorithm generation and port scheduling for multi-port memories","authors":"Chi-Feng Wu, Chih-Tsun Huang, Kuo-Liang Cheng, Chih-Wea Wang, Cheng-Wen Wu","doi":"10.1109/DAC.2001.156155","DOIUrl":"https://doi.org/10.1109/DAC.2001.156155","url":null,"abstract":"The paper presents a simulation-based test algorithm generation and test scheduling methodology for multi-port memories. The purpose is to minimize the testing time while keeping the test algorithm in a simple and regular format for easy test generation, fault diagnosis, and built-in self-test (BIST) circuit implementation. Conventional functional fault models are used to generate tests covering most defects. In addition, multi-port specific defects are covered using structural fault models. Port-scheduling is introduced to take advantage of the inherent parallelism among different ports. Experimental results for commonly used multi-port memories, including dual-port, four-port, and n-read-l-write memories, have been obtained, showing that efficient test algorithms can be generated and scheduled to meet different test bandwidth constraints. Moreover, memories with more ports benefit more with respect to testing time.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a CAD methodology and a tool for high-level synthesis (HLS) of digital hardware for mixed analog-digital chips. In contrast to HLS for digital applications, HLS for mixed-signal systems is mainly challenged by constraints, such as digital switching noise (DSN), that are due to the analog circuits. This paper discusses an integrated approach to HLS and power net routing for effectively reducing DSN. Motivation for this research is that HLS has a high impact on DSN reduction, however, DSN evaluation is very difficult at a high level. Integrated approach also employs an original method for fast evaluation of DSN and an algorithm for power net routing and sizing. Experiments showed that our combined binding and scheduling method produces better results than traditional HLS techniques. Finally, DSN evaluation using the proposed algorithm can be significantly faster than SPICE simulation.
{"title":"Integrated high-level synthesis and power-net routing for digital design under switching noise constraints","authors":"A. Doboli, R. Vemuri","doi":"10.1145/378239.379037","DOIUrl":"https://doi.org/10.1145/378239.379037","url":null,"abstract":"This paper presents a CAD methodology and a tool for high-level synthesis (HLS) of digital hardware for mixed analog-digital chips. In contrast to HLS for digital applications, HLS for mixed-signal systems is mainly challenged by constraints, such as digital switching noise (DSN), that are due to the analog circuits. This paper discusses an integrated approach to HLS and power net routing for effectively reducing DSN. Motivation for this research is that HLS has a high impact on DSN reduction, however, DSN evaluation is very difficult at a high level. Integrated approach also employs an original method for fast evaluation of DSN and an algorithm for power net routing and sizing. Experiments showed that our combined binding and scheduling method produces better results than traditional HLS techniques. Finally, DSN evaluation using the proposed algorithm can be significantly faster than SPICE simulation.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128067201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the concept of flexibility-a geometric property associated with Steiner trees. Flexibility is related to the routability of the Steiner tree. We present an optimal algorithm which takes a Steiner tree and outputs a more flexible Steiner tree. Our experiments show that a net with a flexible Steiner tree increases its routability. Experiments with a global router show that congestion is improved by approximately 20%.
{"title":"Creating and exploiting flexibility in Steiner trees","authors":"E. Bozorgzadeh, R. Kastner, M. Sarrafzadeh","doi":"10.1145/378239.378462","DOIUrl":"https://doi.org/10.1145/378239.378462","url":null,"abstract":"This paper presents the concept of flexibility-a geometric property associated with Steiner trees. Flexibility is related to the routability of the Steiner tree. We present an optimal algorithm which takes a Steiner tree and outputs a more flexible Steiner tree. Our experiments show that a net with a flexible Steiner tree increases its routability. Experiments with a global router show that congestion is improved by approximately 20%.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep submicron technology has two major ramifications on the design process: (i) critical paths are being dominated by global interconnect rather than gate delays and (ii) ultra high levels of integration mandate designs that encompass numerous intra-synchronous blocks with decreased functional granularity and increased communication demands. These factors emphasize the importance of the on-chip bus network as the crucial high-performance enabler for future systems-on-chip. By using independent functional blocks with programmable connectivity, designers are able to build systems-on-chip capable of supporting different applications with exceptional levels of resource sharing. To address challenges in this design paradigm, we have developed a methodology that enables efficient bus network design with approximate timing verification and floorplanning of multi-purpose systems-on-chip in early design stages. The design platform iterates system synthesis and floorplanning to build min-area floorplans that satisfy statistical time constraints of applications. We demonstrate the effectiveness of our bus network design approach using examples from a multimedia benchmark suite.
{"title":"Latency-driven design of multi-purpose systems-on-chip","authors":"S. Meguerdichian, M. Drinic, D. Kirovski","doi":"10.1145/378239.378258","DOIUrl":"https://doi.org/10.1145/378239.378258","url":null,"abstract":"Deep submicron technology has two major ramifications on the design process: (i) critical paths are being dominated by global interconnect rather than gate delays and (ii) ultra high levels of integration mandate designs that encompass numerous intra-synchronous blocks with decreased functional granularity and increased communication demands. These factors emphasize the importance of the on-chip bus network as the crucial high-performance enabler for future systems-on-chip. By using independent functional blocks with programmable connectivity, designers are able to build systems-on-chip capable of supporting different applications with exceptional levels of resource sharing. To address challenges in this design paradigm, we have developed a methodology that enables efficient bus network design with approximate timing verification and floorplanning of multi-purpose systems-on-chip in early design stages. The design platform iterates system synthesis and floorplanning to build min-area floorplans that satisfy statistical time constraints of applications. We demonstrate the effectiveness of our bus network design approach using examples from a multimedia benchmark suite.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131201525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The computation of logic-stage delays is a fundamental sub-problem for many EDA tasks. Although accurate delays can be obtained via circuit simulation, we must estimate the input assignments that will maximize the delay. With conventional methods, it is not feasible to estimate the delay for all input assignments on large sub-networks, so previous approaches have relied on heuristics. We present a symbolic algorithm that enables efficient computation of the Elmore delay under all input assignments and delay refinement using circuit-simulation. We analyze the Elmore estimate with three metrics using data extracted from symbolic timing simulations of industrial circuits.
{"title":"Computing logic-stage delays using circuit simulation and symbolic Elmore analysis","authors":"Clayton B. McDonald, R. Bryant","doi":"10.1145/378239.378486","DOIUrl":"https://doi.org/10.1145/378239.378486","url":null,"abstract":"The computation of logic-stage delays is a fundamental sub-problem for many EDA tasks. Although accurate delays can be obtained via circuit simulation, we must estimate the input assignments that will maximize the delay. With conventional methods, it is not feasible to estimate the delay for all input assignments on large sub-networks, so previous approaches have relied on heuristics. We present a symbolic algorithm that enables efficient computation of the Elmore delay under all input assignments and delay refinement using circuit-simulation. We analyze the Elmore estimate with three metrics using data extracted from symbolic timing simulations of industrial circuits.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132135442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}