Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106799
S. Bhanja, N. Ranganathan
We represent switching activity in VLSI circuits using a graphical probabilistic model based on cascaded Bayesian networks (CBNs). We develop an elegant method for maintaining probabilistic consistency in the interfacing boundaries across the CBNs during the inference process using a tree-dependent (TD) probability distribution function. A tree-dependent (TD) distribution is an approximation of the true joint probability function over the switching variables, with the constraint that the underlying Bayesian network representation is a tree. The tree approximation of the true joint probability function can be arrived at using a maximum weight spanning tree (MWST) built using pairwise mutual information between switchings at two signal lines. Further we also develop a TD distribution based method to model correlations among the primary inputs which is critical for accuracy in Bayesian modeling of switching activity. Experimental results for ISCAS circuits are presented to illustrate the efficacy of the proposed methods.
{"title":"Modeling switching activity using cascaded Bayesian networks for correlated input streams","authors":"S. Bhanja, N. Ranganathan","doi":"10.1109/ICCD.2002.1106799","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106799","url":null,"abstract":"We represent switching activity in VLSI circuits using a graphical probabilistic model based on cascaded Bayesian networks (CBNs). We develop an elegant method for maintaining probabilistic consistency in the interfacing boundaries across the CBNs during the inference process using a tree-dependent (TD) probability distribution function. A tree-dependent (TD) distribution is an approximation of the true joint probability function over the switching variables, with the constraint that the underlying Bayesian network representation is a tree. The tree approximation of the true joint probability function can be arrived at using a maximum weight spanning tree (MWST) built using pairwise mutual information between switchings at two signal lines. Further we also develop a TD distribution based method to model correlations among the primary inputs which is critical for accuracy in Bayesian modeling of switching activity. Experimental results for ISCAS circuits are presented to illustrate the efficacy of the proposed methods.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121301574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106750
Hong Peng, Y. Mokhtari, S. Tahar
Modeling the environment of a design module under verification is a known practical problem in compositional verification. In this paper, we propose an approach to translate an ACTL specification into such an environment. Throughout the translation, we construct an efficient tableau for the full range of ACTL and synthesize the tableau into Verilog HDL behavior level program. The synthesized program can be used to check the properties that the system's components must guarantee. We have used the proposed environment synthesis in the compositional verification of an ATM switch fabric from Nortel Networks. Experiments show that given the theoretical compositional verification intractable limit, we can still manage to verify industry size designs.
{"title":"Environment synthesis for compositional model checking","authors":"Hong Peng, Y. Mokhtari, S. Tahar","doi":"10.1109/ICCD.2002.1106750","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106750","url":null,"abstract":"Modeling the environment of a design module under verification is a known practical problem in compositional verification. In this paper, we propose an approach to translate an ACTL specification into such an environment. Throughout the translation, we construct an efficient tableau for the full range of ACTL and synthesize the tableau into Verilog HDL behavior level program. The synthesized program can be used to check the properties that the system's components must guarantee. We have used the proposed environment synthesis in the compositional verification of an ATM switch fabric from Nortel Networks. Experiments show that given the theoretical compositional verification intractable limit, we can still manage to verify industry size designs.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122801424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106771
I. Pomeranz, S. Reddy
The use of multiple scan chains for a scan design reduces the test application time by reducing the number of clock cycles required for a scan-in/scan-out operation. In this work, we show that the use of multiple scan chains also increases the fault coverage achievable for delay faults, requiring two-pattern tests, under the scan-shift test application scheme. Under this scheme, the first pattern of a two-pattern test is scanned in, and the second pattern is obtained by shifting the scan chain once more. We also demonstrate that the specific way in which scan flip-flops are partitioned into scan chains affects the delay fault coverage. This is true even if the order of the flip-flops in the scan chains remains the same. To demonstrate this point, we describe a procedure that partitions scan flip-flops into scan chains so as to maximize the coverage of transition faults.
{"title":"On the coverage of delay faults in scan designs with multiple scan chains","authors":"I. Pomeranz, S. Reddy","doi":"10.1109/ICCD.2002.1106771","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106771","url":null,"abstract":"The use of multiple scan chains for a scan design reduces the test application time by reducing the number of clock cycles required for a scan-in/scan-out operation. In this work, we show that the use of multiple scan chains also increases the fault coverage achievable for delay faults, requiring two-pattern tests, under the scan-shift test application scheme. Under this scheme, the first pattern of a two-pattern test is scanned in, and the second pattern is obtained by shifting the scan chain once more. We also demonstrate that the specific way in which scan flip-flops are partitioned into scan chains affects the delay fault coverage. This is true even if the order of the flip-flops in the scan chains remains the same. To demonstrate this point, we describe a procedure that partitions scan flip-flops into scan chains so as to maximize the coverage of transition faults.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128717559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106790
S. Ruan, E. Naroska, Chia-Lin Ho, F. Lai
In this paper we propose a bipartition dual-encoding architecture for low power pipelined circuit. Pipelined circuits consist of combinational logic blocks separated by registers which usually consume a large amount of power Although the clock gated technique is a promising approach to reduce switching activities of the pipelined registers, this approach is restricted by the placement of the registers and the additional control signals that must be generated. Thus, we propose a technique for optimizing power dissipation of a pipelined circuit addressing registers and combinational logic blocks at the same time. Our approach modifies the registers using bipartition and encoding techniques. In our experiments power consumption were reduced by 72.9% for pipelined registers and 30.4% for the total pipelined stage on average.
{"title":"Power analysis of bipartition and dual-encoding architecture for pipelined circuits","authors":"S. Ruan, E. Naroska, Chia-Lin Ho, F. Lai","doi":"10.1109/ICCD.2002.1106790","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106790","url":null,"abstract":"In this paper we propose a bipartition dual-encoding architecture for low power pipelined circuit. Pipelined circuits consist of combinational logic blocks separated by registers which usually consume a large amount of power Although the clock gated technique is a promising approach to reduce switching activities of the pipelined registers, this approach is restricted by the placement of the registers and the additional control signals that must be generated. Thus, we propose a technique for optimizing power dissipation of a pipelined circuit addressing registers and combinational logic blocks at the same time. Our approach modifies the registers using bipartition and encoding techniques. In our experiments power consumption were reduced by 72.9% for pipelined registers and 30.4% for the total pipelined stage on average.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129570329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106817
B. Arslan, A. Orailoglu
The exceedingly large size of fault dictionaries constitutes a fundamental obstacle to their usage. We outline a new method to reduce significantly, the size of fault dictionaries. The proposed method partitions the test set and a combined signature is stored for each partition. The new approach aims to provide high diagnostic resolution with a small number of combined signatures. The experimental results show a considerable decrease in the storage requirement of fault dictionaries.
{"title":"Fault dictionary size reduction through test response superposition","authors":"B. Arslan, A. Orailoglu","doi":"10.1109/ICCD.2002.1106817","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106817","url":null,"abstract":"The exceedingly large size of fault dictionaries constitutes a fundamental obstacle to their usage. We outline a new method to reduce significantly, the size of fault dictionaries. The proposed method partitions the test set and a combined signature is stored for each partition. The new approach aims to provide high diagnostic resolution with a small number of combined signatures. The experimental results show a considerable decrease in the storage requirement of fault dictionaries.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129804507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106785
John Douglas Owens, S. Rixner, U. Kapasi, P. Mattson, Brian Towles, B. Serebrin, W. Dally
Media applications, such as image processing, signal processing, video, and graphics, require high computation rates and data bandwidths. The stream programming model is a natural and powerful way to describe these applications. Expressing media applications in this model allows hardware and software systems to take advantage of their concurrency and locality in order to meet their high computational demands. The Imagine stream programming system, a set of software tools and algorithms, is used to program media applications in the stream programming model. We achieve real-time performance on a variety of media processing applications with high computation rates (4-15 billion achieved operations per second) and high efficiency (84-95% occupancy on the arithmetic clusters).
{"title":"Media processing applications on the Imagine stream processor","authors":"John Douglas Owens, S. Rixner, U. Kapasi, P. Mattson, Brian Towles, B. Serebrin, W. Dally","doi":"10.1109/ICCD.2002.1106785","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106785","url":null,"abstract":"Media applications, such as image processing, signal processing, video, and graphics, require high computation rates and data bandwidths. The stream programming model is a natural and powerful way to describe these applications. Expressing media applications in this model allows hardware and software systems to take advantage of their concurrency and locality in order to meet their high computational demands. The Imagine stream programming system, a set of software tools and algorithms, is used to program media applications in the stream programming model. We achieve real-time performance on a variety of media processing applications with high computation rates (4-15 billion achieved operations per second) and high efficiency (84-95% occupancy on the arithmetic clusters).","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124128500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106792
Jinwoo Kim, K. Palem, W. Wong
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching solutions ranging from pure software approach by inserting prefetch instructions through program analysis to purely hardware mechanisms have been proposed. The degrees of success of those techniques are dependent on the nature of the applications. The need for innovative approach is rapidly growing with the introduction of applications such as object-oriented applications that show dynamically changing memory access behavior In this paper, we propose a novel framework for the use of data prefetchers that are trained off-line using smart learning algorithms to produce prediction models which captures hidden memory access patterns. Once built, those prediction models are loaded into a data prefetching unit in the CPU at the appropriate point during the runtime to drive the prefetching. On average by using table size of about 8KB size, we were able to achieve prediction accuracy of about 68% through our own proposed learning method and performance was boosted about 37% on average on the benchmarks we tested. Furthermore, we believe our proposed framework is amenable to other predictors and can be done as a phase of the profiling-optimizing-compiler.
{"title":"A framework for data prefetching using off-line training of Markovian predictors","authors":"Jinwoo Kim, K. Palem, W. Wong","doi":"10.1109/ICCD.2002.1106792","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106792","url":null,"abstract":"An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching solutions ranging from pure software approach by inserting prefetch instructions through program analysis to purely hardware mechanisms have been proposed. The degrees of success of those techniques are dependent on the nature of the applications. The need for innovative approach is rapidly growing with the introduction of applications such as object-oriented applications that show dynamically changing memory access behavior In this paper, we propose a novel framework for the use of data prefetchers that are trained off-line using smart learning algorithms to produce prediction models which captures hidden memory access patterns. Once built, those prediction models are loaded into a data prefetching unit in the CPU at the appropriate point during the runtime to drive the prefetching. On average by using table size of about 8KB size, we were able to achieve prediction accuracy of about 68% through our own proposed learning method and performance was boosted about 37% on average on the benchmarks we tested. Furthermore, we believe our proposed framework is amenable to other predictors and can be done as a phase of the profiling-optimizing-compiler.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128153762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106816
P. Rosinger, B. Al-Hashimi, N. Nicolici
Low power design techniques have been employed for more than two decades, however an emerging problem is satisfying the test power constraints for avoiding destructive test and improving the yield. Our research addresses this problem by proposing a new method which maintains the benefits of mixed-mode built-in self-test (BIST) (low test application time and high fault coverage), and reduces the excessive power dissipation associated with scan-based test. This is achieved by employing dual linear feedback shift register (LFSR) re-seeding and generating mask patterns to reduce the switching activity. Theoretical analysis and experimental results show that the proposed method consistently reduces the switching activity by 25% when compared to the traditional approaches, at the expense of a limited increase in storage requirements.
{"title":"Low power mixed-mode BIST based on mask pattern generation using dual LFSR re-seeding","authors":"P. Rosinger, B. Al-Hashimi, N. Nicolici","doi":"10.1109/ICCD.2002.1106816","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106816","url":null,"abstract":"Low power design techniques have been employed for more than two decades, however an emerging problem is satisfying the test power constraints for avoiding destructive test and improving the yield. Our research addresses this problem by proposing a new method which maintains the benefits of mixed-mode built-in self-test (BIST) (low test application time and high fault coverage), and reduces the excessive power dissipation associated with scan-based test. This is achieved by employing dual linear feedback shift register (LFSR) re-seeding and generating mask patterns to reduce the switching activity. Theoretical analysis and experimental results show that the proposed method consistently reduces the switching activity by 25% when compared to the traditional approaches, at the expense of a limited increase in storage requirements.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132875979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106796
P. Giusto, J. Brunel, A. Ferrari, E. Fourgeau, L. Lavagno, A. Sangiovanni-Vincentelli
In this paper, we present the new concept of virtual integration platform for automotive electronics. The platform provides the basis for a novel methodology in which the integration of sub-systems is performed much earlier in the design cycle. As a result, cost reduction in the final implementation and in the design process can be achieved. In addition, early and repeatable fault analysis can be performed therefore easing the task of system safety proving.
{"title":"Automotive virtual integration platforms: why's, what's, and how's","authors":"P. Giusto, J. Brunel, A. Ferrari, E. Fourgeau, L. Lavagno, A. Sangiovanni-Vincentelli","doi":"10.1109/ICCD.2002.1106796","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106796","url":null,"abstract":"In this paper, we present the new concept of virtual integration platform for automotive electronics. The platform provides the basis for a novel methodology in which the integration of sub-systems is performed much earlier in the design cycle. As a result, cost reduction in the final implementation and in the design process can be achieved. In addition, early and repeatable fault analysis can be performed therefore easing the task of system safety proving.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130933837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106743
Hongyu Chen, B. Yao, Feng Zhou, Chung-Kuan Cheng
Interconnect architecture plays an important role in determining the throughput of meshed communication structures. We assume a mesh structure with uniform communication demand for communication. A multi-commodity flow (MCF) model is proposed to find the throughput for several different routing architectures. The experimental results reveal several trends: 1. The throughput is limited by the capacity of the middle row and column in the mesh, simply enlarging the congested channel cannot produce better throughput. A flexible chip shape provides around 30% throughput improvement over a square chip of equal area. 2. A 45-degree mesh allows 17% throughput improvement over 90-degree mesh and a 90-degree and 45-degree mixed mesh provides 30% throughput improvement. 3. To achieve maximum throughput on a mixed Manhattan and diagonal interconnect architecture, the best ratio of the capacity for diagonal routing layers and the capacity for Manhattan routing layers is 5.6. 4. Incorporating a simplified via model, interleaving diagonal routing layers and Manhattan routing layer is the best way to organize the wiring directions on different layers.
{"title":"Physical planning of on-chip interconnect architectures","authors":"Hongyu Chen, B. Yao, Feng Zhou, Chung-Kuan Cheng","doi":"10.1109/ICCD.2002.1106743","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106743","url":null,"abstract":"Interconnect architecture plays an important role in determining the throughput of meshed communication structures. We assume a mesh structure with uniform communication demand for communication. A multi-commodity flow (MCF) model is proposed to find the throughput for several different routing architectures. The experimental results reveal several trends: 1. The throughput is limited by the capacity of the middle row and column in the mesh, simply enlarging the congested channel cannot produce better throughput. A flexible chip shape provides around 30% throughput improvement over a square chip of equal area. 2. A 45-degree mesh allows 17% throughput improvement over 90-degree mesh and a 90-degree and 45-degree mixed mesh provides 30% throughput improvement. 3. To achieve maximum throughput on a mixed Manhattan and diagonal interconnect architecture, the best ratio of the capacity for diagonal routing layers and the capacity for Manhattan routing layers is 5.6. 4. Incorporating a simplified via model, interleaving diagonal routing layers and Manhattan routing layer is the best way to organize the wiring directions on different layers.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124359926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}