Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646665
S. Chattopadhyay, P. P. Chaudhuri
A novel technique for generating efficient signatures has been proposed for characterizing Boolean functions. The computed signatures can be found to be insensitive to permutations of input variables. Such a signature can be used to find a match for a given function in a large library of Boolean functions. This paper utilizes the concept of A-transform used to solve the problem of probabilistic design verification. It has been proved analytically that for number of variables less than 5, the generated signature is unique. Randomly generated functions of 5, 6, and 7 variables, aliasing has been observed to be within 0.5%. This basic scheme is next modified to arrive at a signature with linear space complexity. The efficiency of the modified signature to distinguish nonequivalent Boolean functions can be found to be above 0.99 for Actel type multiplexor based FPGAs.
{"title":"Efficient signatures with linear space complexity for detecting Boolean function equivalence","authors":"S. Chattopadhyay, P. P. Chaudhuri","doi":"10.1109/ICVD.1998.646665","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646665","url":null,"abstract":"A novel technique for generating efficient signatures has been proposed for characterizing Boolean functions. The computed signatures can be found to be insensitive to permutations of input variables. Such a signature can be used to find a match for a given function in a large library of Boolean functions. This paper utilizes the concept of A-transform used to solve the problem of probabilistic design verification. It has been proved analytically that for number of variables less than 5, the generated signature is unique. Randomly generated functions of 5, 6, and 7 variables, aliasing has been observed to be within 0.5%. This basic scheme is next modified to arrive at a signature with linear space complexity. The efficiency of the modified signature to distinguish nonequivalent Boolean functions can be found to be above 0.99 for Actel type multiplexor based FPGAs.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124305481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646644
N. Ranganathan, R. Anand, G. Chiruvolu
An Asynchronous Transfer Mode (ATM) switching network must process data at the rates of 155 Mbps and 620 Mbps as per the standard. Such bandwidth requirements have necessitated the realization of efficient switch architectures. In this paper, we propose a novel architecture for the design of a non-blocking, central buffer switch for ATM networks. The central buffer switch architectures described in the literature organize the logical output queues as linked lists of data packets. Thus, dynamic memory allocation involves the manipulation of the read and the write pointers of these linked lists. In the switch architecture proposed in this work, the packets are stored in the data memory and only the packet addresses are stored in a set of First In First Out (FIFO) buffers that form the logical output queues. This approach eliminates the need for memory accesses for the manipulation of linked lists which improves significantly the response time. A 4/spl times/4 prototype switch of the proposed architecture was designed and verified using the Cadence design tools. The prototype was verified to operate at a frequency of 40 MHz yielding a throughput of 12.334 Gbps.
ATM (Asynchronous Transfer Mode)交换网络的数据处理速率必须达到155mbps和620mbps的标准。这样的带宽需求使得实现高效的交换机架构成为必要。在本文中,我们提出了一种用于ATM网络的非阻塞中央缓冲交换机的新架构。文献中描述的中央缓冲交换机体系结构将逻辑输出队列组织为数据包的链表。因此,动态内存分配涉及到对这些链表的读和写指针的操作。在这项工作中提出的交换机架构中,数据包存储在数据存储器中,只有数据包地址存储在形成逻辑输出队列的一组先进先出(FIFO)缓冲区中。这种方法消除了为操作链表而访问内存的需要,从而显著提高了响应时间。使用Cadence设计工具设计并验证了所提出架构的4/ sp1倍/4原型开关。经过验证,该原型在40 MHz的频率下运行,吞吐量为12.334 Gbps。
{"title":"A VLSI ATM switch architecture for VBR traffic","authors":"N. Ranganathan, R. Anand, G. Chiruvolu","doi":"10.1109/ICVD.1998.646644","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646644","url":null,"abstract":"An Asynchronous Transfer Mode (ATM) switching network must process data at the rates of 155 Mbps and 620 Mbps as per the standard. Such bandwidth requirements have necessitated the realization of efficient switch architectures. In this paper, we propose a novel architecture for the design of a non-blocking, central buffer switch for ATM networks. The central buffer switch architectures described in the literature organize the logical output queues as linked lists of data packets. Thus, dynamic memory allocation involves the manipulation of the read and the write pointers of these linked lists. In the switch architecture proposed in this work, the packets are stored in the data memory and only the packet addresses are stored in a set of First In First Out (FIFO) buffers that form the logical output queues. This approach eliminates the need for memory accesses for the manipulation of linked lists which improves significantly the response time. A 4/spl times/4 prototype switch of the proposed architecture was designed and verified using the Cadence design tools. The prototype was verified to operate at a frequency of 40 MHz yielding a throughput of 12.334 Gbps.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122559596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646637
V. Dabholkar, S. Chakravarty
Reliability screens are used to reduce infant mortality. The quality of the stress test set used during the screening process has a direct bearing on the effectiveness of the screen. We present a formal study of the problem of computing good quality stress tests for gate-oxide shorts which is the cause of much of the reliability problems. A method to compute stress test which is better than the popular method of using I/sub DDQ/ vectors is presented.
{"title":"Computing stress tests for gate-oxide shorts","authors":"V. Dabholkar, S. Chakravarty","doi":"10.1109/ICVD.1998.646637","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646637","url":null,"abstract":"Reliability screens are used to reduce infant mortality. The quality of the stress test set used during the screening process has a direct bearing on the effectiveness of the screen. We present a formal study of the problem of computing good quality stress tests for gate-oxide shorts which is the cause of much of the reliability problems. A method to compute stress test which is better than the popular method of using I/sub DDQ/ vectors is presented.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124013844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646647
Zhang Yang, Rajesh K. Gupta
In this paper, we investigate the relationship between partitioning and high-level synthesis tasks, namely operation scheduling and resource allocation/binding. The interaction between partitioning and synthesis tasks is explored using IP formulations for four different design approaches representing different strategies for high-level synthesis. The results are quantified by varying three design parameters, namely the partition size bound, resource size bound and latency margin bound. Experimental results show the tradeoff between the quality of synthesis results and the computation cost for different design approaches, while simultaneous partitioning and synthesis tasks gives the best results, and the computational efficiency can be improved by separating scheduling from partitioning.
{"title":"A case analysis of system partitioning and its relationship to high-level synthesis tasks","authors":"Zhang Yang, Rajesh K. Gupta","doi":"10.1109/ICVD.1998.646647","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646647","url":null,"abstract":"In this paper, we investigate the relationship between partitioning and high-level synthesis tasks, namely operation scheduling and resource allocation/binding. The interaction between partitioning and synthesis tasks is explored using IP formulations for four different design approaches representing different strategies for high-level synthesis. The results are quantified by varying three design parameters, namely the partition size bound, resource size bound and latency margin bound. Experimental results show the tradeoff between the quality of synthesis results and the computation cost for different design approaches, while simultaneous partitioning and synthesis tasks gives the best results, and the computational efficiency can be improved by separating scheduling from partitioning.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114064324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646657
Rung-Bin Lin, Meng-Chiou Wu
In this paper a new problem definition of statistical timing analysis is formulated. Two efficient methods that consider only dominant long paths are employed to approach this problem. The influence of the correlation of node delays on the probability distribution of the longest path delay is studied in detail. The experimental results show that the probability distribution of the longest path delay is greatly influenced by the correlation of nodes and by the presence of many dominant long paths. The results also show that the probability distribution obtained by our approaches is well tracked to the distribution obtained by the whole circuit simulation with much less computation time.
{"title":"A new statistical approach to timing analysis of VLSI circuits","authors":"Rung-Bin Lin, Meng-Chiou Wu","doi":"10.1109/ICVD.1998.646657","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646657","url":null,"abstract":"In this paper a new problem definition of statistical timing analysis is formulated. Two efficient methods that consider only dominant long paths are employed to approach this problem. The influence of the correlation of node delays on the probability distribution of the longest path delay is studied in detail. The experimental results show that the probability distribution of the longest path delay is greatly influenced by the correlation of nodes and by the presence of many dominant long paths. The results also show that the probability distribution obtained by our approaches is well tracked to the distribution obtained by the whole circuit simulation with much less computation time.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131265764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646575
M. Mehendale, S. Sherlekar, G. Venkatesh
We present extensions to the programmable DSP architectures for reduced power dissipations. These extensions address power reduction in both external and internal buses, which form a major component of power dissipation in pipelined programmable processors such as DSPs. We present two techniques to reduce power dissipation in the program and data memory address buses, a technique to reduce cross-coupling related power dissipation in the program memory data bus and a technique for reducing power dissipation in the input buses of the ALU. We present results in terms of power savings using these techniques.
{"title":"Extensions to programmable DSP architectures for reduced power dissipation","authors":"M. Mehendale, S. Sherlekar, G. Venkatesh","doi":"10.1109/ICVD.1998.646575","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646575","url":null,"abstract":"We present extensions to the programmable DSP architectures for reduced power dissipations. These extensions address power reduction in both external and internal buses, which form a major component of power dissipation in pipelined programmable processors such as DSPs. We present two techniques to reduce power dissipation in the program and data memory address buses, a technique to reduce cross-coupling related power dissipation in the program memory data bus and a technique for reducing power dissipation in the input buses of the ALU. We present results in terms of power savings using these techniques.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121613629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646623
C. Sodini, J. Gealow, Z. A. Talib, I. Masaki
Typical low-level image processing tasks require thousands of operations per pixel for each input image. The structure of the tasks suggests employing an array of processing elements, one per pixel, sharing instructions issued by a single controller. To build pixel-parallel image processing hardware for microcomputer systems, large processing element arrays must be produced at low cost. Integrated circuit designers have had tremendous success creating dense and inexpensive semiconductor memories. They handcraft circuits to perform essential functions using very little silicon area, then replicate the circuits to form large memory arrays. This paper shows how the same technique may be applied to create a dense integrated processing element array.
{"title":"Integrated memory/logic architecture for image processing","authors":"C. Sodini, J. Gealow, Z. A. Talib, I. Masaki","doi":"10.1109/ICVD.1998.646623","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646623","url":null,"abstract":"Typical low-level image processing tasks require thousands of operations per pixel for each input image. The structure of the tasks suggests employing an array of processing elements, one per pixel, sharing instructions issued by a single controller. To build pixel-parallel image processing hardware for microcomputer systems, large processing element arrays must be produced at low cost. Integrated circuit designers have had tremendous success creating dense and inexpensive semiconductor memories. They handcraft circuits to perform essential functions using very little silicon area, then replicate the circuits to form large memory arrays. This paper shows how the same technique may be applied to create a dense integrated processing element array.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124044927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646664
M. F. Abdulla, C. Ravikumar, Anshul Kumar
The multiple on-chip signature checking architecture proposed by the authors previously is an effective BIST architecture for testing the functional units in modern VLSI circuits. It is characterized by low aliasing, low area overhead and low testing time. However, a straight forward application of this architecture in testing the embedded RAMs will result in excessive area overheads. In this paper the authors propose a scheme to apply this architecture to embedded static RAMs with no significant increase in area. The scheme is applicable to testing chips that have multiple embedded RAMs of various sizes (e.g., ASIC chips in telecommunication applications).
{"title":"On-chip signature checking for embedded memories","authors":"M. F. Abdulla, C. Ravikumar, Anshul Kumar","doi":"10.1109/ICVD.1998.646664","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646664","url":null,"abstract":"The multiple on-chip signature checking architecture proposed by the authors previously is an effective BIST architecture for testing the functional units in modern VLSI circuits. It is characterized by low aliasing, low area overhead and low testing time. However, a straight forward application of this architecture in testing the embedded RAMs will result in excessive area overheads. In this paper the authors propose a scheme to apply this architecture to embedded static RAMs with no significant increase in area. The scheme is applicable to testing chips that have multiple embedded RAMs of various sizes (e.g., ASIC chips in telecommunication applications).","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124245790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646571
M. Mehendale, S. Sherlekar, G. Venkatesh
We present algorithmic and architectural transforms for low power realization of Finite Impulse Response (FIR) filters implemented both in software on programmable DSPs and as hardwired macros. For the programmable DSP based implementation, these transform address power reduction in the program memory address and data busses and also the multiplier. We also propose architectural extensions to support some of these transformations. The transforms for hardwired FIR filters aim at reducing the supply voltage while maintaining the throughput. We also present transforms that reduce the computational complexity of the FIR filter computation and thus achieve power reduction.
{"title":"Algorithmic and architectural transformations for low power realization of FIR filters","authors":"M. Mehendale, S. Sherlekar, G. Venkatesh","doi":"10.1109/ICVD.1998.646571","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646571","url":null,"abstract":"We present algorithmic and architectural transforms for low power realization of Finite Impulse Response (FIR) filters implemented both in software on programmable DSPs and as hardwired macros. For the programmable DSP based implementation, these transform address power reduction in the program memory address and data busses and also the multiplier. We also propose architectural extensions to support some of these transformations. The transforms for hardwired FIR filters aim at reducing the supply voltage while maintaining the throughput. We also present transforms that reduce the computational complexity of the FIR filter computation and thus achieve power reduction.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121732333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646609
R. Burra, D. Bhatia
System level design is increasingly turning towards FPGAs to take advantage of their low cost and fast prototyping. In this paper we present a timing driven partitioning approach for an architecturally constrained multi-FPGA system. The partitioning approach uses path-based clustering based on the work by Dennis et al. (1995) and retiming. The board-level architecture is based on the PCB model consisting of four Xilinx 4013 FPGAs. The proposed algorithm has been tested on large scale real designs.
{"title":"Timing driven multi-FPGA board partitioning","authors":"R. Burra, D. Bhatia","doi":"10.1109/ICVD.1998.646609","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646609","url":null,"abstract":"System level design is increasingly turning towards FPGAs to take advantage of their low cost and fast prototyping. In this paper we present a timing driven partitioning approach for an architecturally constrained multi-FPGA system. The partitioning approach uses path-based clustering based on the work by Dennis et al. (1995) and retiming. The board-level architecture is based on the PCB model consisting of four Xilinx 4013 FPGAs. The proposed algorithm has been tested on large scale real designs.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122065899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}