P. Lotfi-Kamran, M. Massoumi, Mohammad Mirzaei, Z. Navabi
This work provides a canonical representation for manipulation of RTL designs. Work has already been done on a canonical and graph-based representation called Taylor expansion diagram (TED). Although TED can effectively be used to represent arithmetic expressions at the word-level, it is not memory efficient in representing bit-level logic expressions. In addition, TED cannot represent Boolean expressions at the word-level (vector-level). In this paper, we present modifications to TED that will improve its ability for bit-level logic representation while enhancing its robustness to represent word-level Boolean expressions. It will be shown that for bit-level logic expressions, the enhanced TED (ETED) performs the same as the BDD representation.
{"title":"Enhanced TED: A New Data Structure for RTL Verification","authors":"P. Lotfi-Kamran, M. Massoumi, Mohammad Mirzaei, Z. Navabi","doi":"10.1109/VLSI.2008.108","DOIUrl":"https://doi.org/10.1109/VLSI.2008.108","url":null,"abstract":"This work provides a canonical representation for manipulation of RTL designs. Work has already been done on a canonical and graph-based representation called Taylor expansion diagram (TED). Although TED can effectively be used to represent arithmetic expressions at the word-level, it is not memory efficient in representing bit-level logic expressions. In addition, TED cannot represent Boolean expressions at the word-level (vector-level). In this paper, we present modifications to TED that will improve its ability for bit-level logic representation while enhancing its robustness to represent word-level Boolean expressions. It will be shown that for bit-level logic expressions, the enhanced TED (ETED) performs the same as the BDD representation.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115780981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Compared to subthreshold leakage, dynamic power is normally much less sensitive to the process variation due to its approximately linear relation to the process parameters. However, the average dynamic power of a circuit optimized by deterministic glitch elimination (using hazard filtering and path balancing) increases because glitches randomly start reappearing under the influence of process variation. Combining existing techniques, we propose a new statistical mixed integer linear programming (MILP) formulation, which combines glitch elimination and dual-threshold design to statistically minimize the total power in a glitch-free circuit under process variation.
{"title":"Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation","authors":"Y. Lu, V. Agrawal","doi":"10.1109/VLSI.2008.29","DOIUrl":"https://doi.org/10.1109/VLSI.2008.29","url":null,"abstract":"Compared to subthreshold leakage, dynamic power is normally much less sensitive to the process variation due to its approximately linear relation to the process parameters. However, the average dynamic power of a circuit optimized by deterministic glitch elimination (using hazard filtering and path balancing) increases because glitches randomly start reappearing under the influence of process variation. Combining existing techniques, we propose a new statistical mixed integer linear programming (MILP) formulation, which combines glitch elimination and dual-threshold design to statistically minimize the total power in a glitch-free circuit under process variation.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114496721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evaluation of software performance on a given customized embedded processor is an important step in the design space exploration of embedded system architectures. Such evaluations help system designers in taking early design decisions regarding the hardware architecture most suitable for the target application. Simulation based performance evaluations, although very accurate, can be prohibitively slower. In this paper, we present a novel hybrid approach consisting of an initial simulation run (one time) followed by analysis of intermediate level (IR) application code by an evaluation engine. Our results show that the evaluation engine can accurately (more than 95%) estimate the excecution cycles of application or application task on a given customized embedded processor while it is at least an order of magnitude faster in terms of time taken.
{"title":"An Approach to Software Performance Evaluation on Customized Embedded Processors","authors":"Soumyajit Dey, M. Kedia, A. Basu","doi":"10.1109/VLSI.2008.42","DOIUrl":"https://doi.org/10.1109/VLSI.2008.42","url":null,"abstract":"Evaluation of software performance on a given customized embedded processor is an important step in the design space exploration of embedded system architectures. Such evaluations help system designers in taking early design decisions regarding the hardware architecture most suitable for the target application. Simulation based performance evaluations, although very accurate, can be prohibitively slower. In this paper, we present a novel hybrid approach consisting of an initial simulation run (one time) followed by analysis of intermediate level (IR) application code by an evaluation engine. Our results show that the evaluation engine can accurately (more than 95%) estimate the excecution cycles of application or application task on a given customized embedded processor while it is at least an order of magnitude faster in terms of time taken.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127201896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper presents a comprehensive behavioral model of a high precision tunneling accelerometer. Design and optimization of the silicon based tunneling has also been reported in this work. The accelerometer is CMOS compatible and has actuation voltage within CMOS bias levels. The proposed structure uniquely combines the electron tunneling based sensing and capacitive actuation. A feedback controller is designed to measure the acceleration under constant gap mode of operation. The full dynamic range of operation is 1 mug to 200 mug with a resolution in the order of nano-g. The cross-axis sensitivity is less than 1% and the shock survivability is 10 g for a 10 ms shock with 0.1 ms rise time. The Brownian noise floor of the system has also been studied and the squeeze film damping effects on the system has been shown.
{"title":"Behavioral Modeling of a CMOS Compatible High Precision MEMS Based Electron Tunneling Accelerometer","authors":"T. K. Bhattacharyya, A. Ghosh","doi":"10.1109/VLSI.2008.60","DOIUrl":"https://doi.org/10.1109/VLSI.2008.60","url":null,"abstract":"The paper presents a comprehensive behavioral model of a high precision tunneling accelerometer. Design and optimization of the silicon based tunneling has also been reported in this work. The accelerometer is CMOS compatible and has actuation voltage within CMOS bias levels. The proposed structure uniquely combines the electron tunneling based sensing and capacitive actuation. A feedback controller is designed to measure the acceleration under constant gap mode of operation. The full dynamic range of operation is 1 mug to 200 mug with a resolution in the order of nano-g. The cross-axis sensitivity is less than 1% and the shock survivability is 10 g for a 10 ms shock with 0.1 ms rise time. The Brownian noise floor of the system has also been studied and the squeeze film damping effects on the system has been shown.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126070739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Lambrechts, P. Raghavan, M. Jayapala, F. Catthoor, D. Verkest
Modern portable embedded devices provide continuously more features and need processors that are of increasingly higher performance in order to sustain very demanding multimedia and wireless applications. Larger amounts of flexibility need to be built in and the same processor needs to be used for a wide range of evolving products, while very strict energy constraints need to be met in order to provide a long battery life. Coarse Grained Reconflgurable Architectures (CGRAs) provide a mix of flexible computational resources and large amounts of programmable interconnect. However, this programmable interconnect is on average consuming about 50% of the core's energy consumpion for state of the art interconnection topologies. In this work we present an optimized interconnection implementation that selectively activates only the connections that are being used in a certain cycle, in order to reduce the energy spent in the interconnect. Using this optimization, we show the effect on the energy and performance trade-off for the ADRES CGRA. The energy cost of the optimized interconnect topologies that provide a higher performance can be reduced significantly, reducing the total energy consumption of the core with up to 40%. This will enable designers to develop more efficient architectures, tuned to a targeted application domain.
{"title":"Energy-Aware Interconnect Optimization for a Coarse Grained Reconfigurable Processor","authors":"A. Lambrechts, P. Raghavan, M. Jayapala, F. Catthoor, D. Verkest","doi":"10.1109/VLSI.2008.25","DOIUrl":"https://doi.org/10.1109/VLSI.2008.25","url":null,"abstract":"Modern portable embedded devices provide continuously more features and need processors that are of increasingly higher performance in order to sustain very demanding multimedia and wireless applications. Larger amounts of flexibility need to be built in and the same processor needs to be used for a wide range of evolving products, while very strict energy constraints need to be met in order to provide a long battery life. Coarse Grained Reconflgurable Architectures (CGRAs) provide a mix of flexible computational resources and large amounts of programmable interconnect. However, this programmable interconnect is on average consuming about 50% of the core's energy consumpion for state of the art interconnection topologies. In this work we present an optimized interconnection implementation that selectively activates only the connections that are being used in a certain cycle, in order to reduce the energy spent in the interconnect. Using this optimization, we show the effect on the energy and performance trade-off for the ADRES CGRA. The energy cost of the optimized interconnect topologies that provide a higher performance can be reduced significantly, reducing the total energy consumption of the core with up to 40%. This will enable designers to develop more efficient architectures, tuned to a targeted application domain.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127176101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Testing non volatile memories for tunnel oxide defects is one of the most important aspects to guarantee cell reliability. Defective tunnel oxide layer in core memory cells can result in various disturb faults. In this paper, we study various defects in the insulating layers of a IT flash cell and analyze their impact on cell performance. Further, we present a test methodology and test algorithms that enable the detection of tunnel oxide defects in an efficient manner.
{"title":"Testing Flash Memories for Tunnel Oxide Defects","authors":"M. G. Mohammad, K. Saluja","doi":"10.1109/VLSI.2008.41","DOIUrl":"https://doi.org/10.1109/VLSI.2008.41","url":null,"abstract":"Testing non volatile memories for tunnel oxide defects is one of the most important aspects to guarantee cell reliability. Defective tunnel oxide layer in core memory cells can result in various disturb faults. In this paper, we study various defects in the insulating layers of a IT flash cell and analyze their impact on cell performance. Further, we present a test methodology and test algorithms that enable the detection of tunnel oxide defects in an efficient manner.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123174289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a throughput efficient FPGA implementation of the 'Set Partitioning in Hierarchical Trees' (SPIHT) algorithm for compression of images. The SPIHT uses inherent redundancy among wavelet coefficients and suited for both gray and color images. The SPIHT algorithm uses dynamic data structures which hinders hardware realization. In our FPGA implementation we have modified basic SPIHT in two ways, one by using static (fixed) mappings which represent significant information and the other by interchanging the sorting and refinement passes. A hardware realization is done in a Xilinx XC2S30 device. Significant compression ratio and throughput is obtained for a sample image of size 128 times 128 pixels.
{"title":"Throughput Efficient Parallel Implementation of SPIHT Algorithm","authors":"A. Nandi, R. Banakar","doi":"10.1109/VLSI.2008.48","DOIUrl":"https://doi.org/10.1109/VLSI.2008.48","url":null,"abstract":"We present a throughput efficient FPGA implementation of the 'Set Partitioning in Hierarchical Trees' (SPIHT) algorithm for compression of images. The SPIHT uses inherent redundancy among wavelet coefficients and suited for both gray and color images. The SPIHT algorithm uses dynamic data structures which hinders hardware realization. In our FPGA implementation we have modified basic SPIHT in two ways, one by using static (fixed) mappings which represent significant information and the other by interchanging the sorting and refinement passes. A hardware realization is done in a Xilinx XC2S30 device. Significant compression ratio and throughput is obtained for a sample image of size 128 times 128 pixels.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"44 33","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120887916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Ramanarayanan, S. Mathew, V. Erraguntla, R. Krishnamurthy, S. Gueron
This paper describes a unified popcount/bitscanforward/bitscanreverse datapath circuit designed for 2.1GHz operation with total power consumption of 6.5 mW, targeted for 65 nm 64-bit microprocessor execution cores. The unified datapath uses a hybrid 3:2 compressor-based Wallace tree to count the number of '1's in the 64-bit input, along with a novel encoding scheme that enables reuse of the same tree to identify the bit-location of the 1st set bit when scanning the input in the forward and reverse directions. This circuit thus combines the functions of 3 separate units, enabling 26% reduction in total energy and 20% lower area, while achieving single-cycle latency & throughput.
{"title":"A 2.1GHz 6.5mW 64-bit Unified PopCount/BitScan Datapath Unit for 65nm High-Performance Microprocessor Execution Cores","authors":"R. Ramanarayanan, S. Mathew, V. Erraguntla, R. Krishnamurthy, S. Gueron","doi":"10.1109/VLSI.2008.75","DOIUrl":"https://doi.org/10.1109/VLSI.2008.75","url":null,"abstract":"This paper describes a unified popcount/bitscanforward/bitscanreverse datapath circuit designed for 2.1GHz operation with total power consumption of 6.5 mW, targeted for 65 nm 64-bit microprocessor execution cores. The unified datapath uses a hybrid 3:2 compressor-based Wallace tree to count the number of '1's in the 64-bit input, along with a novel encoding scheme that enables reuse of the same tree to identify the bit-location of the 1st set bit when scanning the input in the forward and reverse directions. This circuit thus combines the functions of 3 separate units, enabling 26% reduction in total energy and 20% lower area, while achieving single-cycle latency & throughput.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127091288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Design-for-testability (DFT) approaches that allow a synchronous sequential circuit to enter states that it cannot enter during functional operation improve the fault coverage achievable for the circuit. However, nonfunctional operation during test application may result in switching activity that is significantly higher than under functional operation. This may lead to unnecessary yield loss due to supply voltage droops that slow the circuit but will not occur during functional operation. To address this issue we describe a DFT approach and a test generation procedure that improve the fault coverage by slowing down the state transitions of certain state variables relative to others. Unlike approaches that are based on holding values of state variables stable for unlimited numbers of clock cycles, the proposed approach resumes functional operation every limited number of clock cycles. This is shown to result in maximum switching activity that is in most cases lower than that obtained under the application of a functional test sequence, and never needs to exceed it.
{"title":"Design-for-Testability for Synchronous Sequential Circuits that Maintains Functional Switching Activity","authors":"I. Pomeranz, S. Reddy","doi":"10.1109/VLSI.2008.17","DOIUrl":"https://doi.org/10.1109/VLSI.2008.17","url":null,"abstract":"Design-for-testability (DFT) approaches that allow a synchronous sequential circuit to enter states that it cannot enter during functional operation improve the fault coverage achievable for the circuit. However, nonfunctional operation during test application may result in switching activity that is significantly higher than under functional operation. This may lead to unnecessary yield loss due to supply voltage droops that slow the circuit but will not occur during functional operation. To address this issue we describe a DFT approach and a test generation procedure that improve the fault coverage by slowing down the state transitions of certain state variables relative to others. Unlike approaches that are based on holding values of state variables stable for unlimited numbers of clock cycles, the proposed approach resumes functional operation every limited number of clock cycles. This is shown to result in maximum switching activity that is in most cases lower than that obtained under the application of a functional test sequence, and never needs to exceed it.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131039650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Sklyarov, I. Skliarova, B. Pimentel, M. Almeida
The paper describes novel multimedia tools and architectures for hardware/software co-simulation of reconfigurable systems. The main contributions are provided in the following three areas: 1) multimedia tools making it possible to manage animated graphical objects for virtual simulation of real world physical objects in the scope of reconfigurable system design; 2) a remotely accessible prototyping system, which is very helpful for both solving the problems of hardware design and supporting multimedia systems which can be used in vast varieties of practical applications, the most important of which are engineering training and education; 3) design methodology based on physical circuits and virtual objects. A number of illustrative examples demonstrating capabilities of the proposed approach are presented and discussed.
{"title":"Multimedia Tools and Architectures for Hardware/Software Co-Simulation of Reconfigurable Systems","authors":"V. Sklyarov, I. Skliarova, B. Pimentel, M. Almeida","doi":"10.1109/VLSI.2008.70","DOIUrl":"https://doi.org/10.1109/VLSI.2008.70","url":null,"abstract":"The paper describes novel multimedia tools and architectures for hardware/software co-simulation of reconfigurable systems. The main contributions are provided in the following three areas: 1) multimedia tools making it possible to manage animated graphical objects for virtual simulation of real world physical objects in the scope of reconfigurable system design; 2) a remotely accessible prototyping system, which is very helpful for both solving the problems of hardware design and supporting multimedia systems which can be used in vast varieties of practical applications, the most important of which are engineering training and education; 3) design methodology based on physical circuits and virtual objects. A number of illustrative examples demonstrating capabilities of the proposed approach are presented and discussed.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131222141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}