End-of-the-roadmap nanoscale CMOS is expected to suffer from significant defectivity due to manufacturing defects, random process variations, and wear-out during normal operational. To ensure acceptable yield and reliable operation of the circuit during its life-time, future circuits must be equipped with significant defect-tolerance capabilities. Traditional defect-tolerance approaches are too expensive to be applied to general purpose circuits. In this paper, we propose a defect-tolerant CMOS logic gate architecture that exploits the inherent functional redundancy in static CMOS. This is accomplished by reconfiguring the CMOS logic gate to a pseudo-NMOS-like gate in the presence of a defect. The resulting defect-tolerant logic architecture incurs only a modest area overhead. The proposed gate design can tolerate defects in either the pull-up or pull-down network of the gate. The architecture can tolerate multiple defects across the logic gates of a CMOS logic circuit. The effectiveness of the proposed defect tolerance technique and its impact on circuit delay and power is studied. It is shown that the technique imposes little delay overhead (less than 6%) but incurs power dissipation overhead (less than 20%) in the presence of defects.
{"title":"Reconfiguring CMOS as Pseudo N/PMOS for Defect Tolerance in Nano-Scale CMOS","authors":"M. Ashouei, A. Singh, A. Chatterjee","doi":"10.1109/VLSI.2008.104","DOIUrl":"https://doi.org/10.1109/VLSI.2008.104","url":null,"abstract":"End-of-the-roadmap nanoscale CMOS is expected to suffer from significant defectivity due to manufacturing defects, random process variations, and wear-out during normal operational. To ensure acceptable yield and reliable operation of the circuit during its life-time, future circuits must be equipped with significant defect-tolerance capabilities. Traditional defect-tolerance approaches are too expensive to be applied to general purpose circuits. In this paper, we propose a defect-tolerant CMOS logic gate architecture that exploits the inherent functional redundancy in static CMOS. This is accomplished by reconfiguring the CMOS logic gate to a pseudo-NMOS-like gate in the presence of a defect. The resulting defect-tolerant logic architecture incurs only a modest area overhead. The proposed gate design can tolerate defects in either the pull-up or pull-down network of the gate. The architecture can tolerate multiple defects across the logic gates of a CMOS logic circuit. The effectiveness of the proposed defect tolerance technique and its impact on circuit delay and power is studied. It is shown that the technique imposes little delay overhead (less than 6%) but incurs power dissipation overhead (less than 20%) in the presence of defects.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126739860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a way to improve the yield of memory products by selecting the appropriate test strategy for memory Built- in Self-Test (BIST). We argue that by testing the memory through a sequence of test algorithms which differ in their fault coverage, it is possible to bin the memory into multiple yield bins and increase the yield and product revenue. Further, the test strategy must take into consideration the usage model of the memory. Thus, a number of video and audio buffers are used in sequential access mode, but are overtested using conventional memory test algorithms which model a large number of defects which do not impact the operation of the buffers. We propose a binning strategy where memory test algorithms are applied in different order of strictness such that bins have a specific defect / fault grade. Depending on the applications some of these bins need not be discarded but sold at a lower price as the functionality would never catch the fault due to its usage of memory. We introduce the notion of a test map for the on-chip memories in a SoC and provide results of yield simulation on two specific test strategies called "Most Strict First" and "Least Strict First". Our simulations indicate that significant improvements in yield are possible through the adoption of the proposed technique. We show that the BIST controller area and run-time overheads also reduce when information about the usage model of the memory, such as sequential access, is exploited.
{"title":"Memory Yield Improvement through Multiple Test Sequences and Application-Aware Fault Models","authors":"A. Kokrady, C. Ravikumar, N. Chandrachoodan","doi":"10.1109/VLSI.2008.115","DOIUrl":"https://doi.org/10.1109/VLSI.2008.115","url":null,"abstract":"In this paper, we propose a way to improve the yield of memory products by selecting the appropriate test strategy for memory Built- in Self-Test (BIST). We argue that by testing the memory through a sequence of test algorithms which differ in their fault coverage, it is possible to bin the memory into multiple yield bins and increase the yield and product revenue. Further, the test strategy must take into consideration the usage model of the memory. Thus, a number of video and audio buffers are used in sequential access mode, but are overtested using conventional memory test algorithms which model a large number of defects which do not impact the operation of the buffers. We propose a binning strategy where memory test algorithms are applied in different order of strictness such that bins have a specific defect / fault grade. Depending on the applications some of these bins need not be discarded but sold at a lower price as the functionality would never catch the fault due to its usage of memory. We introduce the notion of a test map for the on-chip memories in a SoC and provide results of yield simulation on two specific test strategies called \"Most Strict First\" and \"Least Strict First\". Our simulations indicate that significant improvements in yield are possible through the adoption of the proposed technique. We show that the BIST controller area and run-time overheads also reduce when information about the usage model of the memory, such as sequential access, is exploited.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127187398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A model to create a simulation and a synthesis framework for design of gyroscopes is proposed. The main motivation is to have a framework for developing gyroscope models in the form of soft intellectual properties (IPs) for their subsequent integration into mainstream VLSI systems. Synthesis targetting different performance classes of gyros is based on a simple table look-up. The next level of model refinement involving optimization of the different physical aspects of the gyro such as its shape is based on statistical design of experiments (DoE). Both FEM and Simulink based models have been used to build a custom DoE framework to estimate the parameters related to a desired gyro structure. A simple gyroscope structure is modeled and analysed with both FEM and Simulink based models. It is shown that DoE based framework can capture the parameters of a gyroscope structure, accurately and that it can be easily integrated with system level synthesis tools.
{"title":"GyroCompiler: A Soft IP Model Synthesis and Analysis Framework for Design of MEMS Based Gyroscopes","authors":"S. Jairam, N. Bhat","doi":"10.1109/VLSI.2008.10","DOIUrl":"https://doi.org/10.1109/VLSI.2008.10","url":null,"abstract":"A model to create a simulation and a synthesis framework for design of gyroscopes is proposed. The main motivation is to have a framework for developing gyroscope models in the form of soft intellectual properties (IPs) for their subsequent integration into mainstream VLSI systems. Synthesis targetting different performance classes of gyros is based on a simple table look-up. The next level of model refinement involving optimization of the different physical aspects of the gyro such as its shape is based on statistical design of experiments (DoE). Both FEM and Simulink based models have been used to build a custom DoE framework to estimate the parameters related to a desired gyro structure. A simple gyroscope structure is modeled and analysed with both FEM and Simulink based models. It is shown that DoE based framework can capture the parameters of a gyroscope structure, accurately and that it can be easily integrated with system level synthesis tools.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126523126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A novel compact four quadrant CMOS transconductance analog multiplier with wide dynamic swing and wide gain bandwidth product using source- degeneration V-I converters is proposed. The design consists of two stages. First stage is a voltage adder and utilizes two V-I converters with diode connected load and source-degeneration resistor which can provide high bandwidth. The second stage consists of two cross connected differential pairs with source- degeneration resistor which act as current steering elements performing V to I conversion with wide dynamic swing and continuous adjustable gain. Unlike conventional multipliers, in the proposed scheme all the significant intermediate terms generated are linear reducing the non-linear term cancellation, making the circuit power efficient. SPICE simulation results in 0.5 mum CMOS AMI technology are presented which validate the proposed work.
提出了一种新型的紧凑的四象限CMOS跨导模拟乘法器,具有宽动态摆幅和宽增益带宽积。设计分为两个阶段。第一级是电压加法器,利用两个V-I转换器,二极管连接负载和源退化电阻,可以提供高带宽。第二级由两个带源退化电阻的交叉连接的差分对组成,它们作为电流转向元件进行V到I转换,具有宽动态摆动和连续可调增益。与传统乘法器不同的是,该方案中产生的所有重要中间项都是线性的,减少了非线性项的抵消,从而提高了电路的功率效率。给出了在0.5 μ m CMOS AMI技术上的SPICE仿真结果,验证了所提出的工作。
{"title":"Highly Linear Wide Dynamic Swing CMOS Transconductance Multiplier Using Source-Degeneration V-I Converters","authors":"S. Garimella","doi":"10.1109/VLSI.2008.91","DOIUrl":"https://doi.org/10.1109/VLSI.2008.91","url":null,"abstract":"A novel compact four quadrant CMOS transconductance analog multiplier with wide dynamic swing and wide gain bandwidth product using source- degeneration V-I converters is proposed. The design consists of two stages. First stage is a voltage adder and utilizes two V-I converters with diode connected load and source-degeneration resistor which can provide high bandwidth. The second stage consists of two cross connected differential pairs with source- degeneration resistor which act as current steering elements performing V to I conversion with wide dynamic swing and continuous adjustable gain. Unlike conventional multipliers, in the proposed scheme all the significant intermediate terms generated are linear reducing the non-linear term cancellation, making the circuit power efficient. SPICE simulation results in 0.5 mum CMOS AMI technology are presented which validate the proposed work.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133308602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Monjur Alam, Santosh K. Ghosh, D. R. Chowdhury, I. Sengupta
This paper presents a single chip encryp- tor/decryptor core implementation of Advanced Encryption Standard (AES-Rijndael) cryptosystem. The suggested architecture is capable of handling all possible combinations of standard bit lengths (128,192,256) of data and key. The fully rolled inner- pipelined architecture ensures lesser hardware complexity. The architecture does reutilize precomputed blocks, in the sense that the same hardware is shared during encryption and decryption as much as possible. The design has been implemented on Xilinx XCVe1000-8bg560 device. The performance of the architecture has been compared with existing results in the literature and has been found to be the most efficient (throughput/area) implementation of the AES algorithm.
{"title":"Single Chip Encryptor/Decryptor Core Implementation of AES Algorithm","authors":"Monjur Alam, Santosh K. Ghosh, D. R. Chowdhury, I. Sengupta","doi":"10.1109/VLSI.2008.82","DOIUrl":"https://doi.org/10.1109/VLSI.2008.82","url":null,"abstract":"This paper presents a single chip encryp- tor/decryptor core implementation of Advanced Encryption Standard (AES-Rijndael) cryptosystem. The suggested architecture is capable of handling all possible combinations of standard bit lengths (128,192,256) of data and key. The fully rolled inner- pipelined architecture ensures lesser hardware complexity. The architecture does reutilize precomputed blocks, in the sense that the same hardware is shared during encryption and decryption as much as possible. The design has been implemented on Xilinx XCVe1000-8bg560 device. The performance of the architecture has been compared with existing results in the literature and has been found to be the most efficient (throughput/area) implementation of the AES algorithm.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131325598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. P. Das, Janakiraman Viraraghavan, B. Amrutur, H. S. Jamadagni, N. Arvind
We investigate the feasibility of developing a comprehensive gate delay and slew models which incorporates output load, input edge slew, supply voltage, temperature, global process variations and local process variations all in the same model. We find that the standard polynomial models cannot handle such a large heterogeneous set of input variables. We instead use neural networks, which are well known for their ability to approximate any arbitrary continuous function. Our initial experiments with a small subset of standard cell gates of an industrial 65 nm library show promising results with error in mean less than 1%, error in standard deviation less than 3% and maximum error less than 11% as compared to SPICE for models covering 0.9- 1.1 V of supply, -40degC to 125degC of temperature, load, slew and global and local process parameters. Enhancing the conventional libraries to be voltage and temperature scalable with similar accuracy requires on an average 4x more SPICE characterization runs.
{"title":"Voltage and Temperature Scalable Gate Delay and Slew Models Including Intra-Gate Variations","authors":"B. P. Das, Janakiraman Viraraghavan, B. Amrutur, H. S. Jamadagni, N. Arvind","doi":"10.1109/VLSI.2008.92","DOIUrl":"https://doi.org/10.1109/VLSI.2008.92","url":null,"abstract":"We investigate the feasibility of developing a comprehensive gate delay and slew models which incorporates output load, input edge slew, supply voltage, temperature, global process variations and local process variations all in the same model. We find that the standard polynomial models cannot handle such a large heterogeneous set of input variables. We instead use neural networks, which are well known for their ability to approximate any arbitrary continuous function. Our initial experiments with a small subset of standard cell gates of an industrial 65 nm library show promising results with error in mean less than 1%, error in standard deviation less than 3% and maximum error less than 11% as compared to SPICE for models covering 0.9- 1.1 V of supply, -40degC to 125degC of temperature, load, slew and global and local process parameters. Enhancing the conventional libraries to be voltage and temperature scalable with similar accuracy requires on an average 4x more SPICE characterization runs.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134579677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Kannan, Aseem Gupta, Aviral Shrivastava, N. Dutt, F. Kurdahi
Simultaneous Multi-Threading (SMT) processors are becoming popular because they exploit both instruction-level and thread- level parallelism by issuing instructions from different threads in the same cycle. However, the issues of power and thermal management hinder SMT processors fabricated in nano-scale technologies. Power and thermal issues in SMT processors not only limit the achievable performance, but also have a direct impact on the cost and viability of these processors. While several performance simulation tools to explore the performance aspect of SMT processors early in their design phase exist, there is a lack of early power and performance evaluation tools for SMT processors. To this end, we have developed PTSMT: a tightly coupled power, performance and thermal exploration tool for SMT processors. In this paper, we demonstrate that PTSMT can automatically and effectively accomplish power, performance and thermal exploration of SMT processors at various levels of design hierarchy, at the application level, microarchitecture level, and physical level. Our experimental results show that: at the application level, number of contexts into which an application is divided could affect performance by 2.2times, energy by 52%, and peak temperature by 35degC; and at the microarchitecture level, context swapping during run time could reduce energy by 9% and improve performance by 8%. These observations indicate the size of the design space which can be explored using PTSMT.
{"title":"PTSMT: A Tool for Cross-Level Power, Performance, and Thermal Exploration of SMT Processors","authors":"D. Kannan, Aseem Gupta, Aviral Shrivastava, N. Dutt, F. Kurdahi","doi":"10.1109/VLSI.2008.84","DOIUrl":"https://doi.org/10.1109/VLSI.2008.84","url":null,"abstract":"Simultaneous Multi-Threading (SMT) processors are becoming popular because they exploit both instruction-level and thread- level parallelism by issuing instructions from different threads in the same cycle. However, the issues of power and thermal management hinder SMT processors fabricated in nano-scale technologies. Power and thermal issues in SMT processors not only limit the achievable performance, but also have a direct impact on the cost and viability of these processors. While several performance simulation tools to explore the performance aspect of SMT processors early in their design phase exist, there is a lack of early power and performance evaluation tools for SMT processors. To this end, we have developed PTSMT: a tightly coupled power, performance and thermal exploration tool for SMT processors. In this paper, we demonstrate that PTSMT can automatically and effectively accomplish power, performance and thermal exploration of SMT processors at various levels of design hierarchy, at the application level, microarchitecture level, and physical level. Our experimental results show that: at the application level, number of contexts into which an application is divided could affect performance by 2.2times, energy by 52%, and peak temperature by 35degC; and at the microarchitecture level, context swapping during run time could reduce energy by 9% and improve performance by 8%. These observations indicate the size of the design space which can be explored using PTSMT.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132011568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most contemporary SAT solvers use a conjunctive-normal-form (CNF) representation for logic functions due to the availability of efficient algorithms for this form, such as deduction through unit propagation and conflict driven learning using clause resolution. The use of CNF generally entails transformation to this form from other representations such as logic circuits (Tseitin, 1970). However, this transformation results in loss of information such as direction of signal flow and observability of signals at circuit outputs (Een, 2003)(Fu, 2005). This has prompted the development of various circuit-based solvers (Ganai et al., 2002), hybrid CNF+circuit-based solvers (Fu, 2005), as well as augmented CNF solvers (Een, 2003). Having the circuit available provides for additional capabilities at a cost, and thus requires careful analysis to determine the viability of each approach. This paper highlights one specific capability provided by a circuit: the ability to consider reconvergent paths in unit propagation. Unit propagation is the workhorse of contemporary SAT solvers, thus any improvement to this has significant practical potential. We first demonstrate that the Tseitin circuit-to-CNF transformation limits backward unit propagation and how additional implications can be derived when unit propagation across multiple paths is considered. Next, we show how these implications can be exploited by statically learning clauses during circuit pre-processing. The results of the practical implementation of these algorithms show that the static learning can provide significant speed-up on several classes of benchmark circuits. Finally, we discuss how this work compares with other circuit-based approaches, especially those arising from the automatic-test-pattern-generation (ATPG) community (e.g. recursive learning) and circuit and non- circuit based pre-processors.
{"title":"Exploiting Circuit Reconvergence through Static Learning in CNF SAT Solvers","authors":"Yinlei Yu, C. Brien, S. Malik","doi":"10.1109/VLSI.2008.90","DOIUrl":"https://doi.org/10.1109/VLSI.2008.90","url":null,"abstract":"Most contemporary SAT solvers use a conjunctive-normal-form (CNF) representation for logic functions due to the availability of efficient algorithms for this form, such as deduction through unit propagation and conflict driven learning using clause resolution. The use of CNF generally entails transformation to this form from other representations such as logic circuits (Tseitin, 1970). However, this transformation results in loss of information such as direction of signal flow and observability of signals at circuit outputs (Een, 2003)(Fu, 2005). This has prompted the development of various circuit-based solvers (Ganai et al., 2002), hybrid CNF+circuit-based solvers (Fu, 2005), as well as augmented CNF solvers (Een, 2003). Having the circuit available provides for additional capabilities at a cost, and thus requires careful analysis to determine the viability of each approach. This paper highlights one specific capability provided by a circuit: the ability to consider reconvergent paths in unit propagation. Unit propagation is the workhorse of contemporary SAT solvers, thus any improvement to this has significant practical potential. We first demonstrate that the Tseitin circuit-to-CNF transformation limits backward unit propagation and how additional implications can be derived when unit propagation across multiple paths is considered. Next, we show how these implications can be exploited by statically learning clauses during circuit pre-processing. The results of the practical implementation of these algorithms show that the static learning can provide significant speed-up on several classes of benchmark circuits. Finally, we discuss how this work compares with other circuit-based approaches, especially those arising from the automatic-test-pattern-generation (ATPG) community (e.g. recursive learning) and circuit and non- circuit based pre-processors.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124446430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a new spatial and temporal encoding approach for generic on-chip global buses with repeaters that enables higher performance while reducing peak energy and average energy. The proposed encoding approach exploits the benefits of temporal encoding circuit and spatial bus-invert coding techniques to simultaneously eliminate opposite transitions on adjacent wires and reduce the number of self-transitions and coupling-transitions. In the design process of applying encoding techniques for reduced bus delay and energy, we present a repeater insertion design methodology to determine the repeater size and inter-repeater bus length which minimizes the total bus energy dissipation while satisfying target delay and slew-rate constraints. This methodology can be employed to obtain optimal energy vs. delay trade-offs under slew-rate constraint for various encoding techniques.
{"title":"Delay and Energy Efficient Design of On-Chip Encoded Bus with Repeaters","authors":"Qingli Zhang, Jinxiang Wang, Y. Ye","doi":"10.1109/VLSI.2008.21","DOIUrl":"https://doi.org/10.1109/VLSI.2008.21","url":null,"abstract":"In this paper, we propose a new spatial and temporal encoding approach for generic on-chip global buses with repeaters that enables higher performance while reducing peak energy and average energy. The proposed encoding approach exploits the benefits of temporal encoding circuit and spatial bus-invert coding techniques to simultaneously eliminate opposite transitions on adjacent wires and reduce the number of self-transitions and coupling-transitions. In the design process of applying encoding techniques for reduced bus delay and energy, we present a repeater insertion design methodology to determine the repeater size and inter-repeater bus length which minimizes the total bus energy dissipation while satisfying target delay and slew-rate constraints. This methodology can be employed to obtain optimal energy vs. delay trade-offs under slew-rate constraint for various encoding techniques.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127335804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In chip multiprocessor (CMP) systems the various effects of technology scaling make the on chip components more susceptible to faults. Most of the earlier schemes that address fault tolerance issues in CMPs adopt redundant-thread techniques. These techniques are mostly effective, except that they fail to detect errors resulting from faults in hardware components on chip that commonly serve multiple cores. The cache coherence controller (CC) logic, which ensures consistency of data shared among multiple threads, is a vital common component in CMPs. A fault in CC logic of any of the processors may lead to errors in the data states in the entire CMP system. It is observed that up to 59.6% of the memory references cause a change in cache state for SPLASH-2 applications. We propose a novel scheme with a verification logic that can dynamically detect errors in the CC logic of multiple cores in a CMP system. The entire verification logic is designed with a negligible area of 0.1372 sq.mm using a TSMC 0.18 mu4-metal layer process technology. Even at highly aggressive fault injection rates, the logic achieves an average error coverage of more than 95% (and almost 100% for some applications)
{"title":"Dynamic Error Detection for Dependable Cache Coherency in Multicore Architectures","authors":"Hui Wang, Sandeep Baldawa, R. Sangireddy","doi":"10.1109/VLSI.2008.68","DOIUrl":"https://doi.org/10.1109/VLSI.2008.68","url":null,"abstract":"In chip multiprocessor (CMP) systems the various effects of technology scaling make the on chip components more susceptible to faults. Most of the earlier schemes that address fault tolerance issues in CMPs adopt redundant-thread techniques. These techniques are mostly effective, except that they fail to detect errors resulting from faults in hardware components on chip that commonly serve multiple cores. The cache coherence controller (CC) logic, which ensures consistency of data shared among multiple threads, is a vital common component in CMPs. A fault in CC logic of any of the processors may lead to errors in the data states in the entire CMP system. It is observed that up to 59.6% of the memory references cause a change in cache state for SPLASH-2 applications. We propose a novel scheme with a verification logic that can dynamically detect errors in the CC logic of multiple cores in a CMP system. The entire verification logic is designed with a negligible area of 0.1372 sq.mm using a TSMC 0.18 mu4-metal layer process technology. Even at highly aggressive fault injection rates, the logic achieves an average error coverage of more than 95% (and almost 100% for some applications)","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125281016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}