{"title":"Session details: Special Session: Emerging Computing Paradigm for Error-Tolerant Applications: Approximate Computing and Stochastic Computing","authors":"Yiran Chen","doi":"10.1145/3254023","DOIUrl":"https://doi.org/10.1145/3254023","url":null,"abstract":"","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121758236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Donato, R. I. Bahar, W. Patterson, A. Zaslavsky
Noise analysis in nonlinear logic circuits requires models that take into account time-varying biasing conditions. When considering thermal noise, which moves the circuit away from its equilibrium point, a correct modeling approach has to go beyond the additive white Gaussian noise (AWGN) used in classical noise analysis. Even when accurate models are available, running standard Monte-Carlo simulations that will expose rare soft errors may still be computationally prohibitive. Probabilistic methods are often preferred for estimating the failure rate. However, these approaches may not provide any insight about the dynamic response to noise events. In this paper, we target both problems in the sub-threshold logic application domain. We first provide a time-domain model for fundamental, technology-independent thermal noise in sub-threshold circuits. Then, we use this model to generate noise input files for SPICE transient analysis. The effectiveness of the approach is demonstrated using 7nm FinFET predictive technology models (PTM) for an inverter and a NAND gate.
{"title":"A Simulation Framework for Analyzing Transient Effects Due to Thermal Noise in Sub-Threshold Circuits","authors":"M. Donato, R. I. Bahar, W. Patterson, A. Zaslavsky","doi":"10.1145/2742060.2742066","DOIUrl":"https://doi.org/10.1145/2742060.2742066","url":null,"abstract":"Noise analysis in nonlinear logic circuits requires models that take into account time-varying biasing conditions. When considering thermal noise, which moves the circuit away from its equilibrium point, a correct modeling approach has to go beyond the additive white Gaussian noise (AWGN) used in classical noise analysis. Even when accurate models are available, running standard Monte-Carlo simulations that will expose rare soft errors may still be computationally prohibitive. Probabilistic methods are often preferred for estimating the failure rate. However, these approaches may not provide any insight about the dynamic response to noise events. In this paper, we target both problems in the sub-threshold logic application domain. We first provide a time-domain model for fundamental, technology-independent thermal noise in sub-threshold circuits. Then, we use this model to generate noise input files for SPICE transient analysis. The effectiveness of the approach is demonstrated using 7nm FinFET predictive technology models (PTM) for an inverter and a NAND gate.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121903520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In deeply scaled CMOS technologies, device aging causes cores performance parameters to degrade over time. While accurate models to efficiently assess these degradation exist for devices and circuits, no reliable model for processor cores has gained strong acceptance in the literature. In this work, we propose a methodology for deriving an NBTI aging model for embedded cores. Based on an accurate characterization on the netlist of the core, we were able to (1) prove the independence of the aging on the workload (i.e., executed instructions), and (2) calculate an equivalent average constant aging factor that justifies the use of the baseline model template. We derived and assessed the proposed model by using a RISC-like processor core implemented in a 45nm process technology as a reference architecture, achieving a maximum error of 2.2% against simulated data on the core netlist.
{"title":"Characterizing the Activity Factor in NBTI Aging Models for Embedded Cores","authors":"Yukai Chen, A. Calimera, E. Macii, M. Poncino","doi":"10.1145/2742060.2742111","DOIUrl":"https://doi.org/10.1145/2742060.2742111","url":null,"abstract":"In deeply scaled CMOS technologies, device aging causes cores performance parameters to degrade over time. While accurate models to efficiently assess these degradation exist for devices and circuits, no reliable model for processor cores has gained strong acceptance in the literature. In this work, we propose a methodology for deriving an NBTI aging model for embedded cores. Based on an accurate characterization on the netlist of the core, we were able to (1) prove the independence of the aging on the workload (i.e., executed instructions), and (2) calculate an equivalent average constant aging factor that justifies the use of the baseline model template. We derived and assessed the proposed model by using a RISC-like processor core implemented in a 45nm process technology as a reference architecture, achieving a maximum error of 2.2% against simulated data on the core netlist.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117009334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mallika Rathore, Weicheng Liu, E. Salman, Can Sitik, B. Taskin
Low swing clocking is a well known technique to reduce dynamic power consumption of a clock network. A novel static D flip-flop topology is proposed that can reliably operate with a low swing clock signal (down to 50% of the VDD) despite the full swing data and output signals. The proposed topology enables low swing signals within the entire clock network, thereby maximizing the power saved by low swing operation. The proposed flip-flop is compared with existing low swing flip-flops using a 45 nm technology node at a clock frequency of 1.5 GHz. The results demonstrate an average reduction of 38.1% and 44.4% in, respectively, power consumption and power-delay product. The sensitivity of each circuit to clock swing is investigated. The robustness of the proposed topology is also demonstrated by ensuring reliable operation at various process, voltage, and temperature corners.
{"title":"A Novel Static D-Flip-Flop Topology for Low Swing Clocking","authors":"Mallika Rathore, Weicheng Liu, E. Salman, Can Sitik, B. Taskin","doi":"10.1145/2742060.2742095","DOIUrl":"https://doi.org/10.1145/2742060.2742095","url":null,"abstract":"Low swing clocking is a well known technique to reduce dynamic power consumption of a clock network. A novel static D flip-flop topology is proposed that can reliably operate with a low swing clock signal (down to 50% of the VDD) despite the full swing data and output signals. The proposed topology enables low swing signals within the entire clock network, thereby maximizing the power saved by low swing operation. The proposed flip-flop is compared with existing low swing flip-flops using a 45 nm technology node at a clock frequency of 1.5 GHz. The results demonstrate an average reduction of 38.1% and 44.4% in, respectively, power consumption and power-delay product. The sensitivity of each circuit to clock swing is investigated. The robustness of the proposed topology is also demonstrated by ensuring reliable operation at various process, voltage, and temperature corners.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123260113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Keynote 1","authors":"Hai Helen Li","doi":"10.1145/3254005","DOIUrl":"https://doi.org/10.1145/3254005","url":null,"abstract":"","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128619793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phase-locking in a charge pump (CP) phase lock loop (PLL) is said to be inevitable if all possible states of the CP PLL eventually converge to the equilibrium, where the input and output phases are in lock and the node voltages vanish. We verify this property for a CP PLL using deductive verification. We split this complex property into two sub-properties defined in two disjoint subsets of the state space. We deductively verify the first property using multiple Lyapunov certificates for hybrid systems, and use the Escape certificate for the verification of the second property. Construction of deductive certificates involves positivity check of polynomial inequalities (which is an NP-Hard problem), so we use the sound but incomplete Sum of Squares (SOS) relaxation algorithm to provide a numerical solution.
{"title":"Inevitability of Phase-locking in a Charge Pump Phase Lock Loop using Deductive Verification","authors":"H. Asad, Kevin D. Jones","doi":"10.1145/2742060.2742072","DOIUrl":"https://doi.org/10.1145/2742060.2742072","url":null,"abstract":"Phase-locking in a charge pump (CP) phase lock loop (PLL) is said to be inevitable if all possible states of the CP PLL eventually converge to the equilibrium, where the input and output phases are in lock and the node voltages vanish. We verify this property for a CP PLL using deductive verification. We split this complex property into two sub-properties defined in two disjoint subsets of the state space. We deductively verify the first property using multiple Lyapunov certificates for hybrid systems, and use the Escape certificate for the verification of the second property. Construction of deductive certificates involves positivity check of polynomial inequalities (which is an NP-Hard problem), so we use the sound but incomplete Sum of Squares (SOS) relaxation algorithm to provide a numerical solution.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129882911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Danilo, P. Coussy, L. Conde-Canencia, Vincent Gripon, W. Gross
Associative memories are an alternative to classical indexed memories that are capable of retrieving a message previously stored when an incomplete version of this message is presented. Recently a new model of associative memory based on binary neurons and binary links has been proposed. This model named Clustered Neural Network (CNN) offers large storage diversity (number of messages stored) and fast message retrieval when implemented in hardware. The performance of this model drops when the stored message distribution is non-uniform. In this paper, we enhance the CNN model to support non-uniform message distribution by adding features of Restricted Boltzmann Machines. In addition, we present a fully parallel hardware design of the model. The proposed implementation multiplies the performance (diversity) of Clustered Neural Networks by a factor of 3 with an increase of complexity of 40%.
{"title":"Restricted Clustered Neural Network for Storing Real Data","authors":"R. Danilo, P. Coussy, L. Conde-Canencia, Vincent Gripon, W. Gross","doi":"10.1145/2742060.2743767","DOIUrl":"https://doi.org/10.1145/2742060.2743767","url":null,"abstract":"Associative memories are an alternative to classical indexed memories that are capable of retrieving a message previously stored when an incomplete version of this message is presented. Recently a new model of associative memory based on binary neurons and binary links has been proposed. This model named Clustered Neural Network (CNN) offers large storage diversity (number of messages stored) and fast message retrieval when implemented in hardware. The performance of this model drops when the stored message distribution is non-uniform. In this paper, we enhance the CNN model to support non-uniform message distribution by adding features of Restricted Boltzmann Machines. In addition, we present a fully parallel hardware design of the model. The proposed implementation multiplies the performance (diversity) of Clustered Neural Networks by a factor of 3 with an increase of complexity of 40%.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125742529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since caches are commonly used in embedded systems, which typically have stringent design constraints imposed by physical size, battery capacity, real-time deadlines, etc., much research focuses on cache optimizations, such as improved performance and/or reduced energy consumption. Cache locking is a popular cache optimization that loads and retains/locks selected memory contents from an executing application into the cache to increase the cache's predictability. Previous work has shown that cache locking also has the potential to improve cache performance and energy consumption. In this paper, we introduce phase-based cache locking, which leverages an application's varying runtime characteristics to dynamically select the locked memory contents to optimize cache performance and energy consumption. Experimental results show that our phase-based cache locking methodology can improve the data cache's miss rates and energy consumption by an average of 24% and 20%, respectively.
{"title":"Phase-based Cache Locking for Embedded Systems","authors":"Tosiron Adegbija, A. Gordon-Ross","doi":"10.1145/2742060.2742076","DOIUrl":"https://doi.org/10.1145/2742060.2742076","url":null,"abstract":"Since caches are commonly used in embedded systems, which typically have stringent design constraints imposed by physical size, battery capacity, real-time deadlines, etc., much research focuses on cache optimizations, such as improved performance and/or reduced energy consumption. Cache locking is a popular cache optimization that loads and retains/locks selected memory contents from an executing application into the cache to increase the cache's predictability. Previous work has shown that cache locking also has the potential to improve cache performance and energy consumption. In this paper, we introduce phase-based cache locking, which leverages an application's varying runtime characteristics to dynamically select the locked memory contents to optimize cache performance and energy consumption. Experimental results show that our phase-based cache locking methodology can improve the data cache's miss rates and energy consumption by an average of 24% and 20%, respectively.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128148080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, post-silicon debugging has become a significantly difficult exercise due to the increase in the size of the electrical state of the IC being debugged, coupled with the limited fraction of this state that is visible to the debug engineer. As the number of transistors increases, the number of possible electrical states increases exponentially, while the amount of information that can be accessed grows at a much slower rate. This difficulty is compounded by the outsourcing of IP blocks, which creates more black boxes that the debug engineer must work around. As a result, when an IC fails tracking down the cause of the failure becomes a monumental task, and debugging becomes more art than science. One source of errors in a test circuit is the fluctuation of the power supplies during a single clock cycle. These supply variations can increase or decrease the speed of a circuit and lead to errors such as hold time violations and setup time violations. This paper presents a circuit that samples precisely the power supply multiple times in a clock cycle, allowing the debug engineer to quantify the variations in the supply over a clock cycle. With this information, a better understanding of the electrical state of the test chip is made possible. The circuit presented in this paper can sample the supply voltage with a quantization of 0.291mV, and the output is linear with an R2 value of 0.9987.
{"title":"An Efficient Approach to Sample On-Chip Power Supplies","authors":"Luke Murray, S. Khatri","doi":"10.1145/2742060.2742121","DOIUrl":"https://doi.org/10.1145/2742060.2742121","url":null,"abstract":"In recent years, post-silicon debugging has become a significantly difficult exercise due to the increase in the size of the electrical state of the IC being debugged, coupled with the limited fraction of this state that is visible to the debug engineer. As the number of transistors increases, the number of possible electrical states increases exponentially, while the amount of information that can be accessed grows at a much slower rate. This difficulty is compounded by the outsourcing of IP blocks, which creates more black boxes that the debug engineer must work around. As a result, when an IC fails tracking down the cause of the failure becomes a monumental task, and debugging becomes more art than science. One source of errors in a test circuit is the fluctuation of the power supplies during a single clock cycle. These supply variations can increase or decrease the speed of a circuit and lead to errors such as hold time violations and setup time violations. This paper presents a circuit that samples precisely the power supply multiple times in a clock cycle, allowing the debug engineer to quantify the variations in the supply over a clock cycle. With this information, a better understanding of the electrical state of the test chip is made possible. The circuit presented in this paper can sample the supply voltage with a quantization of 0.291mV, and the output is linear with an R2 value of 0.9987.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133815419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stochastic computation is an unconventional computational paradigm that uses ordinary digital circuits to operate on stochastic bit streams, where signal value is encoded as the probability of ones in a stream. It is highly tolerant of soft errors and enables complex arithmetic operations to be implemented with simple circuitry. Prior research has proposed a method to synthesize stochastic computing circuits to implement arbitrary arithmetic functions by approximating them via Bernstein polynomials. However, for some functions, the method cannot find Bernstein polynomials that approximate them closely enough, thus causing a large computation error. In this work, we explore linear transformation on a target function to reduce the approximation error. We propose a method to find the optimal linear transformation parameters to minimize the overall error of the stochastic implementation. Experimental results demonstrated the effectiveness of our method in reducing the computation error and the circuit area.
{"title":"Minimizing Error of Stochastic Computation through Linear Transformation","authors":"Yi Wu, Chen Wang, Weikang Qian","doi":"10.1145/2742060.2743761","DOIUrl":"https://doi.org/10.1145/2742060.2743761","url":null,"abstract":"Stochastic computation is an unconventional computational paradigm that uses ordinary digital circuits to operate on stochastic bit streams, where signal value is encoded as the probability of ones in a stream. It is highly tolerant of soft errors and enables complex arithmetic operations to be implemented with simple circuitry. Prior research has proposed a method to synthesize stochastic computing circuits to implement arbitrary arithmetic functions by approximating them via Bernstein polynomials. However, for some functions, the method cannot find Bernstein polynomials that approximate them closely enough, thus causing a large computation error. In this work, we explore linear transformation on a target function to reduce the approximation error. We propose a method to find the optimal linear transformation parameters to minimize the overall error of the stochastic implementation. Experimental results demonstrated the effectiveness of our method in reducing the computation error and the circuit area.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134456457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}