While the emergence of multi-core shared-memory machines offers a promising computing solution to ever complex chip design problems, new parallel CAD methodologies must be developed to gain the full benefit of these increasingly parallel computing systems. We present a parallel transient simulation methodology and its multi-threaded implementation for general analog and digital ICs. Our new approach, Waveform Pipelining (abbreviated as WavePipe), exploits coarsegrained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step by moving backwards in time, the latter performs predictive computing along the forward direction of the time axis. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardying convergence and accuracy. As a coarse-grained parallel approach, WavePipe not only requires low parallel programming effort, more importantly, it creates new avenues to fully utilize increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions.
{"title":"WavePipe: Parallel transient simulation of analog and digital circuits on multi-core shared-memory machines","authors":"Wei Dong, Peng Li, Xiaoji Ye","doi":"10.1145/1391469.1391531","DOIUrl":"https://doi.org/10.1145/1391469.1391531","url":null,"abstract":"While the emergence of multi-core shared-memory machines offers a promising computing solution to ever complex chip design problems, new parallel CAD methodologies must be developed to gain the full benefit of these increasingly parallel computing systems. We present a parallel transient simulation methodology and its multi-threaded implementation for general analog and digital ICs. Our new approach, Waveform Pipelining (abbreviated as WavePipe), exploits coarsegrained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step by moving backwards in time, the latter performs predictive computing along the forward direction of the time axis. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardying convergence and accuracy. As a coarse-grained parallel approach, WavePipe not only requires low parallel programming effort, more importantly, it creates new avenues to fully utilize increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130051603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter Milder, F. Franchetti, J. Hoe, Markus Püschel
We present a domain-specific approach to representing datapaths for hardware implementations of linear signal transform algorithms. We extend the tensor structure for describing linear transform algorithms, adding the ability to explicitly characterize two important dimensions of datapath architecture. This representation allows both algorithm and datapath to be specified within a single formula and gives the designer the ability to easily consider a wide space of possible datapaths at a high level of abstraction. We have constructed a formula manipulation system based on this representation and have written a compiler that can translate a formula into a hardware implementation. This enables an automatic "push button" compilation flow that produces a register transfer level hardware description from high-level datapath directives and an algorithm (written as a formula). In our experimental results, we demonstrate that this approach yields efficient designs over a large tradeoff space.
{"title":"Formal datapath representation and manipulation for implementing DSP transforms","authors":"Peter Milder, F. Franchetti, J. Hoe, Markus Püschel","doi":"10.1145/1391469.1391572","DOIUrl":"https://doi.org/10.1145/1391469.1391572","url":null,"abstract":"We present a domain-specific approach to representing datapaths for hardware implementations of linear signal transform algorithms. We extend the tensor structure for describing linear transform algorithms, adding the ability to explicitly characterize two important dimensions of datapath architecture. This representation allows both algorithm and datapath to be specified within a single formula and gives the designer the ability to easily consider a wide space of possible datapaths at a high level of abstraction. We have constructed a formula manipulation system based on this representation and have written a compiler that can translate a formula into a hardware implementation. This enables an automatic \"push button\" compilation flow that produces a register transfer level hardware description from high-level datapath directives and an algorithm (written as a formula). In our experimental results, we demonstrate that this approach yields efficient designs over a large tradeoff space.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127237893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we explore the implementation of fault simulation on a graphics processing unit (GPU). In particular, we implement a fault simulator that exploits thread level parallelism. Fault simulation is inherently parallelizable, and the large number of threads that can be computed in parallel on a GPU results in a natural fit for the problem of fault simulation. Our implementation fault- simulates all the gates in a particular level of a circuit, including good and faulty circuit simulations, for all patterns, in parallel. Since GPUs have an extremely large memory bandwidth, we implement each of our fault simulation threads (which execute in parallel with no data dependencies) using memory lookup. Fault injection is also done along with gate evaluation, with each thread using a different fault injection mask. All threads compute identical instructions, but on different data, as required by the Single Instruction Multiple Data (SIMD) programming semantics of the GPU. Our results, implemented on a NVIDIA GeForce GTX 8800 GPU card, indicate that our approach is on average 35 x faster when compared to a commercial fault simulation engine. With the recently announced Tesla GPU servers housing up to eight GPUs, our approach would be potentially 238 times faster. The correctness of the GPU based fault simulator has been verified by comparing its result with a CPU based fault simulator.
{"title":"Towards acceleration of fault simulation using Graphics Processing Units","authors":"Kanupriya Gulati, S. Khatri","doi":"10.1145/1391469.1391679","DOIUrl":"https://doi.org/10.1145/1391469.1391679","url":null,"abstract":"In this paper, we explore the implementation of fault simulation on a graphics processing unit (GPU). In particular, we implement a fault simulator that exploits thread level parallelism. Fault simulation is inherently parallelizable, and the large number of threads that can be computed in parallel on a GPU results in a natural fit for the problem of fault simulation. Our implementation fault- simulates all the gates in a particular level of a circuit, including good and faulty circuit simulations, for all patterns, in parallel. Since GPUs have an extremely large memory bandwidth, we implement each of our fault simulation threads (which execute in parallel with no data dependencies) using memory lookup. Fault injection is also done along with gate evaluation, with each thread using a different fault injection mask. All threads compute identical instructions, but on different data, as required by the Single Instruction Multiple Data (SIMD) programming semantics of the GPU. Our results, implemented on a NVIDIA GeForce GTX 8800 GPU card, indicate that our approach is on average 35 x faster when compared to a commercial fault simulation engine. With the recently announced Tesla GPU servers housing up to eight GPUs, our approach would be potentially 238 times faster. The correctness of the GPU based fault simulator has been verified by comparing its result with a CPU based fault simulator.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127305814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, processor power density has been increasing at an alarming rate resulting in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In this paper, we propose a Predictive Dynamic Thermal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. ABTM predicts future temperature based on the application specific thermal behavior, while CBTM estimates core temperature pattern by steady state temperature and workload. The accuracy of our prediction model is 1.6% error in average compared to the model in HybDTM, which has at most 5% error. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possible overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. PDTM enables the exploration of the tradeoff between throughput and fairness in temperature-constrained multicore systems. We implement PDTM on Intel's Quad-Core system with a specific device driver to access Digital Thermal Sensor (DTS). Compared against Linux standard scheduler, PDTM can decrease average temperature about 10%, and peak temperature by 5degC with negligible impact of performance under 1%, while running single SPEC2006 benchmark. Moreover, our PDTM outperforms HRTM in reducing average temperature by about 7% and peak temperature by about 3degC with performance overhead by 0.15% when running single benchmark.
{"title":"Predictive dynamic thermal management for multicore systems","authors":"Inchoon Yeo, C. Liu, Eun Jung Kim","doi":"10.1145/1391469.1391658","DOIUrl":"https://doi.org/10.1145/1391469.1391658","url":null,"abstract":"Recently, processor power density has been increasing at an alarming rate resulting in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In this paper, we propose a Predictive Dynamic Thermal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. ABTM predicts future temperature based on the application specific thermal behavior, while CBTM estimates core temperature pattern by steady state temperature and workload. The accuracy of our prediction model is 1.6% error in average compared to the model in HybDTM, which has at most 5% error. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possible overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. PDTM enables the exploration of the tradeoff between throughput and fairness in temperature-constrained multicore systems. We implement PDTM on Intel's Quad-Core system with a specific device driver to access Digital Thermal Sensor (DTS). Compared against Linux standard scheduler, PDTM can decrease average temperature about 10%, and peak temperature by 5degC with negligible impact of performance under 1%, while running single SPEC2006 benchmark. Moreover, our PDTM outperforms HRTM in reducing average temperature by about 7% and peak temperature by about 3degC with performance overhead by 0.15% when running single benchmark.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"290 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125758947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Boyuan Yan, Lingfei Zhou, S. Tan, Jie Chen, B. McGaughy
Model order reduction is an efficient technique to reduce the system complexity while producing a good approximation of the input-output behavior. However, the efficiency of reduction degrades as the number of ports increases, which remains a long-standing problem. The reason for the degradation is that existing approaches are based on a centralized framework, where each input-output pair is implicitly assumed to be equally interacted and the matrix-valued transfer function has to be assumed to be fully populated. In this paper, a decentralized model order reduction scheme is proposed, where a multi-input multi-output (MIMO) system is decoupled into a number of subsystems and each subsystem corresponds to one output and several dominant inputs. The decoupling process is based on the relative gain array (RGA), which measures the degree of interaction of each input-output pair. Our experimental results on a number of interconnect circuits show that most of the input- output interactions are usually insignificant, which can lead to extremely compact models even for systems with massive ports. The reduction scheme is very amenable for parallel computing as each decoupled subsystem can be reduced independently.
{"title":"DeMOR: Decentralized model order reduction of linear networks with massive ports","authors":"Boyuan Yan, Lingfei Zhou, S. Tan, Jie Chen, B. McGaughy","doi":"10.1145/1391469.1391577","DOIUrl":"https://doi.org/10.1145/1391469.1391577","url":null,"abstract":"Model order reduction is an efficient technique to reduce the system complexity while producing a good approximation of the input-output behavior. However, the efficiency of reduction degrades as the number of ports increases, which remains a long-standing problem. The reason for the degradation is that existing approaches are based on a centralized framework, where each input-output pair is implicitly assumed to be equally interacted and the matrix-valued transfer function has to be assumed to be fully populated. In this paper, a decentralized model order reduction scheme is proposed, where a multi-input multi-output (MIMO) system is decoupled into a number of subsystems and each subsystem corresponds to one output and several dominant inputs. The decoupling process is based on the relative gain array (RGA), which measures the degree of interaction of each input-output pair. Our experimental results on a number of interconnect circuits show that most of the input- output interactions are usually insignificant, which can lead to extremely compact models even for systems with massive ports. The reduction scheme is very amenable for parallel computing as each decoupled subsystem can be reduced independently.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127829205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spin-torque transfer magnetic RAM (STT MRAM) is a promising candidate for future universal memory. It combines the desirable attributes of current memory technologies such as SRAM, DRAM and flash memories. It also solves the key drawbacks of conventional MRAM technology: poor scalability and high write current. In this paper, we analyzed and modeled the failure probabilities of STT MRAM cells due to parameter variations. Based on the model, we developed an efficient simulation tool to capture the coupled electro/magnetic dynamics of spintronic device, leading to effective prediction for memory yield. We also developed a statistical optimization methodology to minimize the memory failure probability. The proposed methodology can be used at an early stage of the design cycle to enhance memory yield.
{"title":"Modeling of failure probability and statistical design of Spin-Torque Transfer Magnetic Random Access Memory (STT MRAM) array for yield enhancement","authors":"Jing Li, C. Augustine, S. Salahuddin, K. Roy","doi":"10.1145/1391469.1391540","DOIUrl":"https://doi.org/10.1145/1391469.1391540","url":null,"abstract":"Spin-torque transfer magnetic RAM (STT MRAM) is a promising candidate for future universal memory. It combines the desirable attributes of current memory technologies such as SRAM, DRAM and flash memories. It also solves the key drawbacks of conventional MRAM technology: poor scalability and high write current. In this paper, we analyzed and modeled the failure probabilities of STT MRAM cells due to parameter variations. Based on the model, we developed an efficient simulation tool to capture the coupled electro/magnetic dynamics of spintronic device, leading to effective prediction for memory yield. We also developed a statistical optimization methodology to minimize the memory failure probability. The proposed methodology can be used at an early stage of the design cycle to enhance memory yield.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127837210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sumana Mandal, Praveen Bhojwani, S. Mohanty, R. Mahapatra
Battery lifetime and safety are primary concerns in the design of battery operated systems. Lifetime management is typically supervised by the system via battery-aware task scheduling, while safety is managed on the battery side via features deployed into smart batteries. This research proposes IntellBatt; an intelligent battery cell array based novel design of a multi-cell battery that offloads battery lifetime management onto the battery. By deploying a battery cell array management unit, IntellBatt exploits various battery related characteristics such as charge recovery effect, to enhance battery lifetime and ensure safe operation. This is achieved by using real-time cell status information to selects cells to deliver the required load current, without the involvement of a complex task scheduler on the host system. The proposed design was evaluated via simulation using accurate cell models and real experimental traces from a portable DVD player. The use of a multi-cell design enhanced battery lifetime by 22% in terms of battery discharge time. Besides a standalone deployment, IntellBatt can also be combined with existing battery-aware task scheduling approaches to further enhance battery lifetime.
{"title":"IntellBatt: Towards smarter battery design","authors":"Sumana Mandal, Praveen Bhojwani, S. Mohanty, R. Mahapatra","doi":"10.1145/1391469.1391690","DOIUrl":"https://doi.org/10.1145/1391469.1391690","url":null,"abstract":"Battery lifetime and safety are primary concerns in the design of battery operated systems. Lifetime management is typically supervised by the system via battery-aware task scheduling, while safety is managed on the battery side via features deployed into smart batteries. This research proposes IntellBatt; an intelligent battery cell array based novel design of a multi-cell battery that offloads battery lifetime management onto the battery. By deploying a battery cell array management unit, IntellBatt exploits various battery related characteristics such as charge recovery effect, to enhance battery lifetime and ensure safe operation. This is achieved by using real-time cell status information to selects cells to deliver the required load current, without the involvement of a complex task scheduler on the host system. The proposed design was evaluated via simulation using accurate cell models and real experimental traces from a portable DVD player. The use of a multi-cell design enhanced battery lifetime by 22% in terms of battery discharge time. Besides a standalone deployment, IntellBatt can also be combined with existing battery-aware task scheduling approaches to further enhance battery lifetime.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127765884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current source based cell models are becoming a necessity for accurate timing and noise analysis at 65 nm and below. Voltage waveform shapes are increasingly more difficult to represent as simple ramps due to highly resistive interconnects and Miller cap effects at receiver gates. Propagation of complex voltage waveforms, and accurate modeling of nonlinear driver and receiver effects in crosstalk noise analysis require accurate cell models. A good cell model should be independent of input waveform and output load, should be easy to characterize and should not increase the complexity of a cell library with high-dimensional look-up tables. At the same time, it should provide high accuracy compared to SPICE for all analysis scenarios including multiple-input switching, and for all cell types and cell arcs, including those with high stacks. It should also be easily extendable for use in statistical STA and noise analysis, and one should be able to simulate it fast enough for practical use in multi-million gate designs. In this paper, we present a gate model built from fast transistor models (FXM) that has all the desired properties. Along with this model, we also present a multithreaded timing traversal approach that allows one to take advantage of the high accuracy provided by the FXM, at traditional STA speeds. Results are presented using a fully extracted 65 nm TSMC technology.
{"title":"Transistor level gate modeling for accurate and fast timing, noise, and power analysis","authors":"S. Raja, F. Varadi, M. Becer, J. Geada","doi":"10.1145/1391469.1391588","DOIUrl":"https://doi.org/10.1145/1391469.1391588","url":null,"abstract":"Current source based cell models are becoming a necessity for accurate timing and noise analysis at 65 nm and below. Voltage waveform shapes are increasingly more difficult to represent as simple ramps due to highly resistive interconnects and Miller cap effects at receiver gates. Propagation of complex voltage waveforms, and accurate modeling of nonlinear driver and receiver effects in crosstalk noise analysis require accurate cell models. A good cell model should be independent of input waveform and output load, should be easy to characterize and should not increase the complexity of a cell library with high-dimensional look-up tables. At the same time, it should provide high accuracy compared to SPICE for all analysis scenarios including multiple-input switching, and for all cell types and cell arcs, including those with high stacks. It should also be easily extendable for use in statistical STA and noise analysis, and one should be able to simulate it fast enough for practical use in multi-million gate designs. In this paper, we present a gate model built from fast transistor models (FXM) that has all the desired properties. Along with this model, we also present a multithreaded timing traversal approach that allows one to take advantage of the high accuracy provided by the FXM, at traditional STA speeds. Results are presented using a fully extracted 65 nm TSMC technology.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127673231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent CAD methodologies of design-for-manufacturability (DFM) have naturally led to a significant increase in the number of process and layout parameters that have to be taken into account in design-rule checking. Methodological consistency requires that a similar number of parameters be taken into account during layout parasitic extraction. Because of the inherent variability of these parameters, the issue of efficiently extracting deterministic parasitic sensitivities with respect to such a large number of parameters must be addressed. In this paper, we tackle this very issue in the context of capacitance sensitivity extraction. In particular, we show how the adjoint sensitivity method can be efficiently integrated within a finite-difference (FD) scheme to compute the sensitivity of the capacitance with respect to a large set of BEOL parameters. If np is the number of parameters, the speedup of the adjoint method is shown to be a factor of np/2 with respect to direct FD sensitivity techniques. The proposed method has been implemented and verified on a 65 nm BEOL cross section having 10 metal layers and a total number of 59 parameters. Because of its speed, the method can be advantageously used to prune out of the CAD flow those BEOL parameters that yield a capacitance sensitivity less than a given threshold.
{"title":"Efficient algorithm for the computation of on-chip capacitance sensitivities with respect to a large set of parameters","authors":"T. El-Moselhy, I. Elfadel, D. Widiger","doi":"10.1145/1391469.1391699","DOIUrl":"https://doi.org/10.1145/1391469.1391699","url":null,"abstract":"Recent CAD methodologies of design-for-manufacturability (DFM) have naturally led to a significant increase in the number of process and layout parameters that have to be taken into account in design-rule checking. Methodological consistency requires that a similar number of parameters be taken into account during layout parasitic extraction. Because of the inherent variability of these parameters, the issue of efficiently extracting deterministic parasitic sensitivities with respect to such a large number of parameters must be addressed. In this paper, we tackle this very issue in the context of capacitance sensitivity extraction. In particular, we show how the adjoint sensitivity method can be efficiently integrated within a finite-difference (FD) scheme to compute the sensitivity of the capacitance with respect to a large set of BEOL parameters. If np is the number of parameters, the speedup of the adjoint method is shown to be a factor of np/2 with respect to direct FD sensitivity techniques. The proposed method has been implemented and verified on a 65 nm BEOL cross section having 10 metal layers and a total number of 59 parameters. Because of its speed, the method can be advantageously used to prune out of the CAD flow those BEOL parameters that yield a capacitance sensitivity less than a given threshold.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1993 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128619858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In modular testing of system-on-a-chip (SoC), test access mechanisms (TAMs) are used to transport test data between the input/output pins of the SoC and the cores under test. Prior work assumes TAMs to be error-free during test data transfer. The validity of this assumption, however, is questionable with the ever-decreasing feature size of today's VLSI technology and the ever-increasing circuit operational frequency. In particular, when functional interconnects such as network-on-chip (NoC) are reused as TAMs, even if they have passed manufacturing test beforehand, failures caused by electrical noise such as crosstalk and transient errors may happen during test data transfer and make good chips appear to be defective, thus leading to undesired test yield loss. To address the above problem, in this paper, we propose novel solutions that are able to achieve reliable modular testing even if test data may sometimes get corrupted during transmission with vulnerable TAMs, by designing a new "jitter-aware" test wrapper and a new "jitter-transparent" ATE interface. Experimental results on an industrial circuit demonstrate the effectiveness of the proposed technique.
{"title":"On reliable modular testing with vulnerable test access mechanisms","authors":"Lin Huang, F. Yuan, Q. Xu","doi":"10.1145/1391469.1391681","DOIUrl":"https://doi.org/10.1145/1391469.1391681","url":null,"abstract":"In modular testing of system-on-a-chip (SoC), test access mechanisms (TAMs) are used to transport test data between the input/output pins of the SoC and the cores under test. Prior work assumes TAMs to be error-free during test data transfer. The validity of this assumption, however, is questionable with the ever-decreasing feature size of today's VLSI technology and the ever-increasing circuit operational frequency. In particular, when functional interconnects such as network-on-chip (NoC) are reused as TAMs, even if they have passed manufacturing test beforehand, failures caused by electrical noise such as crosstalk and transient errors may happen during test data transfer and make good chips appear to be defective, thus leading to undesired test yield loss. To address the above problem, in this paper, we propose novel solutions that are able to achieve reliable modular testing even if test data may sometimes get corrupted during transmission with vulnerable TAMs, by designing a new \"jitter-aware\" test wrapper and a new \"jitter-transparent\" ATE interface. Experimental results on an industrial circuit demonstrate the effectiveness of the proposed technique.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128690533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}