Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105372
Jianlei Yang, Zuowei Li, Yici Cai, Qiang Zhou
As the increasing size of power grids, IR drop analysis has become more computationally challenging both in runtime and memory consumption. In this paper, we propose a linear complexity simulator named PowerRush, which consists of an efficient SPICE Parser, a robust circuit Builder and a linear solver. The proposed solver is a pure algebraic method which can provide an optimal convergence without geometric information. It is implemented by Algebraic Multigrid Preconditioned Conjugate Gradient method, in which an aggregation based algebraic multigrid with K-Cycle acceleration is adopted as a preconditioner to improve the robustness of conjugate gradient iterative method. In multigrid scheme, double pairwise aggregation technique is applied to the matrix graph in coarsening procedure to ensure low setup cost and memory requirement. Further, a K-Cycle multigrid scheme is adopted to provide Krylov subspace acceleration at each level to guarantee optimal or near optimal convergence. Experimental results on real power grids have shown that PowerRush has a linear complexity in runtime cost and memory consumption. The DC analysis of a 60 Million nodes power grid can be solved by PowerRush for 0.01mV accuracy in 170 seconds with 21.89GB memory used.
{"title":"PowerRush: A linear simulator for power grid","authors":"Jianlei Yang, Zuowei Li, Yici Cai, Qiang Zhou","doi":"10.1109/ICCAD.2011.6105372","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105372","url":null,"abstract":"As the increasing size of power grids, IR drop analysis has become more computationally challenging both in runtime and memory consumption. In this paper, we propose a linear complexity simulator named PowerRush, which consists of an efficient SPICE Parser, a robust circuit Builder and a linear solver. The proposed solver is a pure algebraic method which can provide an optimal convergence without geometric information. It is implemented by Algebraic Multigrid Preconditioned Conjugate Gradient method, in which an aggregation based algebraic multigrid with K-Cycle acceleration is adopted as a preconditioner to improve the robustness of conjugate gradient iterative method. In multigrid scheme, double pairwise aggregation technique is applied to the matrix graph in coarsening procedure to ensure low setup cost and memory requirement. Further, a K-Cycle multigrid scheme is adopted to provide Krylov subspace acceleration at each level to guarantee optimal or near optimal convergence. Experimental results on real power grids have shown that PowerRush has a linear complexity in runtime cost and memory consumption. The DC analysis of a 60 Million nodes power grid can be solved by PowerRush for 0.01mV accuracy in 170 seconds with 21.89GB memory used.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"4 1","pages":"482-487"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84455191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105354
R. Kanj, Tong Li, R. Joshi, K. Agarwal, A. Sadigh, David W. Winston, S. Nassif
We propose an efficient Hermite spline-based SPICE simulation methodology for accurate statistical yield analysis. Unlike conventional methods, the spline-based transistor tables are built on-demand specific to the transient simulation requirements of the statistical experiments. Compared with traditional MOSFET table models, on-demand spline table models use ∼500X less memory. This makes Hermite spline-based table models practical for use in simulations for process variation modeling. Furthermore, we propose an efficient gate voltage offset approach to model transistor threshold voltage variation. In this scenario, evaluations of the transistor model rely on a single reference table and require one set of spline function evaluations per VT sample point as opposed to two or more sets for VT interpolation. This method is comprehensive and the results are in excellent agreement with traditional BSIM-based simulations. Around 4X improvement in speed, which includes the table generation cost, could be further improved by employing other fast-SPICE techniques or parallelism. To the best of our knowledge, this is the first time such a methodology has been coupled with importance sampling techniques to study the yield of memory designs.
{"title":"Accelerated statistical simulation via on-demand Hermite spline interpolations","authors":"R. Kanj, Tong Li, R. Joshi, K. Agarwal, A. Sadigh, David W. Winston, S. Nassif","doi":"10.1109/ICCAD.2011.6105354","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105354","url":null,"abstract":"We propose an efficient Hermite spline-based SPICE simulation methodology for accurate statistical yield analysis. Unlike conventional methods, the spline-based transistor tables are built on-demand specific to the transient simulation requirements of the statistical experiments. Compared with traditional MOSFET table models, on-demand spline table models use ∼500X less memory. This makes Hermite spline-based table models practical for use in simulations for process variation modeling. Furthermore, we propose an efficient gate voltage offset approach to model transistor threshold voltage variation. In this scenario, evaluations of the transistor model rely on a single reference table and require one set of spline function evaluations per VT sample point as opposed to two or more sets for VT interpolation. This method is comprehensive and the results are in excellent agreement with traditional BSIM-based simulations. Around 4X improvement in speed, which includes the table generation cost, could be further improved by employing other fast-SPICE techniques or parallelism. To the best of our knowledge, this is the first time such a methodology has been coupled with importance sampling techniques to study the yield of memory designs.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"14 1","pages":"353-360"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81465964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105399
J. Cong, Yuhui Huang, Bo Yuan
The Network-on-Chip (NoC) interconnect network of future multi-processor system-on-a-chip (MPSoC) needs to be efficient in terms of energy and delay. In this paper, we propose a topology synthesis algorithm based on shortest path Steiner arborescence (hereafter we call it ATree). The concept of temporal merging is applied to allow communication flows that are not temporal overlapping to share the same network resource. For scalability and power minimization, we build a hybrid network which consists of routers and buses. We evaluate our ATree-based topology synthesis methodology by applying it to several benchmarks and comparing the results with some existing NoC synthesis algorithms [1], [2]. The experimental results show a significant reduction in the power-latency product. The power-latency product of the synthesized topology using our ATree-based algorithm is 47% and 51% lower than [1], and 10% and 17% lower than [2] for the case without considering bus and the case with bus, respectively.
{"title":"ATree-based topology synthesis for on-chip network","authors":"J. Cong, Yuhui Huang, Bo Yuan","doi":"10.1109/ICCAD.2011.6105399","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105399","url":null,"abstract":"The Network-on-Chip (NoC) interconnect network of future multi-processor system-on-a-chip (MPSoC) needs to be efficient in terms of energy and delay. In this paper, we propose a topology synthesis algorithm based on shortest path Steiner arborescence (hereafter we call it ATree). The concept of temporal merging is applied to allow communication flows that are not temporal overlapping to share the same network resource. For scalability and power minimization, we build a hybrid network which consists of routers and buses. We evaluate our ATree-based topology synthesis methodology by applying it to several benchmarks and comparing the results with some existing NoC synthesis algorithms [1], [2]. The experimental results show a significant reduction in the power-latency product. The power-latency product of the synthesized topology using our ATree-based algorithm is 47% and 51% lower than [1], and 10% and 17% lower than [2] for the case without considering bus and the case with bus, respectively.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"91 1","pages":"651-658"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87622893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105334
G. Prenat, B. Dieny, J. Nozieres, G. D. Pendina, K. Torki
Spintronics (or spin-electronics) is a continuously expending area of research and development at the merge between magnetism and electronics. It aims at taking advantage of the quantum characteristic of the electrons, i.e. its spin, to create new functionalities and new devices. Spintronic devices comprise magnetic layers which serve as spin polarizers or analyzers separated by non-magnetic layers through which the spin-polarized electrons are transmitted. Typically, they rely on the Magneto Resistive (MR) effects, which consists in a dependence of the electrical resistance upon the magnetic configuration. These devices can be used to conceive innovative non-volatile memories, high-perfomances logic circuits, RF oscillators or field/current sensors. This paper describes a full Magnetic Process Design Kit (MPDK) allowing to efficiently design such CMOS/magnetic hybrid circuits. The latter can help circumventing some of the limits of CMOS-only microelectronics.
{"title":"Hybrid CMOS/Magnetic Process Design Kit and application to the design of high-performances non-volatile logic circuits","authors":"G. Prenat, B. Dieny, J. Nozieres, G. D. Pendina, K. Torki","doi":"10.1109/ICCAD.2011.6105334","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105334","url":null,"abstract":"Spintronics (or spin-electronics) is a continuously expending area of research and development at the merge between magnetism and electronics. It aims at taking advantage of the quantum characteristic of the electrons, i.e. its spin, to create new functionalities and new devices. Spintronic devices comprise magnetic layers which serve as spin polarizers or analyzers separated by non-magnetic layers through which the spin-polarized electrons are transmitted. Typically, they rely on the Magneto Resistive (MR) effects, which consists in a dependence of the electrical resistance upon the magnetic configuration. These devices can be used to conceive innovative non-volatile memories, high-perfomances logic circuits, RF oscillators or field/current sensors. This paper describes a full Magnetic Process Design Kit (MPDK) allowing to efficiently design such CMOS/magnetic hybrid circuits. The latter can help circumventing some of the limits of CMOS-only microelectronics.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"39 1","pages":"240-245"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86846313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105414
A. DeOrio, D. Khudia, V. Bertacco
The complexity of modern chips intensifies verification challenges, and an increasing share of this verification effort is shouldered by post-silicon validation. Focusing on the first silicon prototypes, post-silicon validation poses critical new challenges such as intermittent failures, where multiple executions of a same test do not yield a consistent outcome. These are often due to on-chip asynchronous events and electrical effects, leading to extremely time-consuming, if not unachievable, bug diagnosis and debugging processes. In this work, we propose a methodology called BPS (Bug Positioning System) to support the automatic diagnosis of these difficult bugs. During post-silicon validation, lightweight BPS hardware logs a compact encoding of observed signal activity over multiple executions of the same test: some passing, some failing. Leveraging a novel post-analysis algorithm, BPS uses the logged activity to diagnose the bug, identifying the approximate manifestation time and critical design signals. We found experimentally that BPS can localize most bugs down to the exact root signal and within about 1,000 clock cycles of their occurrence.
{"title":"Post-silicon bug diagnosis with inconsistent executions","authors":"A. DeOrio, D. Khudia, V. Bertacco","doi":"10.1109/ICCAD.2011.6105414","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105414","url":null,"abstract":"The complexity of modern chips intensifies verification challenges, and an increasing share of this verification effort is shouldered by post-silicon validation. Focusing on the first silicon prototypes, post-silicon validation poses critical new challenges such as intermittent failures, where multiple executions of a same test do not yield a consistent outcome. These are often due to on-chip asynchronous events and electrical effects, leading to extremely time-consuming, if not unachievable, bug diagnosis and debugging processes. In this work, we propose a methodology called BPS (Bug Positioning System) to support the automatic diagnosis of these difficult bugs. During post-silicon validation, lightweight BPS hardware logs a compact encoding of observed signal activity over multiple executions of the same test: some passing, some failing. Leveraging a novel post-analysis algorithm, BPS uses the logged activity to diagnose the bug, identifying the approximate manifestation time and critical design signals. We found experimentally that BPS can localize most bugs down to the exact root signal and within about 1,000 clock cycles of their occurrence.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"159 1","pages":"755-761"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77028935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105298
Xiaoping Tang, Minsik Cho
Double patterning technology (DPT) is regarded as the most practical solution for the sub-22nm lithography technology. DPT decomposes a single layout into two masks and applies double exposure to print the shapes in the layout. DPT requires accurate overlay control. Thus, the primary objective in DPT decomposition is to minimize the number of stitches (overlay) between the shapes in the two masks. The problem of minimizing the number of stitches in DPT decomposition is conjectured to be NP-hard. Existing approaches either apply Integer Linear Programming (ILP) or use heuristics. In this paper, we show that the problem is actually in P and present a method to decompose a layout for DPT and minimize the number of stitches optimally. The complexity of the method is O(n1.5 log n). Experimental results show that the method is even faster than the fast heuristics.
{"title":"Optimal layout decomposition for double patterning technology","authors":"Xiaoping Tang, Minsik Cho","doi":"10.1109/ICCAD.2011.6105298","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105298","url":null,"abstract":"Double patterning technology (DPT) is regarded as the most practical solution for the sub-22nm lithography technology. DPT decomposes a single layout into two masks and applies double exposure to print the shapes in the layout. DPT requires accurate overlay control. Thus, the primary objective in DPT decomposition is to minimize the number of stitches (overlay) between the shapes in the two masks. The problem of minimizing the number of stitches in DPT decomposition is conjectured to be NP-hard. Existing approaches either apply Integer Linear Programming (ILP) or use heuristics. In this paper, we show that the problem is actually in P and present a method to decompose a layout for DPT and minimize the number of stitches optimally. The complexity of the method is O(n1.5 log n). Experimental results show that the method is even faster than the fast heuristics.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"57 1","pages":"9-13"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75461638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105300
Xuebei Yang, K. Mohanram
In this paper, we introduce unequal-error-protection error correcting codes (UEPECCs) to improve SRAM reliability at low supply voltages for mobile multimedia applications. The fundamental premise for our work is that in multimedia applications, different bits in the same SRAM word are usually not equally significant, and hence deserve different protection levels. The key innovation in our work includes (i) a novel metric, word mean squared error, to measure the reliability of a SRAM word when different bits are not equally significant and (ii) an optimization algorithm based on dynamic programming to construct the UEPECC that assigns different protection levels to bits according to their significance. The advantage of the UEPECC over the traditional equal-error-protection ECC is demonstrated using two representative multimedia applications. For the same area, power, and encoding/decoding latency, SRAMs with UEPECC increase the peak signal-to-noise ratio by 8 dB in image processing and incur 60% less errors on average in optical flow (motion vector) computation.
{"title":"Unequal-error-protection codes in SRAMs for mobile multimedia applications","authors":"Xuebei Yang, K. Mohanram","doi":"10.1109/ICCAD.2011.6105300","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105300","url":null,"abstract":"In this paper, we introduce unequal-error-protection error correcting codes (UEPECCs) to improve SRAM reliability at low supply voltages for mobile multimedia applications. The fundamental premise for our work is that in multimedia applications, different bits in the same SRAM word are usually not equally significant, and hence deserve different protection levels. The key innovation in our work includes (i) a novel metric, word mean squared error, to measure the reliability of a SRAM word when different bits are not equally significant and (ii) an optimization algorithm based on dynamic programming to construct the UEPECC that assigns different protection levels to bits according to their significance. The advantage of the UEPECC over the traditional equal-error-protection ECC is demonstrated using two representative multimedia applications. For the same area, power, and encoding/decoding latency, SRAMs with UEPECC increase the peak signal-to-noise ratio by 8 dB in image processing and incur 60% less errors on average in optical flow (motion vector) computation.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"29 1","pages":"21-27"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75837460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105368
Yehua Su, Wenjing Rao
Crossbar-based architectures are promising for the future nanoelectronic systems. However, due to the inherent unreliability, defect tolerance schemes are necessary to guarantee the successful implementations of any logic functions. Most of the existing approaches have been based on logic mapping, which exploits the freedom of choosing which variables/products (in a logic function) to map to which of the vertical/horizontal wires (in a crossbar). In this paper, we propose a new defect tolerance approach, namely logic morphing, by exploiting the various equivalent forms of a logic function. This approach explores a new dimension of freedom in achieving defect tolerance, and is compatible with the existing mapping-based approaches. We propose an integrated algorithmic framework, which employs both mapping and morphing simultaneously, and efficiently searches for a successful logic implementation in the combined solution space. Simulation results show that the proposed scheme boosts defect tolerance capability significantly with many-fold yield improvement, while having no extra runtime over the existing approach of performing mapping alone.
{"title":"Defect-tolerant logic implementation onto nanocrossbars by exploiting mapping and morphing simultaneously","authors":"Yehua Su, Wenjing Rao","doi":"10.1109/ICCAD.2011.6105368","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105368","url":null,"abstract":"Crossbar-based architectures are promising for the future nanoelectronic systems. However, due to the inherent unreliability, defect tolerance schemes are necessary to guarantee the successful implementations of any logic functions. Most of the existing approaches have been based on logic mapping, which exploits the freedom of choosing which variables/products (in a logic function) to map to which of the vertical/horizontal wires (in a crossbar). In this paper, we propose a new defect tolerance approach, namely logic morphing, by exploiting the various equivalent forms of a logic function. This approach explores a new dimension of freedom in achieving defect tolerance, and is compatible with the existing mapping-based approaches. We propose an integrated algorithmic framework, which employs both mapping and morphing simultaneously, and efficiently searches for a successful logic implementation in the combined solution space. Simulation results show that the proposed scheme boosts defect tolerance capability significantly with many-fold yield improvement, while having no extra runtime over the existing approach of performing mapping alone.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"55 1","pages":"456-462"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75948632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105380
Zheng Zhang, I. Elfadel, L. Daniel
This paper presents an approach for the model order reduction of fully parameterized linear dynamic systems. In a fully parameterized system, not only the state matrices, but also can the input/output matrices be parameterized. The algorithm presented in this paper is based on neither conventional moment-matching nor balanced-truncation ideas. Instead, it uses “optimal (block) vectors” to construct the projection matrix, such that the system errors in the whole parameter space are minimized. This minimization problem is formulated as a recursive least square (RLS) optimization and then solved at a low cost. Our algorithm is tested by a set of multi-port multi-parameter cases with both intermediate and large parameter variations. The numerical results show that high accuracy is guaranteed, and that very compact models can be obtained for multi-parameter models due to the fact that the ROM size is independent of the number of parameters in our approach.
{"title":"Model order reduction of fully parameterized systems by recursive least square optimization","authors":"Zheng Zhang, I. Elfadel, L. Daniel","doi":"10.1109/ICCAD.2011.6105380","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105380","url":null,"abstract":"This paper presents an approach for the model order reduction of fully parameterized linear dynamic systems. In a fully parameterized system, not only the state matrices, but also can the input/output matrices be parameterized. The algorithm presented in this paper is based on neither conventional moment-matching nor balanced-truncation ideas. Instead, it uses “optimal (block) vectors” to construct the projection matrix, such that the system errors in the whole parameter space are minimized. This minimization problem is formulated as a recursive least square (RLS) optimization and then solved at a low cost. Our algorithm is tested by a set of multi-port multi-parameter cases with both intermediate and large parameter variations. The numerical results show that high accuracy is guaranteed, and that very compact models can be obtained for multi-parameter models due to the fact that the ROM size is independent of the number of parameters in our approach.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"79 1","pages":"523-530"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79307191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105323
Hongbo Zhang, Tan Yan, Martin D. F. Wong, Sanjay J. Patel
Aerial image simulation is a fundamental problem for modern VLSI design. It requires a huge amount of numerical computation. The recent advancement of general purpose GPU computing provides an excellent opportunity to parallelize the aerial image simulation and achieve great speedup. In this paper, we present and discuss two GPU-based aerial image simulation algorithms. We show through experiments that the fastest algorithm we propose can achieve 50X to 60X speedup over the CPU based serial algorithm. The error of our approach is shown to be insignificant.
{"title":"Accelerating aerial image simulation with GPU","authors":"Hongbo Zhang, Tan Yan, Martin D. F. Wong, Sanjay J. Patel","doi":"10.1109/ICCAD.2011.6105323","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105323","url":null,"abstract":"Aerial image simulation is a fundamental problem for modern VLSI design. It requires a huge amount of numerical computation. The recent advancement of general purpose GPU computing provides an excellent opportunity to parallelize the aerial image simulation and achieve great speedup. In this paper, we present and discuss two GPU-based aerial image simulation algorithms. We show through experiments that the fastest algorithm we propose can achieve 50X to 60X speedup over the CPU based serial algorithm. The error of our approach is shown to be insignificant.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1996 1","pages":"178-184"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86240584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}