首页 > 最新文献

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
PowerRush: A linear simulator for power grid PowerRush:一个用于电网的线性模拟器
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105372
Jianlei Yang, Zuowei Li, Yici Cai, Qiang Zhou
As the increasing size of power grids, IR drop analysis has become more computationally challenging both in runtime and memory consumption. In this paper, we propose a linear complexity simulator named PowerRush, which consists of an efficient SPICE Parser, a robust circuit Builder and a linear solver. The proposed solver is a pure algebraic method which can provide an optimal convergence without geometric information. It is implemented by Algebraic Multigrid Preconditioned Conjugate Gradient method, in which an aggregation based algebraic multigrid with K-Cycle acceleration is adopted as a preconditioner to improve the robustness of conjugate gradient iterative method. In multigrid scheme, double pairwise aggregation technique is applied to the matrix graph in coarsening procedure to ensure low setup cost and memory requirement. Further, a K-Cycle multigrid scheme is adopted to provide Krylov subspace acceleration at each level to guarantee optimal or near optimal convergence. Experimental results on real power grids have shown that PowerRush has a linear complexity in runtime cost and memory consumption. The DC analysis of a 60 Million nodes power grid can be solved by PowerRush for 0.01mV accuracy in 170 seconds with 21.89GB memory used.
随着电网规模的不断扩大,IR下降分析在运行时和内存消耗方面变得越来越具有计算挑战性。在本文中,我们提出了一个名为PowerRush的线性复杂性模拟器,它由一个高效的SPICE解析器、一个鲁棒的电路生成器和一个线性求解器组成。所提出的求解方法是一种不需要几何信息的纯代数方法,可以提供最优收敛性。该算法采用代数多网格预条件共轭梯度法实现,采用基于K-Cycle加速的聚集代数多网格作为预条件,提高了共轭梯度迭代法的鲁棒性。在多网格方案中,在粗化过程中对矩阵图采用双对聚合技术,以保证较低的设置成本和存储需求。进一步,采用K-Cycle多重网格方案,在每一级提供Krylov子空间加速,保证最优或接近最优收敛。在实际电网上的实验结果表明,PowerRush在运行成本和内存消耗方面具有线性复杂性。PowerRush在使用21.89GB内存的情况下,可以在170秒内解决6000万个节点电网的直流分析,精度为0.01mV。
{"title":"PowerRush: A linear simulator for power grid","authors":"Jianlei Yang, Zuowei Li, Yici Cai, Qiang Zhou","doi":"10.1109/ICCAD.2011.6105372","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105372","url":null,"abstract":"As the increasing size of power grids, IR drop analysis has become more computationally challenging both in runtime and memory consumption. In this paper, we propose a linear complexity simulator named PowerRush, which consists of an efficient SPICE Parser, a robust circuit Builder and a linear solver. The proposed solver is a pure algebraic method which can provide an optimal convergence without geometric information. It is implemented by Algebraic Multigrid Preconditioned Conjugate Gradient method, in which an aggregation based algebraic multigrid with K-Cycle acceleration is adopted as a preconditioner to improve the robustness of conjugate gradient iterative method. In multigrid scheme, double pairwise aggregation technique is applied to the matrix graph in coarsening procedure to ensure low setup cost and memory requirement. Further, a K-Cycle multigrid scheme is adopted to provide Krylov subspace acceleration at each level to guarantee optimal or near optimal convergence. Experimental results on real power grids have shown that PowerRush has a linear complexity in runtime cost and memory consumption. The DC analysis of a 60 Million nodes power grid can be solved by PowerRush for 0.01mV accuracy in 170 seconds with 21.89GB memory used.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84455191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Accelerated statistical simulation via on-demand Hermite spline interpolations 加速统计模拟通过按需埃尔米特样条插值
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105354
R. Kanj, Tong Li, R. Joshi, K. Agarwal, A. Sadigh, David W. Winston, S. Nassif
We propose an efficient Hermite spline-based SPICE simulation methodology for accurate statistical yield analysis. Unlike conventional methods, the spline-based transistor tables are built on-demand specific to the transient simulation requirements of the statistical experiments. Compared with traditional MOSFET table models, on-demand spline table models use ∼500X less memory. This makes Hermite spline-based table models practical for use in simulations for process variation modeling. Furthermore, we propose an efficient gate voltage offset approach to model transistor threshold voltage variation. In this scenario, evaluations of the transistor model rely on a single reference table and require one set of spline function evaluations per VT sample point as opposed to two or more sets for VT interpolation. This method is comprehensive and the results are in excellent agreement with traditional BSIM-based simulations. Around 4X improvement in speed, which includes the table generation cost, could be further improved by employing other fast-SPICE techniques or parallelism. To the best of our knowledge, this is the first time such a methodology has been coupled with importance sampling techniques to study the yield of memory designs.
我们提出了一种有效的基于Hermite样条的SPICE模拟方法,用于准确的产量统计分析。与传统方法不同,基于样条的晶体管表是根据统计实验的瞬态仿真要求按需构建的。与传统的MOSFET表模型相比,按需样条表模型使用的内存减少了~ 500X。这使得基于Hermite样条的表模型可用于过程变化建模的仿真。此外,我们提出了一种有效的栅极电压偏移方法来模拟晶体管阈值电压的变化。在这种情况下,晶体管模型的评估依赖于单个参考表,并且每个VT采样点需要一组样条函数评估,而VT插值则需要两组或更多组。该方法是全面的,结果与传统的基于bsim的仿真结果非常吻合。大约4倍的速度提高,包括表生成成本,可以通过采用其他快速spice技术或并行性进一步提高。据我们所知,这是第一次将这种方法与重要抽样技术结合起来研究存储器设计的产量。
{"title":"Accelerated statistical simulation via on-demand Hermite spline interpolations","authors":"R. Kanj, Tong Li, R. Joshi, K. Agarwal, A. Sadigh, David W. Winston, S. Nassif","doi":"10.1109/ICCAD.2011.6105354","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105354","url":null,"abstract":"We propose an efficient Hermite spline-based SPICE simulation methodology for accurate statistical yield analysis. Unlike conventional methods, the spline-based transistor tables are built on-demand specific to the transient simulation requirements of the statistical experiments. Compared with traditional MOSFET table models, on-demand spline table models use ∼500X less memory. This makes Hermite spline-based table models practical for use in simulations for process variation modeling. Furthermore, we propose an efficient gate voltage offset approach to model transistor threshold voltage variation. In this scenario, evaluations of the transistor model rely on a single reference table and require one set of spline function evaluations per VT sample point as opposed to two or more sets for VT interpolation. This method is comprehensive and the results are in excellent agreement with traditional BSIM-based simulations. Around 4X improvement in speed, which includes the table generation cost, could be further improved by employing other fast-SPICE techniques or parallelism. To the best of our knowledge, this is the first time such a methodology has been coupled with importance sampling techniques to study the yield of memory designs.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81465964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ATree-based topology synthesis for on-chip network 基于树的片上网络拓扑综合
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105399
J. Cong, Yuhui Huang, Bo Yuan
The Network-on-Chip (NoC) interconnect network of future multi-processor system-on-a-chip (MPSoC) needs to be efficient in terms of energy and delay. In this paper, we propose a topology synthesis algorithm based on shortest path Steiner arborescence (hereafter we call it ATree). The concept of temporal merging is applied to allow communication flows that are not temporal overlapping to share the same network resource. For scalability and power minimization, we build a hybrid network which consists of routers and buses. We evaluate our ATree-based topology synthesis methodology by applying it to several benchmarks and comparing the results with some existing NoC synthesis algorithms [1], [2]. The experimental results show a significant reduction in the power-latency product. The power-latency product of the synthesized topology using our ATree-based algorithm is 47% and 51% lower than [1], and 10% and 17% lower than [2] for the case without considering bus and the case with bus, respectively.
未来多处理器片上系统(MPSoC)的片上网络(NoC)互连网络需要在能量和延迟方面高效。本文提出了一种基于最短路径Steiner树形的拓扑综合算法(以下简称ATree)。时间合并的概念被应用于允许非时间重叠的通信流共享相同的网络资源。为了可扩展性和功耗最小化,我们构建了一个由路由器和总线组成的混合网络。我们通过将基于树的拓扑合成方法应用于几个基准测试,并将结果与一些现有的NoC合成算法进行比较[1],[2],从而评估了基于树的拓扑合成方法。实验结果表明,功率-延迟积显著降低。在不考虑总线和考虑总线的情况下,采用基于atree算法的合成拓扑的功率延迟积比[1]分别低47%和51%,比[2]分别低10%和17%。
{"title":"ATree-based topology synthesis for on-chip network","authors":"J. Cong, Yuhui Huang, Bo Yuan","doi":"10.1109/ICCAD.2011.6105399","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105399","url":null,"abstract":"The Network-on-Chip (NoC) interconnect network of future multi-processor system-on-a-chip (MPSoC) needs to be efficient in terms of energy and delay. In this paper, we propose a topology synthesis algorithm based on shortest path Steiner arborescence (hereafter we call it ATree). The concept of temporal merging is applied to allow communication flows that are not temporal overlapping to share the same network resource. For scalability and power minimization, we build a hybrid network which consists of routers and buses. We evaluate our ATree-based topology synthesis methodology by applying it to several benchmarks and comparing the results with some existing NoC synthesis algorithms [1], [2]. The experimental results show a significant reduction in the power-latency product. The power-latency product of the synthesized topology using our ATree-based algorithm is 47% and 51% lower than [1], and 10% and 17% lower than [2] for the case without considering bus and the case with bus, respectively.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87622893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Hybrid CMOS/Magnetic Process Design Kit and application to the design of high-performances non-volatile logic circuits 混合CMOS/磁性工艺设计套件及其在高性能非易失性逻辑电路设计中的应用
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105334
G. Prenat, B. Dieny, J. Nozieres, G. D. Pendina, K. Torki
Spintronics (or spin-electronics) is a continuously expending area of research and development at the merge between magnetism and electronics. It aims at taking advantage of the quantum characteristic of the electrons, i.e. its spin, to create new functionalities and new devices. Spintronic devices comprise magnetic layers which serve as spin polarizers or analyzers separated by non-magnetic layers through which the spin-polarized electrons are transmitted. Typically, they rely on the Magneto Resistive (MR) effects, which consists in a dependence of the electrical resistance upon the magnetic configuration. These devices can be used to conceive innovative non-volatile memories, high-perfomances logic circuits, RF oscillators or field/current sensors. This paper describes a full Magnetic Process Design Kit (MPDK) allowing to efficiently design such CMOS/magnetic hybrid circuits. The latter can help circumventing some of the limits of CMOS-only microelectronics.
自旋电子学(或自旋电子学)是磁学和电子学融合的一个不断发展的研究和发展领域。它旨在利用电子的量子特性,即自旋,来创造新的功能和新的器件。自旋电子器件包括磁性层,磁性层用作自旋极化器或分析仪,由非磁性层隔开,自旋极化电子通过非磁性层传输。通常,它们依赖于磁阻(MR)效应,它由电阻对磁结构的依赖组成。这些器件可用于构思创新的非易失性存储器,高性能逻辑电路,射频振荡器或场/电流传感器。本文描述了一个完整的磁过程设计套件(MPDK),允许有效地设计这种CMOS/磁混合电路。后者可以帮助规避仅cmos微电子的一些限制。
{"title":"Hybrid CMOS/Magnetic Process Design Kit and application to the design of high-performances non-volatile logic circuits","authors":"G. Prenat, B. Dieny, J. Nozieres, G. D. Pendina, K. Torki","doi":"10.1109/ICCAD.2011.6105334","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105334","url":null,"abstract":"Spintronics (or spin-electronics) is a continuously expending area of research and development at the merge between magnetism and electronics. It aims at taking advantage of the quantum characteristic of the electrons, i.e. its spin, to create new functionalities and new devices. Spintronic devices comprise magnetic layers which serve as spin polarizers or analyzers separated by non-magnetic layers through which the spin-polarized electrons are transmitted. Typically, they rely on the Magneto Resistive (MR) effects, which consists in a dependence of the electrical resistance upon the magnetic configuration. These devices can be used to conceive innovative non-volatile memories, high-perfomances logic circuits, RF oscillators or field/current sensors. This paper describes a full Magnetic Process Design Kit (MPDK) allowing to efficiently design such CMOS/magnetic hybrid circuits. The latter can help circumventing some of the limits of CMOS-only microelectronics.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86846313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Post-silicon bug diagnosis with inconsistent executions 后硅错误诊断与不一致的执行
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105414
A. DeOrio, D. Khudia, V. Bertacco
The complexity of modern chips intensifies verification challenges, and an increasing share of this verification effort is shouldered by post-silicon validation. Focusing on the first silicon prototypes, post-silicon validation poses critical new challenges such as intermittent failures, where multiple executions of a same test do not yield a consistent outcome. These are often due to on-chip asynchronous events and electrical effects, leading to extremely time-consuming, if not unachievable, bug diagnosis and debugging processes. In this work, we propose a methodology called BPS (Bug Positioning System) to support the automatic diagnosis of these difficult bugs. During post-silicon validation, lightweight BPS hardware logs a compact encoding of observed signal activity over multiple executions of the same test: some passing, some failing. Leveraging a novel post-analysis algorithm, BPS uses the logged activity to diagnose the bug, identifying the approximate manifestation time and critical design signals. We found experimentally that BPS can localize most bugs down to the exact root signal and within about 1,000 clock cycles of their occurrence.
现代芯片的复杂性加剧了验证挑战,并且后硅验证承担了越来越多的验证工作。关注于第一个硅原型,后硅验证提出了关键的新挑战,如间歇性失败,其中多次执行相同的测试不能产生一致的结果。这通常是由于芯片上的异步事件和电子效应,导致极其耗时(如果不是无法实现的话)的错误诊断和调试过程。在这项工作中,我们提出了一种称为BPS (Bug Positioning System)的方法来支持这些困难的Bug的自动诊断。在硅后验证期间,轻量级BPS硬件记录了在多次执行相同测试时观察到的信号活动的紧凑编码:一些通过,一些失败。利用一种新颖的后期分析算法,BPS使用记录的活动来诊断bug,确定近似的表现时间和关键的设计信号。我们通过实验发现,BPS可以将大多数错误定位到精确的根信号,并在它们发生的大约1000个时钟周期内进行定位。
{"title":"Post-silicon bug diagnosis with inconsistent executions","authors":"A. DeOrio, D. Khudia, V. Bertacco","doi":"10.1109/ICCAD.2011.6105414","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105414","url":null,"abstract":"The complexity of modern chips intensifies verification challenges, and an increasing share of this verification effort is shouldered by post-silicon validation. Focusing on the first silicon prototypes, post-silicon validation poses critical new challenges such as intermittent failures, where multiple executions of a same test do not yield a consistent outcome. These are often due to on-chip asynchronous events and electrical effects, leading to extremely time-consuming, if not unachievable, bug diagnosis and debugging processes. In this work, we propose a methodology called BPS (Bug Positioning System) to support the automatic diagnosis of these difficult bugs. During post-silicon validation, lightweight BPS hardware logs a compact encoding of observed signal activity over multiple executions of the same test: some passing, some failing. Leveraging a novel post-analysis algorithm, BPS uses the logged activity to diagnose the bug, identifying the approximate manifestation time and critical design signals. We found experimentally that BPS can localize most bugs down to the exact root signal and within about 1,000 clock cycles of their occurrence.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77028935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Optimal layout decomposition for double patterning technology 双图案技术的最优布局分解
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105298
Xiaoping Tang, Minsik Cho
Double patterning technology (DPT) is regarded as the most practical solution for the sub-22nm lithography technology. DPT decomposes a single layout into two masks and applies double exposure to print the shapes in the layout. DPT requires accurate overlay control. Thus, the primary objective in DPT decomposition is to minimize the number of stitches (overlay) between the shapes in the two masks. The problem of minimizing the number of stitches in DPT decomposition is conjectured to be NP-hard. Existing approaches either apply Integer Linear Programming (ILP) or use heuristics. In this paper, we show that the problem is actually in P and present a method to decompose a layout for DPT and minimize the number of stitches optimally. The complexity of the method is O(n1.5 log n). Experimental results show that the method is even faster than the fast heuristics.
双图案技术(DPT)被认为是亚22nm光刻技术中最实用的解决方案。DPT将单个布局分解为两个蒙版,并应用双重曝光来打印布局中的形状。DPT需要精确的覆盖控制。因此,DPT分解的主要目标是最小化两个蒙版中形状之间的缝线(覆盖)数量。在DPT分解中最小化缝线数的问题被认为是np困难问题。现有的方法要么采用整数线性规划(ILP),要么使用启发式方法。在本文中,我们证明了这个问题实际上是在P中,并提出了一种分解DPT布局的方法,并最优地减少了针数。该方法的复杂度为O(n1.5 log n),实验结果表明,该方法比快速启发式方法更快。
{"title":"Optimal layout decomposition for double patterning technology","authors":"Xiaoping Tang, Minsik Cho","doi":"10.1109/ICCAD.2011.6105298","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105298","url":null,"abstract":"Double patterning technology (DPT) is regarded as the most practical solution for the sub-22nm lithography technology. DPT decomposes a single layout into two masks and applies double exposure to print the shapes in the layout. DPT requires accurate overlay control. Thus, the primary objective in DPT decomposition is to minimize the number of stitches (overlay) between the shapes in the two masks. The problem of minimizing the number of stitches in DPT decomposition is conjectured to be NP-hard. Existing approaches either apply Integer Linear Programming (ILP) or use heuristics. In this paper, we show that the problem is actually in P and present a method to decompose a layout for DPT and minimize the number of stitches optimally. The complexity of the method is O(n1.5 log n). Experimental results show that the method is even faster than the fast heuristics.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75461638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Unequal-error-protection codes in SRAMs for mobile multimedia applications 移动多媒体应用中sram中的不等错误保护码
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105300
Xuebei Yang, K. Mohanram
In this paper, we introduce unequal-error-protection error correcting codes (UEPECCs) to improve SRAM reliability at low supply voltages for mobile multimedia applications. The fundamental premise for our work is that in multimedia applications, different bits in the same SRAM word are usually not equally significant, and hence deserve different protection levels. The key innovation in our work includes (i) a novel metric, word mean squared error, to measure the reliability of a SRAM word when different bits are not equally significant and (ii) an optimization algorithm based on dynamic programming to construct the UEPECC that assigns different protection levels to bits according to their significance. The advantage of the UEPECC over the traditional equal-error-protection ECC is demonstrated using two representative multimedia applications. For the same area, power, and encoding/decoding latency, SRAMs with UEPECC increase the peak signal-to-noise ratio by 8 dB in image processing and incur 60% less errors on average in optical flow (motion vector) computation.
在本文中,我们引入不等错保护纠错码(UEPECCs)来提高移动多媒体应用中SRAM在低电源电压下的可靠性。我们工作的基本前提是,在多媒体应用中,同一个SRAM字中的不同位通常不是同等重要的,因此应该得到不同的保护级别。我们工作中的关键创新包括(i)一种新的度量,词均方误差,用于衡量不同位不同等重要时SRAM字的可靠性;(ii)一种基于动态规划的优化算法,用于构建UEPECC,该算法根据位的重要性为其分配不同的保护级别。通过两个典型的多媒体应用,验证了UEPECC相对于传统等错保护ECC的优势。对于相同的面积,功率和编码/解码延迟,具有UEPECC的sram在图像处理中将峰值信噪比提高了8 dB,并且在光流(运动矢量)计算中平均减少了60%的误差。
{"title":"Unequal-error-protection codes in SRAMs for mobile multimedia applications","authors":"Xuebei Yang, K. Mohanram","doi":"10.1109/ICCAD.2011.6105300","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105300","url":null,"abstract":"In this paper, we introduce unequal-error-protection error correcting codes (UEPECCs) to improve SRAM reliability at low supply voltages for mobile multimedia applications. The fundamental premise for our work is that in multimedia applications, different bits in the same SRAM word are usually not equally significant, and hence deserve different protection levels. The key innovation in our work includes (i) a novel metric, word mean squared error, to measure the reliability of a SRAM word when different bits are not equally significant and (ii) an optimization algorithm based on dynamic programming to construct the UEPECC that assigns different protection levels to bits according to their significance. The advantage of the UEPECC over the traditional equal-error-protection ECC is demonstrated using two representative multimedia applications. For the same area, power, and encoding/decoding latency, SRAMs with UEPECC increase the peak signal-to-noise ratio by 8 dB in image processing and incur 60% less errors on average in optical flow (motion vector) computation.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75837460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Defect-tolerant logic implementation onto nanocrossbars by exploiting mapping and morphing simultaneously 同时利用映射和变形在纳米交叉棒上实现容错逻辑
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105368
Yehua Su, Wenjing Rao
Crossbar-based architectures are promising for the future nanoelectronic systems. However, due to the inherent unreliability, defect tolerance schemes are necessary to guarantee the successful implementations of any logic functions. Most of the existing approaches have been based on logic mapping, which exploits the freedom of choosing which variables/products (in a logic function) to map to which of the vertical/horizontal wires (in a crossbar). In this paper, we propose a new defect tolerance approach, namely logic morphing, by exploiting the various equivalent forms of a logic function. This approach explores a new dimension of freedom in achieving defect tolerance, and is compatible with the existing mapping-based approaches. We propose an integrated algorithmic framework, which employs both mapping and morphing simultaneously, and efficiently searches for a successful logic implementation in the combined solution space. Simulation results show that the proposed scheme boosts defect tolerance capability significantly with many-fold yield improvement, while having no extra runtime over the existing approach of performing mapping alone.
交叉棒结构在未来的纳米电子系统中很有前途。然而,由于固有的不可靠性,缺陷容忍方案是保证任何逻辑功能成功实现的必要条件。大多数现有的方法都是基于逻辑映射的,它利用了选择哪些变量/产品(在逻辑函数中)映射到哪个垂直/水平线(在横杆中)的自由。本文利用逻辑函数的各种等价形式,提出了一种新的缺陷容错方法,即逻辑变形。这种方法在实现缺陷容忍度方面探索了自由的新维度,并且与现有的基于映射的方法兼容。我们提出了一个集成的算法框架,该框架同时使用映射和变形,并有效地在组合解空间中搜索成功的逻辑实现。仿真结果表明,该方案在不增加运行时间的前提下,显著提高了缺陷容忍度,良率提高了数倍。
{"title":"Defect-tolerant logic implementation onto nanocrossbars by exploiting mapping and morphing simultaneously","authors":"Yehua Su, Wenjing Rao","doi":"10.1109/ICCAD.2011.6105368","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105368","url":null,"abstract":"Crossbar-based architectures are promising for the future nanoelectronic systems. However, due to the inherent unreliability, defect tolerance schemes are necessary to guarantee the successful implementations of any logic functions. Most of the existing approaches have been based on logic mapping, which exploits the freedom of choosing which variables/products (in a logic function) to map to which of the vertical/horizontal wires (in a crossbar). In this paper, we propose a new defect tolerance approach, namely logic morphing, by exploiting the various equivalent forms of a logic function. This approach explores a new dimension of freedom in achieving defect tolerance, and is compatible with the existing mapping-based approaches. We propose an integrated algorithmic framework, which employs both mapping and morphing simultaneously, and efficiently searches for a successful logic implementation in the combined solution space. Simulation results show that the proposed scheme boosts defect tolerance capability significantly with many-fold yield improvement, while having no extra runtime over the existing approach of performing mapping alone.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75948632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Model order reduction of fully parameterized systems by recursive least square optimization 基于递归最小二乘优化的全参数化系统模型降阶
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105380
Zheng Zhang, I. Elfadel, L. Daniel
This paper presents an approach for the model order reduction of fully parameterized linear dynamic systems. In a fully parameterized system, not only the state matrices, but also can the input/output matrices be parameterized. The algorithm presented in this paper is based on neither conventional moment-matching nor balanced-truncation ideas. Instead, it uses “optimal (block) vectors” to construct the projection matrix, such that the system errors in the whole parameter space are minimized. This minimization problem is formulated as a recursive least square (RLS) optimization and then solved at a low cost. Our algorithm is tested by a set of multi-port multi-parameter cases with both intermediate and large parameter variations. The numerical results show that high accuracy is guaranteed, and that very compact models can be obtained for multi-parameter models due to the fact that the ROM size is independent of the number of parameters in our approach.
提出了一种全参数化线性动态系统模型阶数约简方法。在全参数化系统中,不仅状态矩阵可以参数化,输入/输出矩阵也可以参数化。本文提出的算法既不是基于传统的矩匹配思想,也不是基于平衡截断思想。相反,它使用“最优(块)向量”来构造投影矩阵,从而使整个参数空间中的系统误差最小化。该最小化问题被表述为递归最小二乘优化,然后以低成本求解。我们的算法通过一组多端口多参数中、大参数变化的案例进行了测试。数值结果表明,由于该方法中ROM大小与参数数量无关,可以保证较高的精度,并且对于多参数模型可以得到非常紧凑的模型。
{"title":"Model order reduction of fully parameterized systems by recursive least square optimization","authors":"Zheng Zhang, I. Elfadel, L. Daniel","doi":"10.1109/ICCAD.2011.6105380","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105380","url":null,"abstract":"This paper presents an approach for the model order reduction of fully parameterized linear dynamic systems. In a fully parameterized system, not only the state matrices, but also can the input/output matrices be parameterized. The algorithm presented in this paper is based on neither conventional moment-matching nor balanced-truncation ideas. Instead, it uses “optimal (block) vectors” to construct the projection matrix, such that the system errors in the whole parameter space are minimized. This minimization problem is formulated as a recursive least square (RLS) optimization and then solved at a low cost. Our algorithm is tested by a set of multi-port multi-parameter cases with both intermediate and large parameter variations. The numerical results show that high accuracy is guaranteed, and that very compact models can be obtained for multi-parameter models due to the fact that the ROM size is independent of the number of parameters in our approach.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79307191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Accelerating aerial image simulation with GPU 利用GPU加速航拍图像仿真
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105323
Hongbo Zhang, Tan Yan, Martin D. F. Wong, Sanjay J. Patel
Aerial image simulation is a fundamental problem for modern VLSI design. It requires a huge amount of numerical computation. The recent advancement of general purpose GPU computing provides an excellent opportunity to parallelize the aerial image simulation and achieve great speedup. In this paper, we present and discuss two GPU-based aerial image simulation algorithms. We show through experiments that the fastest algorithm we propose can achieve 50X to 60X speedup over the CPU based serial algorithm. The error of our approach is shown to be insignificant.
航拍图像仿真是现代超大规模集成电路设计的一个基本问题。它需要大量的数值计算。通用GPU计算技术的发展为航拍图像仿真的并行化和加速提供了良好的契机。本文提出并讨论了两种基于gpu的航拍图像仿真算法。我们通过实验证明,我们提出的最快算法比基于CPU的串行算法可以实现50到60倍的加速。我们的方法的误差被证明是微不足道的。
{"title":"Accelerating aerial image simulation with GPU","authors":"Hongbo Zhang, Tan Yan, Martin D. F. Wong, Sanjay J. Patel","doi":"10.1109/ICCAD.2011.6105323","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105323","url":null,"abstract":"Aerial image simulation is a fundamental problem for modern VLSI design. It requires a huge amount of numerical computation. The recent advancement of general purpose GPU computing provides an excellent opportunity to parallelize the aerial image simulation and achieve great speedup. In this paper, we present and discuss two GPU-based aerial image simulation algorithms. We show through experiments that the fastest algorithm we propose can achieve 50X to 60X speedup over the CPU based serial algorithm. The error of our approach is shown to be insignificant.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86240584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1