首页 > 最新文献

2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
AVF-driven parity optimization for MBU protection of in-core memory arrays avf驱动的核心存储器阵列MBU保护奇偶优化
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.301
M. Maniatakos, M. Michael, Y. Makris
We propose an AVF-driven parity selection method for protecting modern microprocessor in-core memory arrays against MBUs. As MBUs constitute more than 50% of the upsets in latest technologies, error correcting codes or physical interleaving are typically employed to effectively protect out-of-core memory structures, such as caches. However, such methods are not applicable to high-performance in-core arrays, due to computational complexity, high delay and area overhead. To this end, we revisit parity as an effective mechanism to detect errors and we resort to pipeline flushing and checkpointing for correction. We demonstrate that optimal parity tree construction for MBU detection is a computationally complex problem, which we then formulate as an integer-linear-program (ILP). Experimental results on Alpha 21264 and Intel P6 in-core memory arrays demonstrate that optimal parity tree selection can achieve great vulnerability reduction, even when a small number of bits are added to the parity trees, compared to simple heuristics. Furthermore, the ILP formulation allows us to find better solutions by effectively exploring the solution space in the presence of multiple parity trees; results show that the presence of 2 parity trees offers a vulnerability reduction of more than 50% over a single parity tree.
我们提出了一种avf驱动的奇偶校验选择方法,用于保护现代微处理器核心存储器阵列免受MBUs的侵害。作为生产部构成50%以上的冷门最新技术,错误校正码或物理交叉通常用来有效地保护核外内存结构,如缓存。然而,由于计算复杂性、高延迟和面积开销,这种方法不适用于高性能核内阵列。为此,我们重新审视奇偶校验,将其作为检测错误的有效机制,并借助于流水线刷新和检查点进行纠正。我们证明了MBU检测的最优奇偶树构造是一个计算复杂的问题,然后我们将其表述为整数线性规划(ILP)。在Alpha 21264和Intel P6内核内存阵列上的实验结果表明,与简单的启发式方法相比,即使在奇偶校验树中添加少量比特,最优奇偶校验树选择也可以大大减少漏洞。此外,ILP公式允许我们通过有效地探索存在多个奇偶树的解空间来找到更好的解;结果表明,在单个奇偶校验树上,2个奇偶校验树的存在提供了超过50%的脆弱性减少。
{"title":"AVF-driven parity optimization for MBU protection of in-core memory arrays","authors":"M. Maniatakos, M. Michael, Y. Makris","doi":"10.7873/DATE.2013.301","DOIUrl":"https://doi.org/10.7873/DATE.2013.301","url":null,"abstract":"We propose an AVF-driven parity selection method for protecting modern microprocessor in-core memory arrays against MBUs. As MBUs constitute more than 50% of the upsets in latest technologies, error correcting codes or physical interleaving are typically employed to effectively protect out-of-core memory structures, such as caches. However, such methods are not applicable to high-performance in-core arrays, due to computational complexity, high delay and area overhead. To this end, we revisit parity as an effective mechanism to detect errors and we resort to pipeline flushing and checkpointing for correction. We demonstrate that optimal parity tree construction for MBU detection is a computationally complex problem, which we then formulate as an integer-linear-program (ILP). Experimental results on Alpha 21264 and Intel P6 in-core memory arrays demonstrate that optimal parity tree selection can achieve great vulnerability reduction, even when a small number of bits are added to the parity trees, compared to simple heuristics. Furthermore, the ILP formulation allows us to find better solutions by effectively exploring the solution space in the presence of multiple parity trees; results show that the presence of 2 parity trees offers a vulnerability reduction of more than 50% over a single parity tree.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77537818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient importance sampling for high-sigma yield analysis with adaptive online surrogate modeling 基于自适应在线代理模型的高效重要抽样高西格玛产量分析
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.267
Jian Yao, Zuochang Ye, Yan Wang
Massively repeated structures such as SRAM cells usually require extremely low failure rate. This brings on a challenging issue for Monte Carlo based statistical yield analysis, as huge amount of samples have to be drawn in order to observe one single failure. Fast Monte Carlo methods, e.g. importance sampling methods, are still quite expensive as the anticipated failure rate is very low. In this paper, a new method is proposed to tackle this issue. The key idea is to improve traditional importance sampling method with an efficient online surrogate model. The proposed method improves the performance for both stages in importance sampling, i.e. finding the distorted probability density function, and the distorted sampling. Experimental results show that the proposed method is 1e2X∼1e5X faster than the standard Monte Carlo approach and achieves 5X∼22X speedup over existing state-of-the-art techniques without sacrificing estimation accuracy.
大规模重复结构如SRAM单元通常要求极低的故障率。这给基于蒙特卡罗的统计良率分析带来了一个具有挑战性的问题,因为为了观察单个故障,必须绘制大量的样本。快速蒙特卡罗方法,例如重要性抽样方法,仍然非常昂贵,因为预期的故障率非常低。本文提出了一种新的方法来解决这一问题。其核心思想是利用一种高效的在线代理模型来改进传统的重要抽样方法。该方法提高了重要性抽样的两个阶段的性能,即发现扭曲的概率密度函数和扭曲的抽样。实验结果表明,该方法比标准蒙特卡罗方法快1e2X ~ 1e5X,在不牺牲估计精度的情况下,比现有最先进的技术实现了5X ~ 22X的加速。
{"title":"Efficient importance sampling for high-sigma yield analysis with adaptive online surrogate modeling","authors":"Jian Yao, Zuochang Ye, Yan Wang","doi":"10.7873/DATE.2013.267","DOIUrl":"https://doi.org/10.7873/DATE.2013.267","url":null,"abstract":"Massively repeated structures such as SRAM cells usually require extremely low failure rate. This brings on a challenging issue for Monte Carlo based statistical yield analysis, as huge amount of samples have to be drawn in order to observe one single failure. Fast Monte Carlo methods, e.g. importance sampling methods, are still quite expensive as the anticipated failure rate is very low. In this paper, a new method is proposed to tackle this issue. The key idea is to improve traditional importance sampling method with an efficient online surrogate model. The proposed method improves the performance for both stages in importance sampling, i.e. finding the distorted probability density function, and the distorted sampling. Experimental results show that the proposed method is 1e2X∼1e5X faster than the standard Monte Carlo approach and achieves 5X∼22X speedup over existing state-of-the-art techniques without sacrificing estimation accuracy.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86912898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Future of GPGPU micro-architectural parameters GPGPU微架构参数的未来
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.089
C. Nugteren, Gert-Jan van den Braak, H. Corporaal
As graphics processing units (GPUs) are becoming increasingly popular for general purpose workloads (GPGPU), the question arises how such processors will evolve architecturally in the near future. In this work, we identify and discuss trade-offs for three GPU architecture parameters: active thread count, compute-memory ratio, and cluster and warp sizing. For each parameter, we propose changes to improve GPU design, keeping in mind trends such as dark silicon and the increasing popularity of GPGPU architectures. A key-enabler is dynamism and workload-adaptiveness, enabling among others: dynamic register file sizing, latency aware scheduling, roofline-aware DVFS, run-time cluster fusion, and dynamic warp sizing.
随着图形处理单元(gpu)在通用工作负载(GPGPU)中越来越流行,出现了这样的处理器在不久的将来将如何在体系结构上发展的问题。在这项工作中,我们确定并讨论了三个GPU架构参数的权衡:活动线程数,计算内存比率,集群和warp大小。对于每个参数,我们都提出了改进GPU设计的建议,同时考虑到暗硅和GPGPU架构日益普及等趋势。关键启用项是动态性和工作负载适应性,支持动态寄存器文件大小、延迟感知调度、顶线感知DVFS、运行时集群融合和动态翘度大小等。
{"title":"Future of GPGPU micro-architectural parameters","authors":"C. Nugteren, Gert-Jan van den Braak, H. Corporaal","doi":"10.7873/DATE.2013.089","DOIUrl":"https://doi.org/10.7873/DATE.2013.089","url":null,"abstract":"As graphics processing units (GPUs) are becoming increasingly popular for general purpose workloads (GPGPU), the question arises how such processors will evolve architecturally in the near future. In this work, we identify and discuss trade-offs for three GPU architecture parameters: active thread count, compute-memory ratio, and cluster and warp sizing. For each parameter, we propose changes to improve GPU design, keeping in mind trends such as dark silicon and the increasing popularity of GPGPU architectures. A key-enabler is dynamism and workload-adaptiveness, enabling among others: dynamic register file sizing, latency aware scheduling, roofline-aware DVFS, run-time cluster fusion, and dynamic warp sizing.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87681892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Leveraging sensitivity analysis for fast, accurate estimation of SRAM dynamic write VMIN 利用灵敏度分析快速,准确地估计SRAM动态写入VMIN
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.364
James Boley, V. Chandra, R. Aitken, B. Calhoun
Circuit reliability in the presence of variability is a major concern for SRAM designers. With the size of memory ever increasing, Monte Carlo simulations have become too time consuming for margining and yield evaluation. In addition, dynamic write-ability metrics have an advantage over static metrics because they take into account timing constraints. However, these metrics are much more expensive in terms of runtime. Statistical blockade is one method that reduces the number of simulations by filtering out non-tail samples, however the total number of simulations required still remains relatively large. In this paper, we present a method that uses sensitivity analysis to provide a total speedup of ∼112X compared with recursive statistical blockade with only a 3% average loss in accuracy. In addition, we show how this method can be used to calculate dynamic VMIN and to evaluate several write assist methods.
电路可靠性在可变性的存在是一个主要关注的SRAM设计者。随着内存大小的不断增加,蒙特卡罗模拟对于边际和成品率的评估变得过于耗时。此外,动态可写性指标比静态指标更有优势,因为它们考虑了时间约束。然而,就运行时而言,这些指标的成本要高得多。统计封锁是一种通过过滤掉非尾部样本来减少模拟次数的方法,但是所需的模拟总数仍然比较大。在本文中,我们提出了一种使用灵敏度分析的方法,与递归统计阻断相比,该方法提供了约112X的总加速,平均精度损失仅为3%。此外,我们还展示了如何使用该方法来计算动态VMIN和评估几种写辅助方法。
{"title":"Leveraging sensitivity analysis for fast, accurate estimation of SRAM dynamic write VMIN","authors":"James Boley, V. Chandra, R. Aitken, B. Calhoun","doi":"10.7873/DATE.2013.364","DOIUrl":"https://doi.org/10.7873/DATE.2013.364","url":null,"abstract":"Circuit reliability in the presence of variability is a major concern for SRAM designers. With the size of memory ever increasing, Monte Carlo simulations have become too time consuming for margining and yield evaluation. In addition, dynamic write-ability metrics have an advantage over static metrics because they take into account timing constraints. However, these metrics are much more expensive in terms of runtime. Statistical blockade is one method that reduces the number of simulations by filtering out non-tail samples, however the total number of simulations required still remains relatively large. In this paper, we present a method that uses sensitivity analysis to provide a total speedup of ∼112X compared with recursive statistical blockade with only a 3% average loss in accuracy. In addition, we show how this method can be used to calculate dynamic VMIN and to evaluate several write assist methods.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90084497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Efficient and scalable OpenMP-based system-level design 高效和可扩展的基于openmp的系统级设计
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.206
A. Cilardo, L. Gallo, A. Mazzeo, N. Mazzocca
In this work we present an experimental environment for electronic system-level design based on the OpenMP programming paradigm. Fully compliant with the OpenMP standard, the environment allows the generation of heterogeneous hardware/software systems exhibiting good scalability with respect to the number of threads and limited performance overheads. Based on well-established OpenMP benchmarks, the paper also presents some comparisons with high-performance software implementations as well as with previous proposals oriented to pure hardware translation. The results confirm that the proposed approach achieves improved results in terms of both efficiency and scalability.
在这项工作中,我们提出了一个基于OpenMP编程范式的电子系统级设计的实验环境。该环境完全符合OpenMP标准,允许生成异构硬件/软件系统,在线程数量和有限的性能开销方面表现出良好的可伸缩性。基于完善的OpenMP基准,本文还与高性能软件实现以及先前面向纯硬件翻译的建议进行了一些比较。结果表明,该方法在效率和可扩展性方面都取得了较好的效果。
{"title":"Efficient and scalable OpenMP-based system-level design","authors":"A. Cilardo, L. Gallo, A. Mazzeo, N. Mazzocca","doi":"10.7873/DATE.2013.206","DOIUrl":"https://doi.org/10.7873/DATE.2013.206","url":null,"abstract":"In this work we present an experimental environment for electronic system-level design based on the OpenMP programming paradigm. Fully compliant with the OpenMP standard, the environment allows the generation of heterogeneous hardware/software systems exhibiting good scalability with respect to the number of threads and limited performance overheads. Based on well-established OpenMP benchmarks, the paper also presents some comparisons with high-performance software implementations as well as with previous proposals oriented to pure hardware translation. The results confirm that the proposed approach achieves improved results in terms of both efficiency and scalability.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86058346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Defect-tolerant logic hardening for crossbar-based nanosystems 基于交叉棒的纳米系统的容错逻辑强化
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.361
Yehua Su, Wenjing Rao
Crossbar-based architectures are promising for the future nanoelectronic systems. However, due to the inherent unreliability of nanoscale devices, the implementation of any logic functions relies on aggressive defect-tolerant schemes applied at the post-manufacturing stage. Most of such defect-tolerant approaches explore mapping choices between logic variables/products and crossbar vertical/horizontal wires. In this paper, we develop a new approach, namely fine-grained logic hardening, based on the idea of adding redundancies into a logic function so as to boost the success rate of logic implementation. We propose an analytical framework to evaluate and fine-tune the amount and location of redundancy to be added for a given logic function. Furthermore, we devise a method to optimally harden the logic function so as to maximize the defect tolerance capability. Simulation results show that the proposed logic hardening scheme boosts defect tolerance capability significantly in yield improvement, compared to mapping-only schemes with the same amount of hardware cost.
交叉棒结构在未来的纳米电子系统中很有前途。然而,由于纳米级器件固有的不可靠性,任何逻辑功能的实现都依赖于在后制造阶段应用的积极容错方案。大多数这种容错方法探索逻辑变量/产品与横杆垂直/水平导线之间的映射选择。本文基于在逻辑函数中添加冗余的思想,提出了一种新的方法,即细粒度逻辑强化,以提高逻辑实现的成功率。我们提出了一个分析框架来评估和微调冗余的数量和位置,以增加一个给定的逻辑功能。此外,我们还设计了一种优化强化逻辑功能的方法,使缺陷容错能力最大化。仿真结果表明,在相同的硬件成本下,与仅映射方案相比,所提出的逻辑强化方案在良率提高方面显著提高了缺陷容忍度。
{"title":"Defect-tolerant logic hardening for crossbar-based nanosystems","authors":"Yehua Su, Wenjing Rao","doi":"10.7873/DATE.2013.361","DOIUrl":"https://doi.org/10.7873/DATE.2013.361","url":null,"abstract":"Crossbar-based architectures are promising for the future nanoelectronic systems. However, due to the inherent unreliability of nanoscale devices, the implementation of any logic functions relies on aggressive defect-tolerant schemes applied at the post-manufacturing stage. Most of such defect-tolerant approaches explore mapping choices between logic variables/products and crossbar vertical/horizontal wires. In this paper, we develop a new approach, namely fine-grained logic hardening, based on the idea of adding redundancies into a logic function so as to boost the success rate of logic implementation. We propose an analytical framework to evaluate and fine-tune the amount and location of redundancy to be added for a given logic function. Furthermore, we devise a method to optimally harden the logic function so as to maximize the defect tolerance capability. Simulation results show that the proposed logic hardening scheme boosts defect tolerance capability significantly in yield improvement, compared to mapping-only schemes with the same amount of hardware cost.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90532319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fast and optimized task allocation method for low vertical link density 3-Dimensional Networks-on-Chip based many core systems 低垂直链路密度三维片上网络多核心系统的快速优化任务分配方法
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.357
Haoyuan Ying, T. Hollstein, K. Hofmann
The advantages of moving from 2-Dimensional Networks-on-Chip (NoCs) to 3-Dimensional NoCs for any application must be justified by the improvements in performance, power, latency and the overall system costs, especially the cost of Through-Silicon-Via (TSV). The trade-off between the number of TSVs and the 3D NoCs system performance becomes one of the most critical design issues. In this paper, we present a fast and optimized task allocation method for low vertical link density (TSV number) 3D NoCs based many core systems, in comparison to the classic methods as Genetic Algorithm (GA) and Simulated Annealing (SA), our method can save quite a number of design time. We take several state-of-the-art benchmarks and the generic scalable pseudo application (GSPA) with different network scales to simulate the achieved design (by our method), in comparison to GA and SA methods achieved designs, our technique can achieve better performance and lower cost. All the experiments have been done in GSNOC framework (written in SystemC-RTL), which can achieve the cycle accuracy and good flexibility.
对于任何应用程序来说,从二维片上网络(noc)迁移到三维noc的优势必须通过性能、功耗、延迟和整体系统成本的改进来证明,特别是通硅通孔(TSV)的成本。tsv数量与3D noc系统性能之间的权衡成为最关键的设计问题之一。本文提出了一种基于多核心系统的低垂直链路密度(TSV数)3D noc的快速优化任务分配方法,与遗传算法(GA)和模拟退火(SA)等经典方法相比,该方法可以节省大量的设计时间。我们采用几种最先进的基准测试和具有不同网络规模的通用可扩展伪应用程序(GSPA)来模拟实现的设计(通过我们的方法),与GA和SA方法实现的设计相比,我们的技术可以实现更好的性能和更低的成本。所有实验均在GSNOC框架(用SystemC-RTL编写)中完成,可以达到周期精度和良好的灵活性。
{"title":"Fast and optimized task allocation method for low vertical link density 3-Dimensional Networks-on-Chip based many core systems","authors":"Haoyuan Ying, T. Hollstein, K. Hofmann","doi":"10.7873/DATE.2013.357","DOIUrl":"https://doi.org/10.7873/DATE.2013.357","url":null,"abstract":"The advantages of moving from 2-Dimensional Networks-on-Chip (NoCs) to 3-Dimensional NoCs for any application must be justified by the improvements in performance, power, latency and the overall system costs, especially the cost of Through-Silicon-Via (TSV). The trade-off between the number of TSVs and the 3D NoCs system performance becomes one of the most critical design issues. In this paper, we present a fast and optimized task allocation method for low vertical link density (TSV number) 3D NoCs based many core systems, in comparison to the classic methods as Genetic Algorithm (GA) and Simulated Annealing (SA), our method can save quite a number of design time. We take several state-of-the-art benchmarks and the generic scalable pseudo application (GSPA) with different network scales to simulate the achieved design (by our method), in comparison to GA and SA methods achieved designs, our technique can achieve better performance and lower cost. All the experiments have been done in GSNOC framework (written in SystemC-RTL), which can achieve the cycle accuracy and good flexibility.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90682607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Minimization of P-circuits using boolean relations 用布尔关系最小化p电路
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.208
A. Bernasconi, V. Ciriani, G. Trucco, T. Villa
In this paper, we investigate how to use the complete flexibility of P-circuits, which realize a Boolean function by projecting it onto overlapping subsets given by a generalized Shannon decomposition. It is known how to compute the complete flexibility of P-circuits, but the algorithms proposed so far for its exploitation do not guarantee to find the best implementation, because they cast the problem as the minimization of an incompletely specified function. Instead, here we show that to explore all solutions we must set up the problem as the minimization of a Boolean relation, because there are don't care conditions that cannot be expressed by single cubes. In the experiments we report major improvements with respect to the previously published results.
本文研究了如何利用p电路的完全灵活性,将布尔函数投影到由广义香农分解给出的重叠子集上,从而实现布尔函数。如何计算p电路的完全灵活性是已知的,但迄今为止提出的算法并不能保证找到最佳实现,因为它们将问题视为不完全指定函数的最小化。相反,这里我们表明,为了探索所有的解决方案,我们必须将问题设置为布尔关系的最小化,因为存在不能由单个立方体表示的不关心条件。在实验中,我们报告了与先前发表的结果相比的重大改进。
{"title":"Minimization of P-circuits using boolean relations","authors":"A. Bernasconi, V. Ciriani, G. Trucco, T. Villa","doi":"10.7873/DATE.2013.208","DOIUrl":"https://doi.org/10.7873/DATE.2013.208","url":null,"abstract":"In this paper, we investigate how to use the complete flexibility of P-circuits, which realize a Boolean function by projecting it onto overlapping subsets given by a generalized Shannon decomposition. It is known how to compute the complete flexibility of P-circuits, but the algorithms proposed so far for its exploitation do not guarantee to find the best implementation, because they cast the problem as the minimization of an incompletely specified function. Instead, here we show that to explore all solutions we must set up the problem as the minimization of a Boolean relation, because there are don't care conditions that cannot be expressed by single cubes. In the experiments we report major improvements with respect to the previously published results.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87563617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Slack budgeting and slack to length converting for multi-bit flip-flop merging 多比特触发器合并的松弛预算和松弛到长度的转换
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.367
Chia-Chieh Lu, Rung-Bin Lin
In this paper we propose a flexible slack budgeting approach for post-placement multi-bit flip-flop (MBFF) merging. Our approach considers existing wiring topology and flip-flop delay changes for achieving more accurate slack budgeting. Besides, we propose a slack-to-length converting approach to translating timing slack into equivalent wire length for simplifying a merging process. We also develop a merging method to evaluate our slack budgeting approach. Our slack budgeting and MBFF merging programs are fully integrated into an industrial design flow. Experimental results show that our approach on average achieves 3.4% area saving, 50% clock tree power saving, and 5.3% total power saving.
本文提出了一种用于放置后多比特触发器(MBFF)合并的灵活松弛预算方法。我们的方法考虑了现有的布线拓扑和触发器延迟变化,以实现更准确的空闲预算。此外,我们提出了一种将定时松弛转换为等效导线长度的方法,以简化合并过程。我们还开发了一种合并方法来评估我们的松弛预算方法。我们的宽松预算和MBFF合并程序完全集成到工业设计流程中。实验结果表明,该方法平均节省3.4%的面积,50%的时钟树功耗,5.3%的总功耗。
{"title":"Slack budgeting and slack to length converting for multi-bit flip-flop merging","authors":"Chia-Chieh Lu, Rung-Bin Lin","doi":"10.7873/DATE.2013.367","DOIUrl":"https://doi.org/10.7873/DATE.2013.367","url":null,"abstract":"In this paper we propose a flexible slack budgeting approach for post-placement multi-bit flip-flop (MBFF) merging. Our approach considers existing wiring topology and flip-flop delay changes for achieving more accurate slack budgeting. Besides, we propose a slack-to-length converting approach to translating timing slack into equivalent wire length for simplifying a merging process. We also develop a merging method to evaluate our slack budgeting approach. Our slack budgeting and MBFF merging programs are fully integrated into an industrial design flow. Experimental results show that our approach on average achieves 3.4% area saving, 50% clock tree power saving, and 5.3% total power saving.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84067635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A parallel fast transform-based preconditioning approach for electrical-thermal co-simulation of power delivery networks 输电网电-热联合仿真中一种基于并行快速变换的预处理方法
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.341
Konstantis Daloukas, Alexia Marnari, N. Evmorfopoulos, P. Tsompanopoulou, G. Stamoulis
Efficient analysis of massive on-chip power delivery networks is among the most challenging problems facing the EDA industry today. Due to Joule heating effect and the temperature dependence of resistivity, temperature is one of the most important factors that affect IR drop and must be taken into account in power grid analysis. However, the sheer size of modern power delivery networks (comprising several thousands or millions of nodes) usually forces designers to neglect thermal effects during IR drop analysis in order to simplify and accelerate simulation. As a result, the absence of accurate estimates of Joule heating effect on IR drop analysis introduces significant uncertainty in the evaluation of circuit functionality. This work presents a new approach for fast electrical-thermal co-simulation of large-scale power grids found in contemporary nanometer-scale ICs. A state-of-the-art iterative method is combined with an efficient and extremely parallel preconditioning mechanism, which enables harnessing the computational resources of massively parallel architectures, such as graphics processing units (GPUs). Experimental results demonstrate that the proposed method achieves a speedup of 66.1X for a 3.1M-node design over a state-of-the-art direct method and a speedup of 22.2X for a 20.9M-node design over a state-of-the-art iterative method when GPUs are utilized.
大规模片上供电网络的有效分析是当今EDA行业面临的最具挑战性的问题之一。由于焦耳热效应和电阻率的温度依赖性,温度是影响红外降的重要因素之一,是电网分析中必须考虑的因素。然而,现代输电网络的庞大规模(包括数千或数百万个节点)通常迫使设计人员在红外下降分析期间忽略热效应,以简化和加速模拟。因此,在红外跌落分析中缺乏焦耳热效应的准确估计,在电路功能的评估中引入了重大的不确定性。这项工作为当代纳米级集成电路中大规模电网的快速电-热联合模拟提供了一种新的方法。最先进的迭代方法与高效且极其并行的预处理机制相结合,从而能够利用大规模并行架构(如图形处理单元(gpu))的计算资源。实验结果表明,当使用gpu时,与最先进的迭代方法相比,该方法在3.1 m节点设计上实现了66.1X的加速,在20.9 m节点设计上实现了22.2X的加速。
{"title":"A parallel fast transform-based preconditioning approach for electrical-thermal co-simulation of power delivery networks","authors":"Konstantis Daloukas, Alexia Marnari, N. Evmorfopoulos, P. Tsompanopoulou, G. Stamoulis","doi":"10.7873/DATE.2013.341","DOIUrl":"https://doi.org/10.7873/DATE.2013.341","url":null,"abstract":"Efficient analysis of massive on-chip power delivery networks is among the most challenging problems facing the EDA industry today. Due to Joule heating effect and the temperature dependence of resistivity, temperature is one of the most important factors that affect IR drop and must be taken into account in power grid analysis. However, the sheer size of modern power delivery networks (comprising several thousands or millions of nodes) usually forces designers to neglect thermal effects during IR drop analysis in order to simplify and accelerate simulation. As a result, the absence of accurate estimates of Joule heating effect on IR drop analysis introduces significant uncertainty in the evaluation of circuit functionality. This work presents a new approach for fast electrical-thermal co-simulation of large-scale power grids found in contemporary nanometer-scale ICs. A state-of-the-art iterative method is combined with an efficient and extremely parallel preconditioning mechanism, which enables harnessing the computational resources of massively parallel architectures, such as graphics processing units (GPUs). Experimental results demonstrate that the proposed method achieves a speedup of 66.1X for a 3.1M-node design over a state-of-the-art direct method and a speedup of 22.2X for a 20.9M-node design over a state-of-the-art iterative method when GPUs are utilized.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87193039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1