首页 > 最新文献

2009 IEEE International Conference on Computer Design最新文献

英文 中文
WHOLE: A low energy I-Cache with separate way history WHOLE:具有独立历史的低能量I-Cache
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413162
Zichao Xie, Dong Tong, Xu Cheng
Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WHOLE-Cache) design for single issue and in-order execution processors. The WHOLE-Cache design not only achieves a significant portion of energy reduction by effectively reducing dynamic energy dissipation of set-associative instruction cache, but also leads to no additional cycle penalties. Tag comparison results are stored into either the Branch Target Buffer (BTB) or the Instruction Cache (I-Cache) to avoid tag checks and unnecessary way activation for subsequent accesses to visited cache lines. The extended BTB uses way history bits for branch instructions, while the I-Cache extension bits are used in case of fetching consecutive instructions resided in different cache lines. A valid flag is associated with each stored tag comparison result to indicate whether the instruction to be fetched is resided in the recorded location. A simple invalidation scheme is implemented in the cache miss replacement operation. Whenever a cache line is replaced, the pointers to it, which reside in the BTB or other I-cache lines, will be invalidated accordingly. We model the WHOLE-Cache design in Verilog. By deriving basic parameters from TSMC 65nm technology, we use Wattch simulator to evaluate the performance and energy reduction of the WHOLE-Cache in the instruction fetch stage. We use SPEC2000 and Mediabench as benchmarks. It is observed that compared with a conventional 4-way set-associative I-Cache, the energy consumption of the WHOLE-Cache is reduced by 65% without any performance penalty.
集合关联指令缓存以消耗大量能量为代价实现了低缺失率。以前的节能方法通常存在性能下降和冗余扩展位的问题。在本文中,我们提出了一种针对单问题和顺序执行处理器的面向历史的低能量指令缓存(WHOLE-Cache)设计。整体缓存设计不仅通过有效地减少集合关联指令缓存的动态能量耗散实现了显著的能量降低,而且没有额外的周期损失。标签比较结果存储在分支目标缓冲区(BTB)或指令缓存(I-Cache)中,以避免标签检查和不必要的方式激活,以便后续访问已访问的缓存行。扩展的BTB为分支指令使用way历史位,而I-Cache扩展位用于获取驻留在不同缓存行的连续指令。一个有效的标志与每个存储的标记比较结果相关联,以指示要获取的指令是否位于记录的位置。在缓存缺失替换操作中实现了一个简单的无效方案。每当缓存线被替换时,驻留在BTB或其他I-cache线中的指向它的指针将相应地失效。我们在Verilog中对整个缓存设计进行建模。通过获得台积电65nm工艺的基本参数,我们使用watch模拟器来评估WHOLE-Cache在指令提取阶段的性能和能耗。我们使用SPEC2000和mediabbench作为基准。观察到,与传统的4路集合关联I-Cache相比,全缓存的能耗降低了65%,而没有任何性能损失。
{"title":"WHOLE: A low energy I-Cache with separate way history","authors":"Zichao Xie, Dong Tong, Xu Cheng","doi":"10.1109/ICCD.2009.5413162","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413162","url":null,"abstract":"Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WHOLE-Cache) design for single issue and in-order execution processors. The WHOLE-Cache design not only achieves a significant portion of energy reduction by effectively reducing dynamic energy dissipation of set-associative instruction cache, but also leads to no additional cycle penalties. Tag comparison results are stored into either the Branch Target Buffer (BTB) or the Instruction Cache (I-Cache) to avoid tag checks and unnecessary way activation for subsequent accesses to visited cache lines. The extended BTB uses way history bits for branch instructions, while the I-Cache extension bits are used in case of fetching consecutive instructions resided in different cache lines. A valid flag is associated with each stored tag comparison result to indicate whether the instruction to be fetched is resided in the recorded location. A simple invalidation scheme is implemented in the cache miss replacement operation. Whenever a cache line is replaced, the pointers to it, which reside in the BTB or other I-cache lines, will be invalidated accordingly. We model the WHOLE-Cache design in Verilog. By deriving basic parameters from TSMC 65nm technology, we use Wattch simulator to evaluate the performance and energy reduction of the WHOLE-Cache in the instruction fetch stage. We use SPEC2000 and Mediabench as benchmarks. It is observed that compared with a conventional 4-way set-associative I-Cache, the energy consumption of the WHOLE-Cache is reduced by 65% without any performance penalty.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133559876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A fast routability- and performance-driven droplet routing algorithm for digital microfluidic biochips 数字微流控生物芯片的快速可达性和性能驱动的液滴路由算法
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413119
Tsung-Wei Huang, Tsung-Yi Ho
As the microfluidic technology advances, the design complexity of digital microfluidic biochips (DMFB) are expected to explode in the near future. One of the most critical challenges for DMFB design is the droplet routing problem, which schedules the movement of each droplet in a time-multiplexed manner. In this paper, we propose a fast routability- and performance-driven droplet router for DMFBs. The main contributions of our work are: (1) a global moving vector analysis for constructing preferred routing tracks to minimize the number of used unit cells; (2) an entropy-based equation to determine the routing order of droplets for better routability; (3) a routing compaction technique by dynamic programming to minimize the latest arrival time of droplets. Experimental results show that our algorithm achieves 100% routing completion for all test cases on three Benchmark Suites while the previous algorithms are not. In addition to routability, compared with the state-of-the-art high-performance routing on the Benchmark Suite I [3], the experimental results still show that our algorithm performed better in runtime by 40%, reduced the latest arrival time by 21%, reduced the used unit cells by 10%. Furthermore, experiment results on Benchmark Suite II and III are also very promising. Based on the evaluation of three Benchmark Suites, our algorithm demonstrates the efficiency and robustness of handling complex droplet routing problem over the existing algorithms.
随着微流控技术的发展,数字微流控生物芯片(DMFB)的设计复杂度有望在不久的将来呈爆炸式增长。DMFB设计中最关键的挑战之一是液滴路由问题,该问题以时间复用的方式调度每个液滴的运动。在本文中,我们提出了一种快速可达性和性能驱动的dmfb滴状路由器。我们工作的主要贡献是:(1)构建优选路由路径的全局移动向量分析,以最小化使用的单元格数量;(2)基于熵的方程来确定液滴的路由顺序,以获得更好的可达性;(3)采用动态规划的路由压缩技术,最大限度地减少液滴的最新到达时间。实验结果表明,我们的算法在三个基准套件的所有测试用例中都实现了100%的路由完成,而以前的算法则不能。除了可达性之外,与Benchmark Suite I上最先进的高性能路由相比[3],实验结果仍然表明,我们的算法在运行时的性能提高了40%,最新到达时间减少了21%,使用的单元格减少了10%。此外,在Benchmark Suite II和III上的实验结果也很有前景。通过对三个基准套件的评估,我们的算法在处理复杂液滴路径问题上比现有算法具有更高的效率和鲁棒性。
{"title":"A fast routability- and performance-driven droplet routing algorithm for digital microfluidic biochips","authors":"Tsung-Wei Huang, Tsung-Yi Ho","doi":"10.1109/ICCD.2009.5413119","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413119","url":null,"abstract":"As the microfluidic technology advances, the design complexity of digital microfluidic biochips (DMFB) are expected to explode in the near future. One of the most critical challenges for DMFB design is the droplet routing problem, which schedules the movement of each droplet in a time-multiplexed manner. In this paper, we propose a fast routability- and performance-driven droplet router for DMFBs. The main contributions of our work are: (1) a global moving vector analysis for constructing preferred routing tracks to minimize the number of used unit cells; (2) an entropy-based equation to determine the routing order of droplets for better routability; (3) a routing compaction technique by dynamic programming to minimize the latest arrival time of droplets. Experimental results show that our algorithm achieves 100% routing completion for all test cases on three Benchmark Suites while the previous algorithms are not. In addition to routability, compared with the state-of-the-art high-performance routing on the Benchmark Suite I [3], the experimental results still show that our algorithm performed better in runtime by 40%, reduced the latest arrival time by 21%, reduced the used unit cells by 10%. Furthermore, experiment results on Benchmark Suite II and III are also very promising. Based on the evaluation of three Benchmark Suites, our algorithm demonstrates the efficiency and robustness of handling complex droplet routing problem over the existing algorithms.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123706556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 89
Statistical timing analysis based on simulation of lithographic process 基于光刻过程仿真的统计时序分析
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413181
Aswin Sreedhar, S. Kundu
The length of poly-gate printed on silicon depends on exposure dose, depth of focus, photo-resist thickness and planarity of the surface. In sub-wavelength lithography, polygate length also varies with layout topology. Poly-gate length determines the effective channel length of a transistor, which determines its performance. Since the sources of error are hard to control, statistical analysis can be used to measure the impact on circuit timing characteristics. Typical lithography-aware methodologies consider only systematic variation such as across chip linewidth variation (ACLV). In this paper we propose a statistical technique for timing yield prediction, based on variational lithography modeling of physical circuit layout. By statistically varying lithographic process parameters we estimate the difference in timing yield estimation of a design. Our simulation results show that if manufacturing process parameters follow a Gaussian distribution, resulting transistors follow a skewed normal distribution, where a greater number of them will have shorter channel length. This led us to investigate whether Statistical Static Timing Analysis (SSTA) is overly pessimistic. The baseline delay model assumed for SSTA in out approach is a Gaussian delay model fitted to skew normal distribution data obtained from statistical litho simulation. Our experiments showed that even after re-centering Gaussian delay model to fit the channel length data with minimum error, it is still overly pessimistic and significantly underestimates circuit performance.
印在硅上的多栅极的长度取决于曝光剂量、聚焦深度、光刻胶厚度和表面的平面度。在亚波长光刻中,多栅极长度也随布局拓扑而变化。多栅极长度决定了晶体管的有效通道长度,有效通道长度决定了晶体管的性能。由于误差来源难以控制,因此可以使用统计分析来测量其对电路时序特性的影响。典型的光刻敏感方法只考虑系统变化,如芯片线宽变化(ACLV)。在本文中,我们提出了一种基于变分光刻建模物理电路布局的定时良率预测的统计技术。通过统计变化的光刻工艺参数,我们估计了一个设计的定时良率估计的差异。仿真结果表明,如果制造工艺参数服从高斯分布,则晶体管服从偏态正态分布,其中晶体管数量越多,通道长度越短。这导致我们调查统计静态时间分析(SSTA)是否过于悲观。SSTA in - out方法的基线延迟模型是一个高斯延迟模型,拟合了统计岩性模拟得到的偏态正态分布数据。我们的实验表明,即使在重新定位高斯延迟模型以最小误差拟合信道长度数据后,它仍然过于悲观,并且显着低估了电路性能。
{"title":"Statistical timing analysis based on simulation of lithographic process","authors":"Aswin Sreedhar, S. Kundu","doi":"10.1109/ICCD.2009.5413181","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413181","url":null,"abstract":"The length of poly-gate printed on silicon depends on exposure dose, depth of focus, photo-resist thickness and planarity of the surface. In sub-wavelength lithography, polygate length also varies with layout topology. Poly-gate length determines the effective channel length of a transistor, which determines its performance. Since the sources of error are hard to control, statistical analysis can be used to measure the impact on circuit timing characteristics. Typical lithography-aware methodologies consider only systematic variation such as across chip linewidth variation (ACLV). In this paper we propose a statistical technique for timing yield prediction, based on variational lithography modeling of physical circuit layout. By statistically varying lithographic process parameters we estimate the difference in timing yield estimation of a design. Our simulation results show that if manufacturing process parameters follow a Gaussian distribution, resulting transistors follow a skewed normal distribution, where a greater number of them will have shorter channel length. This led us to investigate whether Statistical Static Timing Analysis (SSTA) is overly pessimistic. The baseline delay model assumed for SSTA in out approach is a Gaussian delay model fitted to skew normal distribution data obtained from statistical litho simulation. Our experiments showed that even after re-centering Gaussian delay model to fit the channel length data with minimum error, it is still overly pessimistic and significantly underestimates circuit performance.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125976631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Rapid early-stage microarchitecture design using predictive models 使用预测模型的快速早期微架构设计
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413141
Christophe Dubach, Timothy M. Jones, M. O’Boyle
The early-stage design of a new microprocessor involves the evaluation of a wide range of benchmarks across a large number of architectural configurations. Several methods are used to cut down on the required simulation time. Typically, however, existing approaches fail to capture true program behaviour accurately and require a non-negligible number of training simulations to be run. We address these problems by developing a machine learning model that predicts the mean of any given metric, e.g. cycles or energy, across a range of programs, for any microarchitectural configuration. It works by combining only the most representative programs from the benchmark suite based on their behaviour in the design space under consideration. We use our model to predict the mean performance, energy, energy-delay (ED) and energy-delay-squared (EDD) of the SPEC CPU 2000 and MiBench benchmark suites within our design space. We achieve the same level of accuracy as two state-of-the-art prediction techniques but require five times fewer training simulations. Furthermore, our technique is scalable and we show that, asymptotically, it requires an order of magnitude fewer simulations than these existing approaches.
新微处理器的早期设计涉及对大量体系结构配置的广泛基准进行评估。采用了几种方法来缩短所需的仿真时间。然而,通常情况下,现有的方法不能准确地捕获真实的程序行为,并且需要运行不可忽略的训练模拟。我们通过开发一个机器学习模型来解决这些问题,该模型可以预测任何给定度量的平均值,例如周期或能量,跨越一系列程序,适用于任何微架构配置。它的工作原理是根据基准套件中最具代表性的程序在考虑的设计空间中的行为来组合它们。我们使用我们的模型来预测在我们的设计空间内SPEC CPU 2000和MiBench基准套件的平均性能、能量、能量延迟(ED)和能量延迟平方(EDD)。我们达到了与两种最先进的预测技术相同的精度水平,但需要的训练模拟减少了五倍。此外,我们的技术是可扩展的,我们表明,渐近地,它需要比这些现有方法少一个数量级的模拟。
{"title":"Rapid early-stage microarchitecture design using predictive models","authors":"Christophe Dubach, Timothy M. Jones, M. O’Boyle","doi":"10.1109/ICCD.2009.5413141","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413141","url":null,"abstract":"The early-stage design of a new microprocessor involves the evaluation of a wide range of benchmarks across a large number of architectural configurations. Several methods are used to cut down on the required simulation time. Typically, however, existing approaches fail to capture true program behaviour accurately and require a non-negligible number of training simulations to be run. We address these problems by developing a machine learning model that predicts the mean of any given metric, e.g. cycles or energy, across a range of programs, for any microarchitectural configuration. It works by combining only the most representative programs from the benchmark suite based on their behaviour in the design space under consideration. We use our model to predict the mean performance, energy, energy-delay (ED) and energy-delay-squared (EDD) of the SPEC CPU 2000 and MiBench benchmark suites within our design space. We achieve the same level of accuracy as two state-of-the-art prediction techniques but require five times fewer training simulations. Furthermore, our technique is scalable and we show that, asymptotically, it requires an order of magnitude fewer simulations than these existing approaches.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125077774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Impact analysis of performance faults in modern microprocessors 现代微处理器性能故障的影响分析
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413171
Naghmeh Karimi, M. Maniatakos, C. Tirumurti, A. Jas, Y. Makris
Towards improving performance, modern microprocessors incorporate a variety of architectural features, such as branch prediction and speculative execution, which are not critical to the correctness of their operation. While faults in the corresponding hardware may not necessarily affect functional correctness, they may, nevertheless, adversely impact performance. In this paper, we investigate quantitatively the performance impact of such faults using a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. We provide extensive fault simulation-based experimental results and we discuss how this information may guide the inclusion of additional hardware for performance loss recovery and yield enhancement.
为了提高性能,现代微处理器结合了各种体系结构特征,例如分支预测和推测执行,这些对其操作的正确性并不重要。虽然相应硬件中的故障不一定会影响功能的正确性,但它们可能会对性能产生不利影响。在本文中,我们使用一个超标量的、动态调度的、乱序的、类似alpha的微处理器,在其上执行SPEC2000整数基准测试,定量地研究了这些故障对性能的影响。我们提供了广泛的基于故障模拟的实验结果,并讨论了这些信息如何指导包含额外的硬件以恢复性能损失和提高产量。
{"title":"Impact analysis of performance faults in modern microprocessors","authors":"Naghmeh Karimi, M. Maniatakos, C. Tirumurti, A. Jas, Y. Makris","doi":"10.1109/ICCD.2009.5413171","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413171","url":null,"abstract":"Towards improving performance, modern microprocessors incorporate a variety of architectural features, such as branch prediction and speculative execution, which are not critical to the correctness of their operation. While faults in the corresponding hardware may not necessarily affect functional correctness, they may, nevertheless, adversely impact performance. In this paper, we investigate quantitatively the performance impact of such faults using a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. We provide extensive fault simulation-based experimental results and we discuss how this information may guide the inclusion of additional hardware for performance loss recovery and yield enhancement.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128289422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Mid-range wireless energy transfer using inductive resonance for wireless sensors 无线传感器中使用感应共振的中程无线能量传输
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413106
S. J. Mazlouman, A. Mahanfar, B. Kaminska
Methods are suggested and tested to measure and optimize the wireless energy transfer efficiency for mid-range (10–100cm) inductive coils with relatively low profile using magnetic resonance. These coils can be used to provide energy for wireless sensors and battery-operated devices. It is shown that for every system, a resonance frequency can be identified where the wireless energy transfer efficiency is optimal. Several prototypes are developed and tested as a proof of validity of the proposed technique. It is also shown that by tuning to the optimum resonant frequency and designing proper matching circuitry, an efficiency of about 25% for moderate profiles can be achieved.
提出并测试了利用磁共振测量和优化低轮廓中程(10-100cm)电感线圈无线能量传输效率的方法。这些线圈可以用来为无线传感器和电池供电的设备提供能量。结果表明,对于每个系统,都可以确定一个无线能量传输效率最优的共振频率。开发并测试了几个原型,以证明所提出技术的有效性。通过调谐到最佳谐振频率并设计合适的匹配电路,可以实现约25%的中等轮廓效率。
{"title":"Mid-range wireless energy transfer using inductive resonance for wireless sensors","authors":"S. J. Mazlouman, A. Mahanfar, B. Kaminska","doi":"10.1109/ICCD.2009.5413106","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413106","url":null,"abstract":"Methods are suggested and tested to measure and optimize the wireless energy transfer efficiency for mid-range (10–100cm) inductive coils with relatively low profile using magnetic resonance. These coils can be used to provide energy for wireless sensors and battery-operated devices. It is shown that for every system, a resonance frequency can be identified where the wireless energy transfer efficiency is optimal. Several prototypes are developed and tested as a proof of validity of the proposed technique. It is also shown that by tuning to the optimum resonant frequency and designing proper matching circuitry, an efficiency of about 25% for moderate profiles can be achieved.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132136615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
FinFET-based dynamic power management of on-chip interconnection networks through adaptive back-gate biasing 基于finfet的自适应后门偏置片上互连网络动态电源管理
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413133
Chun-Yi Lee, N. Jha
On-chip interconnection networks are fast becoming significant power-consumers in high-performance chip multiprocessors (CMPs). Increased power consumption leads to more heat, adversely degrades system reliability, and may increase the cost of cooling IC packages. This situation becomes even worse as bulk CMOS scales further into the nanometer regime because of excessive leakage power due to short-channel effects. In this paper, we explore the use of FinFETs, which are promising substitutes for bulk CMOS at the 32nm node and beyond, to design on-chip network routers. We present a detailed design of a variable pipeline stage router (VPSR) targeted at FinFET technology. We employ a dynamic power management scheme, which we call adaptive back-gate biasing (ABGB), for FinFET implementations. We evaluate VPSR and ABGB on a simulation platform specifically designed for power and performance simulations for FinFET-based interconnection networks. The results show that VPSR is able to successfully adapt its power consumption to incoming traffic, with a resultant 20% reduction in power at almost no impact on latency.
片上互连网络正迅速成为高性能芯片多处理器(cmp)中重要的功耗消耗者。功耗增加会导致热量增加,降低系统可靠性,并可能增加IC封装的冷却成本。由于短通道效应导致的泄漏功率过大,当块体CMOS进一步扩展到纳米级时,这种情况变得更加严重。在本文中,我们探索了使用finfet来设计片上网络路由器,finfet是32nm及以上节点上批量CMOS的有前途的替代品。我们提出了一种针对FinFET技术的可变管道级路由器(VPSR)的详细设计。我们采用动态电源管理方案,我们称之为自适应后门偏置(ABGB),用于FinFET实现。我们在专为基于finfet的互连网络的功率和性能模拟而设计的仿真平台上评估VPSR和ABGB。结果表明,VPSR能够成功地使其功耗适应传入流量,从而在几乎不影响延迟的情况下降低20%的功耗。
{"title":"FinFET-based dynamic power management of on-chip interconnection networks through adaptive back-gate biasing","authors":"Chun-Yi Lee, N. Jha","doi":"10.1109/ICCD.2009.5413133","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413133","url":null,"abstract":"On-chip interconnection networks are fast becoming significant power-consumers in high-performance chip multiprocessors (CMPs). Increased power consumption leads to more heat, adversely degrades system reliability, and may increase the cost of cooling IC packages. This situation becomes even worse as bulk CMOS scales further into the nanometer regime because of excessive leakage power due to short-channel effects. In this paper, we explore the use of FinFETs, which are promising substitutes for bulk CMOS at the 32nm node and beyond, to design on-chip network routers. We present a detailed design of a variable pipeline stage router (VPSR) targeted at FinFET technology. We employ a dynamic power management scheme, which we call adaptive back-gate biasing (ABGB), for FinFET implementations. We evaluate VPSR and ABGB on a simulation platform specifically designed for power and performance simulations for FinFET-based interconnection networks. The results show that VPSR is able to successfully adapt its power consumption to incoming traffic, with a resultant 20% reduction in power at almost no impact on latency.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124204018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Defect-based test optimization for analog/RF circuits for near-zero DPPM applications 为接近零DPPM应用的模拟/射频电路的基于缺陷的测试优化
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413139
E. Yilmaz, S. Ozev
Analog circuits are often tested based on their specifications. While specification-based testing ensures the initial product quality, full testing is often not possible in high volume production. Moreover, even full specification-based testing cannot guarantee that the circuit does not contain any physical defects. Some application domains require near-zero defect levels independent of whether the specifications are met. In this work, we present a defect based test optimization method focusing on defective parts per million (DPPM) minimization. We extract potential defects through inductive fault analysis (IFA) and reduce the number of tests without degrading the test quality. In order to achieve near zero DPPM, we employ outlier analysis to identify defective circuits that cannot be identified using specification based methods. Simulation results on an LNA show that DPPM is reduced down to 0 at a cost of 0.2% yield loss with the proposed method.
模拟电路通常根据其规格进行测试。虽然基于规格的测试确保了最初的产品质量,但在大批量生产中,全面测试通常是不可能的。此外,即使是基于完整规格的测试也不能保证电路不包含任何物理缺陷。一些应用领域需要接近于零的缺陷级别,这与是否满足规范无关。在这项工作中,我们提出了一种基于缺陷的测试优化方法,该方法的重点是缺陷率(DPPM)最小化。通过归纳故障分析(IFA)提取潜在缺陷,在不降低测试质量的前提下减少测试次数。为了实现接近零的DPPM,我们采用离群值分析来识别无法使用基于规范的方法识别的缺陷电路。在LNA上的仿真结果表明,该方法以0.2%的产率损失为代价,将DPPM降至0。
{"title":"Defect-based test optimization for analog/RF circuits for near-zero DPPM applications","authors":"E. Yilmaz, S. Ozev","doi":"10.1109/ICCD.2009.5413139","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413139","url":null,"abstract":"Analog circuits are often tested based on their specifications. While specification-based testing ensures the initial product quality, full testing is often not possible in high volume production. Moreover, even full specification-based testing cannot guarantee that the circuit does not contain any physical defects. Some application domains require near-zero defect levels independent of whether the specifications are met. In this work, we present a defect based test optimization method focusing on defective parts per million (DPPM) minimization. We extract potential defects through inductive fault analysis (IFA) and reduce the number of tests without degrading the test quality. In order to achieve near zero DPPM, we employ outlier analysis to identify defective circuits that cannot be identified using specification based methods. Simulation results on an LNA show that DPPM is reduced down to 0 at a cost of 0.2% yield loss with the proposed method.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126223147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Reusing cached schedules in an out-of-order processor with in-order issue logic 在无序处理器中使用有序问题逻辑重用缓存的调度
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413146
Oscar Palomar, Toni Juan, J. Navarro
The complex and powerful out-of-order issue logic dismisses the repetitive nature of the code, unlike what caches or branch predictors do. We show that 90% of the cycles, the group of instructions selected by the issue logic belongs to just 13% of the total different groups issued: the issue logic of an out-of-order processor is constantly re-discovering what it has already found. To benefit from the repetitive nature of instruction issue, we move the scheduling logic after the commit stage, out of the critical path of execution. The schedules created there are cached and reused to feed a simple in-order issue logic, that could result in a higher frequency design. We present the complete design of our ReLaSch processor, that achieves the same average IPC than a conventional out-of-order processor, and a 1.56 speed-up over the IPC of an in-order processor. We actually surpass the out-of-order IPC in 23 out of 40 SPEC benchmarks, mainly because the broader vision of the code after the commit stage allows creating better schedules.
与缓存或分支预测器不同,复杂而强大的乱序问题逻辑消除了代码的重复性。我们表明,在90%的周期中,由问题逻辑选择的指令组只属于发出的总不同组的13%:无序处理器的问题逻辑不断地重新发现它已经找到的东西。为了从指令问题的重复性中获益,我们在提交阶段之后将调度逻辑移出执行的关键路径。在那里创建的调度被缓存和重用,以提供一个简单的有序问题逻辑,这可能导致更高频率的设计。我们展示了我们的ReLaSch处理器的完整设计,它实现了与传统无序处理器相同的平均IPC,并且比有序处理器的IPC加快了1.56。实际上,在40个SPEC基准测试中,我们有23个超过了无序IPC,这主要是因为在提交阶段之后,更广阔的代码视野允许创建更好的调度。
{"title":"Reusing cached schedules in an out-of-order processor with in-order issue logic","authors":"Oscar Palomar, Toni Juan, J. Navarro","doi":"10.1109/ICCD.2009.5413146","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413146","url":null,"abstract":"The complex and powerful out-of-order issue logic dismisses the repetitive nature of the code, unlike what caches or branch predictors do. We show that 90% of the cycles, the group of instructions selected by the issue logic belongs to just 13% of the total different groups issued: the issue logic of an out-of-order processor is constantly re-discovering what it has already found. To benefit from the repetitive nature of instruction issue, we move the scheduling logic after the commit stage, out of the critical path of execution. The schedules created there are cached and reused to feed a simple in-order issue logic, that could result in a higher frequency design. We present the complete design of our ReLaSch processor, that achieves the same average IPC than a conventional out-of-order processor, and a 1.56 speed-up over the IPC of an in-order processor. We actually surpass the out-of-order IPC in 23 out of 40 SPEC benchmarks, mainly because the broader vision of the code after the commit stage allows creating better schedules.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130046707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Extending data prefetching to cope with context switch misses 扩展数据预取以处理上下文切换错误
Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413144
Hanyu Cui, S. Sair
Among the various costs of a context switch, its impact on the performance of L2 caches is the most significant because of the resulting high miss penalty. To reduce the impact of frequent context switches, we propose restoring a program's locality by prefetching into the L2 cache the data a program was using before it was swapped out. A Global History List is used to record a process' L2 read accesses in LRU order. These accesses are saved along with the process' context when the process is swapped out and loaded to guide prefetching when it is swapped in. We also propose a feedback mechanism that greatly reduces memory traffic incurred by our prefetching scheme. Experiments show significant speedup over baseline architectures with and without traditional prefetching in the presence of frequent context switches.
在上下文切换的各种成本中,它对L2缓存性能的影响是最显著的,因为它会导致较高的丢失损失。为了减少频繁上下文切换的影响,我们建议通过将程序在交换出之前使用的数据预取到L2缓存中来恢复程序的局域性。全局历史列表是用来记录进程L2读访问的LRU顺序。当交换出进程时,这些访问与进程的上下文一起保存,并加载以指导交换入进程时的预取。我们还提出了一种反馈机制,大大减少了我们的预取方案所带来的内存流量。实验表明,在频繁上下文切换的情况下,使用和不使用传统预取都比基线架构有显著的加速。
{"title":"Extending data prefetching to cope with context switch misses","authors":"Hanyu Cui, S. Sair","doi":"10.1109/ICCD.2009.5413144","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413144","url":null,"abstract":"Among the various costs of a context switch, its impact on the performance of L2 caches is the most significant because of the resulting high miss penalty. To reduce the impact of frequent context switches, we propose restoring a program's locality by prefetching into the L2 cache the data a program was using before it was swapped out. A Global History List is used to record a process' L2 read accesses in LRU order. These accesses are saved along with the process' context when the process is swapped out and loaded to guide prefetching when it is swapped in. We also propose a feedback mechanism that greatly reduces memory traffic incurred by our prefetching scheme. Experiments show significant speedup over baseline architectures with and without traditional prefetching in the presence of frequent context switches.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122605451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2009 IEEE International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1