Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413162
Zichao Xie, Dong Tong, Xu Cheng
Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WHOLE-Cache) design for single issue and in-order execution processors. The WHOLE-Cache design not only achieves a significant portion of energy reduction by effectively reducing dynamic energy dissipation of set-associative instruction cache, but also leads to no additional cycle penalties. Tag comparison results are stored into either the Branch Target Buffer (BTB) or the Instruction Cache (I-Cache) to avoid tag checks and unnecessary way activation for subsequent accesses to visited cache lines. The extended BTB uses way history bits for branch instructions, while the I-Cache extension bits are used in case of fetching consecutive instructions resided in different cache lines. A valid flag is associated with each stored tag comparison result to indicate whether the instruction to be fetched is resided in the recorded location. A simple invalidation scheme is implemented in the cache miss replacement operation. Whenever a cache line is replaced, the pointers to it, which reside in the BTB or other I-cache lines, will be invalidated accordingly. We model the WHOLE-Cache design in Verilog. By deriving basic parameters from TSMC 65nm technology, we use Wattch simulator to evaluate the performance and energy reduction of the WHOLE-Cache in the instruction fetch stage. We use SPEC2000 and Mediabench as benchmarks. It is observed that compared with a conventional 4-way set-associative I-Cache, the energy consumption of the WHOLE-Cache is reduced by 65% without any performance penalty.
{"title":"WHOLE: A low energy I-Cache with separate way history","authors":"Zichao Xie, Dong Tong, Xu Cheng","doi":"10.1109/ICCD.2009.5413162","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413162","url":null,"abstract":"Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WHOLE-Cache) design for single issue and in-order execution processors. The WHOLE-Cache design not only achieves a significant portion of energy reduction by effectively reducing dynamic energy dissipation of set-associative instruction cache, but also leads to no additional cycle penalties. Tag comparison results are stored into either the Branch Target Buffer (BTB) or the Instruction Cache (I-Cache) to avoid tag checks and unnecessary way activation for subsequent accesses to visited cache lines. The extended BTB uses way history bits for branch instructions, while the I-Cache extension bits are used in case of fetching consecutive instructions resided in different cache lines. A valid flag is associated with each stored tag comparison result to indicate whether the instruction to be fetched is resided in the recorded location. A simple invalidation scheme is implemented in the cache miss replacement operation. Whenever a cache line is replaced, the pointers to it, which reside in the BTB or other I-cache lines, will be invalidated accordingly. We model the WHOLE-Cache design in Verilog. By deriving basic parameters from TSMC 65nm technology, we use Wattch simulator to evaluate the performance and energy reduction of the WHOLE-Cache in the instruction fetch stage. We use SPEC2000 and Mediabench as benchmarks. It is observed that compared with a conventional 4-way set-associative I-Cache, the energy consumption of the WHOLE-Cache is reduced by 65% without any performance penalty.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133559876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413119
Tsung-Wei Huang, Tsung-Yi Ho
As the microfluidic technology advances, the design complexity of digital microfluidic biochips (DMFB) are expected to explode in the near future. One of the most critical challenges for DMFB design is the droplet routing problem, which schedules the movement of each droplet in a time-multiplexed manner. In this paper, we propose a fast routability- and performance-driven droplet router for DMFBs. The main contributions of our work are: (1) a global moving vector analysis for constructing preferred routing tracks to minimize the number of used unit cells; (2) an entropy-based equation to determine the routing order of droplets for better routability; (3) a routing compaction technique by dynamic programming to minimize the latest arrival time of droplets. Experimental results show that our algorithm achieves 100% routing completion for all test cases on three Benchmark Suites while the previous algorithms are not. In addition to routability, compared with the state-of-the-art high-performance routing on the Benchmark Suite I [3], the experimental results still show that our algorithm performed better in runtime by 40%, reduced the latest arrival time by 21%, reduced the used unit cells by 10%. Furthermore, experiment results on Benchmark Suite II and III are also very promising. Based on the evaluation of three Benchmark Suites, our algorithm demonstrates the efficiency and robustness of handling complex droplet routing problem over the existing algorithms.
随着微流控技术的发展,数字微流控生物芯片(DMFB)的设计复杂度有望在不久的将来呈爆炸式增长。DMFB设计中最关键的挑战之一是液滴路由问题,该问题以时间复用的方式调度每个液滴的运动。在本文中,我们提出了一种快速可达性和性能驱动的dmfb滴状路由器。我们工作的主要贡献是:(1)构建优选路由路径的全局移动向量分析,以最小化使用的单元格数量;(2)基于熵的方程来确定液滴的路由顺序,以获得更好的可达性;(3)采用动态规划的路由压缩技术,最大限度地减少液滴的最新到达时间。实验结果表明,我们的算法在三个基准套件的所有测试用例中都实现了100%的路由完成,而以前的算法则不能。除了可达性之外,与Benchmark Suite I上最先进的高性能路由相比[3],实验结果仍然表明,我们的算法在运行时的性能提高了40%,最新到达时间减少了21%,使用的单元格减少了10%。此外,在Benchmark Suite II和III上的实验结果也很有前景。通过对三个基准套件的评估,我们的算法在处理复杂液滴路径问题上比现有算法具有更高的效率和鲁棒性。
{"title":"A fast routability- and performance-driven droplet routing algorithm for digital microfluidic biochips","authors":"Tsung-Wei Huang, Tsung-Yi Ho","doi":"10.1109/ICCD.2009.5413119","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413119","url":null,"abstract":"As the microfluidic technology advances, the design complexity of digital microfluidic biochips (DMFB) are expected to explode in the near future. One of the most critical challenges for DMFB design is the droplet routing problem, which schedules the movement of each droplet in a time-multiplexed manner. In this paper, we propose a fast routability- and performance-driven droplet router for DMFBs. The main contributions of our work are: (1) a global moving vector analysis for constructing preferred routing tracks to minimize the number of used unit cells; (2) an entropy-based equation to determine the routing order of droplets for better routability; (3) a routing compaction technique by dynamic programming to minimize the latest arrival time of droplets. Experimental results show that our algorithm achieves 100% routing completion for all test cases on three Benchmark Suites while the previous algorithms are not. In addition to routability, compared with the state-of-the-art high-performance routing on the Benchmark Suite I [3], the experimental results still show that our algorithm performed better in runtime by 40%, reduced the latest arrival time by 21%, reduced the used unit cells by 10%. Furthermore, experiment results on Benchmark Suite II and III are also very promising. Based on the evaluation of three Benchmark Suites, our algorithm demonstrates the efficiency and robustness of handling complex droplet routing problem over the existing algorithms.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123706556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413181
Aswin Sreedhar, S. Kundu
The length of poly-gate printed on silicon depends on exposure dose, depth of focus, photo-resist thickness and planarity of the surface. In sub-wavelength lithography, polygate length also varies with layout topology. Poly-gate length determines the effective channel length of a transistor, which determines its performance. Since the sources of error are hard to control, statistical analysis can be used to measure the impact on circuit timing characteristics. Typical lithography-aware methodologies consider only systematic variation such as across chip linewidth variation (ACLV). In this paper we propose a statistical technique for timing yield prediction, based on variational lithography modeling of physical circuit layout. By statistically varying lithographic process parameters we estimate the difference in timing yield estimation of a design. Our simulation results show that if manufacturing process parameters follow a Gaussian distribution, resulting transistors follow a skewed normal distribution, where a greater number of them will have shorter channel length. This led us to investigate whether Statistical Static Timing Analysis (SSTA) is overly pessimistic. The baseline delay model assumed for SSTA in out approach is a Gaussian delay model fitted to skew normal distribution data obtained from statistical litho simulation. Our experiments showed that even after re-centering Gaussian delay model to fit the channel length data with minimum error, it is still overly pessimistic and significantly underestimates circuit performance.
印在硅上的多栅极的长度取决于曝光剂量、聚焦深度、光刻胶厚度和表面的平面度。在亚波长光刻中,多栅极长度也随布局拓扑而变化。多栅极长度决定了晶体管的有效通道长度,有效通道长度决定了晶体管的性能。由于误差来源难以控制,因此可以使用统计分析来测量其对电路时序特性的影响。典型的光刻敏感方法只考虑系统变化,如芯片线宽变化(ACLV)。在本文中,我们提出了一种基于变分光刻建模物理电路布局的定时良率预测的统计技术。通过统计变化的光刻工艺参数,我们估计了一个设计的定时良率估计的差异。仿真结果表明,如果制造工艺参数服从高斯分布,则晶体管服从偏态正态分布,其中晶体管数量越多,通道长度越短。这导致我们调查统计静态时间分析(SSTA)是否过于悲观。SSTA in - out方法的基线延迟模型是一个高斯延迟模型,拟合了统计岩性模拟得到的偏态正态分布数据。我们的实验表明,即使在重新定位高斯延迟模型以最小误差拟合信道长度数据后,它仍然过于悲观,并且显着低估了电路性能。
{"title":"Statistical timing analysis based on simulation of lithographic process","authors":"Aswin Sreedhar, S. Kundu","doi":"10.1109/ICCD.2009.5413181","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413181","url":null,"abstract":"The length of poly-gate printed on silicon depends on exposure dose, depth of focus, photo-resist thickness and planarity of the surface. In sub-wavelength lithography, polygate length also varies with layout topology. Poly-gate length determines the effective channel length of a transistor, which determines its performance. Since the sources of error are hard to control, statistical analysis can be used to measure the impact on circuit timing characteristics. Typical lithography-aware methodologies consider only systematic variation such as across chip linewidth variation (ACLV). In this paper we propose a statistical technique for timing yield prediction, based on variational lithography modeling of physical circuit layout. By statistically varying lithographic process parameters we estimate the difference in timing yield estimation of a design. Our simulation results show that if manufacturing process parameters follow a Gaussian distribution, resulting transistors follow a skewed normal distribution, where a greater number of them will have shorter channel length. This led us to investigate whether Statistical Static Timing Analysis (SSTA) is overly pessimistic. The baseline delay model assumed for SSTA in out approach is a Gaussian delay model fitted to skew normal distribution data obtained from statistical litho simulation. Our experiments showed that even after re-centering Gaussian delay model to fit the channel length data with minimum error, it is still overly pessimistic and significantly underestimates circuit performance.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125976631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413141
Christophe Dubach, Timothy M. Jones, M. O’Boyle
The early-stage design of a new microprocessor involves the evaluation of a wide range of benchmarks across a large number of architectural configurations. Several methods are used to cut down on the required simulation time. Typically, however, existing approaches fail to capture true program behaviour accurately and require a non-negligible number of training simulations to be run. We address these problems by developing a machine learning model that predicts the mean of any given metric, e.g. cycles or energy, across a range of programs, for any microarchitectural configuration. It works by combining only the most representative programs from the benchmark suite based on their behaviour in the design space under consideration. We use our model to predict the mean performance, energy, energy-delay (ED) and energy-delay-squared (EDD) of the SPEC CPU 2000 and MiBench benchmark suites within our design space. We achieve the same level of accuracy as two state-of-the-art prediction techniques but require five times fewer training simulations. Furthermore, our technique is scalable and we show that, asymptotically, it requires an order of magnitude fewer simulations than these existing approaches.
新微处理器的早期设计涉及对大量体系结构配置的广泛基准进行评估。采用了几种方法来缩短所需的仿真时间。然而,通常情况下,现有的方法不能准确地捕获真实的程序行为,并且需要运行不可忽略的训练模拟。我们通过开发一个机器学习模型来解决这些问题,该模型可以预测任何给定度量的平均值,例如周期或能量,跨越一系列程序,适用于任何微架构配置。它的工作原理是根据基准套件中最具代表性的程序在考虑的设计空间中的行为来组合它们。我们使用我们的模型来预测在我们的设计空间内SPEC CPU 2000和MiBench基准套件的平均性能、能量、能量延迟(ED)和能量延迟平方(EDD)。我们达到了与两种最先进的预测技术相同的精度水平,但需要的训练模拟减少了五倍。此外,我们的技术是可扩展的,我们表明,渐近地,它需要比这些现有方法少一个数量级的模拟。
{"title":"Rapid early-stage microarchitecture design using predictive models","authors":"Christophe Dubach, Timothy M. Jones, M. O’Boyle","doi":"10.1109/ICCD.2009.5413141","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413141","url":null,"abstract":"The early-stage design of a new microprocessor involves the evaluation of a wide range of benchmarks across a large number of architectural configurations. Several methods are used to cut down on the required simulation time. Typically, however, existing approaches fail to capture true program behaviour accurately and require a non-negligible number of training simulations to be run. We address these problems by developing a machine learning model that predicts the mean of any given metric, e.g. cycles or energy, across a range of programs, for any microarchitectural configuration. It works by combining only the most representative programs from the benchmark suite based on their behaviour in the design space under consideration. We use our model to predict the mean performance, energy, energy-delay (ED) and energy-delay-squared (EDD) of the SPEC CPU 2000 and MiBench benchmark suites within our design space. We achieve the same level of accuracy as two state-of-the-art prediction techniques but require five times fewer training simulations. Furthermore, our technique is scalable and we show that, asymptotically, it requires an order of magnitude fewer simulations than these existing approaches.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125077774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413171
Naghmeh Karimi, M. Maniatakos, C. Tirumurti, A. Jas, Y. Makris
Towards improving performance, modern microprocessors incorporate a variety of architectural features, such as branch prediction and speculative execution, which are not critical to the correctness of their operation. While faults in the corresponding hardware may not necessarily affect functional correctness, they may, nevertheless, adversely impact performance. In this paper, we investigate quantitatively the performance impact of such faults using a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. We provide extensive fault simulation-based experimental results and we discuss how this information may guide the inclusion of additional hardware for performance loss recovery and yield enhancement.
{"title":"Impact analysis of performance faults in modern microprocessors","authors":"Naghmeh Karimi, M. Maniatakos, C. Tirumurti, A. Jas, Y. Makris","doi":"10.1109/ICCD.2009.5413171","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413171","url":null,"abstract":"Towards improving performance, modern microprocessors incorporate a variety of architectural features, such as branch prediction and speculative execution, which are not critical to the correctness of their operation. While faults in the corresponding hardware may not necessarily affect functional correctness, they may, nevertheless, adversely impact performance. In this paper, we investigate quantitatively the performance impact of such faults using a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. We provide extensive fault simulation-based experimental results and we discuss how this information may guide the inclusion of additional hardware for performance loss recovery and yield enhancement.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128289422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413106
S. J. Mazlouman, A. Mahanfar, B. Kaminska
Methods are suggested and tested to measure and optimize the wireless energy transfer efficiency for mid-range (10–100cm) inductive coils with relatively low profile using magnetic resonance. These coils can be used to provide energy for wireless sensors and battery-operated devices. It is shown that for every system, a resonance frequency can be identified where the wireless energy transfer efficiency is optimal. Several prototypes are developed and tested as a proof of validity of the proposed technique. It is also shown that by tuning to the optimum resonant frequency and designing proper matching circuitry, an efficiency of about 25% for moderate profiles can be achieved.
{"title":"Mid-range wireless energy transfer using inductive resonance for wireless sensors","authors":"S. J. Mazlouman, A. Mahanfar, B. Kaminska","doi":"10.1109/ICCD.2009.5413106","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413106","url":null,"abstract":"Methods are suggested and tested to measure and optimize the wireless energy transfer efficiency for mid-range (10–100cm) inductive coils with relatively low profile using magnetic resonance. These coils can be used to provide energy for wireless sensors and battery-operated devices. It is shown that for every system, a resonance frequency can be identified where the wireless energy transfer efficiency is optimal. Several prototypes are developed and tested as a proof of validity of the proposed technique. It is also shown that by tuning to the optimum resonant frequency and designing proper matching circuitry, an efficiency of about 25% for moderate profiles can be achieved.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132136615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413133
Chun-Yi Lee, N. Jha
On-chip interconnection networks are fast becoming significant power-consumers in high-performance chip multiprocessors (CMPs). Increased power consumption leads to more heat, adversely degrades system reliability, and may increase the cost of cooling IC packages. This situation becomes even worse as bulk CMOS scales further into the nanometer regime because of excessive leakage power due to short-channel effects. In this paper, we explore the use of FinFETs, which are promising substitutes for bulk CMOS at the 32nm node and beyond, to design on-chip network routers. We present a detailed design of a variable pipeline stage router (VPSR) targeted at FinFET technology. We employ a dynamic power management scheme, which we call adaptive back-gate biasing (ABGB), for FinFET implementations. We evaluate VPSR and ABGB on a simulation platform specifically designed for power and performance simulations for FinFET-based interconnection networks. The results show that VPSR is able to successfully adapt its power consumption to incoming traffic, with a resultant 20% reduction in power at almost no impact on latency.
{"title":"FinFET-based dynamic power management of on-chip interconnection networks through adaptive back-gate biasing","authors":"Chun-Yi Lee, N. Jha","doi":"10.1109/ICCD.2009.5413133","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413133","url":null,"abstract":"On-chip interconnection networks are fast becoming significant power-consumers in high-performance chip multiprocessors (CMPs). Increased power consumption leads to more heat, adversely degrades system reliability, and may increase the cost of cooling IC packages. This situation becomes even worse as bulk CMOS scales further into the nanometer regime because of excessive leakage power due to short-channel effects. In this paper, we explore the use of FinFETs, which are promising substitutes for bulk CMOS at the 32nm node and beyond, to design on-chip network routers. We present a detailed design of a variable pipeline stage router (VPSR) targeted at FinFET technology. We employ a dynamic power management scheme, which we call adaptive back-gate biasing (ABGB), for FinFET implementations. We evaluate VPSR and ABGB on a simulation platform specifically designed for power and performance simulations for FinFET-based interconnection networks. The results show that VPSR is able to successfully adapt its power consumption to incoming traffic, with a resultant 20% reduction in power at almost no impact on latency.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124204018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413139
E. Yilmaz, S. Ozev
Analog circuits are often tested based on their specifications. While specification-based testing ensures the initial product quality, full testing is often not possible in high volume production. Moreover, even full specification-based testing cannot guarantee that the circuit does not contain any physical defects. Some application domains require near-zero defect levels independent of whether the specifications are met. In this work, we present a defect based test optimization method focusing on defective parts per million (DPPM) minimization. We extract potential defects through inductive fault analysis (IFA) and reduce the number of tests without degrading the test quality. In order to achieve near zero DPPM, we employ outlier analysis to identify defective circuits that cannot be identified using specification based methods. Simulation results on an LNA show that DPPM is reduced down to 0 at a cost of 0.2% yield loss with the proposed method.
{"title":"Defect-based test optimization for analog/RF circuits for near-zero DPPM applications","authors":"E. Yilmaz, S. Ozev","doi":"10.1109/ICCD.2009.5413139","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413139","url":null,"abstract":"Analog circuits are often tested based on their specifications. While specification-based testing ensures the initial product quality, full testing is often not possible in high volume production. Moreover, even full specification-based testing cannot guarantee that the circuit does not contain any physical defects. Some application domains require near-zero defect levels independent of whether the specifications are met. In this work, we present a defect based test optimization method focusing on defective parts per million (DPPM) minimization. We extract potential defects through inductive fault analysis (IFA) and reduce the number of tests without degrading the test quality. In order to achieve near zero DPPM, we employ outlier analysis to identify defective circuits that cannot be identified using specification based methods. Simulation results on an LNA show that DPPM is reduced down to 0 at a cost of 0.2% yield loss with the proposed method.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126223147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413146
Oscar Palomar, Toni Juan, J. Navarro
The complex and powerful out-of-order issue logic dismisses the repetitive nature of the code, unlike what caches or branch predictors do. We show that 90% of the cycles, the group of instructions selected by the issue logic belongs to just 13% of the total different groups issued: the issue logic of an out-of-order processor is constantly re-discovering what it has already found. To benefit from the repetitive nature of instruction issue, we move the scheduling logic after the commit stage, out of the critical path of execution. The schedules created there are cached and reused to feed a simple in-order issue logic, that could result in a higher frequency design. We present the complete design of our ReLaSch processor, that achieves the same average IPC than a conventional out-of-order processor, and a 1.56 speed-up over the IPC of an in-order processor. We actually surpass the out-of-order IPC in 23 out of 40 SPEC benchmarks, mainly because the broader vision of the code after the commit stage allows creating better schedules.
{"title":"Reusing cached schedules in an out-of-order processor with in-order issue logic","authors":"Oscar Palomar, Toni Juan, J. Navarro","doi":"10.1109/ICCD.2009.5413146","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413146","url":null,"abstract":"The complex and powerful out-of-order issue logic dismisses the repetitive nature of the code, unlike what caches or branch predictors do. We show that 90% of the cycles, the group of instructions selected by the issue logic belongs to just 13% of the total different groups issued: the issue logic of an out-of-order processor is constantly re-discovering what it has already found. To benefit from the repetitive nature of instruction issue, we move the scheduling logic after the commit stage, out of the critical path of execution. The schedules created there are cached and reused to feed a simple in-order issue logic, that could result in a higher frequency design. We present the complete design of our ReLaSch processor, that achieves the same average IPC than a conventional out-of-order processor, and a 1.56 speed-up over the IPC of an in-order processor. We actually surpass the out-of-order IPC in 23 out of 40 SPEC benchmarks, mainly because the broader vision of the code after the commit stage allows creating better schedules.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130046707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-04DOI: 10.1109/ICCD.2009.5413144
Hanyu Cui, S. Sair
Among the various costs of a context switch, its impact on the performance of L2 caches is the most significant because of the resulting high miss penalty. To reduce the impact of frequent context switches, we propose restoring a program's locality by prefetching into the L2 cache the data a program was using before it was swapped out. A Global History List is used to record a process' L2 read accesses in LRU order. These accesses are saved along with the process' context when the process is swapped out and loaded to guide prefetching when it is swapped in. We also propose a feedback mechanism that greatly reduces memory traffic incurred by our prefetching scheme. Experiments show significant speedup over baseline architectures with and without traditional prefetching in the presence of frequent context switches.
{"title":"Extending data prefetching to cope with context switch misses","authors":"Hanyu Cui, S. Sair","doi":"10.1109/ICCD.2009.5413144","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413144","url":null,"abstract":"Among the various costs of a context switch, its impact on the performance of L2 caches is the most significant because of the resulting high miss penalty. To reduce the impact of frequent context switches, we propose restoring a program's locality by prefetching into the L2 cache the data a program was using before it was swapped out. A Global History List is used to record a process' L2 read accesses in LRU order. These accesses are saved along with the process' context when the process is swapped out and loaded to guide prefetching when it is swapped in. We also propose a feedback mechanism that greatly reduces memory traffic incurred by our prefetching scheme. Experiments show significant speedup over baseline architectures with and without traditional prefetching in the presence of frequent context switches.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122605451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}