2009 IEEE International Conference on Computer Design最新文献

英文中文

Computational bit-width allocation for operations in vector calculus 向量微积分运算的计算位宽分配

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413121

A. Kinsman, N. Nicolici

Automated bit-width allocation is a key step required for the design of hardware accelerators. The use of computational methods based on SAT-Modulo Theory to the problem of finite-precision bit-width allocation has recently been shown to overcome challenges faced by the known-art, particularly in the scientific computing domain. However, many such real-life applications are specified in terms of vectors and matrices and they are rendered infeasible by expansion into scalar equations. This paper proposes a framework to include operations from vector calculus and thus it enables tackling applications of practically relevant complexity.

位宽自动分配是硬件加速器设计的关键步骤。利用基于sat模理论的计算方法来解决有限精度位宽分配问题最近被证明可以克服已知技术所面临的挑战，特别是在科学计算领域。然而，许多这样的实际应用都是用向量和矩阵来表示的，它们通过展开成标量方程而变得不可行的。本文提出了一个包含向量微积分运算的框架，从而能够处理实际相关复杂性的应用。

引用次数: 5

Using checksum to reduce power consumption of display systems for low-motion content 使用校验和来降低低运动内容显示系统的功耗

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413176

Kyungtae Han, Zhen Fang, Paul Diefenbaugh, Richard Forand, R. Iyer, D. Newell

Power consumption of the display subsytem has been a relatively less explored area compared to other components of a mobile device including computing, storage, and networking units, although the former often constitutes one of the most power-hungry portions of the system. Typical applications on a mobile device such as web browsing and text editing tend to have rather static image content; each frame hardly changes from the previous one. Efficiently detecting and handling no-motion scenarios is thus critical to extend the battery life. This paper focuses on image change detection. We propose to use checksum to detect image changes. Specifically, CRC hardware is used to optimize the power consumption of 1) refresh of a local display and 2) data compression for wireless remote display. Compared with a traditional, pixel-by-pixel comparison approach, using checksum for image change detection is not only fast, but also reduces accesses to the frame buffer, resulting in significant power savings. We have built a FPGA prototype to verify that CRC can capture image changes well enough to ensure a “visually lossless” quality.

与移动设备的其他组件(包括计算、存储和网络单元)相比，显示子系统的功耗一直是一个相对较少探索的领域，尽管前者通常构成系统中最耗电的部分之一。移动设备上的典型应用程序(如网页浏览和文本编辑)往往具有相当静态的图像内容;每一帧与前一帧几乎没有变化。因此，有效地检测和处理无运动场景对于延长电池寿命至关重要。本文主要研究图像变化检测。我们建议使用校验和来检测图像的变化。具体来说，CRC硬件用于优化1)本地显示的刷新和2)无线远程显示的数据压缩的功耗。与传统的逐像素比较方法相比，使用校验和进行图像变化检测不仅速度快，而且减少了对帧缓冲区的访问，从而大大节省了功耗。我们已经建立了一个FPGA原型来验证CRC可以很好地捕获图像变化，以确保“视觉无损”的质量。

{"title":"Using checksum to reduce power consumption of display systems for low-motion content","authors":"Kyungtae Han, Zhen Fang, Paul Diefenbaugh, Richard Forand, R. Iyer, D. Newell","doi":"10.1109/ICCD.2009.5413176","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413176","url":null,"abstract":"Power consumption of the display subsytem has been a relatively less explored area compared to other components of a mobile device including computing, storage, and networking units, although the former often constitutes one of the most power-hungry portions of the system. Typical applications on a mobile device such as web browsing and text editing tend to have rather static image content; each frame hardly changes from the previous one. Efficiently detecting and handling no-motion scenarios is thus critical to extend the battery life. This paper focuses on image change detection. We propose to use checksum to detect image changes. Specifically, CRC hardware is used to optimize the power consumption of 1) refresh of a local display and 2) data compression for wireless remote display. Compared with a traditional, pixel-by-pixel comparison approach, using checksum for image change detection is not only fast, but also reduces accesses to the frame buffer, resulting in significant power savings. We have built a FPGA prototype to verify that CRC can capture image changes well enough to ensure a “visually lossless” quality.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133932528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Quality improvement and cost reduction using statistical outlier methods 使用统计离群值方法提高质量和降低成本

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413175

A. Nahar, K. Butler, J. Carulli, Charles Weinberger

Quality improvement and cost reduction in the overall IC manufacturing and test processes are being continuously sought. Outlier screening methods can address both of these needs. As technology scales, it has become increasingly difficult to screen outliers without excessive Type I or II errors. Hundreds of parameters are collected at wafer probe, but there lacks a systematic way of selecting outlier screens. In this paper we describe a statistical approach to both identify outliers and select beneficial screening parameters more effectively. Results on a 90nm design to reduce the burn-in fails are described.

在整个集成电路制造和测试过程中不断寻求质量改进和成本降低。异常值筛选方法可以满足这两种需求。随着技术的发展，在没有过多的I型或II型错误的情况下筛选异常值变得越来越困难。在晶圆探针上收集了数百个参数，但缺乏系统的选择异常屏幕的方法。在本文中，我们描述了一种统计方法，既能识别异常值，又能更有效地选择有益的筛选参数。描述了减少老化失效的90nm设计的结果。

引用次数: 33

Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling 避免由于私有数据放置在多核扩展的最后一级缓存中而导致缓存抖动

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413143

Jiayuan Meng, K. Skadron

Without high-bandwidth broadcast, large numbers of cores require a scalable point-to-point interconnect and a directory protocol. In such cases, a shared, inclusive last level cache (LLC) can improve data sharing and avoid three-way communication for shared reads. However, if inclusion encompasses thread-private data, two problems arise with the shared LLC. First, current memory allocators align stack bases on page boundaries, which emerges as a source of severe conflict misses for large numbers of threads on data-parallel applications. Second, correctness does not require the private data to reside in the shared directory or the LLC. This paper advocates stack-base randomization that eliminates the major source of conflict misses for large numbers of threads. However, when capacity becomes a limitation for the directory or last-level cache, this is not sufficient. We then propose non-inclusive, semi-coherent cache organization (NISC) that removes the requirement for inclusion of private data and reduces capacity misses. Our data-parallel benchmarks show that these limitations prevent scaling beyond 8 cores, while our techniques allow scaling to at least 32 cores for most benchmarks. At 8 cores, stack randomization provides a mean speedup of 1.2X, but stack randomization with 32 cores gives a speedup of 2.7X over the best baseline configuration. Comparing to conventional performance with a 2 MB LLC, our technique achieves similar performance with a 256 KB LLC, suggesting LLCs may be typically overprovisioned. When very limited LLC resources are available, NISC can further improve system performance by 1.8X.

没有高带宽广播，大量核心需要可扩展的点对点互连和目录协议。在这种情况下，共享的、包容的最后一级缓存(LLC)可以改善数据共享，并避免共享读的三方通信。但是，如果包含线程私有数据，那么共享LLC就会出现两个问题。首先，当前内存分配器根据页面边界对齐堆栈基，这对于数据并行应用程序上的大量线程来说是严重冲突缺失的来源。其次，正确性不要求私有数据驻留在共享目录或LLC中。本文提倡基于堆栈的随机化，这种随机化消除了大量线程冲突缺失的主要来源。但是，当容量成为目录或最后一级缓存的限制时，这是不够的。然后，我们提出了非包容性、半一致性缓存组织(NISC)，它消除了包含私有数据的要求，并减少了容量丢失。我们的数据并行基准测试表明，这些限制阻止扩展到8核以上，而我们的技术允许在大多数基准测试中扩展到至少32核。在8核的情况下，堆栈随机化提供了1.2倍的平均加速，但32核的堆栈随机化在最佳基线配置上提供了2.7倍的加速。与使用2 MB LLC的传统性能相比，我们的技术在使用256 KB LLC时实现了类似的性能，这表明LLC可能通常是过度配置的。当有限责任公司的资源非常有限时，NISC可以进一步提高系统性能1.8倍。

{"title":"Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling","authors":"Jiayuan Meng, K. Skadron","doi":"10.1109/ICCD.2009.5413143","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413143","url":null,"abstract":"Without high-bandwidth broadcast, large numbers of cores require a scalable point-to-point interconnect and a directory protocol. In such cases, a shared, inclusive last level cache (LLC) can improve data sharing and avoid three-way communication for shared reads. However, if inclusion encompasses thread-private data, two problems arise with the shared LLC. First, current memory allocators align stack bases on page boundaries, which emerges as a source of severe conflict misses for large numbers of threads on data-parallel applications. Second, correctness does not require the private data to reside in the shared directory or the LLC. This paper advocates stack-base randomization that eliminates the major source of conflict misses for large numbers of threads. However, when capacity becomes a limitation for the directory or last-level cache, this is not sufficient. We then propose non-inclusive, semi-coherent cache organization (NISC) that removes the requirement for inclusion of private data and reduces capacity misses. Our data-parallel benchmarks show that these limitations prevent scaling beyond 8 cores, while our techniques allow scaling to at least 32 cores for most benchmarks. At 8 cores, stack randomization provides a mean speedup of 1.2X, but stack randomization with 32 cores gives a speedup of 2.7X over the best baseline configuration. Comparing to conventional performance with a 2 MB LLC, our technique achieves similar performance with a 256 KB LLC, suggesting LLCs may be typically overprovisioned. When very limited LLC resources are available, NISC can further improve system performance by 1.8X.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114554482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

Automatic synthesis of computation interference constraints for relative timing verification 相对时序验证计算干扰约束的自动综合

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413183

Yang Xu, K. Stevens

Asynchronous sequential circuit or protocol design requires formal verification to ensure correct behavior under all operating conditions. However, most asynchronous circuits or protocols cannot be proven conformant to a specification without adding timing assumptions. Relative Timing (RT) is an approach to model and verify circuits and protocols that require timing assumptions to operate correctly. The process of creating path-based RT constraints has previously been done by hand with the aid of a formal verification engine. This time consuming and error prone method vastly restricts the application of RT and the capability to implement circuits and protocols. This paper describes an algorithm for automatic generation of RT constraints based on signal traces generated from a formal verification (FV) engine that supports relative timing constraints. This algorithm has been implemented in a CAD tool called Automatic Relative Timing Identifier based on Signal Traces (ARTIST) which has been embedded into the FV engine. A set of asynchronous and clocked designs and protocols have been verified and proven to be hazard-free with the RT constraints generated by ARTIST which would have taken months to perform by hand. A comparison of RT constraints between hand-generated and ARTIST generated constraints is also described in terms of efficiency and quality.

异步顺序电路或协议设计需要正式验证，以确保在所有操作条件下的正确行为。然而，如果不添加时序假设，大多数异步电路或协议都无法证明符合规范。相对时序(RT)是一种建模和验证电路和协议的方法，这些电路和协议需要时序假设才能正确运行。创建基于路径的RT约束的过程以前是在形式化验证引擎的帮助下手工完成的。这种耗时且容易出错的方法极大地限制了RT的应用以及实现电路和协议的能力。本文描述了一种基于支持相对时序约束的形式验证(FV)引擎生成的信号轨迹自动生成RT约束的算法。该算法已在一个名为基于信号轨迹的自动相对时序识别器(art)的CAD工具中实现，该工具已嵌入到FV引擎中。一组异步和定时设计和协议已经经过验证，并被证明是无危险的，由艺术家生成的RT约束将花费数月的时间手工执行。在效率和质量方面还描述了手工生成和艺术家生成约束之间的RT约束的比较。

{"title":"Automatic synthesis of computation interference constraints for relative timing verification","authors":"Yang Xu, K. Stevens","doi":"10.1109/ICCD.2009.5413183","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413183","url":null,"abstract":"Asynchronous sequential circuit or protocol design requires formal verification to ensure correct behavior under all operating conditions. However, most asynchronous circuits or protocols cannot be proven conformant to a specification without adding timing assumptions. Relative Timing (RT) is an approach to model and verify circuits and protocols that require timing assumptions to operate correctly. The process of creating path-based RT constraints has previously been done by hand with the aid of a formal verification engine. This time consuming and error prone method vastly restricts the application of RT and the capability to implement circuits and protocols. This paper describes an algorithm for automatic generation of RT constraints based on signal traces generated from a formal verification (FV) engine that supports relative timing constraints. This algorithm has been implemented in a CAD tool called Automatic Relative Timing Identifier based on Signal Traces (ARTIST) which has been embedded into the FV engine. A set of asynchronous and clocked designs and protocols have been verified and proven to be hazard-free with the RT constraints generated by ARTIST which would have taken months to perform by hand. A comparison of RT constraints between hand-generated and ARTIST generated constraints is also described in terms of efficiency and quality.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124600209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

LRU-PEA: A smart replacement policy for non-uniform cache architectures on chip multiprocessors LRU-PEA:芯片多处理器上非统一缓存架构的智能替换策略

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413142

Javier Lira, Carlos Molina, Antonio González

The increasing speed-gap between processor and memory and the limited memory bandwidth make last-level cache performance crucial for CMP architectures. non uniform cache architectures (NUCA) have been introduced to deal with this problem. This memory organization divides the whole memory space into smaller pieces or banks allowing nearer banks to have better access latencies than further banks. Moreover, an adaptive replacement policy that efficiently reduces misses in the last-level cache could boost performance, particularly if set associativity is adopted. Unfortunately, traditional replacement policies do not behave properly as they were designed for single-processors. This paper focuses on bank replacement. This policy involves three key decisions when there is a miss: where to place a data block within the cache set, which data to evict from the cache set and finally, where to place the evicted data. We propose a novel replacement technique that enables more intelligent replacement decisions to be taken. This technique is based on the observation that some types of data are less commonly accessed depending on which bank they reside in. We call this technique LRU-PEA (least recently used with a priority eviction approach). We show that the proposed technique significantly reduces the requests to the off-chip memory by increasing the hit ratio in the NUCA cache. This translates into an average IPC improvement of 8% and into an Energy per Instruction (EPI) reduction of 5%.

处理器和内存之间越来越大的速度差距以及有限的内存带宽使得最后一级缓存性能对CMP体系结构至关重要。为了解决这个问题，引入了非统一缓存架构(NUCA)。这种内存组织将整个内存空间划分为更小的块或库，允许较近的库比较远的库具有更好的访问延迟。此外，有效减少最后一级缓存中的失误的自适应替换策略可以提高性能，特别是如果采用集合结合性。不幸的是，传统的替换策略不能正常工作，因为它们是为单处理器设计的。本文主要研究银行置换问题。当出现丢失时，此策略涉及三个关键决策:在缓存集中放置数据块的位置，从缓存集中取出哪些数据，以及将取出的数据放在何处。我们提出了一种新颖的替换技术，可以做出更智能的替换决策。该技术基于以下观察:某些类型的数据不太常被访问，这取决于它们位于哪个银行。我们称这种技术为LRU-PEA(最近与优先级驱逐方法一起使用最少)。我们表明，所提出的技术通过增加NUCA缓存中的命中率显着减少了对片外存储器的请求。这意味着IPC平均提高了8%，每条指令能量(EPI)降低了5%。

{"title":"LRU-PEA: A smart replacement policy for non-uniform cache architectures on chip multiprocessors","authors":"Javier Lira, Carlos Molina, Antonio González","doi":"10.1109/ICCD.2009.5413142","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413142","url":null,"abstract":"The increasing speed-gap between processor and memory and the limited memory bandwidth make last-level cache performance crucial for CMP architectures. non uniform cache architectures (NUCA) have been introduced to deal with this problem. This memory organization divides the whole memory space into smaller pieces or banks allowing nearer banks to have better access latencies than further banks. Moreover, an adaptive replacement policy that efficiently reduces misses in the last-level cache could boost performance, particularly if set associativity is adopted. Unfortunately, traditional replacement policies do not behave properly as they were designed for single-processors. This paper focuses on bank replacement. This policy involves three key decisions when there is a miss: where to place a data block within the cache set, which data to evict from the cache set and finally, where to place the evicted data. We propose a novel replacement technique that enables more intelligent replacement decisions to be taken. This technique is based on the observation that some types of data are less commonly accessed depending on which bank they reside in. We call this technique LRU-PEA (least recently used with a priority eviction approach). We show that the proposed technique significantly reduces the requests to the off-chip memory by increasing the hit ratio in the NUCA cache. This translates into an average IPC improvement of 8% and into an Energy per Instruction (EPI) reduction of 5%.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123332224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

3D simulation and analysis of the radiation tolerance of voltage scaled digital circuit 电压标度数字电路辐射容限的三维仿真与分析

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413111

Rajesh Garg, S. Khatri

In recent times, dynamic supply voltage scaling (DVS) has been extensively employed to minimize the power and energy of VLSI systems. Also, sub-threshold circuits are becoming more popular. At the same time, the reliability of VLSI systems has become a major concern under Single Event Upsets (SEUs). SEUs are very problematic even for circuits operating at nominal voltages. With the increasing demand for low power reliable systems, it is therefore necessary to harden DVS and sub-threshold circuits efficiently. In this paper, we perform 3D simulations of radiation particle strikes in an inverter implemented using DVS and sub-threshold design. We analyze the sensitivity of the inverter to radiation particle strikes by varying the inverter size, the inverter load, the supply voltage (VDD) and the energy of the radiation particles. From these 3D simulations, we make several observations which are important to consider during radiation hardening of DVS and sub-threshold circuits. Based on these observations, we propose several guidelines for radiation hardening of DVS and sub-threshold circuit designs. These guidelines suggest that the traditional radiation hardening approaches need to be revisited for DVS and sub-threshold designs. We also propose a charge collection model for DVS circuits. Our model can accurately estimate (with an average error of 6.3%) the charge collected at the output of a gate for different supply voltages and different gate sizes for medium and high energy particle strikes. The parameters of our charge collection model can be included in SPICE model cards of transistors, to improve the accuracy of SPICE based radiation simulations for DVS circuits.

近年来，动态电源电压缩放(DVS)被广泛用于最小化VLSI系统的功率和能量。此外，亚阈值电路正变得越来越流行。与此同时，超大规模集成电路系统的可靠性已成为单事件干扰(seu)下的主要问题。即使对于在标称电压下工作的电路，seu也是非常有问题的。随着人们对低功耗可靠系统的需求不断增加，有必要对分布式交换机和亚阈值电路进行有效的加固。在本文中，我们使用分布式交换机和亚阈值设计实现了逆变器中辐射粒子撞击的三维模拟。通过改变逆变器的尺寸、逆变器的负载、电源电压(VDD)和辐射粒子的能量，分析了逆变器对辐射粒子撞击的敏感性。从这些三维模拟中，我们得出了几个在分布式交换机和亚阈值电路的辐射硬化过程中需要考虑的重要观察结果。基于这些观察结果，我们提出了几种分布式交换机和亚阈值电路设计的辐射硬化指南。这些指南表明，传统的辐射硬化方法需要重新考虑分布式交换机和亚阈值设计。我们还提出了一个分布式交换机电路的电荷收集模型。我们的模型可以准确地估计在不同电源电压和不同栅极尺寸的中高能粒子撞击下栅极输出处收集的电荷(平均误差为6.3%)。我们的电荷收集模型的参数可以包含在晶体管的SPICE模型卡中，以提高基于SPICE的分布式交换机电路辐射模拟的准确性。

{"title":"3D simulation and analysis of the radiation tolerance of voltage scaled digital circuit","authors":"Rajesh Garg, S. Khatri","doi":"10.1109/ICCD.2009.5413111","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413111","url":null,"abstract":"In recent times, dynamic supply voltage scaling (DVS) has been extensively employed to minimize the power and energy of VLSI systems. Also, sub-threshold circuits are becoming more popular. At the same time, the reliability of VLSI systems has become a major concern under Single Event Upsets (SEUs). SEUs are very problematic even for circuits operating at nominal voltages. With the increasing demand for low power reliable systems, it is therefore necessary to harden DVS and sub-threshold circuits efficiently. In this paper, we perform 3D simulations of radiation particle strikes in an inverter implemented using DVS and sub-threshold design. We analyze the sensitivity of the inverter to radiation particle strikes by varying the inverter size, the inverter load, the supply voltage (VDD) and the energy of the radiation particles. From these 3D simulations, we make several observations which are important to consider during radiation hardening of DVS and sub-threshold circuits. Based on these observations, we propose several guidelines for radiation hardening of DVS and sub-threshold circuit designs. These guidelines suggest that the traditional radiation hardening approaches need to be revisited for DVS and sub-threshold designs. We also propose a charge collection model for DVS circuits. Our model can accurately estimate (with an average error of 6.3%) the charge collected at the output of a gate for different supply voltages and different gate sizes for medium and high energy particle strikes. The parameters of our charge collection model can be included in SPICE model cards of transistors, to improve the accuracy of SPICE based radiation simulations for DVS circuits.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126286039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

On improving the algorithmic robustness of a low-power FIR filter 提高低功耗FIR滤波器算法鲁棒性的研究

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-01 DOI: 10.1109/ICCD.2009.5413126

Sourabh Khire, S. Mukhopadhyay

Voltage scaling is a promising approach to reduce the power consumption in signal processing circuits. However aggressive voltage scaling can introduce errors in the output signal, thus degrading the algorithmic performance of the circuit. We consider the specific case of the finite impulse response (FIR) filter, and identify two different sources of errors occurring due to voltage scaling: (a) errors introduced because of increased delay along the logic path and (b) errors caused by failures in the memory due to process variations. We design a FIR filter by using a simple feedback based approach to reduce the memory errors and a linear predictor structure for correcting the logic errors. The proposed filter is more robust to both logic and memory errors caused by voltage scaling. The results show a considerable improvement in the output Signal to Noise ratio (at least around 10 dB) for a probability of error (Perr) even as high as 0.5. We also utilize the proposed technique for an image filtering application and observe a considerable improvement in the visual quality of the output image along with an improvement of over 10 dB in the Peak Signal to Noise ratio for Perr as high as 0.5.

在信号处理电路中，电压缩放是一种很有前途的降低功耗的方法。然而，过度的电压缩放会在输出信号中引入误差，从而降低电路的算法性能。我们考虑了有限脉冲响应(FIR)滤波器的具体情况，并确定了由于电压缩放而产生的两种不同的误差来源:(a)由于逻辑路径上延迟增加而引入的误差;(b)由于过程变化而导致的存储器故障引起的误差。我们设计了一个FIR滤波器，使用简单的基于反馈的方法来减少记忆误差，并使用线性预测器结构来纠正逻辑误差。该滤波器对电压缩放引起的逻辑误差和内存误差具有较强的鲁棒性。结果表明，在误差概率(Perr)甚至高达0.5的情况下，输出信噪比(至少在10 dB左右)有了相当大的改善。我们还将所提出的技术用于图像滤波应用，并观察到输出图像的视觉质量有了相当大的改善，同时在Perr高达0.5时，峰值信噪比提高了10 dB以上。

{"title":"On improving the algorithmic robustness of a low-power FIR filter","authors":"Sourabh Khire, S. Mukhopadhyay","doi":"10.1109/ICCD.2009.5413126","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413126","url":null,"abstract":"Voltage scaling is a promising approach to reduce the power consumption in signal processing circuits. However aggressive voltage scaling can introduce errors in the output signal, thus degrading the algorithmic performance of the circuit. We consider the specific case of the finite impulse response (FIR) filter, and identify two different sources of errors occurring due to voltage scaling: (a) errors introduced because of increased delay along the logic path and (b) errors caused by failures in the memory due to process variations. We design a FIR filter by using a simple feedback based approach to reduce the memory errors and a linear predictor structure for correcting the logic errors. The proposed filter is more robust to both logic and memory errors caused by voltage scaling. The results show a considerable improvement in the output Signal to Noise ratio (at least around 10 dB) for a probability of error (Perr) even as high as 0.5. We also utilize the proposed technique for an image filtering application and observe a considerable improvement in the visual quality of the output image along with an improvement of over 10 dB in the Peak Signal to Noise ratio for Perr as high as 0.5.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130385411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 IEEE International Conference on Computer Design

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀