首页 > 最新文献

2007 25th International Conference on Computer Design最新文献

英文 中文
Cache replacement based on reuse-distance prediction 基于重用距离预测的缓存替换
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601909
G. Keramidas, Pavlos Petoumenos, S. Kaxiras
Several cache management techniques have been proposed that indirectly try to base their decisions on cacheline reuse-distance, like Cache Decay which is a postdiction of reuse-distances: if a cacheline has not been accessed for some ldquodecay intervalrdquo we know that its reuse-distance is at least as large as this decay interval. In this work, we propose to directly predict reuse-distances via instruction-based (PC) prediction and use this information for cache level optimizations. In this paper, we choose as our target for optimization the replacement policy of the L2 cache, because the gap between the LRU and the theoretical optimal replacement algorithm is comparatively large for L2 caches. This indicates that, in many situations, there is ample room for improvement. We evaluate our reusedistance based replacement policy using a subset of the most memory intensive SPEC2000 and our results show significant benefits across the board.
已经提出了一些缓存管理技术,它们间接地尝试基于缓存的重用距离来做出决策,比如缓存衰减,它是一个重用距离的后置:如果一个缓存在一定的时间间隔内没有被访问,我们知道它的重用距离至少和这个衰减间隔一样大。在这项工作中,我们建议通过基于指令(PC)的预测直接预测重用距离,并将此信息用于缓存级优化。在本文中,我们选择优化L2缓存的替换策略作为我们的目标,因为对于L2缓存,LRU与理论最优替换算法之间的差距比较大。这表明,在许多情况下,有很大的改进余地。我们使用内存最密集的SPEC2000的一个子集来评估基于重用距离的替换策略,我们的结果显示出全面的显著优势。
{"title":"Cache replacement based on reuse-distance prediction","authors":"G. Keramidas, Pavlos Petoumenos, S. Kaxiras","doi":"10.1109/ICCD.2007.4601909","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601909","url":null,"abstract":"Several cache management techniques have been proposed that indirectly try to base their decisions on cacheline reuse-distance, like Cache Decay which is a postdiction of reuse-distances: if a cacheline has not been accessed for some ldquodecay intervalrdquo we know that its reuse-distance is at least as large as this decay interval. In this work, we propose to directly predict reuse-distances via instruction-based (PC) prediction and use this information for cache level optimizations. In this paper, we choose as our target for optimization the replacement policy of the L2 cache, because the gap between the LRU and the theoretical optimal replacement algorithm is comparatively large for L2 caches. This indicates that, in many situations, there is ample room for improvement. We evaluate our reusedistance based replacement policy using a subset of the most memory intensive SPEC2000 and our results show significant benefits across the board.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90815539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 128
SCAFFI: An intrachip FPGA asynchronous interface based on hard macros 基于硬宏的片内FPGA异步接口
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601950
Julian J. H. Pontes, R. Soares, Ewerson Carvalho, F. Moraes, Ney Laert Vilar Calazans
Building fully synchronous VLSI circuits is becoming less viable as circuit geometries evolve. However, before the adoption of purely asynchronous strategies in VLSI design, globally asynchronous, locally synchronous (GALS) design approaches should take over. The design of circuits using complex field programmable components like state of the art FPGAs follows this same trend. In GALS design, a critical step is the definition of asynchronous interfaces between synchronous regions. This paper proposes SCAFFI, a new asynchronous interface to interconnect modules inside FPGAs. The interface is based on clock stretching techniques to avoid metastability. Differently from other interfaces, it can use both logic levels for stretching and do not require the use of arbiters. Also, compactness of the implementation is enhanced by the use of dedicated FPGA hard macros. A GALS version implementation of an RSA cryptography core demonstrates the use of SCAFFI.
随着电路几何形状的发展,构建完全同步的VLSI电路变得越来越不可行。然而,在VLSI设计中采用纯异步策略之前,应该采用全局异步,局部同步(GALS)设计方法。使用复杂的现场可编程组件(如最先进的fpga)的电路设计遵循同样的趋势。在GALS设计中,一个关键步骤是定义同步区域之间的异步接口。本文提出了一种用于fpga内部模块互连的新型异步接口SCAFFI。该接口基于时钟拉伸技术以避免亚稳态。与其他接口不同的是,它可以使用两个逻辑级别进行拉伸,并且不需要使用仲裁器。此外,通过使用专用FPGA硬宏,增强了实现的紧凑性。RSA加密核心的GALS版本实现演示了SCAFFI的使用。
{"title":"SCAFFI: An intrachip FPGA asynchronous interface based on hard macros","authors":"Julian J. H. Pontes, R. Soares, Ewerson Carvalho, F. Moraes, Ney Laert Vilar Calazans","doi":"10.1109/ICCD.2007.4601950","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601950","url":null,"abstract":"Building fully synchronous VLSI circuits is becoming less viable as circuit geometries evolve. However, before the adoption of purely asynchronous strategies in VLSI design, globally asynchronous, locally synchronous (GALS) design approaches should take over. The design of circuits using complex field programmable components like state of the art FPGAs follows this same trend. In GALS design, a critical step is the definition of asynchronous interfaces between synchronous regions. This paper proposes SCAFFI, a new asynchronous interface to interconnect modules inside FPGAs. The interface is based on clock stretching techniques to avoid metastability. Differently from other interfaces, it can use both logic levels for stretching and do not require the use of arbiters. Also, compactness of the implementation is enhanced by the use of dedicated FPGA hard macros. A GALS version implementation of an RSA cryptography core demonstrates the use of SCAFFI.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84015411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Accurate modeling and fault simulation of Byzantine resistive bridges 拜占庭式电阻桥的精确建模与故障仿真
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601923
H. Cheung, S. Gupta
Many recent studies show that a resistive bridging fault may cause intermediate voltages at the bridging fault site. Since the gates in the fanout of the fault site may have distinct and multiple logic threshold voltages, namely VIL and VIH, these gates may interpret the intermediate voltage as logic '1', logic '0', or logically indeterminate. Such fault behavior is described as the bridging fault Byzantine general problem (T. Nanya et al., Nov. 1989). None of the existing models of bridging faults used by bridging fault simulators accurately captures the indeterminate logic behavior of such bridges. We present a resistive bridging fault model that accurately yet efficiently captures indeterminate logic values. We also describe an efficient PPSFP bridging fault simulator and show that all previous approaches seriously overestimate bridging fault coverage.
近年来的许多研究表明,阻性桥接故障可能在桥接故障点产生中间电压。由于故障点的扇出门可能具有不同的多个逻辑阈值电压,即VIL和VIH,因此这些门可能将中间电压解释为逻辑“1”、逻辑“0”或逻辑不确定。这种故障行为被描述为桥接故障拜占庭一般问题(T. Nanya et al., Nov. 1989)。桥接故障模拟器所使用的现有桥接故障模型都不能准确地捕捉此类桥的不确定逻辑行为。我们提出了一种准确而有效地捕获不确定逻辑值的电阻桥接故障模型。我们还描述了一个高效的PPSFP桥接故障模拟器,并表明所有以前的方法都严重高估了桥接故障覆盖率。
{"title":"Accurate modeling and fault simulation of Byzantine resistive bridges","authors":"H. Cheung, S. Gupta","doi":"10.1109/ICCD.2007.4601923","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601923","url":null,"abstract":"Many recent studies show that a resistive bridging fault may cause intermediate voltages at the bridging fault site. Since the gates in the fanout of the fault site may have distinct and multiple logic threshold voltages, namely VIL and VIH, these gates may interpret the intermediate voltage as logic '1', logic '0', or logically indeterminate. Such fault behavior is described as the bridging fault Byzantine general problem (T. Nanya et al., Nov. 1989). None of the existing models of bridging faults used by bridging fault simulators accurately captures the indeterminate logic behavior of such bridges. We present a resistive bridging fault model that accurately yet efficiently captures indeterminate logic values. We also describe an efficient PPSFP bridging fault simulator and show that all previous approaches seriously overestimate bridging fault coverage.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84363267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
VIZOR: Virtually zero margin adaptive RF for ultra low power wireless communication VIZOR:用于超低功耗无线通信的几乎零边际自适应射频
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601956
R. Senguttuvan, Shreyas Sen, A. Chatterjee
Modern wireless transceiver systems are often overdesigned to meet the requirements of low bit error rate values at high data rates under worst-case channel operating conditions (interference, noise, multi-path effects). This results in circuits being designed with ldquosufficientrdquo margins leading to lower efficiency and high power consumption. In this paper, we develop an adaptive power management strategy for RF systems that optimally trades-off power vs. performance for the RF front-end to maintain operation at or below a specified maximum bit error rate (BER) across temporally changing operating conditions. As the communication channel degrades, more power is consumed by the RF front end and vice versa. Since the maximum bit-error rate specification is not violated, minimum voice or video quality through the wireless channel is always guaranteed.
现代无线收发器系统通常被过度设计,以满足在最坏的信道工作条件下(干扰、噪声、多径效应)在高数据速率下的低误码率值的要求。这导致电路设计的余量不足,导致效率降低和功耗高。在本文中,我们为射频系统开发了一种自适应电源管理策略,该策略可以在射频前端的功率与性能之间进行最佳权衡,从而在临时变化的操作条件下保持在指定的最大误码率(BER)或以下的运行。随着通信信道的退化,射频前端消耗更多的功率,反之亦然。由于不违反最大误码率规范,通过无线信道的最低语音或视频质量始终得到保证。
{"title":"VIZOR: Virtually zero margin adaptive RF for ultra low power wireless communication","authors":"R. Senguttuvan, Shreyas Sen, A. Chatterjee","doi":"10.1109/ICCD.2007.4601956","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601956","url":null,"abstract":"Modern wireless transceiver systems are often overdesigned to meet the requirements of low bit error rate values at high data rates under worst-case channel operating conditions (interference, noise, multi-path effects). This results in circuits being designed with ldquosufficientrdquo margins leading to lower efficiency and high power consumption. In this paper, we develop an adaptive power management strategy for RF systems that optimally trades-off power vs. performance for the RF front-end to maintain operation at or below a specified maximum bit error rate (BER) across temporally changing operating conditions. As the communication channel degrades, more power is consumed by the RF front end and vice versa. Since the maximum bit-error rate specification is not violated, minimum voice or video quality through the wireless channel is always guaranteed.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78313110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Negative-skewed shadow registers for at-speed delay variation characterization 高速延迟变化特性的负偏斜阴影寄存器
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601924
Jie Li, J. Lach
The increased process, voltage, and temperature (PVT) variability that comes with integrated circuit (IC) technology scaling has become a major problem in the semiconductor industry. In order to refine manufacturing processes and develop circuit design techniques to cope with variability, we must be able to accurately and precisely characterize the variations that occur. In this paper, we introduce a technique for characterizing combinational path delay variations by measuring a designer-controlled number of register-to-register delays in manufactured ICs with negative-skewed shadow registers. This technique enables delay measurements to be performed with at-speed tests that are run in parallel with and are orthogonal to other testing techniques, and therefore does not add combinatorial complexity to the testing process. This technique can be implemented cost-effectively on a large number of otherwise unobservable internal combinational paths to get accurate, precise data about delay variability.
集成电路(IC)技术缩放带来的工艺、电压和温度(PVT)可变性增加已经成为半导体行业的一个主要问题。为了改进制造工艺和开发电路设计技术以应对变异性,我们必须能够准确地描述发生的变化。在本文中,我们介绍了一种技术,通过测量具有负倾斜阴影寄存器的制造ic中设计人员控制的寄存器到寄存器延迟数来表征组合路径延迟变化。该技术允许使用与其他测试技术并行运行且与其他测试技术正交的高速测试来执行延迟测量,因此不会给测试过程增加组合复杂性。该技术可以在大量不可观察的内部组合路径上经济有效地实现,以获得关于延迟可变性的准确数据。
{"title":"Negative-skewed shadow registers for at-speed delay variation characterization","authors":"Jie Li, J. Lach","doi":"10.1109/ICCD.2007.4601924","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601924","url":null,"abstract":"The increased process, voltage, and temperature (PVT) variability that comes with integrated circuit (IC) technology scaling has become a major problem in the semiconductor industry. In order to refine manufacturing processes and develop circuit design techniques to cope with variability, we must be able to accurately and precisely characterize the variations that occur. In this paper, we introduce a technique for characterizing combinational path delay variations by measuring a designer-controlled number of register-to-register delays in manufactured ICs with negative-skewed shadow registers. This technique enables delay measurements to be performed with at-speed tests that are run in parallel with and are orthogonal to other testing techniques, and therefore does not add combinatorial complexity to the testing process. This technique can be implemented cost-effectively on a large number of otherwise unobservable internal combinational paths to get accurate, precise data about delay variability.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88482713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
A radix-10 SRT divider based on alternative BCD codings 基于备选BCD编码的基数-10 SRT除法器
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601914
Álvaro Vázquez, E. Antelo, P. Montuschi
In this paper we present the algorithm and architecture a radix-10 floating-point divider based on an SRT non-restoring digit-by-digit algorithm. The algorithm uses conventional techniques developed to speed-up radix-2k division such as signed-digit (SD) redundant quotient and digit selection by constant comparison using a carry-save estimate of the partial remainder. To optimize area and latency for decimal, we include novel features such as the use of alternative BCD codings to represent decimal operands, estimates by truncation at any binary position inside a decimal digit, a single customized fast carry propagate decimal adder for partial remainder computation, initial odd multiple generation and final normalization with rounding, and register placement to exploit advanced high fanin mux-latch circuits. The rough area-delay estimations performed show that the proposed divider has a similar latency but less hardware complexity (1.3 area ratio) than a recently published high performance digit-by-digit implementation.
本文提出了一种基于SRT逐位非恢复算法的基数-10浮点除法的算法和结构。该算法使用传统的技术来加速基数-2k除法,如有符号数字冗余商和通过使用部分余数的免进位估计进行常数比较的数字选择。为了优化十进制的面积和延迟,我们包含了一些新功能,例如使用替代BCD编码来表示十进制操作数,通过截断十进制数字内任何二进制位置进行估计,用于部分余数计算的单个定制快速进位传播十进制加法器,初始奇倍数生成和最终四舍五入归一化,以及利用先进的高fanin多路锁存电路的寄存器放置。粗略的面积延迟估计表明,所提出的分频器具有类似的延迟,但比最近发布的高性能数位分频器具有更低的硬件复杂性(1.3面积比)。
{"title":"A radix-10 SRT divider based on alternative BCD codings","authors":"Álvaro Vázquez, E. Antelo, P. Montuschi","doi":"10.1109/ICCD.2007.4601914","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601914","url":null,"abstract":"In this paper we present the algorithm and architecture a radix-10 floating-point divider based on an SRT non-restoring digit-by-digit algorithm. The algorithm uses conventional techniques developed to speed-up radix-2k division such as signed-digit (SD) redundant quotient and digit selection by constant comparison using a carry-save estimate of the partial remainder. To optimize area and latency for decimal, we include novel features such as the use of alternative BCD codings to represent decimal operands, estimates by truncation at any binary position inside a decimal digit, a single customized fast carry propagate decimal adder for partial remainder computation, initial odd multiple generation and final normalization with rounding, and register placement to exploit advanced high fanin mux-latch circuits. The rough area-delay estimations performed show that the proposed divider has a similar latency but less hardware complexity (1.3 area ratio) than a recently published high performance digit-by-digit implementation.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91176668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Exploring the interplay of yield, area, and performance in processor caches 探索处理器缓存的产量、面积和性能之间的相互作用
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601905
Hyunjin Lee, Sangyeun Cho, B. Childers
The deployment of future deep submicron technology calls for a careful review of existing cache organizations and design practices in terms of yield and performance. This paper presents a cache design flow that enables processor architects to consider yield, area, and performance (YAP) together in a unified framework. Since there is a complex, changing trade-off between these metrics depending on the technology, the cache organization, and the yield enhancement scheme employed, such a design flow becomes invaluable to processor architects when they assess a design and explore the design space quickly at an early stage. We develop a complete set of tools supporting the proposed design flow, from injecting defects into a wafer to evaluating program performance of individual processors in the wafer. A case study is presented to demonstrate the effectiveness of the proposed design flow and developed tools.
未来深亚微米技术的部署要求对现有的缓存组织和设计实践在产量和性能方面进行仔细的审查。本文提出了一个缓存设计流程,使处理器架构师能够在一个统一的框架中同时考虑产量、面积和性能(YAP)。由于这些指标之间存在复杂的、不断变化的权衡,这取决于所采用的技术、缓存组织和良率增强方案,因此当处理器架构师在早期阶段评估设计并快速探索设计空间时,这样的设计流程对他们来说变得非常宝贵。我们开发了一套完整的工具来支持所提出的设计流程,从向晶圆中注入缺陷到评估晶圆中单个处理器的程序性能。通过一个案例研究来证明所提出的设计流程和开发的工具的有效性。
{"title":"Exploring the interplay of yield, area, and performance in processor caches","authors":"Hyunjin Lee, Sangyeun Cho, B. Childers","doi":"10.1109/ICCD.2007.4601905","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601905","url":null,"abstract":"The deployment of future deep submicron technology calls for a careful review of existing cache organizations and design practices in terms of yield and performance. This paper presents a cache design flow that enables processor architects to consider yield, area, and performance (YAP) together in a unified framework. Since there is a complex, changing trade-off between these metrics depending on the technology, the cache organization, and the yield enhancement scheme employed, such a design flow becomes invaluable to processor architects when they assess a design and explore the design space quickly at an early stage. We develop a complete set of tools supporting the proposed design flow, from injecting defects into a wafer to evaluating program performance of individual processors in the wafer. A case study is presented to demonstrate the effectiveness of the proposed design flow and developed tools.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86469673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
FPGA routing architecture analysis under variations FPGA路由结构变化分析
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601894
S. Srinivasan, P. Mangalagiri, Yuan Xie, N. Vijaykrishnan
Systems with the combined features of ASICs and field programmable gate arrays(FPGAs) are increasingly being considered as technology forerunners looking at their extraordinary benefits. This drags FPGAs into the technology scaling race along with ASICs exposing the FPGA industries to the problems associated with scaling. Extensive process variations is one such issue which directly impacts the profit margins of hardware design beyond 65 nm gate length technology. Since the resources in FPGAs are primarily dominated by the interconnect fabric, variations in the interconnect impacting the critical path timing and leakage yield needs rigorous analysis. In this work we provide a statistical modeling of individual routing components in an FPGA followed by a statistical methodology to analyze the timing and leakage distribution. This statistical model is incorporated into the routing algorithm to model a new statistically intelligent routing algorithm (SIRA), which simultaneously optimizes the leakage and timing yield of the FPGA device. We demonstrate and average leakage yield increase of 9% and timing yield by 11% using our final algorithm.
集成集成电路(asic)和现场可编程门阵列(fpga)相结合的系统越来越被认为是技术先驱,因为它们具有非凡的优势。这将FPGA与asic一起拖入了技术扩展竞赛,使FPGA行业暴露于与扩展相关的问题。广泛的工艺变化就是这样一个问题,它直接影响到65纳米栅极长度技术以外硬件设计的利润空间。由于fpga中的资源主要由互连结构控制,因此需要严格分析互连中影响关键路径时序和泄漏率的变化。在这项工作中,我们提供了FPGA中单个路由组件的统计建模,然后使用统计方法分析时序和泄漏分布。将该统计模型引入到路由算法中,建立了一种新的统计智能路由算法(SIRA),该算法同时优化了FPGA器件的漏率和时序良率。我们证明,使用我们的最终算法,泄漏率平均提高9%,时序率提高11%。
{"title":"FPGA routing architecture analysis under variations","authors":"S. Srinivasan, P. Mangalagiri, Yuan Xie, N. Vijaykrishnan","doi":"10.1109/ICCD.2007.4601894","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601894","url":null,"abstract":"Systems with the combined features of ASICs and field programmable gate arrays(FPGAs) are increasingly being considered as technology forerunners looking at their extraordinary benefits. This drags FPGAs into the technology scaling race along with ASICs exposing the FPGA industries to the problems associated with scaling. Extensive process variations is one such issue which directly impacts the profit margins of hardware design beyond 65 nm gate length technology. Since the resources in FPGAs are primarily dominated by the interconnect fabric, variations in the interconnect impacting the critical path timing and leakage yield needs rigorous analysis. In this work we provide a statistical modeling of individual routing components in an FPGA followed by a statistical methodology to analyze the timing and leakage distribution. This statistical model is incorporated into the routing algorithm to model a new statistically intelligent routing algorithm (SIRA), which simultaneously optimizes the leakage and timing yield of the FPGA device. We demonstrate and average leakage yield increase of 9% and timing yield by 11% using our final algorithm.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87002578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Power reduction of chip multi-processors using shared resource control cooperating with DVFS 利用共享资源控制与DVFS合作降低芯片多处理器功耗
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601961
Ryoma Watanabe, Masaaki Kondo, Hiroshi Nakamura, T. Nanya
This paper presents a novel power reduction method for chip multi-processors (CMPs) under real-time constraints. While the power consumption of processing units (PUs) on CMPs can be reduced without violating real-time constraints by dynamic voltage and frequency scaling (DVFS), the clock frequency of each PU cannot be determined independently because of the performance impact caused by the conflict for the shared resources. To minimize power consumption in this situation, we first derive an analytical model which provides the optimal priority and clock frequency setting, and then propose a method of controlling the priority of shared resource accesses in cooperation with DVFS. From the analytical model, in dual-core CMPs, we reveal that the total power consumption is minimized when the clock frequency of two PUs becomes the same. An experiment with a synthetic benchmark supports the validity of the analytical model and the evaluation results with real applications show that the proposed method reduces the power consumption by up to 15% and 6.7% on average compared with a conventional DVFS technique.
本文提出了一种基于实时约束的芯片多处理器(cmp)功耗降低方法。动态电压和频率缩放(DVFS)可以在不违反实时约束的情况下降低cmp上处理器(PU)的功耗,但由于共享资源的冲突会影响性能,因此无法独立确定每个PU的时钟频率。为了在这种情况下最大限度地降低功耗,我们首先推导了一个提供最优优先级和时钟频率设置的分析模型,然后提出了一种与DVFS合作控制共享资源访问优先级的方法。从分析模型来看,在双核cmp中,我们发现当两个pu的时钟频率相同时,总功耗最小。综合基准实验验证了分析模型的有效性,实际应用的评价结果表明,与传统的DVFS技术相比,该方法的功耗平均降低了15%和6.7%。
{"title":"Power reduction of chip multi-processors using shared resource control cooperating with DVFS","authors":"Ryoma Watanabe, Masaaki Kondo, Hiroshi Nakamura, T. Nanya","doi":"10.1109/ICCD.2007.4601961","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601961","url":null,"abstract":"This paper presents a novel power reduction method for chip multi-processors (CMPs) under real-time constraints. While the power consumption of processing units (PUs) on CMPs can be reduced without violating real-time constraints by dynamic voltage and frequency scaling (DVFS), the clock frequency of each PU cannot be determined independently because of the performance impact caused by the conflict for the shared resources. To minimize power consumption in this situation, we first derive an analytical model which provides the optimal priority and clock frequency setting, and then propose a method of controlling the priority of shared resource accesses in cooperation with DVFS. From the analytical model, in dual-core CMPs, we reveal that the total power consumption is minimized when the clock frequency of two PUs becomes the same. An experiment with a synthetic benchmark supports the validity of the analytical model and the evaluation results with real applications show that the proposed method reduces the power consumption by up to 15% and 6.7% on average compared with a conventional DVFS technique.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86609224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Constraint satisfaction in incremental placement with application to performance optimization under power constraints 在功率约束下的性能优化应用中,增量布局的约束满足
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601910
Huan Ren, S. Dutt
We present new techniques for explicit constraint satisfaction in the incremental placement process. Our algorithm employs a Lagrangian relaxation (LR) type approach in the analytical global placement stage to solve the constrained optimization problem. We establish theoretical results that prove the optimality of this stage. In the detailed placement stage, we develop a constraint-monitoring and satisfaction mechanism in a network (n/w) flow based detailed placement framework proposed recently, and empirically show its near-optimality. We establish the effectiveness of our general constraint-satisfaction methods by applying them to the problem of timing-driven optimization under power constraints. We overlay our algorithms on a recently developed unconstrained timing-driven incremental placement method flow-place. On a large number of benchmarks with up to 210K cells, our constraint satisfaction algorithms obtain an average timing improvement of 12.4% under a 3% power increase limit (the actual average power increase incurred is only 2.1%), while the original unconstrained method gives an average power increase of 8.4% for a timing improvement of 17.3%. Our techniques thus yield a tradeoff of 75% power improvement to 28% timing deterioration for the given constraint. Our constraint-satisfying incremental placer is also quite fast, e.g., its run time for the 210 K-cell circuit ibm18 is only 1541 secs.
我们提出了在增量放置过程中满足显式约束的新技术。该算法采用拉格朗日松弛(LR)型方法在解析全局布局阶段解决约束优化问题。建立了理论结果,证明了这一阶段的最优性。在详细安置阶段,我们在最近提出的基于网络(n/w)流的详细安置框架中建立了约束监测和满意度机制,并实证证明了其接近最优性。将一般约束满足方法应用于功率约束下的时间驱动优化问题,验证了其有效性。我们将我们的算法覆盖在最近开发的无约束时间驱动的增量放置方法流放置上。在多达210K单元的大量基准测试中,我们的约束满足算法在3%的功率增长限制下获得了12.4%的平均时间改进(实际平均功率增长仅为2.1%),而原始的无约束方法在17.3%的时间改进下平均功率增加了8.4%。因此,在给定的约束条件下,我们的技术产生了75%的功率改进和28%的时间退化的折衷。我们的满足约束的增量放置器也非常快,例如,它在210 k单元电路ibm18上的运行时间仅为1541秒。
{"title":"Constraint satisfaction in incremental placement with application to performance optimization under power constraints","authors":"Huan Ren, S. Dutt","doi":"10.1109/ICCD.2007.4601910","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601910","url":null,"abstract":"We present new techniques for explicit constraint satisfaction in the incremental placement process. Our algorithm employs a Lagrangian relaxation (LR) type approach in the analytical global placement stage to solve the constrained optimization problem. We establish theoretical results that prove the optimality of this stage. In the detailed placement stage, we develop a constraint-monitoring and satisfaction mechanism in a network (n/w) flow based detailed placement framework proposed recently, and empirically show its near-optimality. We establish the effectiveness of our general constraint-satisfaction methods by applying them to the problem of timing-driven optimization under power constraints. We overlay our algorithms on a recently developed unconstrained timing-driven incremental placement method flow-place. On a large number of benchmarks with up to 210K cells, our constraint satisfaction algorithms obtain an average timing improvement of 12.4% under a 3% power increase limit (the actual average power increase incurred is only 2.1%), while the original unconstrained method gives an average power increase of 8.4% for a timing improvement of 17.3%. Our techniques thus yield a tradeoff of 75% power improvement to 28% timing deterioration for the given constraint. Our constraint-satisfying incremental placer is also quite fast, e.g., its run time for the 210 K-cell circuit ibm18 is only 1541 secs.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79532801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2007 25th International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1