首页 > 最新文献

2007 25th International Conference on Computer Design最新文献

英文 中文
Distributed voting for fault-tolerant nanoscale systems 容错纳米级系统的分布式投票
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601954
A. Namazi, M. Nourani
In this paper, we propose a distributed voting strategy to design a robust NMR system. We show that using inexpensive current-based drivers and buffers, we can completely eliminate the centralized voter unit and do the majority voting among N modules in a distributed fashion. Our strategy achieves high reliability that is vital for future nano systems in which high defect rate is expected. Experimental results are also reported to verify the concept, clarify the design procedure and measure the system's reliability.
在本文中,我们提出了一种分布式投票策略来设计一个鲁棒核磁共振系统。我们表明,使用廉价的基于电流的驱动器和缓冲区,我们可以完全消除集中式选民单元,并以分布式方式在N个模块中进行多数投票。我们的策略实现了高可靠性,这对未来高缺陷率的纳米系统至关重要。实验结果验证了系统的设计思想,阐明了系统的设计过程,并对系统的可靠性进行了测试。
{"title":"Distributed voting for fault-tolerant nanoscale systems","authors":"A. Namazi, M. Nourani","doi":"10.1109/ICCD.2007.4601954","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601954","url":null,"abstract":"In this paper, we propose a distributed voting strategy to design a robust NMR system. We show that using inexpensive current-based drivers and buffers, we can completely eliminate the centralized voter unit and do the majority voting among N modules in a distributed fashion. Our strategy achieves high reliability that is vital for future nano systems in which high defect rate is expected. Experimental results are also reported to verify the concept, clarify the design procedure and measure the system's reliability.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86418302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Improving cache efficiency via resizing + remapping 通过调整大小和重新映射来提高缓存效率
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601879
Subramanian Ramaswamy, S. Yalamanchili
In this paper we propose techniques to dynamically downsize or upsize a cache accompanied by cache set/line shutdown to produce efficient caches. Unlike previous approaches, resizing is accompanied by a non-uniform remapping of memory into the resized cache, thus avoiding misses to sets/lines that are shut off. The paper first provides an analysis into the causes of energy inefficiencies revealing a simple model for improving efficiency. Based on this model we propose the concept of "folding" - memory regions mapping to disjoint cache resources are combined to share cache sets producing a new placement function. Folding enables powering down cache sets at the expense of possibly increasing conflict misses. Effective folding heuristics can substantially increase energy efficiency at the expense of acceptable increase in execution time. We target the 12 cache because of its larger size and greater energy consumption. Our techniques increase cache energy efficiency by 20%, and reduce the EDP (energy delay product) by up to 45% with an IPC degradation of less than 4%. The results also indicate opportunity for improving cache efficiencies further via cooperative compiler interactions.
在本文中,我们提出了动态缩小或增大缓存的技术,同时关闭缓存集/行以产生高效的缓存。与以前的方法不同,调整大小伴随着将内存重新映射到调整大小的缓存中,从而避免错过关闭的集/行。本文首先对能源效率低下的原因进行了分析,揭示了一个提高效率的简单模型。基于该模型,我们提出了“折叠”的概念——将映射到不相交的缓存资源的存储区域组合在一起以共享缓存集,从而产生新的放置函数。折叠可以关闭缓存集,但代价是可能增加冲突丢失。有效的折叠启发式可以在可接受的执行时间增加的代价下大幅提高能源效率。我们的目标是12缓存,因为它的大小和能耗更大。我们的技术将缓存能源效率提高了20%,并将EDP(能量延迟产品)降低了45%,IPC退化低于4%。结果还指出了通过协作编译器交互进一步提高缓存效率的机会。
{"title":"Improving cache efficiency via resizing + remapping","authors":"Subramanian Ramaswamy, S. Yalamanchili","doi":"10.1109/ICCD.2007.4601879","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601879","url":null,"abstract":"In this paper we propose techniques to dynamically downsize or upsize a cache accompanied by cache set/line shutdown to produce efficient caches. Unlike previous approaches, resizing is accompanied by a non-uniform remapping of memory into the resized cache, thus avoiding misses to sets/lines that are shut off. The paper first provides an analysis into the causes of energy inefficiencies revealing a simple model for improving efficiency. Based on this model we propose the concept of \"folding\" - memory regions mapping to disjoint cache resources are combined to share cache sets producing a new placement function. Folding enables powering down cache sets at the expense of possibly increasing conflict misses. Effective folding heuristics can substantially increase energy efficiency at the expense of acceptable increase in execution time. We target the 12 cache because of its larger size and greater energy consumption. Our techniques increase cache energy efficiency by 20%, and reduce the EDP (energy delay product) by up to 45% with an IPC degradation of less than 4%. The results also indicate opportunity for improving cache efficiencies further via cooperative compiler interactions.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83623566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Transparent mode flip-flops for collapsible pipelines 可折叠管道的透明模式人字拖
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601952
Eric L. Hill, Mikko H. Lipasti
Prior work has shown that collapsible pipelining techniques have the potential to significantly reduce clocking activity, which can consume up to 70% of the dynamic power in modern high performance microprocessors. Previous collapsible pipeline proposals either rely on single phase clocking (by forcing latches into transparent state) or do not discuss the mechanisms by which stages are merged. In this work two flip-flop designs featuring an additional transparent state suitable for collapsing stages are presented. Transparency is achieved either by decoupling the master and slave clocks to keep both latches transparent, or by using a bypass mux that routes around the flip-flop. Both of these designs are evaluated in the context of transparently gated pipelines, an ad-hoc collapsible pipelining technique. Detailed analysis shows that the decoupled clock flip-flop is the most attractive in terms of energy and delay.
先前的研究表明,可折叠的流水线技术有可能显著减少时钟活动,这可能会消耗现代高性能微处理器中高达70%的动态功率。以前的可折叠管道提案要么依赖于单相时钟(通过迫使锁存器进入透明状态),要么没有讨论阶段合并的机制。在这项工作中,提出了两种具有适合折叠阶段的额外透明状态的触发器设计。通过将主时钟和从时钟解耦以保持两个锁存器透明,或者通过使用绕过触发器的旁路复用器来实现透明度。这两种设计都是在透明门控管道的背景下进行评估的,透明门控管道是一种特殊的可折叠管道技术。详细分析表明,解耦时钟触发器在能量和延迟方面最具吸引力。
{"title":"Transparent mode flip-flops for collapsible pipelines","authors":"Eric L. Hill, Mikko H. Lipasti","doi":"10.1109/ICCD.2007.4601952","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601952","url":null,"abstract":"Prior work has shown that collapsible pipelining techniques have the potential to significantly reduce clocking activity, which can consume up to 70% of the dynamic power in modern high performance microprocessors. Previous collapsible pipeline proposals either rely on single phase clocking (by forcing latches into transparent state) or do not discuss the mechanisms by which stages are merged. In this work two flip-flop designs featuring an additional transparent state suitable for collapsing stages are presented. Transparency is achieved either by decoupling the master and slave clocks to keep both latches transparent, or by using a bypass mux that routes around the flip-flop. Both of these designs are evaluated in the context of transparently gated pipelines, an ad-hoc collapsible pipelining technique. Detailed analysis shows that the decoupled clock flip-flop is the most attractive in terms of energy and delay.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80065065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Benchmarks and performance analysis of decimal floating-point applications 十进制浮点应用程序的基准测试和性能分析
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601896
Liang-Kai Wang, C. Tsen, M. Schulte, Divya Jhalani
The IEEE P754 draft standard for floating-point arithmetic provides specifications for decimal floating-point (DFP) formats and operations. Based on this standard, many developers will provide support for DFP calculations. We present a benchmark suite for DFP applications and use this suite to evaluate the performance of hardware and software DFP solutions. Our benchmarks include banking, commerce, risk-management, tax, and telephone billing applications organized into a suite of five macro benchmarks. In addition to developing our own applications, we leverage open-source projects and academic financial analysis applications. The benchmarks are modular, making them easy to adapt for different DFP solutions. We use the benchmarks to evaluate the performance of the decNumber DFP library and an extended version of the SimpleScalar PISA architecture with hardware and instruction set support for DFP operations. Our analysis shows that providing processor support for high-speed DFP operations significantly improves the performance of DFP applications.
浮点运算的IEEE P754标准草案提供了十进制浮点(DFP)格式和操作的规范。基于这个标准,许多开发人员将提供对DFP计算的支持。我们提出了一个DFP应用程序的基准套件,并使用该套件来评估硬件和软件DFP解决方案的性能。我们的基准测试包括银行、商业、风险管理、税务和电话计费应用程序,这些应用程序被组织成一组五个宏观基准测试。除了开发我们自己的应用程序,我们还利用开源项目和学术财务分析应用程序。基准测试是模块化的,这使得它们很容易适应不同的DFP解决方案。我们使用基准来评估decNumber DFP库和SimpleScalar PISA体系结构的扩展版本的性能,该体系结构具有支持DFP操作的硬件和指令集。我们的分析表明,为高速DFP操作提供处理器支持可以显著提高DFP应用程序的性能。
{"title":"Benchmarks and performance analysis of decimal floating-point applications","authors":"Liang-Kai Wang, C. Tsen, M. Schulte, Divya Jhalani","doi":"10.1109/ICCD.2007.4601896","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601896","url":null,"abstract":"The IEEE P754 draft standard for floating-point arithmetic provides specifications for decimal floating-point (DFP) formats and operations. Based on this standard, many developers will provide support for DFP calculations. We present a benchmark suite for DFP applications and use this suite to evaluate the performance of hardware and software DFP solutions. Our benchmarks include banking, commerce, risk-management, tax, and telephone billing applications organized into a suite of five macro benchmarks. In addition to developing our own applications, we leverage open-source projects and academic financial analysis applications. The benchmarks are modular, making them easy to adapt for different DFP solutions. We use the benchmarks to evaluate the performance of the decNumber DFP library and an extended version of the SimpleScalar PISA architecture with hardware and instruction set support for DFP operations. Our analysis shows that providing processor support for high-speed DFP operations significantly improves the performance of DFP applications.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80833022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Continual hashing for efficient fine-grain state inconsistency detection 连续散列,实现高效的细粒度状态不一致检测
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601877
Jae W. Lee, Myron King, K. Asanović
Transaction-level modeling (TLM) allows a designer to save functional verification effort during the modular refinement of an SoC by reusing the prior implementation of a module as a golden model for state inconsistency detection. One problem in simulation-based verification is the performance and bandwidth overhead of state dump and comparison between two models. In this paper, we propose an efficient fine-grain state inconsistency detection technique that checks the consistency of two states of arbitrary size at sub- transaction (tick) granularity using incremental hashes. At each tick, the hash generates a signature of the entire state, which can be efficiently updated and compared. We evaluate the proposed signature scheme with a FIR filter and a Vorbis decoder and show that very fine-grain state consistency checking is feasible. The hash signature checking increases execution time of Bluespec RTL simulation by 1.2% for the FIR filter and by 2.2% for the Verbis decoder while correctly detecting any injected state inconsistency.
事务级建模(TLM)通过重用模块的先前实现作为状态不一致检测的黄金模型,允许设计人员在SoC的模块化改进期间节省功能验证工作。基于仿真的验证中存在的一个问题是状态转储的性能和带宽开销以及两种模型之间的比较。在本文中,我们提出了一种有效的细粒度状态不一致检测技术,该技术使用增量哈希在子事务(tick)粒度上检查任意大小的两个状态的一致性。在每个滴答声中,散列生成整个状态的签名,可以有效地更新和比较。我们用FIR滤波器和Vorbis解码器对所提出的签名方案进行了评估,并证明了非常细粒度的状态一致性检查是可行的。哈希签名检查在正确检测任何注入状态不一致的同时,将FIR滤波器的Bluespec RTL模拟的执行时间增加了1.2%,Verbis解码器的执行时间增加了2.2%。
{"title":"Continual hashing for efficient fine-grain state inconsistency detection","authors":"Jae W. Lee, Myron King, K. Asanović","doi":"10.1109/ICCD.2007.4601877","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601877","url":null,"abstract":"Transaction-level modeling (TLM) allows a designer to save functional verification effort during the modular refinement of an SoC by reusing the prior implementation of a module as a golden model for state inconsistency detection. One problem in simulation-based verification is the performance and bandwidth overhead of state dump and comparison between two models. In this paper, we propose an efficient fine-grain state inconsistency detection technique that checks the consistency of two states of arbitrary size at sub- transaction (tick) granularity using incremental hashes. At each tick, the hash generates a signature of the entire state, which can be efficiently updated and compared. We evaluate the proposed signature scheme with a FIR filter and a Vorbis decoder and show that very fine-grain state consistency checking is feasible. The hash signature checking increases execution time of Bluespec RTL simulation by 1.2% for the FIR filter and by 2.2% for the Verbis decoder while correctly detecting any injected state inconsistency.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77700902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An efficient gate delay model for VLSI design VLSI设计中一种有效的门延迟模型
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601938
T. Chiang, C. Y. Chen, Weiyu Chen
Accurate estimation of gate delays is essential for timing-related CAD tools. CAD researchers tend to use Elmore delay model for estimating gate delays. Since Elmore delay model was primarily developed for estimating interconnection delay, when applied to gate delay estimation, there will be significant inaccuracy. In this paper, by embedding concepts of electronic theories into switch-level analysis, a simple and efficient delay model for gates of general types (such as NAND, NOR, and complex gates) is proposed. Experimental data show that the proposed gate delay model consistently achieves high accuracy (typically within around 2% of SPICE simulations).
门延迟的准确估计对于时间相关的CAD工具是必不可少的。CAD研究者倾向于使用Elmore延迟模型来估计门延迟。由于Elmore延迟模型主要是为了估计互连延迟而开发的,当应用于门延迟估计时,会有很大的不准确性。在本文中,通过将电子理论的概念嵌入到开关级分析中,提出了一种简单有效的一般类型门(如NAND, NOR和复杂门)的延迟模型。实验数据表明,所提出的门延迟模型始终达到高精度(通常在SPICE模拟的2%左右)。
{"title":"An efficient gate delay model for VLSI design","authors":"T. Chiang, C. Y. Chen, Weiyu Chen","doi":"10.1109/ICCD.2007.4601938","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601938","url":null,"abstract":"Accurate estimation of gate delays is essential for timing-related CAD tools. CAD researchers tend to use Elmore delay model for estimating gate delays. Since Elmore delay model was primarily developed for estimating interconnection delay, when applied to gate delay estimation, there will be significant inaccuracy. In this paper, by embedding concepts of electronic theories into switch-level analysis, a simple and efficient delay model for gates of general types (such as NAND, NOR, and complex gates) is proposed. Experimental data show that the proposed gate delay model consistently achieves high accuracy (typically within around 2% of SPICE simulations).","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79487681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An automated runtime power-gating scheme 一个自动运行时功率门控方案
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601928
M. Hamada, T. Kitahara, N. Kawabe, Hironori Sato, T. Nishikawa, T. Shimazawa, T. Yamashita, H. Hara, Y. Oowaki
An automated runtime power-gating scheme to reduce the leakage power in the active mode is presented in this paper. We propose a circuit that generates a sleep control signal from a clock-gating control signal automatically. By the combination of selective MT-CMOS scheme, the generated sleep control signal, and a novel flip-flop circuit with an additional latch function, a zero-wait transition from a sleep mode to an active mode is enabled. The additional latch function required for the zero-wait transition is achieved by only 6 transistors in addition to a conventional flip- flop. By the scheme, any design with the clock-gating scheme can be transformed automatically to a power- gated design while keeping the system operation the same in terms of the cycle accuracy. The scheme is applied to an MPEG4/H.264 audio/video codec and 21% power saving is achieved in the active mode while keeping the area overhead only 16% in a 90 nm CMOS design.
本文提出了一种自动运行功率门控方案,以降低有源模式下的泄漏功率。我们提出了一种由时钟门控信号自动产生睡眠控制信号的电路。通过选择性MT-CMOS方案、生成的睡眠控制信号和具有附加锁存功能的新颖触发器电路的组合,实现了从睡眠模式到活动模式的零等待转换。零等待转换所需的额外锁存器功能除了一个传统触发器外,只需要6个晶体管即可实现。通过该方案,任何具有时钟门控方案的设计都可以自动转换为功率门控设计,同时保持系统在周期精度方面的运行相同。该方案适用于MPEG4/H格式。在有源模式下,可实现264音频/视频编解码器和21%的功耗节约,同时在90纳米CMOS设计中仅保持16%的面积开销。
{"title":"An automated runtime power-gating scheme","authors":"M. Hamada, T. Kitahara, N. Kawabe, Hironori Sato, T. Nishikawa, T. Shimazawa, T. Yamashita, H. Hara, Y. Oowaki","doi":"10.1109/ICCD.2007.4601928","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601928","url":null,"abstract":"An automated runtime power-gating scheme to reduce the leakage power in the active mode is presented in this paper. We propose a circuit that generates a sleep control signal from a clock-gating control signal automatically. By the combination of selective MT-CMOS scheme, the generated sleep control signal, and a novel flip-flop circuit with an additional latch function, a zero-wait transition from a sleep mode to an active mode is enabled. The additional latch function required for the zero-wait transition is achieved by only 6 transistors in addition to a conventional flip- flop. By the scheme, any design with the clock-gating scheme can be transformed automatically to a power- gated design while keeping the system operation the same in terms of the cycle accuracy. The scheme is applied to an MPEG4/H.264 audio/video codec and 21% power saving is achieved in the active mode while keeping the area overhead only 16% in a 90 nm CMOS design.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74352673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Whitespace redistribution for thermal via insertion in 3D stacked ICs 三维堆叠集成电路中热插入的空白再分配
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601912
E. Wong, S. Lim
One of the biggest challenges in 3D stacked IC design is heat dissipation. Incorporating thermal vias is a promising method for reducing the temperatures of 3D ICs. The bonding styles between device layers impose certain restrictions to where thermal vias may be inserted. This paper presents a whitespace redistribution algorithm that takes bonding style into consideration to improve thermal via placement, which in turn reduces temperature.
3D堆叠IC设计的最大挑战之一是散热。结合热通孔是降低3D集成电路温度的一种很有前途的方法。器件层之间的键合方式对热通孔可能插入的位置施加了一定的限制。本文提出了一种考虑键合方式的空白重新分配算法,通过放置来改善散热,从而降低温度。
{"title":"Whitespace redistribution for thermal via insertion in 3D stacked ICs","authors":"E. Wong, S. Lim","doi":"10.1109/ICCD.2007.4601912","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601912","url":null,"abstract":"One of the biggest challenges in 3D stacked IC design is heat dissipation. Incorporating thermal vias is a promising method for reducing the temperatures of 3D ICs. The bonding styles between device layers impose certain restrictions to where thermal vias may be inserted. This paper presents a whitespace redistribution algorithm that takes bonding style into consideration to improve thermal via placement, which in turn reduces temperature.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88296748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A position-insensitive finished store buffer 位置不敏感的已完成存储缓冲区
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601888
Erika Gunadi, Mikko H. Lipasti
This paper presents the finished store buffer (or FSB), an alternative and position-insensitive approach for building a scalable store buffer for an out-of-order processor. Exploiting the fact that only a small portion of in-flight stores are done executing (i.e. finished) and waiting for retirement, we are able to build a much smaller and more scalable store buffer. Our study shows that we only need at most half of the number of entries in a conventional store queue if we buffer only the stores that have finished execution. Entries in the store buffer are allocated at issue and disallocated on retirement. A clever encoder circuit is used to provide positional searches without an explicitly positional queue structure. While reducing the access latency and power consumption significantly, our technique has virtually no detrimental effect on per-cycle performance (IPC).
本文提出了完成存储缓冲区(或FSB),这是一种为无序处理器构建可扩展存储缓冲区的替代方法和位置不敏感方法。利用只有一小部分飞行中的商店完成执行(即完成)并等待退役的事实,我们能够构建一个更小且更具可扩展性的商店缓冲区。我们的研究表明,如果我们只缓冲已经完成执行的存储,我们最多只需要传统存储队列中条目数量的一半。存储缓冲区中的条目在发出时分配,在退出时取消分配。使用了一个聪明的编码器电路来提供位置搜索,而不需要明确的位置队列结构。在显著降低访问延迟和功耗的同时,我们的技术对每周期性能(IPC)几乎没有不利影响。
{"title":"A position-insensitive finished store buffer","authors":"Erika Gunadi, Mikko H. Lipasti","doi":"10.1109/ICCD.2007.4601888","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601888","url":null,"abstract":"This paper presents the finished store buffer (or FSB), an alternative and position-insensitive approach for building a scalable store buffer for an out-of-order processor. Exploiting the fact that only a small portion of in-flight stores are done executing (i.e. finished) and waiting for retirement, we are able to build a much smaller and more scalable store buffer. Our study shows that we only need at most half of the number of entries in a conventional store queue if we buffer only the stores that have finished execution. Entries in the store buffer are allocated at issue and disallocated on retirement. A clever encoder circuit is used to provide positional searches without an explicitly positional queue structure. While reducing the access latency and power consumption significantly, our technique has virtually no detrimental effect on per-cycle performance (IPC).","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84569864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An efficient routing method for pseudo-exhaustive built-in self-testing of high-speed interconnects 高速互连伪穷举内置自检的一种有效路由方法
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601925
Jianxun Liu, W. Jone
This paper presents a powerful routing method for pseudo-exhaustive built-in self-testing of high-speed interconnects with both capacitive and inductive crosstalk effects. Based on the concepts of test cone and cut-off locality, the routing method can generate an interconnect structure such that all nets can be tested by pseudoexhaustive patterns. The test pattern generation method is simple and efficient. Experimental results obtained by simulating a set of MCNC benchmarks demonstrate the feasibility of the proposed pseudo-exhaustive test approach and the efficiency of the proposed routing method.
本文提出了一种强大的路由方法,用于具有电容串扰和电感串扰效应的高速互连的伪穷举内置自测试。基于测试锥和截止局域的概念,路由方法可以生成一个互连结构,使得所有网络都可以通过伪穷极模式进行测试。该测试模式生成方法简单有效。仿真一组MCNC基准测试的实验结果证明了所提出的伪穷举测试方法的可行性和路由方法的有效性。
{"title":"An efficient routing method for pseudo-exhaustive built-in self-testing of high-speed interconnects","authors":"Jianxun Liu, W. Jone","doi":"10.1109/ICCD.2007.4601925","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601925","url":null,"abstract":"This paper presents a powerful routing method for pseudo-exhaustive built-in self-testing of high-speed interconnects with both capacitive and inductive crosstalk effects. Based on the concepts of test cone and cut-off locality, the routing method can generate an interconnect structure such that all nets can be tested by pseudoexhaustive patterns. The test pattern generation method is simple and efficient. Experimental results obtained by simulating a set of MCNC benchmarks demonstrate the feasibility of the proposed pseudo-exhaustive test approach and the efficiency of the proposed routing method.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89565720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2007 25th International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1