首页 > 最新文献

2012 IEEE 30th International Conference on Computer Design (ICCD)最新文献

英文 中文
Improving inclusive cache performance with two-level eviction priority 通过两级退出优先级提高包容性缓存性能
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378668
Lingda Li, Dong Tong, Zichao Xie, Junlin Lu, Xu Cheng
Inclusive cache hierarchies are widely adopted in modern processors, since they can simplify the implementation of cache coherence. However, it sacrifices some performance to guarantee inclusion. Many recent intelligent management policies are proposed to improve the last-level cache (LLC) performance by evicting blocks with poor locality earlier. Unfortunately, they are inapplicable in inclusive LLCs. In this paper, we propose Two-level Eviction Priority (TEP) policy. Besides the eviction priority provided by the baseline replacement policy, TEP appends an additional high level of eviction priority to LLC blocks, which is decided at the insertion time and cannot be changed during their lifetime in the LLC. When blocks with high eviction priority are not in inner caches anymore, they get evicted from the LLC preferentially. Thus, the LLC can retain more useful blocks to improve performance. TEP can cooperate well with various baseline replacement policies. Our evaluation shows that TEP with NRU can improve the performance of inclusive LLCs significantly while requiring negligible extra storage. It also outperforms other recent proposals including QBS, DIP, and DRRIP.
包含缓存层次结构在现代处理器中被广泛采用,因为它们可以简化缓存一致性的实现。但是,它牺牲了一些性能来保证包含。最近提出了许多智能管理策略,通过更早地清除局部性差的块来提高最后一级缓存(LLC)的性能。不幸的是,它们不适用于包容性有限责任公司。在本文中,我们提出了两级驱逐优先(TEP)政策。除了基线替换策略提供的驱逐优先级外,TEP还为LLC块附加了一个额外的高级别驱逐优先级,该优先级在插入时确定,并且在LLC中的生命周期内无法更改。当具有高驱逐优先级的块不再位于内部缓存中时,它们将优先从LLC中被驱逐。因此,有限责任公司可以保留更多有用的块来提高性能。TEP可以很好地配合各种基线替换策略。我们的评估表明,带NRU的TEP可以显著提高包容性有限责任公司的性能,而无需额外的存储空间。它也优于其他最近的提案,包括QBS、DIP和DRRIP。
{"title":"Improving inclusive cache performance with two-level eviction priority","authors":"Lingda Li, Dong Tong, Zichao Xie, Junlin Lu, Xu Cheng","doi":"10.1109/ICCD.2012.6378668","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378668","url":null,"abstract":"Inclusive cache hierarchies are widely adopted in modern processors, since they can simplify the implementation of cache coherence. However, it sacrifices some performance to guarantee inclusion. Many recent intelligent management policies are proposed to improve the last-level cache (LLC) performance by evicting blocks with poor locality earlier. Unfortunately, they are inapplicable in inclusive LLCs. In this paper, we propose Two-level Eviction Priority (TEP) policy. Besides the eviction priority provided by the baseline replacement policy, TEP appends an additional high level of eviction priority to LLC blocks, which is decided at the insertion time and cannot be changed during their lifetime in the LLC. When blocks with high eviction priority are not in inner caches anymore, they get evicted from the LLC preferentially. Thus, the LLC can retain more useful blocks to improve performance. TEP can cooperate well with various baseline replacement policies. Our evaluation shows that TEP with NRU can improve the performance of inclusive LLCs significantly while requiring negligible extra storage. It also outperforms other recent proposals including QBS, DIP, and DRRIP.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116637529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Locating faults in application-dependent interconnects of SRAM based FPGAs 基于SRAM的fpga应用相关互连故障定位
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378678
T. N. Kumar, H. Almurib, F. Lombardi
This paper presents a new method for locating multiple faults in an interconnect following application testing of an FPGA. This method utilizes conditions related to the interconnect structure and in particular, the presence of paths of nets that are either disjoint or joint between the primary input and at least one primary output. They yield to a rather adaptive approach by which faults are hierarchically located using the walking-1 test set. The proposed method is not dependent on net ordering and is capable to locate multiple stuck-at and pairwise bridging faults. This process requires 1+log2 k test configurations for multiple stuck-at location and 2+2log2 k additional test configurations to locate more than one pair-wise bridging faults (where k denotes the maximum combinational depth). As validated by simulation for benchmark circuits (implemented on the Xilinx Virtex4), the proposed method results in a significant reduction in the number of configurations.
通过FPGA的应用测试,提出了一种新的互连多故障定位方法。该方法利用与互连结构相关的条件,特别是在主要输入和至少一个主要输出之间存在不相交或连接的网络路径。它们屈服于一种相当自适应的方法,通过这种方法,使用walk -1测试集分层地定位故障。该方法不依赖于网络排序,能够定位多个卡滞故障和两两桥接故障。这个过程需要1+ log2k个测试配置用于多个卡在位置,2+ 2log2k个额外的测试配置用于定位多个成对桥接故障(其中k表示最大组合深度)。通过对基准电路(在Xilinx Virtex4上实现)的仿真验证,所提出的方法显著减少了配置的数量。
{"title":"Locating faults in application-dependent interconnects of SRAM based FPGAs","authors":"T. N. Kumar, H. Almurib, F. Lombardi","doi":"10.1109/ICCD.2012.6378678","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378678","url":null,"abstract":"This paper presents a new method for locating multiple faults in an interconnect following application testing of an FPGA. This method utilizes conditions related to the interconnect structure and in particular, the presence of paths of nets that are either disjoint or joint between the primary input and at least one primary output. They yield to a rather adaptive approach by which faults are hierarchically located using the walking-1 test set. The proposed method is not dependent on net ordering and is capable to locate multiple stuck-at and pairwise bridging faults. This process requires 1+log2 k test configurations for multiple stuck-at location and 2+2log2 k additional test configurations to locate more than one pair-wise bridging faults (where k denotes the maximum combinational depth). As validated by simulation for benchmark circuits (implemented on the Xilinx Virtex4), the proposed method results in a significant reduction in the number of configurations.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127032025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Maximizing crosstalk-induced slowdown during path delay test 在路径延迟测试中最大化串扰引起的减速
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378635
Dibakar Gope, D. Walker
In this paper, we present a timing-driven test generator to sensitize multiple aligned aggressors coupled to a delay-sensitive victim path to detect the combination of a delay spot defect and crosstalk-induced slowdown. The framework uses parasitic capacitance information, timing windows and crosstalk-induced delay estimates to screen out unaligned or ineffective aggressors coupled to a victim path, speeding up crosstalk pattern generation. In order to induce maximum crosstalk slowdown along a path, aggressors are prioritized based on their potential delay increase and timing alignment. The test generation engine introduces the concept of alignment-driven path sensitization to generate paths from inputs to coupled aggressor nets that meet timing alignment and direction requirements. In addition, two new crosstalk-driven dynamic test compaction algorithms are developed to control the increase in test pattern count. The proposed test generation algorithm is applied to ISCAS85 and ISCAS89 benchmark circuits. SPICE simulation results demonstrate the ability of the alignment-driven test generator to increase crosstalk-induced delays along victim paths.
在本文中,我们提出了一个时间驱动的测试发生器,以敏感多个对齐的攻击者耦合到一个延迟敏感的受害者路径,以检测延迟点缺陷和串扰引起的减速的组合。该框架使用寄生电容信息、定时窗口和串扰引起的延迟估计来筛选耦合到受害者路径的未对齐或无效攻击者,加速串扰模式的生成。为了在路径上诱导最大的串扰减速,攻击者根据其潜在的延迟增加和定时对齐进行优先级排序。测试生成引擎引入了对准驱动路径敏化的概念,以生成从输入到耦合干扰网的路径,满足定时对准和方向要求。此外,还开发了两种新的串扰驱动动态测试压缩算法来控制测试模式数的增加。所提出的测试生成算法已应用于ISCAS85和ISCAS89基准电路。SPICE仿真结果表明,对准驱动的测试发生器能够增加受害者路径上串扰引起的延迟。
{"title":"Maximizing crosstalk-induced slowdown during path delay test","authors":"Dibakar Gope, D. Walker","doi":"10.1109/ICCD.2012.6378635","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378635","url":null,"abstract":"In this paper, we present a timing-driven test generator to sensitize multiple aligned aggressors coupled to a delay-sensitive victim path to detect the combination of a delay spot defect and crosstalk-induced slowdown. The framework uses parasitic capacitance information, timing windows and crosstalk-induced delay estimates to screen out unaligned or ineffective aggressors coupled to a victim path, speeding up crosstalk pattern generation. In order to induce maximum crosstalk slowdown along a path, aggressors are prioritized based on their potential delay increase and timing alignment. The test generation engine introduces the concept of alignment-driven path sensitization to generate paths from inputs to coupled aggressor nets that meet timing alignment and direction requirements. In addition, two new crosstalk-driven dynamic test compaction algorithms are developed to control the increase in test pattern count. The proposed test generation algorithm is applied to ISCAS85 and ISCAS89 benchmark circuits. SPICE simulation results demonstrate the ability of the alignment-driven test generator to increase crosstalk-induced delays along victim paths.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133109242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Ring oscillator physical unclonable function with multi level supply voltages 环形振荡器具有多电平供电电压的物理不可克隆功能
Pub Date : 2012-07-17 DOI: 10.1109/ICCD.2012.6378703
S. Mansouri, E. Dubrova
In this paper we introduce a new type of Ring Oscillator PUF (RO-PUF) in which the inverters composing the ring oscillators can be supplied by independent voltages. This new RO-PUF can improve the reliability of the PUF in case of temperature variations.
本文介绍了一种新型的环形振荡器PUF (RO-PUF),其组成环形振荡器的逆变器可以由独立的电压供电。这种新型RO-PUF可以提高PUF在温度变化情况下的可靠性。
{"title":"Ring oscillator physical unclonable function with multi level supply voltages","authors":"S. Mansouri, E. Dubrova","doi":"10.1109/ICCD.2012.6378703","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378703","url":null,"abstract":"In this paper we introduce a new type of Ring Oscillator PUF (RO-PUF) in which the inverters composing the ring oscillators can be supplied by independent voltages. This new RO-PUF can improve the reliability of the PUF in case of temperature variations.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116089845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs Xpipes:用于多处理器soc的延迟不敏感参数化片上网络架构
Pub Date : 2003-10-13 DOI: 10.1109/ICCD.2012.6378615
M. Dall'Osso, G. Biccari, L. Giovannini, D. Bertozzi, L. Benini
The growing complexity of customizable embedded multi-processor architectures for digital media processing will soon require highly scalable network-on-chip based communication infrastructures. In this paper, we propose xpipes, a scalable and high-performance NoC architecture for multi-processor SoCs, consisting of soft macros that can be turned into instance-specific network components at instantiation time. The flexibility of its components allows our NoC to support both homogeneous and heterogeneous architectures. The interface with IP cores at the periphery of the network is standardized (OCP-based). Links can be pipelined with a flexible number of stages to decouple data introduction speed from worst-case link delay. Switches are lightweight and support reliable communication for arbitrary link pipeline depths (latency insensitive operation). xpipes has been described in synthesizable SystemC, at the cycle-accurate and signal-accurate level.
用于数字媒体处理的可定制嵌入式多处理器架构日益复杂,这将很快需要高度可扩展的基于片上网络的通信基础设施。在本文中,我们提出了xpipes,这是一种用于多处理器soc的可扩展高性能NoC架构,由软宏组成,可以在实例化时转换为实例特定的网络组件。其组件的灵活性允许我们的NoC支持同构和异构架构。网络外围的IP核接口是标准化的(基于ocp)。链路可以用灵活的阶段数进行流水线,以将数据引入速度与最坏情况下的链路延迟分离。交换机重量轻,支持任意链路管道深度的可靠通信(延迟不敏感操作)。在可合成的SystemC中描述了xpipes,在周期精确和信号精确级别。
{"title":"Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs","authors":"M. Dall'Osso, G. Biccari, L. Giovannini, D. Bertozzi, L. Benini","doi":"10.1109/ICCD.2012.6378615","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378615","url":null,"abstract":"The growing complexity of customizable embedded multi-processor architectures for digital media processing will soon require highly scalable network-on-chip based communication infrastructures. In this paper, we propose xpipes, a scalable and high-performance NoC architecture for multi-processor SoCs, consisting of soft macros that can be turned into instance-specific network components at instantiation time. The flexibility of its components allows our NoC to support both homogeneous and heterogeneous architectures. The interface with IP cores at the periphery of the network is standardized (OCP-based). Links can be pipelined with a flexible number of stages to decouple data introduction speed from worst-case link delay. Switches are lightweight and support reliable communication for arbitrary link pipeline depths (latency insensitive operation). xpipes has been described in synthesizable SystemC, at the cycle-accurate and signal-accurate level.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"306 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114395560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
Exploiting microarchitectural redundancy for defect tolerance 利用微架构冗余来容忍缺陷
Pub Date : 2003-10-13 DOI: 10.1109/ICCD.2012.6378613
P. Shivakumar, S. Keckler, C. R. Moore, D. Burger
The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and long-term reliability of devices. Methods based on defect detection and reduction may not offer a scalable solution due to cost of eliminating contaminants in the manufacturing process and increasing chip complexity. This paper proposes to use the inherent redundancy available in existing and future chip microarchitectures to improve yield and enable graceful performance degradation in fail-in-place systems. We introduce a new yield metric called performance averaged yield (Ypav) which accounts both for fully functional chips and those that exhibit some performance degradation. Our results indicate that at 250nm we are able to increase the Ypav of a uniprocessor with only redundant rows in its caches from a base value of 85% to 98% using microarchitectural redundancy. Given constant chip area, shrinking feature sizes increases fault susceptibility and reduces the base Ypav to 60% at 50nm, which exploiting microarchitectural redundancy then increases to 99.6%.
由于制造技术的进步和特征尺寸的缩小,微处理器时钟频率的持续增加给保持设备的制造良率和长期可靠性带来了挑战。由于在制造过程中消除污染物的成本和芯片复杂性的增加,基于缺陷检测和减少的方法可能无法提供可扩展的解决方案。本文提出利用现有和未来芯片微架构中可用的固有冗余来提高成品率,并使故障就地系统的性能下降变得优雅。我们引入了一个新的良率指标,称为性能平均良率(Ypav),它既考虑了功能齐全的芯片,也考虑了那些表现出一些性能下降的芯片。我们的结果表明,在250nm时,我们能够使用微架构冗余将缓存中只有冗余行的单处理器的Ypav从基础值85%提高到98%。在芯片面积不变的情况下,缩小特征尺寸会增加故障敏感性,并将50nm的基本Ypav降低到60%,而利用微架构冗余的基本Ypav则增加到99.6%。
{"title":"Exploiting microarchitectural redundancy for defect tolerance","authors":"P. Shivakumar, S. Keckler, C. R. Moore, D. Burger","doi":"10.1109/ICCD.2012.6378613","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378613","url":null,"abstract":"The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and long-term reliability of devices. Methods based on defect detection and reduction may not offer a scalable solution due to cost of eliminating contaminants in the manufacturing process and increasing chip complexity. This paper proposes to use the inherent redundancy available in existing and future chip microarchitectures to improve yield and enable graceful performance degradation in fail-in-place systems. We introduce a new yield metric called performance averaged yield (Ypav) which accounts both for fully functional chips and those that exhibit some performance degradation. Our results indicate that at 250nm we are able to increase the Ypav of a uniprocessor with only redundant rows in its caches from a base value of 85% to 98% using microarchitectural redundancy. Given constant chip area, shrinking feature sizes increases fault susceptibility and reduces the base Ypav to 60% at 50nm, which exploiting microarchitectural redundancy then increases to 99.6%.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129320264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Power-sensitive multithreaded architecture 对功率敏感的多线程架构
Pub Date : 2000-09-17 DOI: 10.1109/ICCD.2012.6378610
J. Seng, D. Tullsen, George Z. N. Cai
The power consumption of microprocessors is becoming increasingly important in design decisions, not only in mobile processors, but also now in high-performance processors. Power-conscious design must therefore go beyond technology and low-level design, but also change the way modern processors are architected. A multithreading processor is attractive in the context of low-power or power-constrained devices for many of the same reasons that enable its high throughput. Primarily, it supplies extra parallelism via multiple threads, allowing the processor to rely much less heavily on speculation. We show that a simultaneous multithreading processor utilizes up to 22% less energy per instruction than a single-threaded architecture. We also explore other power optimizations that are particular to multithreaded architectures, either because they are unavailable to or unreasonable for single-thread architectures.
微处理器的功耗在设计决策中变得越来越重要,不仅在移动处理器中,现在在高性能处理器中也是如此。因此,功耗意识设计必须超越技术和底层设计,还必须改变现代处理器的架构方式。多线程处理器在低功耗或功耗受限的设备中很有吸引力,其原因与实现高吞吐量的原因相同。首先,它通过多个线程提供额外的并行性,允许处理器更少地依赖推测。我们表明,与单线程架构相比,同步多线程处理器每条指令消耗的能量最多可减少22%。我们还探讨了多线程体系结构特有的其他功率优化,因为它们对于单线程体系结构不可用或不合理。
{"title":"Power-sensitive multithreaded architecture","authors":"J. Seng, D. Tullsen, George Z. N. Cai","doi":"10.1109/ICCD.2012.6378610","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378610","url":null,"abstract":"The power consumption of microprocessors is becoming increasingly important in design decisions, not only in mobile processors, but also now in high-performance processors. Power-conscious design must therefore go beyond technology and low-level design, but also change the way modern processors are architected. A multithreading processor is attractive in the context of low-power or power-constrained devices for many of the same reasons that enable its high throughput. Primarily, it supplies extra parallelism via multiple threads, allowing the processor to rely much less heavily on speculation. We show that a simultaneous multithreading processor utilizes up to 22% less energy per instruction than a single-threaded architecture. We also explore other power optimizations that are particular to multithreaded architectures, either because they are unavailable to or unreasonable for single-thread architectures.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127826355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
FlexRAM: Toward an advanced Intelligent Memory system FlexRAM:迈向先进的智能存储系统
Pub Date : 1999-10-10 DOI: 10.1109/ICCD.2012.6378608
Y. Kang, Wei Huang, Seung-Moon Yoo, D. Franklin, Zhenzhou Ge, V. Lam, P. Pattnaik, J. Torrellas
Major advances in Merged Logic DRAM (MLD) technology coupled with the popularization of memory-intensive applications provide fertile ground for architectures based on Intelligent Memory (IRAM) or Processors-in-Memory (PIM). The contribution of this paper is to explore one way to use the current state-of-the-art MLD technology for general-purpose computers. To satisfy requirements of general purpose and low programming cost, we place the PIM chips in the memory system and let them default to plain DRAM if the application is not enabled for intelligent memory. Since wide usability is crucial, we identify and analyze a range of real applications for PIM. Based on the requirements of these applications and current technological constraints, we design a PIM chip and a PIM-based memory system. We call the chip FlexRAM. We describe FlexRAMs design and floorplan, and the resulting memory system. Evaluation of the system through simulations shows that 4 FlexRAM chips often allow a workstation to run 25-40 times faster.
合并逻辑DRAM (MLD)技术的重大进步,加上内存密集型应用的普及,为基于智能内存(IRAM)或内存中处理器(PIM)的架构提供了肥沃的土壤。本文的贡献是探索一种将当前最先进的MLD技术用于通用计算机的方法。为了满足通用和低编程成本的要求,我们将PIM芯片放置在内存系统中,如果应用程序未启用智能内存,则让它们默认为普通DRAM。由于广泛的可用性至关重要,因此我们确定并分析了PIM的一系列实际应用程序。根据这些应用的需求和目前的技术限制,我们设计了一个PIM芯片和一个基于PIM的存储系统。我们称这种芯片为FlexRAM。我们描述了FlexRAMs的设计和平面图,以及由此产生的存储系统。通过模拟对系统的评估表明,4个FlexRAM芯片通常可以使工作站的运行速度提高25-40倍。
{"title":"FlexRAM: Toward an advanced Intelligent Memory system","authors":"Y. Kang, Wei Huang, Seung-Moon Yoo, D. Franklin, Zhenzhou Ge, V. Lam, P. Pattnaik, J. Torrellas","doi":"10.1109/ICCD.2012.6378608","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378608","url":null,"abstract":"Major advances in Merged Logic DRAM (MLD) technology coupled with the popularization of memory-intensive applications provide fertile ground for architectures based on Intelligent Memory (IRAM) or Processors-in-Memory (PIM). The contribution of this paper is to explore one way to use the current state-of-the-art MLD technology for general-purpose computers. To satisfy requirements of general purpose and low programming cost, we place the PIM chips in the memory system and let them default to plain DRAM if the application is not enabled for intelligent memory. Since wide usability is crucial, we identify and analyze a range of real applications for PIM. Based on the requirements of these applications and current technological constraints, we design a PIM chip and a PIM-based memory system. We call the chip FlexRAM. We describe FlexRAMs design and floorplan, and the resulting memory system. Evaluation of the system through simulations shows that 4 FlexRAM chips often allow a workstation to run 25-40 times faster.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131100535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 160
期刊
2012 IEEE 30th International Conference on Computer Design (ICCD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1