首页 > 最新文献

2012 IEEE 30th International Conference on Computer Design (ICCD)最新文献

英文 中文
Maximizing crosstalk-induced slowdown during path delay test 在路径延迟测试中最大化串扰引起的减速
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378635
Dibakar Gope, D. Walker
In this paper, we present a timing-driven test generator to sensitize multiple aligned aggressors coupled to a delay-sensitive victim path to detect the combination of a delay spot defect and crosstalk-induced slowdown. The framework uses parasitic capacitance information, timing windows and crosstalk-induced delay estimates to screen out unaligned or ineffective aggressors coupled to a victim path, speeding up crosstalk pattern generation. In order to induce maximum crosstalk slowdown along a path, aggressors are prioritized based on their potential delay increase and timing alignment. The test generation engine introduces the concept of alignment-driven path sensitization to generate paths from inputs to coupled aggressor nets that meet timing alignment and direction requirements. In addition, two new crosstalk-driven dynamic test compaction algorithms are developed to control the increase in test pattern count. The proposed test generation algorithm is applied to ISCAS85 and ISCAS89 benchmark circuits. SPICE simulation results demonstrate the ability of the alignment-driven test generator to increase crosstalk-induced delays along victim paths.
在本文中,我们提出了一个时间驱动的测试发生器,以敏感多个对齐的攻击者耦合到一个延迟敏感的受害者路径,以检测延迟点缺陷和串扰引起的减速的组合。该框架使用寄生电容信息、定时窗口和串扰引起的延迟估计来筛选耦合到受害者路径的未对齐或无效攻击者,加速串扰模式的生成。为了在路径上诱导最大的串扰减速,攻击者根据其潜在的延迟增加和定时对齐进行优先级排序。测试生成引擎引入了对准驱动路径敏化的概念,以生成从输入到耦合干扰网的路径,满足定时对准和方向要求。此外,还开发了两种新的串扰驱动动态测试压缩算法来控制测试模式数的增加。所提出的测试生成算法已应用于ISCAS85和ISCAS89基准电路。SPICE仿真结果表明,对准驱动的测试发生器能够增加受害者路径上串扰引起的延迟。
{"title":"Maximizing crosstalk-induced slowdown during path delay test","authors":"Dibakar Gope, D. Walker","doi":"10.1109/ICCD.2012.6378635","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378635","url":null,"abstract":"In this paper, we present a timing-driven test generator to sensitize multiple aligned aggressors coupled to a delay-sensitive victim path to detect the combination of a delay spot defect and crosstalk-induced slowdown. The framework uses parasitic capacitance information, timing windows and crosstalk-induced delay estimates to screen out unaligned or ineffective aggressors coupled to a victim path, speeding up crosstalk pattern generation. In order to induce maximum crosstalk slowdown along a path, aggressors are prioritized based on their potential delay increase and timing alignment. The test generation engine introduces the concept of alignment-driven path sensitization to generate paths from inputs to coupled aggressor nets that meet timing alignment and direction requirements. In addition, two new crosstalk-driven dynamic test compaction algorithms are developed to control the increase in test pattern count. The proposed test generation algorithm is applied to ISCAS85 and ISCAS89 benchmark circuits. SPICE simulation results demonstrate the ability of the alignment-driven test generator to increase crosstalk-induced delays along victim paths.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133109242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A high-performance, low-overhead microarchitecture for secure program execution 用于安全程序执行的高性能、低开销的微体系结构
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378624
A. Kanuparthi, R. Karri, Gaston Ormazabal, Sateesh Addepalli
High performance and low power consumption have traditionally been the primary design goals for computer architects. With computer systems facing a wave of attacks that disrupt their normal execution or leak sensitive data, computer security is no longer an afterthought. Dynamic integrity checking has emerged as a possible solution to protect computer systems by thwarting various attacks. Dynamic integrity checking involves calculation of hashes of the instructions in the code being executed and comparing these hashes against corresponding precomputed hashes at runtime. The processor pipeline is stalled and the instructions are not allowed to commit until the integrity check is complete. Such an approach has severe performance implications as it stalls the pipeline for several cycles. In this paper, we propose a hardware-based dynamic integrity checking approach that does not stall the processor pipeline. We permit the instructions to commit before the integrity check is complete, and allow them to make changes to the register file, but not the data cache. The system is rolled back to a known state if the checker deems the instructions as modified. Our experiments show an average performance overhead of 1.66%, area overhead of 4.25%, and a power overhead of 2.45% over a baseline processor.
高性能和低功耗历来是计算机架构师的主要设计目标。随着计算机系统面临着破坏其正常运行或泄露敏感数据的攻击浪潮,计算机安全不再是事后的想法。动态完整性检查已经成为一种可能的解决方案,可以通过阻止各种攻击来保护计算机系统。动态完整性检查包括计算正在执行的代码中指令的哈希值,并在运行时将这些哈希值与相应的预先计算的哈希值进行比较。处理器管道被停止,指令在完整性检查完成之前不允许提交。这种方法具有严重的性能影响,因为它会使管道停滞几个周期。在本文中,我们提出了一种基于硬件的动态完整性检查方法,该方法不会使处理器管道停滞。我们允许在完整性检查完成之前提交指令,并允许它们对寄存器文件进行更改,但不允许对数据缓存进行更改。如果检查器认为指令被修改,系统将回滚到已知状态。我们的实验表明,与基准处理器相比,平均性能开销为1.66%,面积开销为4.25%,功耗开销为2.45%。
{"title":"A high-performance, low-overhead microarchitecture for secure program execution","authors":"A. Kanuparthi, R. Karri, Gaston Ormazabal, Sateesh Addepalli","doi":"10.1109/ICCD.2012.6378624","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378624","url":null,"abstract":"High performance and low power consumption have traditionally been the primary design goals for computer architects. With computer systems facing a wave of attacks that disrupt their normal execution or leak sensitive data, computer security is no longer an afterthought. Dynamic integrity checking has emerged as a possible solution to protect computer systems by thwarting various attacks. Dynamic integrity checking involves calculation of hashes of the instructions in the code being executed and comparing these hashes against corresponding precomputed hashes at runtime. The processor pipeline is stalled and the instructions are not allowed to commit until the integrity check is complete. Such an approach has severe performance implications as it stalls the pipeline for several cycles. In this paper, we propose a hardware-based dynamic integrity checking approach that does not stall the processor pipeline. We permit the instructions to commit before the integrity check is complete, and allow them to make changes to the register file, but not the data cache. The system is rolled back to a known state if the checker deems the instructions as modified. Our experiments show an average performance overhead of 1.66%, area overhead of 4.25%, and a power overhead of 2.45% over a baseline processor.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131297297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Providing cost-effective on-chip network bandwidth in GPGPUs 在gpgpu中提供高性价比的片上网络带宽
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378671
H. Kim, John Kim, Woong Seo, Yeon-Gon Cho, Soojung Ryu
Network-on-chip (NoC) bandwidth has a significant impact on overall performance in throughput-oriented processors such as GPG-PUs. Although it has been commonly assumed that high NoC bandwidth can be provided through abundant on-chip wires, we show that increasing NoC router frequency results in a more cost-effective NoC. However, router arbitration critical path can limit the NoC router frequency. Thus, we propose a direct all-to-all network overlaid on mesh (DA2mesh) NoC architecture that exploits the traffic characteristics of GPGPU and removes arbitration from the router pipeline. DA2mesh simplifies the router pipeline with 36% improvement of performance while reducing NoC energy by 15%.
片上网络(NoC)带宽对面向吞吐量的处理器(如gpg - pu)的整体性能有重大影响。虽然通常认为高NoC带宽可以通过丰富的片上导线提供,但我们表明,增加NoC路由器频率会导致更具成本效益的NoC。但是,路由器仲裁关键路径可以限制NoC路由器的频率。因此,我们提出了一种直接的全对全网络覆盖网格(DA2mesh) NoC架构,该架构利用了GPGPU的流量特性,并从路由器管道中删除了仲裁。DA2mesh简化了路由器管道,性能提高了36%,同时减少了15%的NoC能量。
{"title":"Providing cost-effective on-chip network bandwidth in GPGPUs","authors":"H. Kim, John Kim, Woong Seo, Yeon-Gon Cho, Soojung Ryu","doi":"10.1109/ICCD.2012.6378671","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378671","url":null,"abstract":"Network-on-chip (NoC) bandwidth has a significant impact on overall performance in throughput-oriented processors such as GPG-PUs. Although it has been commonly assumed that high NoC bandwidth can be provided through abundant on-chip wires, we show that increasing NoC router frequency results in a more cost-effective NoC. However, router arbitration critical path can limit the NoC router frequency. Thus, we propose a direct all-to-all network overlaid on mesh (DA2mesh) NoC architecture that exploits the traffic characteristics of GPGPU and removes arbitration from the router pipeline. DA2mesh simplifies the router pipeline with 36% improvement of performance while reducing NoC energy by 15%.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131319960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Ring oscillator physical unclonable function with multi level supply voltages 环形振荡器具有多电平供电电压的物理不可克隆功能
Pub Date : 2012-07-17 DOI: 10.1109/ICCD.2012.6378703
S. Mansouri, E. Dubrova
In this paper we introduce a new type of Ring Oscillator PUF (RO-PUF) in which the inverters composing the ring oscillators can be supplied by independent voltages. This new RO-PUF can improve the reliability of the PUF in case of temperature variations.
本文介绍了一种新型的环形振荡器PUF (RO-PUF),其组成环形振荡器的逆变器可以由独立的电压供电。这种新型RO-PUF可以提高PUF在温度变化情况下的可靠性。
{"title":"Ring oscillator physical unclonable function with multi level supply voltages","authors":"S. Mansouri, E. Dubrova","doi":"10.1109/ICCD.2012.6378703","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378703","url":null,"abstract":"In this paper we introduce a new type of Ring Oscillator PUF (RO-PUF) in which the inverters composing the ring oscillators can be supplied by independent voltages. This new RO-PUF can improve the reliability of the PUF in case of temperature variations.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116089845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs Xpipes:用于多处理器soc的延迟不敏感参数化片上网络架构
Pub Date : 2003-10-13 DOI: 10.1109/ICCD.2012.6378615
M. Dall'Osso, G. Biccari, L. Giovannini, D. Bertozzi, L. Benini
The growing complexity of customizable embedded multi-processor architectures for digital media processing will soon require highly scalable network-on-chip based communication infrastructures. In this paper, we propose xpipes, a scalable and high-performance NoC architecture for multi-processor SoCs, consisting of soft macros that can be turned into instance-specific network components at instantiation time. The flexibility of its components allows our NoC to support both homogeneous and heterogeneous architectures. The interface with IP cores at the periphery of the network is standardized (OCP-based). Links can be pipelined with a flexible number of stages to decouple data introduction speed from worst-case link delay. Switches are lightweight and support reliable communication for arbitrary link pipeline depths (latency insensitive operation). xpipes has been described in synthesizable SystemC, at the cycle-accurate and signal-accurate level.
用于数字媒体处理的可定制嵌入式多处理器架构日益复杂,这将很快需要高度可扩展的基于片上网络的通信基础设施。在本文中,我们提出了xpipes,这是一种用于多处理器soc的可扩展高性能NoC架构,由软宏组成,可以在实例化时转换为实例特定的网络组件。其组件的灵活性允许我们的NoC支持同构和异构架构。网络外围的IP核接口是标准化的(基于ocp)。链路可以用灵活的阶段数进行流水线,以将数据引入速度与最坏情况下的链路延迟分离。交换机重量轻,支持任意链路管道深度的可靠通信(延迟不敏感操作)。在可合成的SystemC中描述了xpipes,在周期精确和信号精确级别。
{"title":"Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs","authors":"M. Dall'Osso, G. Biccari, L. Giovannini, D. Bertozzi, L. Benini","doi":"10.1109/ICCD.2012.6378615","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378615","url":null,"abstract":"The growing complexity of customizable embedded multi-processor architectures for digital media processing will soon require highly scalable network-on-chip based communication infrastructures. In this paper, we propose xpipes, a scalable and high-performance NoC architecture for multi-processor SoCs, consisting of soft macros that can be turned into instance-specific network components at instantiation time. The flexibility of its components allows our NoC to support both homogeneous and heterogeneous architectures. The interface with IP cores at the periphery of the network is standardized (OCP-based). Links can be pipelined with a flexible number of stages to decouple data introduction speed from worst-case link delay. Switches are lightweight and support reliable communication for arbitrary link pipeline depths (latency insensitive operation). xpipes has been described in synthesizable SystemC, at the cycle-accurate and signal-accurate level.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114395560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
Exploiting microarchitectural redundancy for defect tolerance 利用微架构冗余来容忍缺陷
Pub Date : 2003-10-13 DOI: 10.1109/ICCD.2012.6378613
P. Shivakumar, S. Keckler, C. R. Moore, D. Burger
The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and long-term reliability of devices. Methods based on defect detection and reduction may not offer a scalable solution due to cost of eliminating contaminants in the manufacturing process and increasing chip complexity. This paper proposes to use the inherent redundancy available in existing and future chip microarchitectures to improve yield and enable graceful performance degradation in fail-in-place systems. We introduce a new yield metric called performance averaged yield (Ypav) which accounts both for fully functional chips and those that exhibit some performance degradation. Our results indicate that at 250nm we are able to increase the Ypav of a uniprocessor with only redundant rows in its caches from a base value of 85% to 98% using microarchitectural redundancy. Given constant chip area, shrinking feature sizes increases fault susceptibility and reduces the base Ypav to 60% at 50nm, which exploiting microarchitectural redundancy then increases to 99.6%.
由于制造技术的进步和特征尺寸的缩小,微处理器时钟频率的持续增加给保持设备的制造良率和长期可靠性带来了挑战。由于在制造过程中消除污染物的成本和芯片复杂性的增加,基于缺陷检测和减少的方法可能无法提供可扩展的解决方案。本文提出利用现有和未来芯片微架构中可用的固有冗余来提高成品率,并使故障就地系统的性能下降变得优雅。我们引入了一个新的良率指标,称为性能平均良率(Ypav),它既考虑了功能齐全的芯片,也考虑了那些表现出一些性能下降的芯片。我们的结果表明,在250nm时,我们能够使用微架构冗余将缓存中只有冗余行的单处理器的Ypav从基础值85%提高到98%。在芯片面积不变的情况下,缩小特征尺寸会增加故障敏感性,并将50nm的基本Ypav降低到60%,而利用微架构冗余的基本Ypav则增加到99.6%。
{"title":"Exploiting microarchitectural redundancy for defect tolerance","authors":"P. Shivakumar, S. Keckler, C. R. Moore, D. Burger","doi":"10.1109/ICCD.2012.6378613","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378613","url":null,"abstract":"The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and long-term reliability of devices. Methods based on defect detection and reduction may not offer a scalable solution due to cost of eliminating contaminants in the manufacturing process and increasing chip complexity. This paper proposes to use the inherent redundancy available in existing and future chip microarchitectures to improve yield and enable graceful performance degradation in fail-in-place systems. We introduce a new yield metric called performance averaged yield (Ypav) which accounts both for fully functional chips and those that exhibit some performance degradation. Our results indicate that at 250nm we are able to increase the Ypav of a uniprocessor with only redundant rows in its caches from a base value of 85% to 98% using microarchitectural redundancy. Given constant chip area, shrinking feature sizes increases fault susceptibility and reduces the base Ypav to 60% at 50nm, which exploiting microarchitectural redundancy then increases to 99.6%.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129320264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Power-sensitive multithreaded architecture 对功率敏感的多线程架构
Pub Date : 2000-09-17 DOI: 10.1109/ICCD.2012.6378610
J. Seng, D. Tullsen, George Z. N. Cai
The power consumption of microprocessors is becoming increasingly important in design decisions, not only in mobile processors, but also now in high-performance processors. Power-conscious design must therefore go beyond technology and low-level design, but also change the way modern processors are architected. A multithreading processor is attractive in the context of low-power or power-constrained devices for many of the same reasons that enable its high throughput. Primarily, it supplies extra parallelism via multiple threads, allowing the processor to rely much less heavily on speculation. We show that a simultaneous multithreading processor utilizes up to 22% less energy per instruction than a single-threaded architecture. We also explore other power optimizations that are particular to multithreaded architectures, either because they are unavailable to or unreasonable for single-thread architectures.
微处理器的功耗在设计决策中变得越来越重要,不仅在移动处理器中,现在在高性能处理器中也是如此。因此,功耗意识设计必须超越技术和底层设计,还必须改变现代处理器的架构方式。多线程处理器在低功耗或功耗受限的设备中很有吸引力,其原因与实现高吞吐量的原因相同。首先,它通过多个线程提供额外的并行性,允许处理器更少地依赖推测。我们表明,与单线程架构相比,同步多线程处理器每条指令消耗的能量最多可减少22%。我们还探讨了多线程体系结构特有的其他功率优化,因为它们对于单线程体系结构不可用或不合理。
{"title":"Power-sensitive multithreaded architecture","authors":"J. Seng, D. Tullsen, George Z. N. Cai","doi":"10.1109/ICCD.2012.6378610","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378610","url":null,"abstract":"The power consumption of microprocessors is becoming increasingly important in design decisions, not only in mobile processors, but also now in high-performance processors. Power-conscious design must therefore go beyond technology and low-level design, but also change the way modern processors are architected. A multithreading processor is attractive in the context of low-power or power-constrained devices for many of the same reasons that enable its high throughput. Primarily, it supplies extra parallelism via multiple threads, allowing the processor to rely much less heavily on speculation. We show that a simultaneous multithreading processor utilizes up to 22% less energy per instruction than a single-threaded architecture. We also explore other power optimizations that are particular to multithreaded architectures, either because they are unavailable to or unreasonable for single-thread architectures.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127826355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
FlexRAM: Toward an advanced Intelligent Memory system FlexRAM:迈向先进的智能存储系统
Pub Date : 1999-10-10 DOI: 10.1109/ICCD.2012.6378608
Y. Kang, Wei Huang, Seung-Moon Yoo, D. Franklin, Zhenzhou Ge, V. Lam, P. Pattnaik, J. Torrellas
Major advances in Merged Logic DRAM (MLD) technology coupled with the popularization of memory-intensive applications provide fertile ground for architectures based on Intelligent Memory (IRAM) or Processors-in-Memory (PIM). The contribution of this paper is to explore one way to use the current state-of-the-art MLD technology for general-purpose computers. To satisfy requirements of general purpose and low programming cost, we place the PIM chips in the memory system and let them default to plain DRAM if the application is not enabled for intelligent memory. Since wide usability is crucial, we identify and analyze a range of real applications for PIM. Based on the requirements of these applications and current technological constraints, we design a PIM chip and a PIM-based memory system. We call the chip FlexRAM. We describe FlexRAMs design and floorplan, and the resulting memory system. Evaluation of the system through simulations shows that 4 FlexRAM chips often allow a workstation to run 25-40 times faster.
合并逻辑DRAM (MLD)技术的重大进步,加上内存密集型应用的普及,为基于智能内存(IRAM)或内存中处理器(PIM)的架构提供了肥沃的土壤。本文的贡献是探索一种将当前最先进的MLD技术用于通用计算机的方法。为了满足通用和低编程成本的要求,我们将PIM芯片放置在内存系统中,如果应用程序未启用智能内存,则让它们默认为普通DRAM。由于广泛的可用性至关重要,因此我们确定并分析了PIM的一系列实际应用程序。根据这些应用的需求和目前的技术限制,我们设计了一个PIM芯片和一个基于PIM的存储系统。我们称这种芯片为FlexRAM。我们描述了FlexRAMs的设计和平面图,以及由此产生的存储系统。通过模拟对系统的评估表明,4个FlexRAM芯片通常可以使工作站的运行速度提高25-40倍。
{"title":"FlexRAM: Toward an advanced Intelligent Memory system","authors":"Y. Kang, Wei Huang, Seung-Moon Yoo, D. Franklin, Zhenzhou Ge, V. Lam, P. Pattnaik, J. Torrellas","doi":"10.1109/ICCD.2012.6378608","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378608","url":null,"abstract":"Major advances in Merged Logic DRAM (MLD) technology coupled with the popularization of memory-intensive applications provide fertile ground for architectures based on Intelligent Memory (IRAM) or Processors-in-Memory (PIM). The contribution of this paper is to explore one way to use the current state-of-the-art MLD technology for general-purpose computers. To satisfy requirements of general purpose and low programming cost, we place the PIM chips in the memory system and let them default to plain DRAM if the application is not enabled for intelligent memory. Since wide usability is crucial, we identify and analyze a range of real applications for PIM. Based on the requirements of these applications and current technological constraints, we design a PIM chip and a PIM-based memory system. We call the chip FlexRAM. We describe FlexRAMs design and floorplan, and the resulting memory system. Evaluation of the system through simulations shows that 4 FlexRAM chips often allow a workstation to run 25-40 times faster.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131100535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 160
期刊
2012 IEEE 30th International Conference on Computer Design (ICCD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1