首页 > 最新文献

Proceedings of the 40th Annual International Symposium on Computer Architecture最新文献

英文 中文
Utility-based acceleration of multithreaded applications on asymmetric CMPs 非对称cmp上多线程应用程序基于实用程序的加速
Pub Date : 2013-06-23 DOI: 10.1145/2485922.2485936
José A. Joao, M. A. Suleman, O. Mutlu, Y. Patt
Asymmetric Chip Multiprocessors (ACMPs) are becoming a reality. ACMPs can speed up parallel applications if they can identify and accelerate code segments that are critical for performance. Proposals already exist for using coarse-grained thread scheduling and fine-grained bottleneck acceleration. Unfortunately, there have been no proposals offered thus far to decide which code segments to accelerate in cases where both coarse-grained thread scheduling and fine-grained bottleneck acceleration could have value. This paper proposes Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs (UBA), a cooperative software/hardware mechanism for identifying and accelerating the most likely critical code segments from a set of multithreaded applications running on an ACMP. The key idea is a new Utility of Acceleration metric that quantifies the performance benefit of accelerating a bottleneck or a thread by taking into account both the criticality and the expected speedup. UBA outperforms the best of two state-of-the-art mechanisms by 11% for single application workloads and by 7% for two-application workloads on an ACMP with 52 small cores and 3 large cores.
非对称芯片多处理器(acmp)正在成为现实。如果acmp能够识别和加速对性能至关重要的代码段,则可以加快并行应用程序的速度。已经有使用粗粒度线程调度和细粒度瓶颈加速的建议。不幸的是,到目前为止还没有提出建议来决定在粗粒度线程调度和细粒度瓶颈加速都有价值的情况下加速哪些代码段。本文提出了基于实用程序的多线程应用在非对称cmp (UBA)上的加速,这是一种协作的软件/硬件机制,用于识别和加速在ACMP上运行的一组多线程应用中最可能的关键代码段。关键思想是一个新的Utility of Acceleration度量,它通过考虑临界性和预期加速来量化加速瓶颈或线程的性能收益。在具有52个小核和3个大核的ACMP上,对于单个应用程序工作负载,UBA的性能比两种最先进机制中的最佳机制高出11%,对于两个应用程序工作负载,UBA的性能高出7%。
{"title":"Utility-based acceleration of multithreaded applications on asymmetric CMPs","authors":"José A. Joao, M. A. Suleman, O. Mutlu, Y. Patt","doi":"10.1145/2485922.2485936","DOIUrl":"https://doi.org/10.1145/2485922.2485936","url":null,"abstract":"Asymmetric Chip Multiprocessors (ACMPs) are becoming a reality. ACMPs can speed up parallel applications if they can identify and accelerate code segments that are critical for performance. Proposals already exist for using coarse-grained thread scheduling and fine-grained bottleneck acceleration. Unfortunately, there have been no proposals offered thus far to decide which code segments to accelerate in cases where both coarse-grained thread scheduling and fine-grained bottleneck acceleration could have value. This paper proposes Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs (UBA), a cooperative software/hardware mechanism for identifying and accelerating the most likely critical code segments from a set of multithreaded applications running on an ACMP. The key idea is a new Utility of Acceleration metric that quantifies the performance benefit of accelerating a bottleneck or a thread by taking into account both the criticality and the expected speedup. UBA outperforms the best of two state-of-the-art mechanisms by 11% for single application workloads and by 7% for two-application workloads on an ACMP with 52 small cores and 3 large cores.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82549380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Navigating big data with high-throughput, energy-efficient data partitioning 以高吞吐量、高能效的数据分区引领大数据
Pub Date : 2013-06-23 DOI: 10.1145/2485922.2485944
Lisa Wu, R. J. Barker, Martha A. Kim, K. A. Ross
The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve the throughput and energy efficiency of large-scale data processing. In particular, data partitioning is a critical operation for manipulating large data sets. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries. To accelerate partitioning, this paper describes a hardware accelerator for range partitioning, or HARP, and a hardware-software data streaming framework. The streaming framework offers a seamless execution environment for streaming accelerators such as HARP. Together, HARP and the streaming framework provide an order of magnitude improvement in partitioning performance and energy. A detailed analysis of a 32nm physical design shows 7.8 times the throughput of a highly optimized and optimistic software implementation, while consuming just 6.9% of the area and 4.3% of the power of a single Xeon core in the same technology generation.
全球数据池以每天2.5万亿字节的速度增长,其中90%是在过去两年产生的[24]。毫无疑问,大数据时代已经到来。本文探讨了有针对性地部署硬件加速器,以提高大规模数据处理的吞吐量和能源效率。特别是,数据分区是操作大型数据集的关键操作。它通常是数据库性能的限制因素,并且在大型数据查询的整体运行时中占很大一部分。为了加速分区,本文描述了一个用于范围分区的硬件加速器(HARP)和一个软硬件数据流框架。流框架为流加速器(如HARP)提供了无缝的执行环境。总之,HARP和流框架在分区性能和能量方面提供了一个数量级的改进。对32nm物理设计的详细分析表明,高度优化和乐观的软件实现的吞吐量是7.8倍,而功耗仅为同一代技术中单个Xeon核心的6.9%和4.3%。
{"title":"Navigating big data with high-throughput, energy-efficient data partitioning","authors":"Lisa Wu, R. J. Barker, Martha A. Kim, K. A. Ross","doi":"10.1145/2485922.2485944","DOIUrl":"https://doi.org/10.1145/2485922.2485944","url":null,"abstract":"The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve the throughput and energy efficiency of large-scale data processing. In particular, data partitioning is a critical operation for manipulating large data sets. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries. To accelerate partitioning, this paper describes a hardware accelerator for range partitioning, or HARP, and a hardware-software data streaming framework. The streaming framework offers a seamless execution environment for streaming accelerators such as HARP. Together, HARP and the streaming framework provide an order of magnitude improvement in partitioning performance and energy. A detailed analysis of a 32nm physical design shows 7.8 times the throughput of a highly optimized and optimistic software implementation, while consuming just 6.9% of the area and 4.3% of the power of a single Xeon core in the same technology generation.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87038341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
Tri-level-cell phase change memory: toward an efficient and reliable memory system 三电平单元相变存储器:迈向高效可靠的存储器系统
Pub Date : 2013-06-23 DOI: 10.1145/2485922.2485960
N. Seong, Sungkap Yeo, H. Lee
There are several emerging memory technologies looming on the horizon to compensate the physical scaling challenges of DRAM. Phase change memory (PCM) is one such candidate proposed for being part of the main memory in computing systems. One salient feature of PCM is its multi-level-cell (MLC) property, which can be used to multiply the memory capacity at the cell level. However, due to the nature of PCM that the value written to the cell can drift over time, PCM is prone to a unique type of soft errors, posing a great challenge for their practical deployment. This paper first quantitatively studied the current art for MLC PCM in dealing with the resistance drift problem and showed that the previously proposed techniques such as scrubbing or error correction mechanisms have significant reliability challenges to overcome. We then propose tri-level-cell PCM and demonstrate its ability to achieving 105 x lower soft error rate than four-level-cell PCM and 1.33 x higher information density than single-level-cell PCM. According to our findings, the tri-level-cell PCM shows 36.4% performance improvement over the four-level-cell PCM while achieving the soft error rate of DRAM.
有几种新兴的存储技术即将出现,以弥补DRAM的物理扩展挑战。相变存储器(PCM)是计算机系统中主存储器的一部分。PCM的一个显著特征是它的多级单元(MLC)特性,它可以用于在单元级别成倍增加存储容量。然而,由于PCM的特性,写入单元的值会随着时间的推移而漂移,因此PCM容易出现独特类型的软错误,这对其实际部署构成了巨大的挑战。本文首先对MLC PCM处理电阻漂移问题的现有技术进行了定量研究,并表明先前提出的诸如擦洗或纠错机制等技术在可靠性方面存在重大挑战。然后,我们提出了三级单元PCM,并证明了它能够实现比四级单元PCM低105倍的软错误率和比单级单元PCM高1.33倍的信息密度。根据我们的研究结果,三级单元PCM的性能比四级单元PCM提高了36.4%,同时达到了DRAM的软错误率。
{"title":"Tri-level-cell phase change memory: toward an efficient and reliable memory system","authors":"N. Seong, Sungkap Yeo, H. Lee","doi":"10.1145/2485922.2485960","DOIUrl":"https://doi.org/10.1145/2485922.2485960","url":null,"abstract":"There are several emerging memory technologies looming on the horizon to compensate the physical scaling challenges of DRAM. Phase change memory (PCM) is one such candidate proposed for being part of the main memory in computing systems. One salient feature of PCM is its multi-level-cell (MLC) property, which can be used to multiply the memory capacity at the cell level. However, due to the nature of PCM that the value written to the cell can drift over time, PCM is prone to a unique type of soft errors, posing a great challenge for their practical deployment. This paper first quantitatively studied the current art for MLC PCM in dealing with the resistance drift problem and showed that the previously proposed techniques such as scrubbing or error correction mechanisms have significant reliability challenges to overcome. We then propose tri-level-cell PCM and demonstrate its ability to achieving 105 x lower soft error rate than four-level-cell PCM and 1.33 x higher information density than single-level-cell PCM. According to our findings, the tri-level-cell PCM shows 36.4% performance improvement over the four-level-cell PCM while achieving the soft error rate of DRAM.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91401271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 89
Efficient virtual memory for big memory servers 为大内存服务器提供高效的虚拟内存
Pub Date : 2013-06-23 DOI: 10.1145/2485922.2485943
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, M. Hill, M. Swift
Our analysis shows that many "big-memory" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory. They consume as much as 10% of execution cycles on TLB misses, even using large pages. On the other hand, we find that these workloads use read-write permission on most pages, are provisioned not to swap, and rarely benefit from the full flexibility of page-based virtual memory. To remove the TLB miss overhead for big-memory workloads, we propose mapping part of a process's linear virtual address space with a direct segment, while page mapping the rest of the virtual address space. Direct segments use minimal hardware---base, limit and offset registers per core---to map contiguous virtual memory regions directly to contiguous physical memory. They eliminate the possibility of TLB misses for key data structures such as database buffer pools and in-memory key-value stores. Memory mapped by a direct segment may be converted back to paging when needed. We prototype direct-segment software support for x86-64 in Linux and emulate direct-segment hardware. For our workloads, direct segments eliminate almost all TLB misses and reduce the execution time wasted on TLB misses to less than 0.5%.
我们的分析表明,许多“大内存”服务器工作负载(如数据库、内存缓存和图形分析)为基于页面的虚拟内存付出了高昂的成本。即使使用大页面,它们也会在TLB失败上消耗高达10%的执行周期。另一方面,我们发现这些工作负载在大多数页面上使用读写权限,不允许交换,并且很少受益于基于页面的虚拟内存的全部灵活性。为了消除大内存工作负载的TLB遗漏开销,我们建议将进程的线性虚拟地址空间的一部分映射为直接段,而将页面映射为虚拟地址空间的其余部分。直接段使用最小的硬件——每个核的基寄存器、限制寄存器和偏移寄存器——将连续的虚拟内存区域直接映射到连续的物理内存。它们消除了关键数据结构(如数据库缓冲池和内存中的键值存储)丢失TLB的可能性。由直接段映射的内存可以在需要时转换回分页。我们对Linux中支持x86-64的直接分段软件进行了原型设计,并对直接分段硬件进行了仿真。对于我们的工作负载,直接段消除了几乎所有的TLB失误,并将TLB失误所浪费的执行时间减少到0.5%以下。
{"title":"Efficient virtual memory for big memory servers","authors":"Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, M. Hill, M. Swift","doi":"10.1145/2485922.2485943","DOIUrl":"https://doi.org/10.1145/2485922.2485943","url":null,"abstract":"Our analysis shows that many \"big-memory\" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory. They consume as much as 10% of execution cycles on TLB misses, even using large pages. On the other hand, we find that these workloads use read-write permission on most pages, are provisioned not to swap, and rarely benefit from the full flexibility of page-based virtual memory. To remove the TLB miss overhead for big-memory workloads, we propose mapping part of a process's linear virtual address space with a direct segment, while page mapping the rest of the virtual address space. Direct segments use minimal hardware---base, limit and offset registers per core---to map contiguous virtual memory regions directly to contiguous physical memory. They eliminate the possibility of TLB misses for key data structures such as database buffer pools and in-memory key-value stores. Memory mapped by a direct segment may be converted back to paging when needed. We prototype direct-segment software support for x86-64 in Linux and emulate direct-segment hardware. For our workloads, direct segments eliminate almost all TLB misses and reduce the execution time wasted on TLB misses to less than 0.5%.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84078818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 319
Design space exploration and optimization of path oblivious RAM in secure processors 安全处理器中路径无关RAM的设计空间探索与优化
Pub Date : 2013-06-01 DOI: 10.1145/2485922.2485971
Ling Ren, Xiangyao Yu, Christopher W. Fletcher, Marten van Dijk, S. Devadas
Keeping user data private is a huge problem both in cloud computing and computation outsourcing. One paradigm to achieve data privacy is to use tamper-resistant processors, inside which users' private data is decrypted and computed upon. These processors need to interact with untrusted external memory. Even if we encrypt all data that leaves the trusted processor, however, the address sequence that goes off-chip may still leak information. To prevent this address leakage, the security community has proposed ORAM (Oblivious RAM). ORAM has mainly been explored in server/file settings which assume a vastly different computation model than secure processors. Not surprisingly, naïvely applying ORAM to a secure processor setting incurs large performance overheads. In this paper, a recent proposal called Path ORAM is studied. We demonstrate techniques to make Path ORAM practical in a secure processor setting. We introduce background eviction schemes to prevent Path ORAM failure and allow for a performance-driven design space exploration. We propose a concept called super blocks to further improve Path ORAM's performance, and also show an efficient integrity verification scheme for Path ORAM. With our optimizations, Path ORAM overhead drops by 41.8%, and SPEC benchmark execution time improves by 52.4% in relation to a baseline configuration. Our work can be used to improve the security level of previous secure processors.
保持用户数据的私密性在云计算和计算外包中都是一个巨大的问题。实现数据隐私的一种范例是使用防篡改处理器,在其中对用户的私有数据进行解密和计算。这些处理器需要与不受信任的外部存储器进行交互。即使我们加密了所有离开可信处理器的数据,但是,离开芯片的地址序列仍然可能泄露信息。为了防止这种地址泄漏,安全社区提出了ORAM(遗忘RAM)。ORAM主要在服务器/文件设置中进行了探索,这些设置假设了与安全处理器截然不同的计算模型。毫不奇怪,naïvely将ORAM应用于安全处理器设置会导致很大的性能开销。本文研究了最近提出的一种名为Path ORAM的方案。我们演示了在安全处理器设置中使Path ORAM实用的技术。我们引入了背景驱逐方案,以防止路径ORAM失败,并允许性能驱动的设计空间探索。为了进一步提高Path ORAM的性能,我们提出了超级块的概念,并给出了一个有效的Path ORAM完整性验证方案。通过我们的优化,Path ORAM开销降低了41.8%,SPEC基准执行时间相对于基线配置提高了52.4%。我们的工作可以用来提高以前的安全处理器的安全级别。
{"title":"Design space exploration and optimization of path oblivious RAM in secure processors","authors":"Ling Ren, Xiangyao Yu, Christopher W. Fletcher, Marten van Dijk, S. Devadas","doi":"10.1145/2485922.2485971","DOIUrl":"https://doi.org/10.1145/2485922.2485971","url":null,"abstract":"Keeping user data private is a huge problem both in cloud computing and computation outsourcing. One paradigm to achieve data privacy is to use tamper-resistant processors, inside which users' private data is decrypted and computed upon. These processors need to interact with untrusted external memory. Even if we encrypt all data that leaves the trusted processor, however, the address sequence that goes off-chip may still leak information. To prevent this address leakage, the security community has proposed ORAM (Oblivious RAM). ORAM has mainly been explored in server/file settings which assume a vastly different computation model than secure processors. Not surprisingly, naïvely applying ORAM to a secure processor setting incurs large performance overheads. In this paper, a recent proposal called Path ORAM is studied. We demonstrate techniques to make Path ORAM practical in a secure processor setting. We introduce background eviction schemes to prevent Path ORAM failure and allow for a performance-driven design space exploration. We propose a concept called super blocks to further improve Path ORAM's performance, and also show an efficient integrity verification scheme for Path ORAM. With our optimizations, Path ORAM overhead drops by 41.8%, and SPEC benchmark execution time improves by 52.4% in relation to a baseline configuration. Our work can be used to improve the security level of previous secure processors.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80237822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 149
The locality-aware adaptive cache coherence protocol 位置感知自适应缓存一致性协议
Pub Date : 2013-06-01 DOI: 10.1145/2485922.2485967
George Kurian, O. Khan, S. Devadas
Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in future processors. We propose a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines. Our data-centric approach relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit the private caches and enable low-latency, low-energy memory access, while retaining the convenience of shared memory. On a set of parallel benchmarks, our low-overhead locality-aware mechanisms reduce the overall energy by 25% and completion time by 15% in an NoC-based multicore with the Reactive-NUCA on-chip cache organization and the ACKwise limited directory-based coherence protocol.
下一代多核应用程序将处理大量数据,并具有重要的共享性。数据移动和管理会影响内存访问延迟并消耗电力。因此,在未来的处理器中,利用数据局部性是非常重要的。我们提出了一种可扩展的、高效的共享内存缓存一致性协议,该协议可以在缓存线的细粒度上实现片上数据的私有和逻辑共享缓存之间的无缝适应。我们以数据为中心的方法依赖于每条缓存行的硬件内低开销运行时概要分析,并且只允许对具有高时空局部性的数据块进行私有缓存。这使我们能够更好地利用私有缓存,实现低延迟、低能耗的内存访问,同时保留共享内存的便利性。在一组并行基准测试中,我们的低开销位置感知机制在具有Reactive-NUCA片上缓存组织和ACKwise有限目录一致性协议的基于noc的多核中减少了25%的总能量和15%的完成时间。
{"title":"The locality-aware adaptive cache coherence protocol","authors":"George Kurian, O. Khan, S. Devadas","doi":"10.1145/2485922.2485967","DOIUrl":"https://doi.org/10.1145/2485922.2485967","url":null,"abstract":"Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in future processors. We propose a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines. Our data-centric approach relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit the private caches and enable low-latency, low-energy memory access, while retaining the convenience of shared memory. On a set of parallel benchmarks, our low-overhead locality-aware mechanisms reduce the overall energy by 25% and completion time by 15% in an NoC-based multicore with the Reactive-NUCA on-chip cache organization and the ACKwise limited directory-based coherence protocol.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91109004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Zombie memory: Extending memory lifetime by reviving dead blocks 僵尸记忆:通过复活死块来延长记忆寿命
Pub Date : 2013-04-01 DOI: 10.1145/2485922.2485961
R. Azevedo, John D. Davis, K. Strauss, Parikshit Gopalan, M. Manasse, S. Yekhanin
Zombie is an endurance management framework that enables a variety of error correction mechanisms to extend the lifetimes of memories that suffer from bit failures caused by wearout, such as phase-change memory (PCM). Zombie supports both single-level cell (SLC) and multi-level cell (MLC) variants. It extends the lifetime of blocks in working memory pages (primary blocks) by pairing them with spare blocks, i.e., working blocks in pages that have been disabled due to exhaustion of a single block's error correction resources, which would be 'dead' otherwise. Spare blocks adaptively provide error correction resources to primary blocks as failures accumulate over time. This reduces the waste caused by early block failures, making working blocks in discarded pages a useful resource. Even though we use PCM as the target technology, Zombie applies to any memory technology that suffers stuck-at cell failures. This paper describes the Zombie framework, a combination of two new error correction mechanisms (ZombieXOR for SLC and ZombieMLC for MLC) and the extension of two previously proposed SLC mechanisms (ZombieECP and ZombieERC). The result is a 58% to 92% improvement in endurance for Zombie SLC memory and an even more impressive 11x to 17x improvement for ZombieMLC, both with performance overheads of only 0.1% when memories using prior error correction mechanisms reach end of life.
Zombie是一种耐用性管理框架,它支持各种纠错机制,以延长因磨损而导致比特故障的存储器的使用寿命,例如相变存储器(PCM)。Zombie支持单级单元(SLC)和多级单元(MLC)变体。它通过将工作内存页面(主块)中的块与备用块配对来延长它们的生命周期,即,由于单个块的纠错资源耗尽而禁用的页面中的工作块,否则这些块将“死亡”。当故障随时间累积时,备用块自适应地为主块提供纠错资源。这减少了早期块失败造成的浪费,使丢弃页面中的工作块成为有用的资源。即使我们使用PCM作为目标技术,Zombie也适用于任何遭受卡在单元故障的存储技术。本文描述了僵尸框架,它是两种新的纠错机制(针对SLC的ZombieXOR和针对MLC的ZombieMLC)的组合,以及对先前提出的两种SLC机制(ZombieECP和ZombieERC)的扩展。结果是,Zombie SLC内存的续航时间提高了58%到92%,而ZombieMLC的续航时间提高了11到17倍,当使用先前纠错机制的内存寿命结束时,两者的性能开销仅为0.1%。
{"title":"Zombie memory: Extending memory lifetime by reviving dead blocks","authors":"R. Azevedo, John D. Davis, K. Strauss, Parikshit Gopalan, M. Manasse, S. Yekhanin","doi":"10.1145/2485922.2485961","DOIUrl":"https://doi.org/10.1145/2485922.2485961","url":null,"abstract":"Zombie is an endurance management framework that enables a variety of error correction mechanisms to extend the lifetimes of memories that suffer from bit failures caused by wearout, such as phase-change memory (PCM). Zombie supports both single-level cell (SLC) and multi-level cell (MLC) variants. It extends the lifetime of blocks in working memory pages (primary blocks) by pairing them with spare blocks, i.e., working blocks in pages that have been disabled due to exhaustion of a single block's error correction resources, which would be 'dead' otherwise. Spare blocks adaptively provide error correction resources to primary blocks as failures accumulate over time. This reduces the waste caused by early block failures, making working blocks in discarded pages a useful resource. Even though we use PCM as the target technology, Zombie applies to any memory technology that suffers stuck-at cell failures. This paper describes the Zombie framework, a combination of two new error correction mechanisms (ZombieXOR for SLC and ZombieMLC for MLC) and the extension of two previously proposed SLC mechanisms (ZombieECP and ZombieERC). The result is a 58% to 92% improvement in endurance for Zombie SLC memory and an even more impressive 11x to 17x improvement for ZombieMLC, both with performance overheads of only 0.1% when memories using prior error correction mechanisms reach end of life.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77428969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
期刊
Proceedings of the 40th Annual International Symposium on Computer Architecture
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1