ACM Transactions on Storage (TOS)最新文献

英文中文

The Concurrent Learned Indexes for Multicore Data Storage 面向多核数据存储的并发学习索引

ACM Transactions on Storage (TOS)

Pub Date : 2022-01-29 DOI: 10.1145/3478289

Zhaoguo Wang, Haibo Chen, Youyun Wang, Chuzhe Tang, Huan Wang

We present XIndex, which is a concurrent index library and designed for fast queries. It includes a concurrent ordered index (XIndex-R) and a concurrent hash index (XIndex-H). Similar to a recent proposal of the learned index, the indexes in XIndex use learned models to optimize index efficiency. Compared with the learned index, for the ordered index, XIndex-R is able to handle concurrent writes effectively and adapts its structure according to runtime workload characteristics. For the hash index, XIndex-H is able to avoid the resize operation blocking concurrent writes. Furthermore, the indexes in XIndex can index string keys much more efficiently than the learned index. We demonstrate the advantages of XIndex with YCSB, TPC-C (KV), which is a TPC-C-inspired benchmark for key-value stores, and micro-benchmarks. Compared with ordered indexes of Masstree and Wormhole, XIndex-R achieves up to 3.2× and 4.4× performance improvement on a 24-core machine. Compared with hash indexes of Intel TBB HashMap, XIndex-H achieves up to 3.1× speedup. The performance further improves by 91% after adding the optimizations on indexing string keys. The library is open-sourced.1

我们介绍XIndex，它是一个并发索引库，专为快速查询而设计。它包括一个并发有序索引(XIndex-R)和一个并发散列索引(XIndex-H)。与最近提出的学习索引类似，XIndex中的索引使用学习模型来优化索引效率。与学习索引相比，对于有序索引，XIndex-R能够有效地处理并发写，并根据运行时工作负载特征调整其结构。对于散列索引，XIndex-H能够避免调整大小操作阻塞并发写操作。此外，XIndex中的索引可以比学习索引更有效地索引字符串键。我们演示了XIndex与YCSB、TPC-C (KV)(它是受TPC-C启发的键值存储基准)和微基准的优势。与Masstree和Wormhole的有序索引相比，XIndex-R在24核机器上的性能提升高达3.2倍和4.4倍。与Intel TBB HashMap的哈希索引相比，XIndex-H实现了高达3.1倍的加速。在对索引字符串键进行优化后，性能进一步提高了91%。这个库是开源的

{"title":"The Concurrent Learned Indexes for Multicore Data Storage","authors":"Zhaoguo Wang, Haibo Chen, Youyun Wang, Chuzhe Tang, Huan Wang","doi":"10.1145/3478289","DOIUrl":"https://doi.org/10.1145/3478289","url":null,"abstract":"We present XIndex, which is a concurrent index library and designed for fast queries. It includes a concurrent ordered index (XIndex-R) and a concurrent hash index (XIndex-H). Similar to a recent proposal of the learned index, the indexes in XIndex use learned models to optimize index efficiency. Compared with the learned index, for the ordered index, XIndex-R is able to handle concurrent writes effectively and adapts its structure according to runtime workload characteristics. For the hash index, XIndex-H is able to avoid the resize operation blocking concurrent writes. Furthermore, the indexes in XIndex can index string keys much more efficiently than the learned index. We demonstrate the advantages of XIndex with YCSB, TPC-C (KV), which is a TPC-C-inspired benchmark for key-value stores, and micro-benchmarks. Compared with ordered indexes of Masstree and Wormhole, XIndex-R achieves up to 3.2× and 4.4× performance improvement on a 24-core machine. Compared with hash indexes of Intel TBB HashMap, XIndex-H achieves up to 3.1× speedup. The performance further improves by 91% after adding the optimizations on indexing string keys. The library is open-sourced.1","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124918492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Reprogramming 3D TLC Flash Memory based Solid State Drives 基于固态硬盘的3D TLC闪存重新编程

ACM Transactions on Storage (TOS)

Pub Date : 2022-01-29 DOI: 10.1145/3487064

Congming Gao, Min Ye, C. Xue, Youtao Zhang, Liang Shi, J. Shu, Jun Yang

NAND flash memory-based SSDs have been widely adopted. The scaling of SSD has evolved from plannar (2D) to 3D stacking. For reliability and other reasons, the technology node in 3D NAND SSD is larger than in 2D, but data density can be increased via increasing bit-per-cell. In this work, we develop a novel reprogramming scheme for TLCs in 3D NAND SSD, such that a cell can be programmed and reprogrammed several times before it is erased. Such reprogramming can improve the endurance of a cell and the speed of programming, and increase the amount of bits written in a cell per program/erase cycle, i.e., effective capacity. Our work is the first to perform a real 3D NAND SSD test to validate the feasibility of the reprogram operation. From the collected data, we derive the restrictions of performing reprogramming due to reliability challenges. Furthermore, a reprogrammable SSD (ReSSD) is designed to structure reprogram operations. ReSSD is evaluated in a case study in RAID 5 system (RSS-RAID). Experimental results show that RSS-RAID can improve the endurance by 35.7%, boost write performance by 15.9%, and increase effective capacity by 7.71%, with negligible overhead compared with conventional 3D SSD-based RAID 5 system.

基于NAND闪存的固态硬盘已被广泛采用。SSD的缩放已经从平面(2D)发展到3D堆叠。出于可靠性和其他原因，3D NAND SSD的技术节点比2D大，但可以通过增加每单元比特数来增加数据密度。在这项工作中，我们为3D NAND SSD中的tlc开发了一种新的重编程方案，这样一个单元可以在擦除之前多次编程和重编程。这种重编程可以提高单元的寿命和编程速度，并增加每个程序/擦除周期写入单元的比特量，即有效容量。我们的工作是第一次进行真正的3D NAND SSD测试，以验证重编程操作的可行性。从收集的数据中，我们得出了由于可靠性挑战而执行重编程的限制。此外，一个可重新编程的SSD (ReSSD)被设计用来组织重新编程操作。以RAID 5系统(RSS-RAID)为例，对ReSSD进行了评估。实验结果表明，与传统的基于3D ssd的RAID 5系统相比，RSS-RAID的持久性能提高了35.7%，写性能提高了15.9%，有效容量提高了7.71%，开销可以忽略不计。

{"title":"Reprogramming 3D TLC Flash Memory based Solid State Drives","authors":"Congming Gao, Min Ye, C. Xue, Youtao Zhang, Liang Shi, J. Shu, Jun Yang","doi":"10.1145/3487064","DOIUrl":"https://doi.org/10.1145/3487064","url":null,"abstract":"NAND flash memory-based SSDs have been widely adopted. The scaling of SSD has evolved from plannar (2D) to 3D stacking. For reliability and other reasons, the technology node in 3D NAND SSD is larger than in 2D, but data density can be increased via increasing bit-per-cell. In this work, we develop a novel reprogramming scheme for TLCs in 3D NAND SSD, such that a cell can be programmed and reprogrammed several times before it is erased. Such reprogramming can improve the endurance of a cell and the speed of programming, and increase the amount of bits written in a cell per program/erase cycle, i.e., effective capacity. Our work is the first to perform a real 3D NAND SSD test to validate the feasibility of the reprogram operation. From the collected data, we derive the restrictions of performing reprogramming due to reliability challenges. Furthermore, a reprogrammable SSD (ReSSD) is designed to structure reprogram operations. ReSSD is evaluated in a case study in RAID 5 system (RSS-RAID). Experimental results show that RSS-RAID can improve the endurance by 35.7%, boost write performance by 15.9%, and increase effective capacity by 7.71%, with negligible overhead compared with conventional 3D SSD-based RAID 5 system.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127942939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

RAIL: Predictable, Low Tail Latency for NVMe Flash RAIL:可预测的，低尾延迟NVMe闪存

ACM Transactions on Storage (TOS)

Pub Date : 2022-01-29 DOI: 10.1145/3465406

Heiner Litz, Javier González, Ana Klimovic, C. Kozyrakis

Flash-based storage is replacing disk for an increasing number of data center applications, providing orders of magnitude higher throughput and lower average latency. However, applications also require predictable storage latency. Existing Flash devices fail to provide low tail read latency in the presence of write operations. We propose two novel techniques to address SSD read tail latency, including Redundant Array of Independent LUNs (RAIL) which avoids serialization of reads behind user writes as well as latency-aware hot-cold separation (HC) which improves write throughput while maintaining low tail latency. RAIL leverages the internal parallelism of modern Flash devices and allocates data and parity pages to avoid reads getting stuck behind writes. We implement RAIL in the Linux Kernel as part of the LightNVM Flash translation layer and show that it can reduce read tail latency by 7× at the 99.99th percentile, while reducing relative bandwidth by only 33%.

基于闪存的存储正在为越来越多的数据中心应用程序取代磁盘，提供更高的吞吐量和更低的平均延迟。然而，应用程序也需要可预测的存储延迟。现有的Flash设备无法在存在写操作的情况下提供低尾读延迟。我们提出了两种解决SSD读尾延迟的新技术，包括独立lun冗余阵列(RAIL)，它避免了用户写入后的读序列化，以及延迟感知的热冷分离(HC)，它在保持低尾延迟的同时提高了写吞吐量。RAIL利用现代Flash设备的内部并行性，分配数据和奇偶校验页，以避免读卡在写之后。我们将RAIL作为LightNVM Flash转换层的一部分在Linux内核中实现，并表明它可以在99.99%的百分位数上将读尾延迟减少7倍，而相对带宽仅减少33%。

引用次数: 12

Exploration and Exploitation for Buffer-Controlled HDD-Writes for SSD-HDD Hybrid Storage Server SSD-HDD混合存储服务器缓冲控制hdd写入的探索与利用

ACM Transactions on Storage (TOS)

Pub Date : 2022-01-29 DOI: 10.1145/3465410

Shucheng Wang, Ziyi Lu, Q. Cao, Hong Jiang, Jie Yao, Yuanyuan Dong, Puyuan Yang, Changsheng Xie

Hybrid storage servers combining solid-state drives (SSDs) and hard-drive disks (HDDs) provide cost-effectiveness and μs-level responsiveness for applications. However, observations from cloud storage system Pangu manifest that HDDs are often underutilized while SSDs are overused, especially under intensive writes. It leads to fast wear-out and high tail latency to SSDs. On the other hand, our experimental study reveals that a series of sequential and continuous writes to HDDs exhibit a periodic, staircase-shaped pattern of write latency, i.e., low (e.g., 35 μs), middle (e.g., 55 μs), and high latency (e.g., 12 ms), resulting from buffered writes within HDD’s controller. It inspires us to explore and exploit the potential μs-level IO delay of HDDs to absorb excessive SSD writes without performance degradation. We first build an HDD writing model for describing the staircase behavior and design a profiling process to initialize and dynamically recalibrate the model parameters. Then, we propose a Buffer-Controlled Write approach (BCW) to proactively control buffered writes so that low- and mid-latency periods are scheduled with application data and high-latency periods are filled with padded data. Leveraging BCW, we design a mixed IO scheduler (MIOS) to adaptively steer incoming data to SSDs and HDDs. A multi-HDD scheduling is further designed to minimize HDD-write latency. We perform extensive evaluations under production workloads and benchmarks. The results show that MIOS removes up to 93% amount of data written to SSDs, reduces average and 99th-percentile latencies of the hybrid server by 65% and 85%, respectively.

混合存储服务器结合了固态驱动器(ssd)和硬盘驱动器(hdd)，为应用程序提供了成本效益和μs级的响应能力。然而，从云存储系统Pangu的观察显示，hdd经常未被充分利用，而ssd则被过度使用，特别是在密集写入的情况下。这会导致ssd的快速损耗和高尾延迟。另一方面，我们的实验研究表明，对HDD的一系列顺序和连续写入表现出周期性的阶梯形写入延迟模式，即低(例如，35 μs)，中(例如，55 μs)和高延迟(例如，12 ms)，这是由HDD控制器内的缓冲写入造成的。这启发我们探索和利用hdd潜在的μs级IO延迟来吸收过多的SSD写而不降低性能。我们首先建立了一个用于描述楼梯行为的HDD写入模型，并设计了一个分析过程来初始化和动态重新校准模型参数。然后，我们提出一种缓冲控制写方法(BCW)来主动控制缓冲写，以便用应用程序数据调度低延迟期和中延迟期，并用填充的数据填充高延迟期。利用BCW，我们设计了一个混合IO调度器(MIOS)，以自适应地将传入数据引导到ssd和hdd。多hdd调度被进一步设计为最小化hdd写入延迟。我们在生产工作负载和基准下执行广泛的评估。结果表明，MIOS删除了多达93%的写入ssd的数据量，将混合服务器的平均和99百分位延迟分别减少了65%和85%。

{"title":"Exploration and Exploitation for Buffer-Controlled HDD-Writes for SSD-HDD Hybrid Storage Server","authors":"Shucheng Wang, Ziyi Lu, Q. Cao, Hong Jiang, Jie Yao, Yuanyuan Dong, Puyuan Yang, Changsheng Xie","doi":"10.1145/3465410","DOIUrl":"https://doi.org/10.1145/3465410","url":null,"abstract":"Hybrid storage servers combining solid-state drives (SSDs) and hard-drive disks (HDDs) provide cost-effectiveness and μs-level responsiveness for applications. However, observations from cloud storage system Pangu manifest that HDDs are often underutilized while SSDs are overused, especially under intensive writes. It leads to fast wear-out and high tail latency to SSDs. On the other hand, our experimental study reveals that a series of sequential and continuous writes to HDDs exhibit a periodic, staircase-shaped pattern of write latency, i.e., low (e.g., 35 μs), middle (e.g., 55 μs), and high latency (e.g., 12 ms), resulting from buffered writes within HDD’s controller. It inspires us to explore and exploit the potential μs-level IO delay of HDDs to absorb excessive SSD writes without performance degradation. We first build an HDD writing model for describing the staircase behavior and design a profiling process to initialize and dynamically recalibrate the model parameters. Then, we propose a Buffer-Controlled Write approach (BCW) to proactively control buffered writes so that low- and mid-latency periods are scheduled with application data and high-latency periods are filled with padded data. Leveraging BCW, we design a mixed IO scheduler (MIOS) to adaptively steer incoming data to SSDs and HDDs. A multi-HDD scheduling is further designed to minimize HDD-write latency. We perform extensive evaluations under production workloads and benchmarks. The results show that MIOS removes up to 93% amount of data written to SSDs, reduces average and 99th-percentile latencies of the hybrid server by 65% and 85%, respectively.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128315672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Pattern-Based Prefetching with Adaptive Cache Management Inside of Solid-State Drives 基于模式的预取与自适应缓存管理内部的固态驱动器

ACM Transactions on Storage (TOS)

Pub Date : 2022-01-29 DOI: 10.1145/3474393

Jun Li, Xiaofei Xu, Zhigang Cai, Jianwei Liao, Kenli Li, Balazs Gerofi, Y. Ishikawa

This article proposes a pattern-based prefetching scheme with the support of adaptive cache management, at the flash translation layer of solid-state drives (SSDs). It works inside of SSDs and has features of OS dependence and uses transparency. Specifically, it first mines frequent block access patterns that reflect the correlation among the occurred I/O requests. Then, it compares the requests in the current time window with the identified patterns to direct prefetching data into the cache of SSDs. More importantly, to maximize the cache use efficiency, we build a mathematical model to adaptively determine the cache partition on the basis of I/O workload characteristics, for separately buffering the prefetched data and the written data. Experimental results show that our proposal can yield improvements on average read latency by 1.8%–36.5% without noticeably increasing the write latency, in contrast to conventional SSD-inside prefetching schemes.

本文提出了一种在固态硬盘(ssd)的闪存转换层支持自适应缓存管理的基于模式的预取方案。它在ssd内部工作，具有操作系统依赖和使用透明性的特性。具体来说，它首先挖掘反映发生I/O请求之间相关性的频繁块访问模式。然后，它将当前时间窗口中的请求与确定的模式进行比较，以将预取数据直接放入ssd的缓存中。更重要的是，为了最大限度地提高缓存的使用效率，我们建立了一个数学模型，根据I/O工作负载特征自适应确定缓存分区，分别缓冲预取数据和写入数据。实验结果表明，与传统的ssd内部预取方案相比，我们的方案可以在不显著增加写延迟的情况下将平均读延迟提高1.8%-36.5%。

引用次数: 5

Can Applications Recover from fsync Failures? 应用程序可以从fsync失败中恢复吗?

ACM Transactions on Storage (TOS)

Pub Date : 2021-06-15 DOI: 10.1145/3450338

Anthony Rebello, Yuvraj Patel, R. Alagappan, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

We analyze how file systems and modern data-intensive applications react to fsync failures. First, we characterize how three Linux file systems (ext4, XFS, Btrfs) behave in the presence of failures. We find commonalities across file systems (pages are always marked clean, certain block writes always lead to unavailability) as well as differences (page content and failure reporting is varied). Next, we study how five widely used applications (PostgreSQL, LMDB, LevelDB, SQLite, Redis) handle fsync failures. Our findings show that although applications use many failure-handling strategies, none are sufficient: fsync failures can cause catastrophic outcomes such as data loss and corruption. Our findings have strong implications for the design of file systems and applications that intend to provide strong durability guarantees.

我们分析了文件系统和现代数据密集型应用程序对fsync失败的反应。首先，我们描述三个Linux文件系统(ext4、XFS、Btrfs)在出现故障时的行为。我们发现了文件系统之间的共性(页面总是标记干净，某些块写总是导致不可用)以及差异(页面内容和失败报告各不相同)。接下来，我们研究了五个广泛使用的应用程序(PostgreSQL, LMDB, LevelDB, SQLite, Redis)如何处理fsync失败。我们的研究结果表明，尽管应用程序使用了许多故障处理策略，但没有一个是足够的:fsync故障可能导致灾难性的结果，如数据丢失和损坏。我们的发现对于想要提供强持久性保证的文件系统和应用程序的设计具有重要意义。

引用次数: 8

Performance Modeling and Practical Use Cases for Black-Box SSDs 黑盒固态硬盘的性能建模和实际用例

ACM Transactions on Storage (TOS)

Pub Date : 2021-06-08 DOI: 10.1145/3440022

Joonsung Kim, Kanghyun Choi, Wonsik Lee, Jangwoo Kim

Modern servers are actively deploying Solid-State Drives (SSDs) thanks to their high throughput and low latency. However, current server architects cannot achieve the full performance potential of commodity SSDs, as SSDs are complex devices designed for specific goals (e.g., latency, throughput, endurance, cost) with their internal mechanisms undisclosed to users. In this article, we propose SSDcheck, a novel SSD performance model to extract various internal mechanisms and predict the latency of next access to commodity black-box SSDs. We identify key performance-critical features (e.g., garbage collection, write buffering) and find their parameters (i.e., size, threshold) from each SSD by using our novel diagnosis code snippets. Then, SSDcheck constructs a performance model for a target SSD and dynamically manages the model to predict the latency of the next access. In addition, SSDcheck extracts and provides other useful internal mechanisms (e.g., fetch unit in multi-queue SSDs, background tasks triggering idle-time interval) for the storage system to fully exploit SSDs. By using those useful features and the performance model, we propose multiple practical use cases. Our evaluations show that SSDcheck’s performance model is highly accurate, and proposed use cases achieve significant performance improvement in various scenarios.

由于高吞吐量和低延迟，现代服务器正在积极部署固态硬盘(ssd)。然而，当前的服务器架构师无法实现商用ssd的全部性能潜力，因为ssd是为特定目标(例如，延迟、吞吐量、耐用性、成本)而设计的复杂设备，其内部机制未向用户公开。在本文中，我们提出了一种新的SSD性能模型SSDcheck，用于提取各种内部机制并预测下一次访问商品黑盒SSD的延迟。通过使用我们新颖的诊断代码片段，我们确定了关键的性能关键特性(例如，垃圾收集、写缓冲)，并从每个SSD中找到它们的参数(例如，大小、阈值)。然后，SSDcheck为目标SSD构建性能模型，并对该模型进行动态管理，预测下一次访问的延迟。此外，SSDcheck提取并提供了其他有用的内部机制(例如，多队列ssd中的取单元，触发空闲时间间隔的后台任务)，以便存储系统充分利用ssd。通过使用这些有用的特性和性能模型，我们提出了多个实际用例。我们的评估表明，SSDcheck的性能模型是高度准确的，并且提出的用例在各种场景中实现了显着的性能改进。

{"title":"Performance Modeling and Practical Use Cases for Black-Box SSDs","authors":"Joonsung Kim, Kanghyun Choi, Wonsik Lee, Jangwoo Kim","doi":"10.1145/3440022","DOIUrl":"https://doi.org/10.1145/3440022","url":null,"abstract":"Modern servers are actively deploying Solid-State Drives (SSDs) thanks to their high throughput and low latency. However, current server architects cannot achieve the full performance potential of commodity SSDs, as SSDs are complex devices designed for specific goals (e.g., latency, throughput, endurance, cost) with their internal mechanisms undisclosed to users. In this article, we propose SSDcheck, a novel SSD performance model to extract various internal mechanisms and predict the latency of next access to commodity black-box SSDs. We identify key performance-critical features (e.g., garbage collection, write buffering) and find their parameters (i.e., size, threshold) from each SSD by using our novel diagnosis code snippets. Then, SSDcheck constructs a performance model for a target SSD and dynamically manages the model to predict the latency of the next access. In addition, SSDcheck extracts and provides other useful internal mechanisms (e.g., fetch unit in multi-queue SSDs, background tasks triggering idle-time interval) for the storage system to fully exploit SSDs. By using those useful features and the performance model, we propose multiple practical use cases. Our evaluations show that SSDcheck’s performance model is highly accurate, and proposed use cases achieve significant performance improvement in various scenarios.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126467127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Twizzler: A Data-centric OS for Non-volatile Memory Twizzler:一个以数据为中心的非易失性内存操作系统

ACM Transactions on Storage (TOS)

Pub Date : 2021-06-07 DOI: 10.1145/3454129

Daniel Bittman, P. Alvaro, P. Mehra, D. Long, E. Miller

Byte-addressable, non-volatile memory (NVM) presents an opportunity to rethink the entire system stack. We present Twizzler, an operating system redesign for this near-future. Twizzler removes the kernel from the I/O path, provides programs with memory-style access to persistent data using small (64 bit), object-relative cross-object pointers, and enables simple and efficient long-term sharing of data both between applications and between runs of an application. Twizzler provides a clean-slate programming model for persistent data, realizing the vision of Unix in a world of persistent RAM. We show that Twizzler is simpler, more extensible, and more secure than existing I/O models and implementations by building software for Twizzler and evaluating it on NVM DIMMs. Most persistent pointer operations in Twizzler impose less than 0.5 ns added latency. Twizzler operations are up to faster than Unix, and SQLite queries are up to faster than on PMDK. YCSB workloads ran 1.1– faster on Twizzler than on native and NVM-optimized SQLite backends.

字节可寻址的非易失性内存(NVM)提供了重新考虑整个系统堆栈的机会。我们介绍Twizzler，一个为不久的将来重新设计的操作系统。Twizzler将内核从I/O路径中移除，使用小的(64位)对象相对的跨对象指针为程序提供内存风格的对持久数据的访问，并在应用程序之间和应用程序的运行之间实现简单而有效的长期数据共享。Twizzler为持久数据提供了一个全新的编程模型，实现了Unix在持久RAM世界中的愿景。我们通过为Twizzler构建软件并在NVM内存上对其进行评估，证明Twizzler比现有的I/O模型和实现更简单、更可扩展、更安全。Twizzler中的大多数持久化指针操作所增加的延迟小于0.5 ns。Twizzler操作比Unix快，SQLite查询比PMDK快。YCSB工作负载在Twizzler上运行1.1——比在原生和nvm优化的SQLite后端上运行要快。

引用次数: 31

Design of LSM-tree-based Key-value SSDs with Bounded Tails 基于lsm树的有界尾键值ssd的设计

ACM Transactions on Storage (TOS)

Pub Date : 2021-05-28 DOI: 10.1145/3452846

Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind, Sungjin Lee

Key-value store based on a log-structured merge-tree (LSM-tree) is preferable to hash-based key-value store, because an LSM-tree can support a wider variety of operations and show better performance, especially for writes. However, LSM-tree is difficult to implement in the resource constrained environment of a key-value SSD (KV-SSD), and, consequently, KV-SSDs typically use hash-based schemes. We present PinK, a design and implementation of an LSM-tree-based KV-SSD, which compared to a hash-based KV-SSD, reduces 99th percentile tail latency by 73%, improves average read latency by 42%, and shows 37% higher throughput. The key idea in improving the performance of an LSM-tree in a resource constrained environment is to avoid the use of Bloom filters and instead, use a small amount of DRAM to keep/pin the top levels of the LSM-tree. We also find that PinK is able to provide a flexible design space for a wide range of KV workloads by leveraging the read-write tradeoff in LSM-trees.

基于日志结构的合并树(LSM-tree)的键值存储比基于散列的键值存储更可取，因为LSM-tree可以支持更广泛的操作并显示更好的性能，特别是对于写操作。但是，在key-value SSD (KV-SSD)的资源受限环境中，很难实现LSM-tree，因此，KV-SSD通常使用基于哈希的方案。我们提出了PinK，一个基于lsm树的KV-SSD的设计和实现，与基于哈希的KV-SSD相比，它减少了73%的第99百分位尾部延迟，提高了42%的平均读延迟，并显示出37%的高吞吐量。在资源受限的环境中，提高lsm树性能的关键思想是避免使用Bloom过滤器，而是使用少量的DRAM来保持/固定lsm树的顶层。我们还发现，通过利用lsm树中的读写权衡，PinK能够为广泛的KV工作负载提供灵活的设计空间。

引用次数: 3

Penalty- and Locality-aware Memory Allocation in Redis Using Enhanced AET Redis中使用增强AET的惩罚和位置感知内存分配

ACM Transactions on Storage (TOS)

Pub Date : 2021-05-28 DOI: 10.1145/3447573

Cheng Pan, Xiaolin Wang, Yingwei Luo, Zhenlin Wang

Due to large data volume and low latency requirements of modern web services, the use of an in-memory key-value (KV) cache often becomes an inevitable choice (e.g., Redis and Memcached). The in-memory cache holds hot data, reduces request latency, and alleviates the load on background databases. Inheriting from the traditional hardware cache design, many existing KV cache systems still use recency-based cache replacement algorithms, e.g., least recently used or its approximations. However, the diversity of miss penalty distinguishes a KV cache from a hardware cache. Inadequate consideration of penalty can substantially compromise space utilization and request service time. KV accesses also demonstrate locality, which needs to be coordinated with miss penalty to guide cache management. In this article, we first discuss how to enhance the existing cache model, the Average Eviction Time model, so that it can adapt to modeling a KV cache. After that, we apply the model to Redis and propose pRedis, Penalty- and Locality-aware Memory Allocation in Redis, which synthesizes data locality and miss penalty, in a quantitative manner, to guide memory allocation and replacement in Redis. At the same time, we also explore the diurnal behavior of a KV store and exploit long-term reuse. We replace the original passive eviction mechanism with an automatic dump/load mechanism, to smooth the transition between access peaks and valleys. Our evaluation shows that pRedis effectively reduces the average and tail access latency with minimal time and space overhead. For both real-world and synthetic workloads, our approach delivers an average of 14.0%∼52.3% latency reduction over a state-of-the-art penalty-aware cache management scheme, Hyperbolic Caching (HC), and shows more quantitative predictability of performance. Moreover, we can obtain even lower average latency (1.1%∼5.5%) when dynamically switching policies between pRedis and HC.

由于现代web服务的大数据量和低延迟要求，使用内存中的键值(KV)缓存通常成为不可避免的选择(例如，Redis和Memcached)。内存缓存保存热数据，减少请求延迟，减轻后台数据库的负载。继承了传统的硬件缓存设计，许多现有的KV缓存系统仍然使用基于最近的缓存替换算法，例如，最近最少使用或其近似值。然而，失误惩罚的多样性将KV缓存与硬件缓存区分开来。对惩罚的考虑不足会严重影响空间利用率和请求服务时间。KV访问也具有局部性，需要与未命中惩罚相协调，以指导缓存管理。在本文中，我们首先讨论了如何改进现有的缓存模型，即平均驱逐时间模型，使其能够适应千伏缓存的建模。之后，我们将该模型应用到Redis中，提出了pRedis, Penalty- and - locality -aware Memory Allocation in Redis，该方法定量地综合了数据的locality和miss Penalty，指导Redis中的内存分配和替换。同时，我们还探索了KV存储的日常行为，并开发了长期重用。我们用自动转储/加载机制取代了原来的被动驱逐机制，以平滑访问高峰和低谷之间的过渡。我们的评估表明，pRedis以最小的时间和空间开销有效地降低了平均和尾访问延迟。对于现实世界和合成工作负载，我们的方法比最先进的惩罚感知缓存管理方案双曲缓存(HC)平均减少14.0% ~ 52.3%的延迟，并显示出更多的性能定量可预测性。此外，当在pRedis和HC之间动态切换策略时，我们可以获得更低的平均延迟(1.1% ~ 5.5%)。

{"title":"Penalty- and Locality-aware Memory Allocation in Redis Using Enhanced AET","authors":"Cheng Pan, Xiaolin Wang, Yingwei Luo, Zhenlin Wang","doi":"10.1145/3447573","DOIUrl":"https://doi.org/10.1145/3447573","url":null,"abstract":"Due to large data volume and low latency requirements of modern web services, the use of an in-memory key-value (KV) cache often becomes an inevitable choice (e.g., Redis and Memcached). The in-memory cache holds hot data, reduces request latency, and alleviates the load on background databases. Inheriting from the traditional hardware cache design, many existing KV cache systems still use recency-based cache replacement algorithms, e.g., least recently used or its approximations. However, the diversity of miss penalty distinguishes a KV cache from a hardware cache. Inadequate consideration of penalty can substantially compromise space utilization and request service time. KV accesses also demonstrate locality, which needs to be coordinated with miss penalty to guide cache management. In this article, we first discuss how to enhance the existing cache model, the Average Eviction Time model, so that it can adapt to modeling a KV cache. After that, we apply the model to Redis and propose pRedis, Penalty- and Locality-aware Memory Allocation in Redis, which synthesizes data locality and miss penalty, in a quantitative manner, to guide memory allocation and replacement in Redis. At the same time, we also explore the diurnal behavior of a KV store and exploit long-term reuse. We replace the original passive eviction mechanism with an automatic dump/load mechanism, to smooth the transition between access peaks and valleys. Our evaluation shows that pRedis effectively reduces the average and tail access latency with minimal time and space overhead. For both real-world and synthetic workloads, our approach delivers an average of 14.0%∼52.3% latency reduction over a state-of-the-art penalty-aware cache management scheme, Hyperbolic Caching (HC), and shows more quantitative predictability of performance. Moreover, we can obtain even lower average latency (1.1%∼5.5%) when dynamically switching policies between pRedis and HC.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115826793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM Transactions on Storage (TOS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀