面向RDMA的高性能分解存储系统学习键值存储

IF 2.1 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Storage Pub Date : 2023-09-05 DOI:10.1145/3620674

Pengfei Li, Yu Hua, Pengfei Zuo, Zhangyu Chen, Jiajie Sheng

{"title":"面向RDMA的高性能分解存储系统学习键值存储","authors":"Pengfei Li, Yu Hua, Pengfei Zuo, Zhangyu Chen, Jiajie Sheng","doi":"10.1145/3620674","DOIUrl":null,"url":null,"abstract":"Disaggregated memory systems separate monolithic servers into different components, including compute and memory nodes, to enjoy the benefits of high resource utilization, flexible hardware scalability, and efficient data sharing. By exploiting the high-performance RDMA (Remote Direct Memory Access), the compute nodes directly access the remote memory pool without involving remote CPUs. Hence, the ordered key-value (KV) stores (e.g., B-trees and learned indexes) keep all data sorted to provide rang query service via the high-performance network. However, existing ordered KVs fail to work well on the disaggregated memory systems, due to either consuming multiple network roundtrips to search the remote data or heavily relying on the memory nodes equipped with insufficient computing resources to process data modifications. In this paper, we propose a scalable RDMA-oriented KV store with learned indexes, called ROLEX, to coalesce the ordered KV store in the disaggregated systems for efficient data storage and retrieval. ROLEX leverages a retraining-decoupled learned index scheme to dissociate the model retraining from data modification operations via adding a bias and some data-movement constraints to learned models. Based on the operation decoupling, data modifications are directly executed in compute nodes via one-sided RDMA verbs with high scalability. The model retraining is hence removed from the critical path of data modification and asynchronously executed in memory nodes by using dedicated computing resources. ROLEX efficiently alleviates the fragmentation and garbage collection issues, due to allocating and reclaiming space via fixed-size leaves that are accessed via the atomic-size leaf numbers. Our experimental results on YCSB and real-world workloads demonstrate that ROLEX achieves competitive performance on the static workloads, as well as significantly improving the performance on dynamic workloads by up to 2.2 × than state-of-the-art schemes on the disaggregated memory systems. We have released the open-source codes for public use in GitHub.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A High-Performance RDMA-oriented Learned Key-Value Store for Disaggregated Memory Systems\",\"authors\":\"Pengfei Li, Yu Hua, Pengfei Zuo, Zhangyu Chen, Jiajie Sheng\",\"doi\":\"10.1145/3620674\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Disaggregated memory systems separate monolithic servers into different components, including compute and memory nodes, to enjoy the benefits of high resource utilization, flexible hardware scalability, and efficient data sharing. By exploiting the high-performance RDMA (Remote Direct Memory Access), the compute nodes directly access the remote memory pool without involving remote CPUs. Hence, the ordered key-value (KV) stores (e.g., B-trees and learned indexes) keep all data sorted to provide rang query service via the high-performance network. However, existing ordered KVs fail to work well on the disaggregated memory systems, due to either consuming multiple network roundtrips to search the remote data or heavily relying on the memory nodes equipped with insufficient computing resources to process data modifications. In this paper, we propose a scalable RDMA-oriented KV store with learned indexes, called ROLEX, to coalesce the ordered KV store in the disaggregated systems for efficient data storage and retrieval. ROLEX leverages a retraining-decoupled learned index scheme to dissociate the model retraining from data modification operations via adding a bias and some data-movement constraints to learned models. Based on the operation decoupling, data modifications are directly executed in compute nodes via one-sided RDMA verbs with high scalability. The model retraining is hence removed from the critical path of data modification and asynchronously executed in memory nodes by using dedicated computing resources. ROLEX efficiently alleviates the fragmentation and garbage collection issues, due to allocating and reclaiming space via fixed-size leaves that are accessed via the atomic-size leaf numbers. Our experimental results on YCSB and real-world workloads demonstrate that ROLEX achieves competitive performance on the static workloads, as well as significantly improving the performance on dynamic workloads by up to 2.2 × than state-of-the-art schemes on the disaggregated memory systems. We have released the open-source codes for public use in GitHub.\",\"PeriodicalId\":49113,\"journal\":{\"name\":\"ACM Transactions on Storage\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Storage\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3620674\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3620674","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

分解内存系统将单片服务器分离为不同的组件，包括计算和内存节点，以享受高资源利用率、灵活的硬件可扩展性和高效数据共享的好处。通过利用高性能的RDMA（远程直接内存访问），计算节点可以直接访问远程内存池，而无需涉及远程CPU。因此，有序键值（KV）存储（例如，B树和学习索引）保持所有数据的排序，以通过高性能网络提供范围查询服务。然而，由于消耗多个网络往返来搜索远程数据，或者严重依赖于配备有不足计算资源的存储器节点来处理数据修改，现有的有序KV在分解的存储器系统上不能很好地工作。在本文中，我们提出了一种具有学习索引的面向RDMA的可扩展KV存储，称为ROLEX，以在分解系统中合并有序的KV存储，从而实现高效的数据存储和检索。ROLEX利用再训练解耦学习索引方案，通过向学习模型添加偏差和一些数据移动约束，将模型再训练与数据修改操作分离。基于操作解耦，数据修改通过具有高可伸缩性的单边RDMA谓词直接在计算节点中执行。因此，模型再训练从数据修改的关键路径中删除，并通过使用专用计算资源在内存节点中异步执行。ROLEX通过固定大小的叶分配和回收空间，有效地缓解了碎片和垃圾收集问题，这些叶是通过原子大小的叶编号访问的。我们在YCSB和真实世界工作负载上的实验结果表明，ROLEX在静态工作负载上实现了有竞争力的性能，并且在动态工作负载上比在分解内存系统上的最先进方案显著提高了2.2倍。我们已经在GitHub中发布了供公众使用的开源代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A High-Performance RDMA-oriented Learned Key-Value Store for Disaggregated Memory Systems

Disaggregated memory systems separate monolithic servers into different components, including compute and memory nodes, to enjoy the benefits of high resource utilization, flexible hardware scalability, and efficient data sharing. By exploiting the high-performance RDMA (Remote Direct Memory Access), the compute nodes directly access the remote memory pool without involving remote CPUs. Hence, the ordered key-value (KV) stores (e.g., B-trees and learned indexes) keep all data sorted to provide rang query service via the high-performance network. However, existing ordered KVs fail to work well on the disaggregated memory systems, due to either consuming multiple network roundtrips to search the remote data or heavily relying on the memory nodes equipped with insufficient computing resources to process data modifications. In this paper, we propose a scalable RDMA-oriented KV store with learned indexes, called ROLEX, to coalesce the ordered KV store in the disaggregated systems for efficient data storage and retrieval. ROLEX leverages a retraining-decoupled learned index scheme to dissociate the model retraining from data modification operations via adding a bias and some data-movement constraints to learned models. Based on the operation decoupling, data modifications are directly executed in compute nodes via one-sided RDMA verbs with high scalability. The model retraining is hence removed from the critical path of data modification and asynchronously executed in memory nodes by using dedicated computing resources. ROLEX efficiently alleviates the fragmentation and garbage collection issues, due to allocating and reclaiming space via fixed-size leaves that are accessed via the atomic-size leaf numbers. Our experimental results on YCSB and real-world workloads demonstrate that ROLEX achieves competitive performance on the static workloads, as well as significantly improving the performance on dynamic workloads by up to 2.2 × than state-of-the-art schemes on the disaggregated memory systems. We have released the open-source codes for public use in GitHub.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Storage COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

4.20

自引率

5.90%

发文量

审稿时长

>12 weeks

期刊介绍： The ACM Transactions on Storage (TOS) is a new journal with an intent to publish original archival papers in the area of storage and closely related disciplines. Articles that appear in TOS will tend either to present new techniques and concepts or to report novel experiences and experiments with practical systems. Storage is a broad and multidisciplinary area that comprises of network protocols, resource management, data backup, replication, recovery, devices, security, and theory of data coding, densities, and low-power. Potential synergies among these fields are expected to open up new research directions.