Zhonghua Wang, Chen Ding, Fengguang Song, Kai Lu, Jiguang Wan, Zhihu Tan, Changsheng Xie, Guokuan Li
{"title":"WIPE: a Write-Optimized Learned Index for Persistent Memory","authors":"Zhonghua Wang, Chen Ding, Fengguang Song, Kai Lu, Jiguang Wan, Zhihu Tan, Changsheng Xie, Guokuan Li","doi":"10.1145/3634915","DOIUrl":null,"url":null,"abstract":"<p>Learned Index, which utilizes effective machine learning models to accelerate locating sorted data positions, has gained increasing attention in many big data scenarios. Using efficient learned models, the learned indexes build large nodes and flat structures, thereby greatly improving the performance. However, most of the state-of-the-art learned indexes are designed for DRAM, and there is hence an urgent need to enable high-performance learned indexes for emerging Non-Volatile Memory (NVM). In this paper, we first evaluate and analyze the performance of the existing learned indexes on NVM. We discover that these learned indexes encounter severe write amplification and write performance degradation due to the requirements of maintaining large sorted/semi-sorted data nodes. To tackle the problems, we propose a novel three-tiered architecture of write-optimized persistent learned index, which is named <i>WIPE</i>, by adopting unsorted fine-granularity data nodes to achieve high write performance on NVM. Thereinto, we devise a new root node construction algorithm to accelerate searching numerous small data nodes. The algorithm ensures stable flat structure and high read performance in large-size datasets by introducing an intermediate layer (i.e., index nodes) and achieving accurate prediction of index node positions from the root node. Our extensive experiments on Intel DCPMM show that WIPE can improve write throughput and read throughput by up to 3.9 × and 7 ×, respectively, compared to the state-of-the-art learned indexes. Also, WIPE can recover from a system crash in ∼ 18<i>ms</i>. WIPE is free as an open-source software package<sup>1</sup>.</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"77 1 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3634915","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Learned Index, which utilizes effective machine learning models to accelerate locating sorted data positions, has gained increasing attention in many big data scenarios. Using efficient learned models, the learned indexes build large nodes and flat structures, thereby greatly improving the performance. However, most of the state-of-the-art learned indexes are designed for DRAM, and there is hence an urgent need to enable high-performance learned indexes for emerging Non-Volatile Memory (NVM). In this paper, we first evaluate and analyze the performance of the existing learned indexes on NVM. We discover that these learned indexes encounter severe write amplification and write performance degradation due to the requirements of maintaining large sorted/semi-sorted data nodes. To tackle the problems, we propose a novel three-tiered architecture of write-optimized persistent learned index, which is named WIPE, by adopting unsorted fine-granularity data nodes to achieve high write performance on NVM. Thereinto, we devise a new root node construction algorithm to accelerate searching numerous small data nodes. The algorithm ensures stable flat structure and high read performance in large-size datasets by introducing an intermediate layer (i.e., index nodes) and achieving accurate prediction of index node positions from the root node. Our extensive experiments on Intel DCPMM show that WIPE can improve write throughput and read throughput by up to 3.9 × and 7 ×, respectively, compared to the state-of-the-art learned indexes. Also, WIPE can recover from a system crash in ∼ 18ms. WIPE is free as an open-source software package1.
期刊介绍:
ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.