On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing

Proceedings of machine learning research Pub Date : 2023-06-19 DOI:10.48550/arXiv.2306.10651

Sepanta Zeighami, C. Shahabi

{"title":"On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing","authors":"Sepanta Zeighami, C. Shahabi","doi":"10.48550/arXiv.2306.10651","DOIUrl":null,"url":null,"abstract":"A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in O(logn) time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of O(logn), but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in O(loglogn) expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve O(1) expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"261 1","pages":"40669-40680"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of machine learning research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.10651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in O(logn) time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of O(logn), but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in O(loglogn) expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve O(1) expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学习索引的分布相关次对数查询时间

数据管理中的一个基本问题是在数组中找到与查询匹配的元素。最近，学习索引被广泛用于解决这个问题，它们学习一个模型来预测数组中项的位置。经验表明，它们比非学习方法(例如，在O(logn)时间内回答查询的b树或二叉搜索)的性能要好几个数量级。然而，学习指标的成功并没有从理论上得到证明。只有现有的尝试显示相同的查询时间为O(logn)，但在数据分布的某些假设下，空间复杂度比非学习方法有恒定的提高。在本文中，我们显著加强了这一结果，表明在对数据分布的温和假设下，在与非学习方法相同的空间复杂度下，学习索引可以在O(loglog)期望查询时间内回答查询。我们还表明，考虑到稍大但仍然接近线性的空间开销，学习索引可以实现O(1)预期查询时间。我们的结果从理论上证明了学习索引比非学习方法快几个数量级，理论上为他们的经验成功奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of machine learning research

自引率

0.00%

发文量