学习型索引的分布依赖次对数查询时间。

Sepanta Zeighami, Cyrus Shahabi
{"title":"学习型索引的分布依赖次对数查询时间。","authors":"Sepanta Zeighami, Cyrus Shahabi","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math> time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math>, but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math> expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve <math><mi>O</mi><mo>(</mo><mn>1</mn><mo>)</mo></math> expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"40669-40680"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627073/pdf/","citationCount":"0","resultStr":"{\"title\":\"On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing.\",\"authors\":\"Sepanta Zeighami, Cyrus Shahabi\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math> time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math>, but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math> expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve <math><mi>O</mi><mo>(</mo><mn>1</mn><mo>)</mo></math> expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.</p>\",\"PeriodicalId\":74504,\"journal\":{\"name\":\"Proceedings of machine learning research\",\"volume\":\"202 \",\"pages\":\"40669-40680\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627073/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of machine learning research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of machine learning research","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

数据管理中的一个基本问题是在数组中查找与查询匹配的元素。最近,学习的索引被广泛用于解决这个问题,它们学习一个模型来预测数组中项目的位置。经验表明,它们在数量级上优于非学习方法(例如,在O(logn)时间内回答查询的B树或二进制搜索)。然而,学习指数的成功并没有得到理论上的证明。只有现有的尝试显示了相同的查询时间O(logn),但在数据分布的一些假设下,与非学习方法相比,空间复杂性不断提高。在本文中,我们显著地加强了这一结果,表明在对数据分布的温和假设下,以及与非学习方法相同的空间复杂性下,学习索引可以在O(loglogn)预期查询时间内回答查询。我们还表明,考虑到稍大但仍接近线性的空间开销,学习的索引可以实现O(1)的预期查询时间。我们的结果从理论上证明了学习指数比非学习方法快几个数量级,这在理论上奠定了它们的经验成功基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing.

A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in O(logn) time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of O(logn), but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in O(loglogn) expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve O(1) expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Contrastive Learning for Clinical Outcome Prediction with Partial Data Sources. Multi-Source Conformal Inference Under Distribution Shift. DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation. Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters. Adapt and Diffuse: Sample-Adaptive Reconstruction Via Latent Diffusion Models.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1