基于非线性谱隙的数据相关哈希

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing Pub Date : 2018-06-20 DOI:10.1145/3188745.3188846

Alexandr Andoni, A. Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, Erik Waingarten

{"title":"基于非线性谱隙的数据相关哈希","authors":"Alexandr Andoni, A. Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, Erik Waingarten","doi":"10.1145/3188745.3188846","DOIUrl":null,"url":null,"abstract":"We establish a generic reduction from _nonlinear spectral gaps_ of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: * For _general_ d-dimensional normed spaces and n-point datasets, we obtain a _cell-probe_ ANN data structure with approximation O(logd/ε2), space dO(1) n1+ε, and dO(1)nε cell probes per query, for any ε>0. No non-trivial approximation was known before in this generality other than the O(√d) bound which follows from embedding a general norm into ℓ2. * For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query _time_. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a ”tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"Data-dependent hashing via nonlinear spectral gaps\",\"authors\":\"Alexandr Andoni, A. Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, Erik Waingarten\",\"doi\":\"10.1145/3188745.3188846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We establish a generic reduction from _nonlinear spectral gaps_ of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: * For _general_ d-dimensional normed spaces and n-point datasets, we obtain a _cell-probe_ ANN data structure with approximation O(logd/ε2), space dO(1) n1+ε, and dO(1)nε cell probes per query, for any ε>0. No non-trivial approximation was known before in this generality other than the O(√d) bound which follows from embedding a general norm into ℓ2. * For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query _time_. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a ”tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.\",\"PeriodicalId\":20593,\"journal\":{\"name\":\"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3188745.3188846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3188745.3188846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

摘要

我们建立了从度量空间的非线性谱隙到数据相关的位置敏感哈希的一般约简，给出了一种解决各种距离函数下的高维近似近邻搜索问题的新方法。*对于一般d维归范空间和n点数据集，我们得到了近似为O(logd/ε2)的_cell-probe_ ANN数据结构，对于任意ε>，每次查询的空间dO(1) n1+ε和dO(1)nε细胞探针。在此一般性中，除了将一般范数嵌入到l2中所得到的O(√d)界之外，没有已知的非平凡近似。*对于p和schattenp范数，我们进一步改进了数据结构，得到近似O(p)和次线性查询_time_。对于p，这改进了之前的最佳近似2O(p)(它需要多项式，而不是n空间中的近线性)。对于schattenp范数，在此工作之前没有已知的非平凡ANN数据结构。以前的人工神经网络问题的方法要么利用度量的低维，在维度上需要空间指数，要么通过将度量嵌入到“可处理”的空间(如1)来规避维度的诅咒。我们的新通用约简使用一种新的划分方法，与这两种方法不同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data-dependent hashing via nonlinear spectral gaps

We establish a generic reduction from _nonlinear spectral gaps_ of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: * For _general_ d-dimensional normed spaces and n-point datasets, we obtain a _cell-probe_ ANN data structure with approximation O(logd/ε2), space dO(1) n1+ε, and dO(1)nε cell probes per query, for any ε>0. No non-trivial approximation was known before in this generality other than the O(√d) bound which follows from embedding a general norm into ℓ2. * For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query _time_. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a ”tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

自引率

0.00%

发文量

期刊最新文献

Data-dependent hashing via nonlinear spectral gaps Interactive compression to external information The query complexity of graph isomorphism: bypassing distribution testing lower bounds Collusion resistant traitor tracing from learning with errors Explicit binary tree codes with polylogarithmic size alphabet