Alexandr Andoni, A. Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, Erik Waingarten
{"title":"基于非线性谱隙的数据相关哈希","authors":"Alexandr Andoni, A. Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, Erik Waingarten","doi":"10.1145/3188745.3188846","DOIUrl":null,"url":null,"abstract":"We establish a generic reduction from _nonlinear spectral gaps_ of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: * For _general_ d-dimensional normed spaces and n-point datasets, we obtain a _cell-probe_ ANN data structure with approximation O(logd/ε2), space dO(1) n1+ε, and dO(1)nε cell probes per query, for any ε>0. No non-trivial approximation was known before in this generality other than the O(√d) bound which follows from embedding a general norm into ℓ2. * For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query _time_. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a ”tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"Data-dependent hashing via nonlinear spectral gaps\",\"authors\":\"Alexandr Andoni, A. Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, Erik Waingarten\",\"doi\":\"10.1145/3188745.3188846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We establish a generic reduction from _nonlinear spectral gaps_ of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: * For _general_ d-dimensional normed spaces and n-point datasets, we obtain a _cell-probe_ ANN data structure with approximation O(logd/ε2), space dO(1) n1+ε, and dO(1)nε cell probes per query, for any ε>0. No non-trivial approximation was known before in this generality other than the O(√d) bound which follows from embedding a general norm into ℓ2. * For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query _time_. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a ”tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.\",\"PeriodicalId\":20593,\"journal\":{\"name\":\"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3188745.3188846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3188745.3188846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data-dependent hashing via nonlinear spectral gaps
We establish a generic reduction from _nonlinear spectral gaps_ of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results: * For _general_ d-dimensional normed spaces and n-point datasets, we obtain a _cell-probe_ ANN data structure with approximation O(logd/ε2), space dO(1) n1+ε, and dO(1)nε cell probes per query, for any ε>0. No non-trivial approximation was known before in this generality other than the O(√d) bound which follows from embedding a general norm into ℓ2. * For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query _time_. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work. Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a ”tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.