{"title":"Optimal Data-Dependent Hashing for Approximate Near Neighbors","authors":"Alexandr Andoni, Ilya P. Razenshteyn","doi":"10.1145/2746539.2746553","DOIUrl":null,"url":null,"abstract":"We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an n-point dataset in a d-dimensional space our data structure achieves query time O(d ⋅ nρ+o(1)) and space O(n1+ρ+o(1) + d ⋅ n), where ρ=1/(2c2-1) for the Euclidean space and approximation c>1. For the Hamming space, we obtain an exponent of ρ=1/(2c-1). Our result completes the direction set forth in (Andoni, Indyk, Nguyen, Razenshteyn 2014) who gave a proof-of-concept that data-dependent hashing can outperform classic Locality Sensitive Hashing (LSH). In contrast to (Andoni, Indyk, Nguyen, Razenshteyn 2014), the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures (Indyk, Motwani 1998) (Andoni, Indyk 2006) for all approximation factors c>1. From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"263","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2746539.2746553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 263
Abstract
We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an n-point dataset in a d-dimensional space our data structure achieves query time O(d ⋅ nρ+o(1)) and space O(n1+ρ+o(1) + d ⋅ n), where ρ=1/(2c2-1) for the Euclidean space and approximation c>1. For the Hamming space, we obtain an exponent of ρ=1/(2c-1). Our result completes the direction set forth in (Andoni, Indyk, Nguyen, Razenshteyn 2014) who gave a proof-of-concept that data-dependent hashing can outperform classic Locality Sensitive Hashing (LSH). In contrast to (Andoni, Indyk, Nguyen, Razenshteyn 2014), the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures (Indyk, Motwani 1998) (Andoni, Indyk 2006) for all approximation factors c>1. From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.