Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng
{"title":"一种通用高效的哈希学习查询方法","authors":"Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng","doi":"10.1145/3183713.3183750","DOIUrl":null,"url":null,"abstract":"As an effective solution to the approximate nearest neighbors (ANN) search problem, learning to hash (L2H) is able to learn similarity-preserving hash functions tailored for a given dataset. However, existing L2H research mainly focuses on improving query performance by learning good hash functions, while Hamming ranking (HR) is used as the default querying method. We show by analysis and experiments that Hamming distance, the similarity indicator used in HR, is too coarse-grained and thus limits the performance of query processing. We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. We then develop two efficient querying methods based on QD, which achieve significantly better query performance than HR. Our methods are general and can work with various L2H algorithms. Our experiments demonstrate that a simple and elegant querying method can produce performance gain equivalent to advanced and complicated learning algorithms.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"A General and Efficient Querying Method for Learning to Hash\",\"authors\":\"Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng\",\"doi\":\"10.1145/3183713.3183750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As an effective solution to the approximate nearest neighbors (ANN) search problem, learning to hash (L2H) is able to learn similarity-preserving hash functions tailored for a given dataset. However, existing L2H research mainly focuses on improving query performance by learning good hash functions, while Hamming ranking (HR) is used as the default querying method. We show by analysis and experiments that Hamming distance, the similarity indicator used in HR, is too coarse-grained and thus limits the performance of query processing. We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. We then develop two efficient querying methods based on QD, which achieve significantly better query performance than HR. Our methods are general and can work with various L2H algorithms. Our experiments demonstrate that a simple and elegant querying method can produce performance gain equivalent to advanced and complicated learning algorithms.\",\"PeriodicalId\":20430,\"journal\":{\"name\":\"Proceedings of the 2018 International Conference on Management of Data\",\"volume\":\"57 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3183713.3183750\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3183713.3183750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A General and Efficient Querying Method for Learning to Hash
As an effective solution to the approximate nearest neighbors (ANN) search problem, learning to hash (L2H) is able to learn similarity-preserving hash functions tailored for a given dataset. However, existing L2H research mainly focuses on improving query performance by learning good hash functions, while Hamming ranking (HR) is used as the default querying method. We show by analysis and experiments that Hamming distance, the similarity indicator used in HR, is too coarse-grained and thus limits the performance of query processing. We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. We then develop two efficient querying methods based on QD, which achieve significantly better query performance than HR. Our methods are general and can work with various L2H algorithms. Our experiments demonstrate that a simple and elegant querying method can produce performance gain equivalent to advanced and complicated learning algorithms.