一种通用高效的哈希学习查询方法

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI:10.1145/3183713.3183750

Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng

{"title":"一种通用高效的哈希学习查询方法","authors":"Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng","doi":"10.1145/3183713.3183750","DOIUrl":null,"url":null,"abstract":"As an effective solution to the approximate nearest neighbors (ANN) search problem, learning to hash (L2H) is able to learn similarity-preserving hash functions tailored for a given dataset. However, existing L2H research mainly focuses on improving query performance by learning good hash functions, while Hamming ranking (HR) is used as the default querying method. We show by analysis and experiments that Hamming distance, the similarity indicator used in HR, is too coarse-grained and thus limits the performance of query processing. We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. We then develop two efficient querying methods based on QD, which achieve significantly better query performance than HR. Our methods are general and can work with various L2H algorithms. Our experiments demonstrate that a simple and elegant querying method can produce performance gain equivalent to advanced and complicated learning algorithms.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"A General and Efficient Querying Method for Learning to Hash\",\"authors\":\"Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng\",\"doi\":\"10.1145/3183713.3183750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As an effective solution to the approximate nearest neighbors (ANN) search problem, learning to hash (L2H) is able to learn similarity-preserving hash functions tailored for a given dataset. However, existing L2H research mainly focuses on improving query performance by learning good hash functions, while Hamming ranking (HR) is used as the default querying method. We show by analysis and experiments that Hamming distance, the similarity indicator used in HR, is too coarse-grained and thus limits the performance of query processing. We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. We then develop two efficient querying methods based on QD, which achieve significantly better query performance than HR. Our methods are general and can work with various L2H algorithms. Our experiments demonstrate that a simple and elegant querying method can produce performance gain equivalent to advanced and complicated learning algorithms.\",\"PeriodicalId\":20430,\"journal\":{\"name\":\"Proceedings of the 2018 International Conference on Management of Data\",\"volume\":\"57 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3183713.3183750\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3183713.3183750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

作为近似最近邻(ANN)搜索问题的有效解决方案，学习哈希(L2H)能够学习为给定数据集定制的保持相似性的哈希函数。然而，现有的L2H研究主要侧重于通过学习好的哈希函数来提高查询性能，而默认的查询方法是Hamming ranking (HR)。通过分析和实验表明，HR中使用的相似度指标Hamming距离过于粗粒度，从而限制了查询处理的性能。我们提出了一种新的细粒度相似度指标，量化距离(QD)，它提供了查询与桶中项目之间相似度的更多信息。然后，我们开发了两种基于QD的高效查询方法，其查询性能明显优于HR。我们的方法是通用的，可以与各种L2H算法一起工作。我们的实验表明，一个简单而优雅的查询方法可以产生相当于高级和复杂的学习算法的性能增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A General and Efficient Querying Method for Learning to Hash

As an effective solution to the approximate nearest neighbors (ANN) search problem, learning to hash (L2H) is able to learn similarity-preserving hash functions tailored for a given dataset. However, existing L2H research mainly focuses on improving query performance by learning good hash functions, while Hamming ranking (HR) is used as the default querying method. We show by analysis and experiments that Hamming distance, the similarity indicator used in HR, is too coarse-grained and thus limits the performance of query processing. We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. We then develop two efficient querying methods based on QD, which achieve significantly better query performance than HR. Our methods are general and can work with various L2H algorithms. Our experiments demonstrate that a simple and elegant querying method can produce performance gain equivalent to advanced and complicated learning algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2018 International Conference on Management of Data

自引率

0.00%

发文量

期刊最新文献

Meta-Dataflows: Efficient Exploratory Dataflow Jobs Columnstore and B+ tree - Are Hybrid Physical Designs Important? Demonstration of VerdictDB, the Platform-Independent AQP System Efficient Selection of Geospatial Data on Maps for Interactive and Visualized Exploration Session details: Keynote1