{"title":"qwLSH","authors":"Omid Jafari, John Ossorgin, P. Nagarkar","doi":"10.1145/3323873.3325048","DOIUrl":null,"url":null,"abstract":"Similarity search queries in high-dimensional spaces are an important type of queries in many domains such as image processing, machine learning, etc. %Since exact similarity search indexing techniques suffer from the well-knowncurse of dimensionality in high-dimensional spaces, approximate search techniques are often utilized instead. Locality Sensitive Hashing (LSH) has been shown to be an effective approximate search method for solving similarity search queries in high-dimensional spaces. Often, queries in real-world settings arrive as part of a query workload. LSH and its variants are particularly designed to solve single queries effectively. They suffer from one major drawback while executing query workloads: they do not take into consideration important data characteristics for effective cache utilization while designing the index structures. In this paper, we presentqwLSH, an index structure %for efficiently processing similarity search query workloads in high-dimensional spaces. We that intelligently divides a given cache during processing of a query workload by using novel cost models. Experimental results show that, given a query workload,qwLSH is able to perform faster than existing techniques due to its unique cost models and strategies to reduce cache misses. %We further present different caching strategies for efficiently processing similarity search query workloads. We evaluate our proposed unique design and cost models ofqwLSH on real datasets against state-of-the-art LSH-based techniques.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3323873.3325048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Similarity search queries in high-dimensional spaces are an important type of queries in many domains such as image processing, machine learning, etc. %Since exact similarity search indexing techniques suffer from the well-knowncurse of dimensionality in high-dimensional spaces, approximate search techniques are often utilized instead. Locality Sensitive Hashing (LSH) has been shown to be an effective approximate search method for solving similarity search queries in high-dimensional spaces. Often, queries in real-world settings arrive as part of a query workload. LSH and its variants are particularly designed to solve single queries effectively. They suffer from one major drawback while executing query workloads: they do not take into consideration important data characteristics for effective cache utilization while designing the index structures. In this paper, we presentqwLSH, an index structure %for efficiently processing similarity search query workloads in high-dimensional spaces. We that intelligently divides a given cache during processing of a query workload by using novel cost models. Experimental results show that, given a query workload,qwLSH is able to perform faster than existing techniques due to its unique cost models and strategies to reduce cache misses. %We further present different caching strategies for efficiently processing similarity search query workloads. We evaluate our proposed unique design and cost models ofqwLSH on real datasets against state-of-the-art LSH-based techniques.