基于变换的KNN集相似度搜索框架(扩展摘要)

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI:10.1109/ICDE48307.2020.00239

Yong Zhang, Jiacheng Wu, Jin Wang, Chunxiao Xing

{"title":"基于变换的KNN集相似度搜索框架(扩展摘要)","authors":"Yong Zhang, Jiacheng Wu, Jin Wang, Chunxiao Xing","doi":"10.1109/ICDE48307.2020.00239","DOIUrl":null,"url":null,"abstract":"Set similarity search is a fundamental operation in a variety of applications [3] , [5] , [2] . There is a long stream of research on the problem of set similarity search. Given a collection of set records, a query and a similarity function, the algorithm will return all the set records that are similarity with the query. There are many metrics to measure the similarity between two sets, such as Overlap, Jaccard, Cosine and Dice. In this paper we use the widely applied Jaccard to quantify the similarity between two sets, but our proposed techniques can be easily extended to other set-based similarity functions. Previous approaches require users to specify a threshold of similarity. However, in many scenarios it is rather difficult to specify such a threshold. For example, when users types some keywords in the search engine, they will pay more attention for the results which rank in the front, say the top five ones. In this case, if we use threshold-based search instead of KNN similarity search, it is difficult to find the results that are more attractive for users.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"2040-2041"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Transformation-based Framework for KNN Set Similarity Search(Extended Abstract)\",\"authors\":\"Yong Zhang, Jiacheng Wu, Jin Wang, Chunxiao Xing\",\"doi\":\"10.1109/ICDE48307.2020.00239\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Set similarity search is a fundamental operation in a variety of applications [3] , [5] , [2] . There is a long stream of research on the problem of set similarity search. Given a collection of set records, a query and a similarity function, the algorithm will return all the set records that are similarity with the query. There are many metrics to measure the similarity between two sets, such as Overlap, Jaccard, Cosine and Dice. In this paper we use the widely applied Jaccard to quantify the similarity between two sets, but our proposed techniques can be easily extended to other set-based similarity functions. Previous approaches require users to specify a threshold of similarity. However, in many scenarios it is rather difficult to specify such a threshold. For example, when users types some keywords in the search engine, they will pay more attention for the results which rank in the front, say the top five ones. In this case, if we use threshold-based search instead of KNN similarity search, it is difficult to find the results that are more attractive for users.\",\"PeriodicalId\":6709,\"journal\":{\"name\":\"2020 IEEE 36th International Conference on Data Engineering (ICDE)\",\"volume\":\"1 1\",\"pages\":\"2040-2041\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 36th International Conference on Data Engineering (ICDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE48307.2020.00239\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE48307.2020.00239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

集合相似度搜索是各种应用中的基本操作[3]，[5]，[2]。集相似度搜索问题的研究由来已久。给定一个集合记录、一个查询和一个相似函数的集合，该算法将返回与查询相似的所有集合记录。有许多指标可以衡量两个集合之间的相似性，如重叠、Jaccard、余弦和骰子。在本文中，我们使用广泛应用的Jaccard来量化两个集合之间的相似性，但我们提出的技术可以很容易地扩展到其他基于集合的相似性函数。以前的方法需要用户指定一个相似度的阈值。然而，在许多情况下，指定这样的阈值是相当困难的。例如，当用户在搜索引擎中输入一些关键词时，他们会更加关注排名靠前的结果，比如前五名。在这种情况下，如果我们使用基于阈值的搜索而不是KNN相似度搜索，很难找到对用户更有吸引力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Transformation-based Framework for KNN Set Similarity Search(Extended Abstract)

Set similarity search is a fundamental operation in a variety of applications [3] , [5] , [2] . There is a long stream of research on the problem of set similarity search. Given a collection of set records, a query and a similarity function, the algorithm will return all the set records that are similarity with the query. There are many metrics to measure the similarity between two sets, such as Overlap, Jaccard, Cosine and Dice. In this paper we use the widely applied Jaccard to quantify the similarity between two sets, but our proposed techniques can be easily extended to other set-based similarity functions. Previous approaches require users to specify a threshold of similarity. However, in many scenarios it is rather difficult to specify such a threshold. For example, when users types some keywords in the search engine, they will pay more attention for the results which rank in the front, say the top five ones. In this case, if we use threshold-based search instead of KNN similarity search, it is difficult to find the results that are more attractive for users.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 36th International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量

期刊最新文献

Turbocharging Geospatial Visualization Dashboards via a Materialized Sampling Cube Approach Mobility-Aware Dynamic Taxi Ridesharing Multiscale Frequent Co-movement Pattern Mining Automatic Calibration of Road Intersection Topology using Trajectories Turbine: Facebook’s Service Management Platform for Stream Processing