具有多样性的无参数、独立于域的相似度搜索

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2013-07-29 DOI:10.1145/2484838.2484854

Lúcio F. D. Santos, Willian D. Oliveira, Mônica Ribeiro Porto Ferreira, A. Traina, C. Traina

{"title":"具有多样性的无参数、独立于域的相似度搜索","authors":"Lúcio F. D. Santos, Willian D. Oliveira, Mônica Ribeiro Porto Ferreira, A. Traina, C. Traina","doi":"10.1145/2484838.2484854","DOIUrl":null,"url":null,"abstract":"New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the query center and to themselves, reducing the answer's utility. In this paper, we tackle the problem of providing diversity to similarity query results, and define techniques to assure that each element in the result set is different enough from the others. Existing techniques compel the user to define either a parameter to trade among similarity and diversity or a minimum similarity between result elements. Distinctly, our approach provides similarity queries with diversification using the influence concept, which automatically estimates the inherent diversity between the result set elements requiring no user-defined parameters. Furthermore, our technique can be applied over any data represented in a metric space, so it is both parameter and application-domain independent. The \"Better Results with Influence Diversification\" (BRID) technique is the basis to the k-Diverse Nearest Neighbor (BRIDk) and to the Range Diverse (BRIDr) algorithms, which execute k-nearest neighbor and range queries with diversification, showing that the technique can be applied to diversify any type of similarity queries. We also define a way to measure the diversification degree in a result set. Through a detailed experimental evaluation using our approach, we show that BRID outperforms the existing methods regarding both query diversification quality and execution times, being at least two orders of magnitude faster than the best existing approaches.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"44 1","pages":"5:1-5:12"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Parameter-free and domain-independent similarity search with diversity\",\"authors\":\"Lúcio F. D. Santos, Willian D. Oliveira, Mônica Ribeiro Porto Ferreira, A. Traina, C. Traina\",\"doi\":\"10.1145/2484838.2484854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the query center and to themselves, reducing the answer's utility. In this paper, we tackle the problem of providing diversity to similarity query results, and define techniques to assure that each element in the result set is different enough from the others. Existing techniques compel the user to define either a parameter to trade among similarity and diversity or a minimum similarity between result elements. Distinctly, our approach provides similarity queries with diversification using the influence concept, which automatically estimates the inherent diversity between the result set elements requiring no user-defined parameters. Furthermore, our technique can be applied over any data represented in a metric space, so it is both parameter and application-domain independent. The \\\"Better Results with Influence Diversification\\\" (BRID) technique is the basis to the k-Diverse Nearest Neighbor (BRIDk) and to the Range Diverse (BRIDr) algorithms, which execute k-nearest neighbor and range queries with diversification, showing that the technique can be applied to diversify any type of similarity queries. We also define a way to measure the diversification degree in a result set. Through a detailed experimental evaluation using our approach, we show that BRID outperforms the existing methods regarding both query diversification quality and execution times, being at least two orders of magnitude faster than the best existing approaches.\",\"PeriodicalId\":74773,\"journal\":{\"name\":\"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management\",\"volume\":\"44 1\",\"pages\":\"5:1-5:12\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2484838.2484854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484838.2484854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

对存储在数据库管理系统中的多媒体数据执行基于相似性查询的新操作符的需求日益增加。但是，在非常大的数据集中进行搜索时，基本运算符返回的元素通常与查询中心和其本身都非常相似，从而降低了答案的实用性。在本文中，我们解决了为相似查询结果提供多样性的问题，并定义了确保结果集中的每个元素与其他元素足够不同的技术。现有的技术迫使用户要么定义一个参数来在相似性和多样性之间进行交易，要么定义结果元素之间的最小相似性。显然，我们的方法使用影响概念提供了具有多样性的相似性查询，它自动估计结果集元素之间的内在多样性，不需要用户定义的参数。此外，我们的技术可以应用于度量空间中表示的任何数据，因此它既与参数无关，也与应用领域无关。“具有影响多样化的更好结果”(BRID)技术是k-多样化最近邻(BRIDk)和范围多样化(BRIDr)算法的基础，它们执行具有多样化的k-最近邻和范围查询，表明该技术可以应用于多样化任何类型的相似性查询。我们还定义了一种度量结果集多样化程度的方法。通过使用我们的方法进行详细的实验评估，我们表明BRID在查询多样化质量和执行时间方面优于现有方法，比现有最佳方法至少快两个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Parameter-free and domain-independent similarity search with diversity

New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the query center and to themselves, reducing the answer's utility. In this paper, we tackle the problem of providing diversity to similarity query results, and define techniques to assure that each element in the result set is different enough from the others. Existing techniques compel the user to define either a parameter to trade among similarity and diversity or a minimum similarity between result elements. Distinctly, our approach provides similarity queries with diversification using the influence concept, which automatically estimates the inherent diversity between the result set elements requiring no user-defined parameters. Furthermore, our technique can be applied over any data represented in a metric space, so it is both parameter and application-domain independent. The "Better Results with Influence Diversification" (BRID) technique is the basis to the k-Diverse Nearest Neighbor (BRIDk) and to the Range Diverse (BRIDr) algorithms, which execute k-nearest neighbor and range queries with diversification, showing that the technique can be applied to diversify any type of similarity queries. We also define a way to measure the diversification degree in a result set. Through a detailed experimental evaluation using our approach, we show that BRID outperforms the existing methods regarding both query diversification quality and execution times, being at least two orders of magnitude faster than the best existing approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量

期刊最新文献

Towards Co-Evolution of Data-Centric Ecosystems. Data perturbation for outlier detection ensembles SLACID - sparse linear algebra in a column-oriented in-memory database system SensorBench: benchmarking approaches to processing wireless sensor network data Efficient data management and statistics with zero-copy integration