Lúcio F. D. Santos, Willian D. Oliveira, Mônica Ribeiro Porto Ferreira, A. Traina, C. Traina
{"title":"具有多样性的无参数、独立于域的相似度搜索","authors":"Lúcio F. D. Santos, Willian D. Oliveira, Mônica Ribeiro Porto Ferreira, A. Traina, C. Traina","doi":"10.1145/2484838.2484854","DOIUrl":null,"url":null,"abstract":"New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the query center and to themselves, reducing the answer's utility. In this paper, we tackle the problem of providing diversity to similarity query results, and define techniques to assure that each element in the result set is different enough from the others. Existing techniques compel the user to define either a parameter to trade among similarity and diversity or a minimum similarity between result elements. Distinctly, our approach provides similarity queries with diversification using the influence concept, which automatically estimates the inherent diversity between the result set elements requiring no user-defined parameters. Furthermore, our technique can be applied over any data represented in a metric space, so it is both parameter and application-domain independent. The \"Better Results with Influence Diversification\" (BRID) technique is the basis to the k-Diverse Nearest Neighbor (BRIDk) and to the Range Diverse (BRIDr) algorithms, which execute k-nearest neighbor and range queries with diversification, showing that the technique can be applied to diversify any type of similarity queries. We also define a way to measure the diversification degree in a result set. Through a detailed experimental evaluation using our approach, we show that BRID outperforms the existing methods regarding both query diversification quality and execution times, being at least two orders of magnitude faster than the best existing approaches.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"44 1","pages":"5:1-5:12"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Parameter-free and domain-independent similarity search with diversity\",\"authors\":\"Lúcio F. D. Santos, Willian D. Oliveira, Mônica Ribeiro Porto Ferreira, A. Traina, C. Traina\",\"doi\":\"10.1145/2484838.2484854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the query center and to themselves, reducing the answer's utility. In this paper, we tackle the problem of providing diversity to similarity query results, and define techniques to assure that each element in the result set is different enough from the others. Existing techniques compel the user to define either a parameter to trade among similarity and diversity or a minimum similarity between result elements. Distinctly, our approach provides similarity queries with diversification using the influence concept, which automatically estimates the inherent diversity between the result set elements requiring no user-defined parameters. Furthermore, our technique can be applied over any data represented in a metric space, so it is both parameter and application-domain independent. The \\\"Better Results with Influence Diversification\\\" (BRID) technique is the basis to the k-Diverse Nearest Neighbor (BRIDk) and to the Range Diverse (BRIDr) algorithms, which execute k-nearest neighbor and range queries with diversification, showing that the technique can be applied to diversify any type of similarity queries. We also define a way to measure the diversification degree in a result set. Through a detailed experimental evaluation using our approach, we show that BRID outperforms the existing methods regarding both query diversification quality and execution times, being at least two orders of magnitude faster than the best existing approaches.\",\"PeriodicalId\":74773,\"journal\":{\"name\":\"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management\",\"volume\":\"44 1\",\"pages\":\"5:1-5:12\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2484838.2484854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484838.2484854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parameter-free and domain-independent similarity search with diversity
New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the query center and to themselves, reducing the answer's utility. In this paper, we tackle the problem of providing diversity to similarity query results, and define techniques to assure that each element in the result set is different enough from the others. Existing techniques compel the user to define either a parameter to trade among similarity and diversity or a minimum similarity between result elements. Distinctly, our approach provides similarity queries with diversification using the influence concept, which automatically estimates the inherent diversity between the result set elements requiring no user-defined parameters. Furthermore, our technique can be applied over any data represented in a metric space, so it is both parameter and application-domain independent. The "Better Results with Influence Diversification" (BRID) technique is the basis to the k-Diverse Nearest Neighbor (BRIDk) and to the Range Diverse (BRIDr) algorithms, which execute k-nearest neighbor and range queries with diversification, showing that the technique can be applied to diversify any type of similarity queries. We also define a way to measure the diversification degree in a result set. Through a detailed experimental evaluation using our approach, we show that BRID outperforms the existing methods regarding both query diversification quality and execution times, being at least two orders of magnitude faster than the best existing approaches.