{"title":"DivIDE: efficient diversification for interactive data exploration","authors":"Hina A. Khan, M. Sharaf, Abdullah M. Albarrak","doi":"10.1145/2618243.2618253","DOIUrl":null,"url":null,"abstract":"Today, Interactive Data Exploration (IDE) has become a main constituent of many discovery-oriented applications, in which users repeatedly submit exploratory queries to identify interesting subspaces in large data sets. Returning relevant yet diverse results to such queries provides users with quick insights into a rather large data space. Meanwhile, search results diversification adds additional cost to an already computationally expensive exploration process. To address this challenge, in this paper, we propose a novel diversification scheme called DivIDE, which targets the problem of efficiently diversifying the results of queries posed during data exploration sessions. In particular, our scheme exploits the properties of data diversification functions while leveraging the natural overlap occurring between the results of different queries so that to provide significant reductions in processing costs. Our extensive experimental evaluation on both synthetic and real data sets shows the significant benefits provided by our scheme as compared to existing methods.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"31 1","pages":"15:1-15:12"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2618243.2618253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28
Abstract
Today, Interactive Data Exploration (IDE) has become a main constituent of many discovery-oriented applications, in which users repeatedly submit exploratory queries to identify interesting subspaces in large data sets. Returning relevant yet diverse results to such queries provides users with quick insights into a rather large data space. Meanwhile, search results diversification adds additional cost to an already computationally expensive exploration process. To address this challenge, in this paper, we propose a novel diversification scheme called DivIDE, which targets the problem of efficiently diversifying the results of queries posed during data exploration sessions. In particular, our scheme exploits the properties of data diversification functions while leveraging the natural overlap occurring between the results of different queries so that to provide significant reductions in processing costs. Our extensive experimental evaluation on both synthetic and real data sets shows the significant benefits provided by our scheme as compared to existing methods.