首页 > 最新文献

Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web最新文献

英文 中文
CrowdLink: An Error-Tolerant Model for Linking Complex Records CrowdLink:链接复杂记录的容错模型
C. Zhang, Rui Meng, Lei Chen, Feida Zhu
Record linkage (RL) refers to the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, databases), which is a long-standing challenge in database management. Algorithmic approaches have been proposed to improve RL quality, but remain far from perfect. Crowdsourcing offers a more accurate but expensive (and slow) way to bring human insight into the process. In this paper, we propose a new probabilistic model, namely CrowdLink, to tackle the above limitations. In particular, our model gracefully handles the crowd error and the correlation among different pairs, as well as enables us to decompose the records into small pieces (i.e. attributes) so that crowdsourcing workers can easily verify. Further, we develop efficient and effective algorithms to select the most valuable questions, in order to reduce the monetary cost of crowdsourcing. We conducted extensive experiments on both synthetic and real-world datasets. The experimental results verified the effectiveness and the applicability of our model.
记录链接(Record linkage, RL)是指在数据集中查找跨不同数据源(如数据文件、图书、网站、数据库)引用同一实体的记录的任务,这是数据库管理中一个长期存在的挑战。算法方法已经提出,以提高强化学习的质量,但仍然远远不够完美。众包提供了一种更准确但昂贵(且缓慢)的方式,将人类的洞察力引入到过程中。在本文中,我们提出了一个新的概率模型,即CrowdLink,以解决上述限制。特别是,我们的模型优雅地处理了人群错误和不同对之间的相关性,并使我们能够将记录分解成小块(即属性),以便众包工作人员可以轻松验证。此外,我们开发了高效的算法来选择最有价值的问题,以减少众包的货币成本。我们在合成数据集和真实数据集上进行了广泛的实验。实验结果验证了该模型的有效性和适用性。
{"title":"CrowdLink: An Error-Tolerant Model for Linking Complex Records","authors":"C. Zhang, Rui Meng, Lei Chen, Feida Zhu","doi":"10.1145/2795218.2795222","DOIUrl":"https://doi.org/10.1145/2795218.2795222","url":null,"abstract":"Record linkage (RL) refers to the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, databases), which is a long-standing challenge in database management. Algorithmic approaches have been proposed to improve RL quality, but remain far from perfect. Crowdsourcing offers a more accurate but expensive (and slow) way to bring human insight into the process. In this paper, we propose a new probabilistic model, namely CrowdLink, to tackle the above limitations. In particular, our model gracefully handles the crowd error and the correlation among different pairs, as well as enables us to decompose the records into small pieces (i.e. attributes) so that crowdsourcing workers can easily verify. Further, we develop efficient and effective algorithms to select the most valuable questions, in order to reduce the monetary cost of crowdsourcing. We conducted extensive experiments on both synthetic and real-world datasets. The experimental results verified the effectiveness and the applicability of our model.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131719745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Preferential Diversity 优惠的多样性
Xiaoyu Ge, Panos K. Chrysanthis, Alexandros Labrinidis
The ever increasing supply of data is bringing a renewed attention to query personalization. Query personalization is a technique that utilizes user preferences with the goal of providing relevant results to the users. Along with preferences, diversity is another important aspect of query personalization especially useful during data exploration. The goal of result diversification is to reduce the amount of redundant information included in the results. Most previous approaches of result diversification focus solely on generating the most diverse results, which do not take user preferences into account. In this paper, we propose a novel framework called Preferential Diversity (PrefDiv) that aims to support both relevancy and diversity of user query results. PrefDiv utilizes user preference models that return ranked results and reduces the redundancy of results in an efficient and flexible way. PrefDiv maintains the balance between relevancy and diversity of the query results by providing users with the ability to control the trade-off between the two. We describe an implementation of PrefDiv on top of the HYPRE preference model, which allows users to specify both qualitative and quantitative preferences and unifies them using the concept of preference intensities. We experimentally evaluate its performance by comparing with state-of-the-art diversification techniques; our results indicate that PrefDiv achieves significantly better balance between diversity and relevance.
不断增加的数据供应重新引起了对查询个性化的关注。查询个性化是一种利用用户偏好的技术,目的是向用户提供相关的结果。除了首选项之外,多样性是查询个性化的另一个重要方面,在数据探索期间尤其有用。结果多样化的目标是减少结果中包含的冗余信息的数量。大多数先前的结果多样化方法只关注生成最多样化的结果,而不考虑用户偏好。在本文中,我们提出了一个新的框架,称为优先多样性(PrefDiv),旨在支持用户查询结果的相关性和多样性。PrefDiv利用用户偏好模型返回排序结果,并以高效灵活的方式减少结果冗余。PrefDiv通过向用户提供控制两者之间权衡的能力来维持查询结果的相关性和多样性之间的平衡。我们描述了在HYPRE偏好模型之上的PrefDiv的实现,它允许用户指定定性和定量偏好,并使用偏好强度的概念将它们统一起来。我们通过比较最先进的多样化技术来实验评估其性能;我们的研究结果表明,PrefDiv在多样性和相关性之间取得了更好的平衡。
{"title":"Preferential Diversity","authors":"Xiaoyu Ge, Panos K. Chrysanthis, Alexandros Labrinidis","doi":"10.1145/2795218.2795224","DOIUrl":"https://doi.org/10.1145/2795218.2795224","url":null,"abstract":"The ever increasing supply of data is bringing a renewed attention to query personalization. Query personalization is a technique that utilizes user preferences with the goal of providing relevant results to the users. Along with preferences, diversity is another important aspect of query personalization especially useful during data exploration. The goal of result diversification is to reduce the amount of redundant information included in the results. Most previous approaches of result diversification focus solely on generating the most diverse results, which do not take user preferences into account. In this paper, we propose a novel framework called Preferential Diversity (PrefDiv) that aims to support both relevancy and diversity of user query results. PrefDiv utilizes user preference models that return ranked results and reduces the redundancy of results in an efficient and flexible way. PrefDiv maintains the balance between relevancy and diversity of the query results by providing users with the ability to control the trade-off between the two. We describe an implementation of PrefDiv on top of the HYPRE preference model, which allows users to specify both qualitative and quantitative preferences and unifies them using the concept of preference intensities. We experimentally evaluate its performance by comparing with state-of-the-art diversification techniques; our results indicate that PrefDiv achieves significantly better balance between diversity and relevance.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128370458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Method of Complex Event Processing over XML Streams XML流上的复杂事件处理方法
Tatsuki Matsuda, Yuki Uchida, Satoru Fujita
This paper describes a query processing engine for multiple continuous XML data streams with correlated data as a notification mechanism for navigating data exploration. Stream processing, including formal models for stream filtering, union, activation, decomposition, and partition, is formulated in algebraic expressions. In addition, a query language, called QLMXS, over XML streams for complex event processing is described. QLMXS supports all functions of the algebraic expressions in a SQL-like form. QLMXS queries are converted into a visibly pushdown automaton (VPA) that analyzes complex event data from the XML streams. The VPA engine concurrently processes multiple XML data on multiple levels; therefore, it is very important to tune the performance of the engine. Four optimization methods are proposed to improve performance by utilizing VPA and XML features: VPA-state reduction, VPA unification, delayed evaluation, and elimination of unnecessary XML processing. Experimental results demonstrate that VPA unification increases the processing speed of the VPA engine 1.6 times, and the overall processing speed is increased 2.6 times.
本文描述了一种针对具有相关数据的多个连续XML数据流的查询处理引擎,作为导航数据探索的通知机制。流处理,包括流过滤、联合、激活、分解和划分的正式模型,用代数表达式表示。此外,还描述了一种用于复杂事件处理的XML流查询语言QLMXS。QLMXS以类似sql的形式支持代数表达式的所有功能。QLMXS查询被转换成一个可见的下推自动机(VPA),它分析来自XML流的复杂事件数据。VPA引擎在多个层次上并发处理多个XML数据;因此,调整发动机的性能是非常重要的。提出了利用VPA和XML特性提高性能的四种优化方法:VPA状态缩减、VPA统一、延迟求值和消除不必要的XML处理。实验结果表明,VPA统一后,VPA引擎的处理速度提高了1.6倍,整体处理速度提高了2.6倍。
{"title":"Method of Complex Event Processing over XML Streams","authors":"Tatsuki Matsuda, Yuki Uchida, Satoru Fujita","doi":"10.1145/2795218.2795220","DOIUrl":"https://doi.org/10.1145/2795218.2795220","url":null,"abstract":"This paper describes a query processing engine for multiple continuous XML data streams with correlated data as a notification mechanism for navigating data exploration. Stream processing, including formal models for stream filtering, union, activation, decomposition, and partition, is formulated in algebraic expressions. In addition, a query language, called QLMXS, over XML streams for complex event processing is described. QLMXS supports all functions of the algebraic expressions in a SQL-like form. QLMXS queries are converted into a visibly pushdown automaton (VPA) that analyzes complex event data from the XML streams. The VPA engine concurrently processes multiple XML data on multiple levels; therefore, it is very important to tune the performance of the engine. Four optimization methods are proposed to improve performance by utilizing VPA and XML features: VPA-state reduction, VPA unification, delayed evaluation, and elimination of unnecessary XML processing. Experimental results demonstrate that VPA unification increases the processing speed of the VPA engine 1.6 times, and the overall processing speed is increased 2.6 times.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128381011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principled Optimization Frameworks for Query Reformulation of Database Queries 数据库查询重构的原则优化框架
Gautam Das
Traditional databases have traditionally supported the Boolean retrieval model, where a query returns all tuples that match the selection conditions specified -- no more and no less. Such a query model is often inconvenient for naive users conducting searches that are often exploratory in nature, since the user may not have a complete idea, or a firm opinion of what she may be looking for. This is especially relevant in the context of the Deep Web, which offers a plethora of searchable data sources such as electronic products, transportation choices, apparel, investment options, etc. Users often encounter two types of problems: (a) they may under-specify the items of interest, and find too many items satisfying the given conditions (the many answers problem), or (b) they may over-specify the items of interest, and find no item in the source satisfying all the provided conditions (the empty answer problem). In this talk, I discuss our recent efforts in developing techniques for iterative "query reformulation" by which the system guides the user in a systematic way through several small steps, where each step suggests slight query modifications, until the query reaches a form that generates desirable answers. Our proposed approaches for suggesting query reformulations are driven by novel probabilistic frameworks based on optimizing a wide variety of application-dependent objective functions.
传统数据库传统上支持布尔检索模型,其中查询返回与指定的选择条件匹配的所有元组——不多也不少。对于进行探索性搜索的新手用户来说,这样的查询模型通常不方便,因为用户可能没有完整的想法,或者对自己要查找的内容没有明确的意见。这在深度网络的背景下尤为重要,深度网络提供了大量可搜索的数据源,如电子产品、交通选择、服装、投资选择等。用户通常会遇到两种类型的问题:(a)他们可能没有充分指定感兴趣的项目,并且找到太多满足给定条件的项目(多答案问题),或者(b)他们可能过度指定感兴趣的项目,并且在源中没有找到满足所有提供条件的项目(空答案问题)。在这次演讲中,我将讨论我们最近在开发迭代“查询重新表述”技术方面所做的努力,通过该技术,系统以系统的方式引导用户通过几个小步骤,其中每个步骤都建议轻微的查询修改,直到查询达到生成所需答案的形式。我们提出的建议查询重新表述的方法是由基于优化各种应用相关目标函数的新型概率框架驱动的。
{"title":"Principled Optimization Frameworks for Query Reformulation of Database Queries","authors":"Gautam Das","doi":"10.1145/2795218.2795227","DOIUrl":"https://doi.org/10.1145/2795218.2795227","url":null,"abstract":"Traditional databases have traditionally supported the Boolean retrieval model, where a query returns all tuples that match the selection conditions specified -- no more and no less. Such a query model is often inconvenient for naive users conducting searches that are often exploratory in nature, since the user may not have a complete idea, or a firm opinion of what she may be looking for. This is especially relevant in the context of the Deep Web, which offers a plethora of searchable data sources such as electronic products, transportation choices, apparel, investment options, etc. Users often encounter two types of problems: (a) they may under-specify the items of interest, and find too many items satisfying the given conditions (the many answers problem), or (b) they may over-specify the items of interest, and find no item in the source satisfying all the provided conditions (the empty answer problem). In this talk, I discuss our recent efforts in developing techniques for iterative \"query reformulation\" by which the system guides the user in a systematic way through several small steps, where each step suggests slight query modifications, until the query reaches a form that generates desirable answers. Our proposed approaches for suggesting query reformulations are driven by novel probabilistic frameworks based on optimizing a wide variety of application-dependent objective functions.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121684172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unifying Qualitative and Quantitative Database Preferences to Enhance Query Personalization 统一定性和定量数据库偏好,增强查询个性化
Roxana Gheorghiu, Alexandros Labrinidis, Panos K. Chrysanthis
Query personalization can be an effective technique in dealing with the data scalability challenge, primarily from the human point of view, i.e., making big data easier to use. In order to customize their query results, users need to express their preferences in a simple and user-friendly manner. In this paper, we present a graph-based theoretical framework and a prototype system that unify qualitative and quantitative preferences, while eliminating their disadvantages. Our integrated system allows for (1) the specification of database preferences and the creation of user preference profiles in a user-friendly manner, (2) the manipulation of preferences of individuals or groups of users and (3) total ordering of the tuples in the database, matching both qualitative and quantitative preferences, hence significantly increasing the number of tuples covered by the user preferences. We confirmed the latter experimentally by comparing our preference selection algorithm with Fagin's TA algorithm.
查询个性化可以是处理数据可伸缩性挑战的一种有效技术,主要是从人的角度来看,即使大数据更易于使用。为了定制查询结果,用户需要以简单和用户友好的方式表达他们的偏好。在本文中,我们提出了一个基于图的理论框架和原型系统,统一了定性和定量偏好,同时消除了它们的缺点。我们的集成系统允许(1)以用户友好的方式规范数据库偏好和创建用户偏好配置文件,(2)操纵个人或用户群体的偏好,以及(3)数据库中元组的总排序,匹配定性和定量偏好,从而显着增加用户偏好所涵盖的元组数量。我们通过实验将我们的偏好选择算法与Fagin的TA算法进行比较,证实了后者。
{"title":"Unifying Qualitative and Quantitative Database Preferences to Enhance Query Personalization","authors":"Roxana Gheorghiu, Alexandros Labrinidis, Panos K. Chrysanthis","doi":"10.1145/2795218.2795223","DOIUrl":"https://doi.org/10.1145/2795218.2795223","url":null,"abstract":"Query personalization can be an effective technique in dealing with the data scalability challenge, primarily from the human point of view, i.e., making big data easier to use. In order to customize their query results, users need to express their preferences in a simple and user-friendly manner. In this paper, we present a graph-based theoretical framework and a prototype system that unify qualitative and quantitative preferences, while eliminating their disadvantages. Our integrated system allows for (1) the specification of database preferences and the creation of user preference profiles in a user-friendly manner, (2) the manipulation of preferences of individuals or groups of users and (3) total ordering of the tuples in the database, matching both qualitative and quantitative preferences, hence significantly increasing the number of tuples covered by the user preferences. We confirmed the latter experimentally by comparing our preference selection algorithm with Fagin's TA algorithm.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133374309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Data Like This: Ranked Search of Genomic Data Vision Paper 像这样的数据:基因组数据视觉论文的排名搜索
V. M. Megler, D. Maier, D. Bottomly, Libbey White, S. McWeeney, B. Wilmot
High-throughput genetic sequencing produces the ultimate "big data": a human genome sequence contains more than 3B base pairs, and more and more characteristics, or annotations, are being recorded at the base-pair level. Locating areas of interest within the genome is a challenge for researchers, limiting their investigations. We describe our vision of adapting "big data" ranked search to the problem of searching the genome. Our goal is to make searching for data as easy for scientists as searching the Internet.
高通量基因测序产生了最终的“大数据”:人类基因组序列包含超过3B个碱基对,并且越来越多的特征或注释被记录在碱基对水平上。在基因组中定位感兴趣的区域对研究人员来说是一个挑战,限制了他们的研究。我们描述了将“大数据”排序搜索应用于基因组搜索问题的愿景。我们的目标是使搜索数据对科学家来说像搜索互联网一样容易。
{"title":"Data Like This: Ranked Search of Genomic Data Vision Paper","authors":"V. M. Megler, D. Maier, D. Bottomly, Libbey White, S. McWeeney, B. Wilmot","doi":"10.1145/2795218.2795221","DOIUrl":"https://doi.org/10.1145/2795218.2795221","url":null,"abstract":"High-throughput genetic sequencing produces the ultimate \"big data\": a human genome sequence contains more than 3B base pairs, and more and more characteristics, or annotations, are being recorded at the base-pair level. Locating areas of interest within the genome is a challenge for researchers, limiting their investigations. We describe our vision of adapting \"big data\" ranked search to the problem of searching the genome. Our goal is to make searching for data as easy for scientists as searching the Internet.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115954252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Explore-By-Example: A New Database Service for Interactive Data Exploration 范例探索:交互式数据探索的新数据库服务
Y. Diao
Traditional DBMSs are suited for applications in which the structure, meaning and contents of the database, as well as the questions (queries) to be asked, are all well-understood. However, this is no longer true when the volume and diversity of data grow at an unprecedented rate, while the user ability to comprehend data remains (as limited) as before. To address the increasing disparity in the "big data - same humans" problem, our project explores a new approach of system-aided exploration of a big data space and automatic learning of the user interest in order to retrieve all objects that match the user interest -- we call this new service "interactive data exploration", which complements the traditional querying interface of a database system. In this talk, I introduce a new framework for interactive data exploration, called "Explore-by-Example", which iteratively seeks user relevance feedback on database samples and uses such feedback to finally predict a query that retrieves all objects of interest to the user. The goal is to make such exploration converge fast to the true user interest model, while minimizing the user labeling effort and providing interactive performance in each iteration. I discuss a range of techniques and optimizations to do so for linear patterns and complex non-linear patterns. Our user study indicates that our approach can significantly reduce the user effort and the total exploration time, compared with the common practice of manual exploration. I finally conclude the talk by pointing out a host of new challenges, ranging from application of active learning theory, to database optimizations, to visualization.
传统的dbms适用于这样的应用程序:数据库的结构、含义和内容以及要问的问题(查询)都很容易理解。然而,当数据的数量和多样性以前所未有的速度增长,而用户理解数据的能力仍然(和以前一样有限)时,情况就不再是这样了。为了解决“大数据-同样的人”问题中日益增长的差异,我们的项目探索了一种系统辅助大数据空间探索和用户兴趣自动学习的新方法,以便检索与用户兴趣匹配的所有对象——我们称之为“交互式数据探索”的新服务,它补充了数据库系统的传统查询界面。在这次演讲中,我将介绍一个交互式数据探索的新框架,称为“按例探索”,它迭代地寻找数据库样本上的用户相关反馈,并使用这些反馈来最终预测检索用户感兴趣的所有对象的查询。目标是使这种探索快速收敛到真正的用户兴趣模型,同时最小化用户标记工作并在每次迭代中提供交互性能。我讨论了一系列用于线性模式和复杂非线性模式的技术和优化。我们的用户研究表明,我们的方法可以显著减少用户的努力和总探索时间,与人工探索的常见做法相比。最后,我指出了一系列新的挑战,从主动学习理论的应用,到数据库优化,再到可视化。
{"title":"Explore-By-Example: A New Database Service for Interactive Data Exploration","authors":"Y. Diao","doi":"10.1145/2795218.2795226","DOIUrl":"https://doi.org/10.1145/2795218.2795226","url":null,"abstract":"Traditional DBMSs are suited for applications in which the structure, meaning and contents of the database, as well as the questions (queries) to be asked, are all well-understood. However, this is no longer true when the volume and diversity of data grow at an unprecedented rate, while the user ability to comprehend data remains (as limited) as before. To address the increasing disparity in the \"big data - same humans\" problem, our project explores a new approach of system-aided exploration of a big data space and automatic learning of the user interest in order to retrieve all objects that match the user interest -- we call this new service \"interactive data exploration\", which complements the traditional querying interface of a database system. In this talk, I introduce a new framework for interactive data exploration, called \"Explore-by-Example\", which iteratively seeks user relevance feedback on database samples and uses such feedback to finally predict a query that retrieves all objects of interest to the user. The goal is to make such exploration converge fast to the true user interest model, while minimizing the user labeling effort and providing interactive performance in each iteration. I discuss a range of techniques and optimizations to do so for linear patterns and complex non-linear patterns. Our user study indicates that our approach can significantly reduce the user effort and the total exploration time, compared with the common practice of manual exploration. I finally conclude the talk by pointing out a host of new challenges, ranging from application of active learning theory, to database optimizations, to visualization.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128971382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diversifying with Few Regrets, But too Few to Mention 没有遗憾的多样化,但太少了
Zaeem Hussain, Hina A. Khan, M. Sharaf
Representative data provide users with a concise overview of their potentially large query results. Recently, diversity maximization has been adopted as one technique to generate representative data with high coverage and low redundancy. Orthogonally, regret minimization has emerged as another technique to generate representative data with high utility that satisfy the user's preference. In reality, however, users typically have some pre-specified preferences over some dimensions of the data, while expecting good coverage over the other dimensions. Motivated by that need, in this work we propose a novel scheme called ReDi, which aims to generate representative data that balance the tradeoff between regret minimization and diversity maximization. ReDi is based on a hybrid objective function that combines both regret and diversity. Additionally, it employs several algorithms that are designed to maximize that objective function. We perform extensive experimental evaluation to measure the tradeoff between the effectiveness and efficiency provided by the different ReDi algorithms.
代表性数据为用户提供了对其潜在的大型查询结果的简要概述。近年来,分集最大化被作为一种生成高覆盖、低冗余的代表性数据的技术。正交,遗憾最小化已经成为另一种技术,以产生具有高效用的代表性数据,满足用户的偏好。然而,在现实中,用户通常对数据的某些维度有一些预先指定的偏好,同时期望对其他维度有良好的覆盖。在这种需求的激励下,我们提出了一种名为ReDi的新方案,旨在生成具有代表性的数据,平衡遗憾最小化和多样性最大化之间的权衡。ReDi是基于一个混合目标函数,结合了遗憾和多样性。此外,它还采用了一些算法来最大化目标函数。我们进行了广泛的实验评估,以衡量不同的ReDi算法提供的有效性和效率之间的权衡。
{"title":"Diversifying with Few Regrets, But too Few to Mention","authors":"Zaeem Hussain, Hina A. Khan, M. Sharaf","doi":"10.1145/2795218.2795225","DOIUrl":"https://doi.org/10.1145/2795218.2795225","url":null,"abstract":"Representative data provide users with a concise overview of their potentially large query results. Recently, diversity maximization has been adopted as one technique to generate representative data with high coverage and low redundancy. Orthogonally, regret minimization has emerged as another technique to generate representative data with high utility that satisfy the user's preference. In reality, however, users typically have some pre-specified preferences over some dimensions of the data, while expecting good coverage over the other dimensions. Motivated by that need, in this work we propose a novel scheme called ReDi, which aims to generate representative data that balance the tradeoff between regret minimization and diversity maximization. ReDi is based on a hybrid objective function that combines both regret and diversity. Additionally, it employs several algorithms that are designed to maximize that objective function. We perform extensive experimental evaluation to measure the tradeoff between the effectiveness and efficiency provided by the different ReDi algorithms.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121237520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web 第二届数据库和网络探索性搜索国际研讨会论文集
{"title":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","authors":"","doi":"10.1145/2795218","DOIUrl":"https://doi.org/10.1145/2795218","url":null,"abstract":"","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116468460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1