首页 > 最新文献

Proceedings of the 2018 International Conference on Management of Data最新文献

英文 中文
Modern Recommender Systems: from Computing Matrices to Thinking with Neurons 现代推荐系统:从计算矩阵到神经元思考
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3197389
G. Koutrika
Starting with the Netflix Prize, which fueled much recent progress in the field of collaborative filtering, recent years have witnessed rapid development of new recommendation algorithms and increasingly more complex systems, which greatly differ from their early content-based and collaborative filtering systems. Modern recommender systems leverage several novel algorithmic approaches: from matrix factorization methods and multi-armed bandits to deep neural networks. In this tutorial, we will cover recent algorithmic advances in recommender systems, highlight their capabilities, and their impact. We will give many examples of industrial-scale recommender systems that define the future of the recommender systems area. We will discuss related evaluation issues, and outline future research directions. The ultimate goal of the tutorial is to encourage the application of novel recommendation approaches to solve problems that go beyond user consumption and to further promote research in the intersection of recommender systems and databases.
从Netflix奖开始,它推动了协作过滤领域的最新进展,近年来见证了新的推荐算法和越来越复杂的系统的快速发展,这些系统与早期基于内容和协作过滤系统有很大不同。现代推荐系统利用了几种新颖的算法方法:从矩阵分解方法和多臂强盗到深度神经网络。在本教程中,我们将介绍推荐系统中最新的算法进展,重点介绍它们的功能及其影响。我们将给出许多工业规模推荐系统的例子,这些例子定义了推荐系统领域的未来。我们将讨论相关的评估问题,并概述未来的研究方向。本教程的最终目标是鼓励应用新颖的推荐方法来解决超越用户消费的问题,并进一步促进推荐系统和数据库交叉领域的研究。
{"title":"Modern Recommender Systems: from Computing Matrices to Thinking with Neurons","authors":"G. Koutrika","doi":"10.1145/3183713.3197389","DOIUrl":"https://doi.org/10.1145/3183713.3197389","url":null,"abstract":"Starting with the Netflix Prize, which fueled much recent progress in the field of collaborative filtering, recent years have witnessed rapid development of new recommendation algorithms and increasingly more complex systems, which greatly differ from their early content-based and collaborative filtering systems. Modern recommender systems leverage several novel algorithmic approaches: from matrix factorization methods and multi-armed bandits to deep neural networks. In this tutorial, we will cover recent algorithmic advances in recommender systems, highlight their capabilities, and their impact. We will give many examples of industrial-scale recommender systems that define the future of the recommender systems area. We will discuss related evaluation issues, and outline future research directions. The ultimate goal of the tutorial is to encourage the application of novel recommendation approaches to solve problems that go beyond user consumption and to further promote research in the intersection of recommender systems and databases.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75172031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Session details: Research 1: Data Integration & Cleaning 会议详情:研究1:数据集成与清洗
E. Rahm
{"title":"Session details: Research 1: Data Integration & Cleaning","authors":"E. Rahm","doi":"10.1145/3258004","DOIUrl":"https://doi.org/10.1145/3258004","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75288817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSD as SQLite Engine SSD作为SQLite引擎
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183720
Soyee Choi
As a proof-of-concept for the vision “SSD as SQL Engine” (SaS in short), we demonstrate that SQLite [4], a popular mobile database engine, in its entirety can run inside a real SSD development platform. By turning storage device into database engine, SaS allows applications to directly interact with full SQL database server running inside storage device. In SaS, the SQL language itself, not the traditional dummy block interface, will be provided as new interface between applications and storage device. In addition, since SaS plays the role of the uni ed platform of database computing node and storage node, the host and the storage need not be segregated any more as separate physical computing components.
作为“SSD作为SQL引擎”(简称SaS)愿景的概念验证,我们证明SQLite[4],一个流行的移动数据库引擎,整体上可以运行在一个真正的SSD开发平台中。通过将存储设备转换为数据库引擎,应用程序可以直接与存储设备内运行的完整SQL数据库服务器进行交互。在sa中,SQL语言本身,而不是传统的虚拟块接口,将作为应用程序和存储设备之间的新接口提供。另外,由于sa作为数据库计算节点和存储节点的统一平台,主机和存储不再需要作为独立的物理计算组件进行隔离。
{"title":"SSD as SQLite Engine","authors":"Soyee Choi","doi":"10.1145/3183713.3183720","DOIUrl":"https://doi.org/10.1145/3183713.3183720","url":null,"abstract":"As a proof-of-concept for the vision “SSD as SQL Engine” (SaS in short), we demonstrate that SQLite [4], a popular mobile database engine, in its entirety can run inside a real SSD development platform. By turning storage device into database engine, SaS allows applications to directly interact with full SQL database server running inside storage device. In SaS, the SQL language itself, not the traditional dummy block interface, will be provided as new interface between applications and storage device. In addition, since SaS plays the role of the uni ed platform of database computing node and storage node, the host and the storage need not be segregated any more as separate physical computing components.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73576387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Carousel: Low-Latency Transaction Processing for Globally-Distributed Data Carousel:全球分布式数据的低延迟事务处理
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196912
Xinan Yan, Linguan Yang, Hongbo Zhang, X. Lin, B. Wong, K. Salem, Tim Brecht
The trend towards global applications and services has created an increasing demand for transaction processing on globally-distributed data. Many database systems, such as Spanner and CockroachDB, support distributed transactions but require a large number of wide-area network roundtrips to commit each transaction and ensure the transaction's state is durably replicated across multiple datacenters. This can significantly increase transaction completion time, resulting in developers replacing database-level transactions with their own error-prone application-level solutions. This paper introduces Carousel, a distributed database system that provides low-latency transaction processing for multi-partition globally-distributed transactions. Carousel shortens transaction processing time by reducing the number of sequential wide-area network round trips required to commit a transaction and replicate its results while maintaining serializability. This is possible in part by using information about a transaction's potential write set to enable transaction processing, including any necessary remote read operations, to overlap with 2PC and state replication. Carousel further reduces transaction completion time by introducing a consensus protocol that can perform state replication in parallel with 2PC. For a multi-partition 2-round Fixed-set Interactive (2FI) transaction, Carousel requires at most two wide-area network roundtrips to commit the transaction when there are no failures, and only one round trip in the common case if local replicas are available.
全球应用程序和服务的趋势已经对全球分布式数据的事务处理产生了越来越大的需求。许多数据库系统(如Spanner和CockroachDB)支持分布式事务,但需要大量的广域网往返来提交每个事务,并确保事务的状态在多个数据中心之间持久地复制。这可以显著增加事务完成时间,导致开发人员用他们自己的容易出错的应用程序级解决方案替换数据库级事务。介绍了一种为多分区全局分布式事务提供低延迟事务处理的分布式数据库系统Carousel。Carousel减少了提交事务和复制其结果所需的连续广域网往返次数,同时保持了可序列化性,从而缩短了事务处理时间。这在一定程度上可以通过使用有关事务的潜在写集的信息来启用事务处理(包括任何必要的远程读操作)与2PC和状态复制重叠来实现。Carousel通过引入可以与2PC并行执行状态复制的共识协议,进一步缩短了事务完成时间。对于多分区的2轮固定集交互式(2FI)事务,在没有故障的情况下,Carousel最多需要两次广域网往返来提交事务,而在通常情况下,如果本地副本可用,则只需要一次往返。
{"title":"Carousel: Low-Latency Transaction Processing for Globally-Distributed Data","authors":"Xinan Yan, Linguan Yang, Hongbo Zhang, X. Lin, B. Wong, K. Salem, Tim Brecht","doi":"10.1145/3183713.3196912","DOIUrl":"https://doi.org/10.1145/3183713.3196912","url":null,"abstract":"The trend towards global applications and services has created an increasing demand for transaction processing on globally-distributed data. Many database systems, such as Spanner and CockroachDB, support distributed transactions but require a large number of wide-area network roundtrips to commit each transaction and ensure the transaction's state is durably replicated across multiple datacenters. This can significantly increase transaction completion time, resulting in developers replacing database-level transactions with their own error-prone application-level solutions. This paper introduces Carousel, a distributed database system that provides low-latency transaction processing for multi-partition globally-distributed transactions. Carousel shortens transaction processing time by reducing the number of sequential wide-area network round trips required to commit a transaction and replicate its results while maintaining serializability. This is possible in part by using information about a transaction's potential write set to enable transaction processing, including any necessary remote read operations, to overlap with 2PC and state replication. Carousel further reduces transaction completion time by introducing a consensus protocol that can perform state replication in parallel with 2PC. For a multi-partition 2-round Fixed-set Interactive (2FI) transaction, Carousel requires at most two wide-area network roundtrips to commit the transaction when there are no failures, and only one round trip in the common case if local replicas are available.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84234232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes Amazon Aurora:避免I/ o、提交和成员变更的分布式共识
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196937
Alexandre Verbitski, Anurag Gupta, D. Saha, James Corey, K. Gupta, Murali Brahmadesam, Raman Mittal, S. Krishnamurthy, Sandor Maurice, T. Kharatishvili, Xiaofeng Bao
Amazon Aurora is a high-throughput cloud-native relational database offered as part of Amazon Web Services (AWS). One of the more novel differences between Aurora and other relational databases is how it pushes redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. Doing so reduces networking traffic, avoids checkpoints and crash recovery, enables failovers to replicas without loss of data, and enables fault-tolerant storage that heals without database involvement. Traditional implementations that leverage distributed storage would use distributed consensus algorithms for commits, reads, replication, and membership changes and amplify cost of underlying storage. In this paper, we describe how Aurora avoids distributed consensus under most circumstances by establishing invariants and leveraging local transient state. Doing so improves performance, reduces variability, and lowers costs.
Amazon Aurora是作为Amazon Web Services (AWS)的一部分提供的高吞吐量云原生关系数据库。Aurora与其他关系型数据库之间的一个比较新颖的区别是,它如何将重做处理推送到一个多租户的横向扩展存储服务上,这是专门为Aurora构建的。这样做可以减少网络流量,避免检查点和崩溃恢复,支持故障转移到副本而不会丢失数据,并支持容错存储,无需数据库参与即可进行修复。利用分布式存储的传统实现将使用分布式共识算法进行提交、读取、复制和成员变更,并增加底层存储的成本。在本文中,我们描述了Aurora如何在大多数情况下通过建立不变量和利用局部瞬态来避免分布式共识。这样做可以提高性能、减少可变性并降低成本。
{"title":"Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes","authors":"Alexandre Verbitski, Anurag Gupta, D. Saha, James Corey, K. Gupta, Murali Brahmadesam, Raman Mittal, S. Krishnamurthy, Sandor Maurice, T. Kharatishvili, Xiaofeng Bao","doi":"10.1145/3183713.3196937","DOIUrl":"https://doi.org/10.1145/3183713.3196937","url":null,"abstract":"Amazon Aurora is a high-throughput cloud-native relational database offered as part of Amazon Web Services (AWS). One of the more novel differences between Aurora and other relational databases is how it pushes redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. Doing so reduces networking traffic, avoids checkpoints and crash recovery, enables failovers to replicas without loss of data, and enables fault-tolerant storage that heals without database involvement. Traditional implementations that leverage distributed storage would use distributed consensus algorithms for commits, reads, replication, and membership changes and amplify cost of underlying storage. In this paper, we describe how Aurora avoids distributed consensus under most circumstances by establishing invariants and leveraging local transient state. Doing so improves performance, reduces variability, and lowers costs.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85726682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Session details: Research 7: Tuning, Monitoring & Query Optimization 研究7:调优,监控和查询优化
Sudipto Das
{"title":"Session details: Research 7: Tuning, Monitoring & Query Optimization","authors":"Sudipto Das","doi":"10.1145/3258013","DOIUrl":"https://doi.org/10.1145/3258013","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84587873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPUI: Experimental Platform for Urban Informatics 城市信息学实验平台
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193560
Xiaoyu Ge, Panos K. Chrysanthis, K. Pelechrinis, D. Zeinalipour-Yazti
Recent studies in urban navigation have revealed new demands (e.g., diversity, safety, happiness, serendipity) for the navigation services that are critical to providing useful recommendations to travelers. This exposes the need to design next-generation navigation services that accommodate these newly emerging aspects. In this paper, we present a prototype system, namely, EPUI (an Experimental Platform of Urban Informatics), which provides a testbed for exploring and evaluating venues and route recommendation solutions that balance between different objectives (i.e., demands) including the newly discovered ones. In addition, EPUI incorporates a modularized design, enabling researchers to upload their own algorithms and compare them to well-known algorithms using different performance metrics. Its user interface makes it easily usable by both end-user and experienced researchers.
最近对城市导航的研究揭示了对导航服务的新需求(例如,多样性、安全性、幸福感、偶然性),这些服务对于向旅行者提供有用的建议至关重要。这就需要设计下一代导航服务,以适应这些新出现的方面。在本文中,我们提出了一个原型系统EPUI(城市信息学实验平台),它为探索和评估平衡不同目标(即需求)的场地和路线推荐方案提供了一个测试平台,包括新发现的目标。此外,EPUI采用模块化设计,使研究人员能够上传自己的算法,并将其与使用不同性能指标的知名算法进行比较。它的用户界面使最终用户和经验丰富的研究人员都可以轻松使用。
{"title":"EPUI: Experimental Platform for Urban Informatics","authors":"Xiaoyu Ge, Panos K. Chrysanthis, K. Pelechrinis, D. Zeinalipour-Yazti","doi":"10.1145/3183713.3193560","DOIUrl":"https://doi.org/10.1145/3183713.3193560","url":null,"abstract":"Recent studies in urban navigation have revealed new demands (e.g., diversity, safety, happiness, serendipity) for the navigation services that are critical to providing useful recommendations to travelers. This exposes the need to design next-generation navigation services that accommodate these newly emerging aspects. In this paper, we present a prototype system, namely, EPUI (an Experimental Platform of Urban Informatics), which provides a testbed for exploring and evaluating venues and route recommendation solutions that balance between different objectives (i.e., demands) including the newly discovered ones. In addition, EPUI incorporates a modularized design, enabling researchers to upload their own algorithms and compare them to well-known algorithms using different performance metrics. Its user interface makes it easily usable by both end-user and experienced researchers.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81479454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SQuID: Semantic Similarity-Aware Query Intent Discovery SQuID:语义相似感知查询意图发现
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193548
Anna Fariha, Sheikh Muhammad Sarwar, A. Meliou
Recent expansion of database technology demands a convenient framework for non-expert users to explore datasets. Several approaches exist to assist these non-expert users where they can express their query intent by providing example tuples for their intended query output. However, these approaches treat the structural similarity among the example tuples as the only factor specifying query intent and ignore the richer context present in the data. In this demo, we present SQuID, a system for Semantic similarity-aware Query Intent Discovery. SQuID takes a few example tuples from the user as input, through a simple interface, and consults the database to discover deeper associations among these examples. These data-driven associations reveal the semantic context of the provided examples, allowing SQuID to infer the user's intended query precisely and effectively. SQuID further explains its inference, by displaying the discovered semantic context to the user, who can then provide feedback and tune the result. We demonstrate how SQuID can capture even esoteric and complex semantic contexts, alleviating the need for constructing complex SQL queries, while not requiring the user to have any schema or query language knowledge.
最近数据库技术的扩展需要一个方便的框架供非专业用户探索数据集。有几种方法可以帮助这些非专业用户通过为他们预期的查询输出提供示例元组来表达他们的查询意图。然而,这些方法将示例元组之间的结构相似性视为指定查询意图的唯一因素,而忽略了数据中存在的更丰富的上下文。在这个演示中,我们展示了SQuID,一个语义相似感知的查询意图发现系统。SQuID通过一个简单的界面从用户那里获取一些示例元组作为输入,并查询数据库以发现这些示例之间更深层次的关联。这些数据驱动的关联揭示了所提供示例的语义上下文,从而允许SQuID精确而有效地推断用户想要查询的内容。SQuID通过向用户显示发现的语义上下文来进一步解释它的推理,然后用户可以提供反馈并调整结果。我们演示了SQuID如何捕获甚至深奥和复杂的语义上下文,从而减轻了构建复杂SQL查询的需要,同时不要求用户具有任何模式或查询语言知识。
{"title":"SQuID: Semantic Similarity-Aware Query Intent Discovery","authors":"Anna Fariha, Sheikh Muhammad Sarwar, A. Meliou","doi":"10.1145/3183713.3193548","DOIUrl":"https://doi.org/10.1145/3183713.3193548","url":null,"abstract":"Recent expansion of database technology demands a convenient framework for non-expert users to explore datasets. Several approaches exist to assist these non-expert users where they can express their query intent by providing example tuples for their intended query output. However, these approaches treat the structural similarity among the example tuples as the only factor specifying query intent and ignore the richer context present in the data. In this demo, we present SQuID, a system for Semantic similarity-aware Query Intent Discovery. SQuID takes a few example tuples from the user as input, through a simple interface, and consults the database to discover deeper associations among these examples. These data-driven associations reveal the semantic context of the provided examples, allowing SQuID to infer the user's intended query precisely and effectively. SQuID further explains its inference, by displaying the discovered semantic context to the user, who can then provide feedback and tune the result. We demonstrate how SQuID can capture even esoteric and complex semantic contexts, alleviating the need for constructing complex SQL queries, while not requiring the user to have any schema or query language knowledge.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78954987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Maverick: Discovering Exceptional Facts from Knowledge Graphs 特立独行:从知识图谱中发现特殊事实
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183730
Gensheng Zhang, Damian Jimenez, Chengkai Li
We present Maverick, a general, extensible framework that discovers exceptional facts about entities in knowledge graphs. To the best of our knowledge, there was no previous study of the problem. We model an exceptional fact about an entity of interest as a context-subspace pair, in which a subspace is a set of attributes and a context is defined by a graph query pattern of which the entity is a match. The entity is exceptional among the entities in the context, with regard to the subspace. The search spaces of both patterns and subspaces are exponentially large. Maverick conducts beam search on the patterns which uses a match-based pattern construction method to evade the evaluation of invalid patterns. It applies two heuristics to select promising patterns to form the beam in each iteration. Maverick traverses and prunes the subspaces organized as a set enumeration tree by exploiting the upper bound properties of exceptionality scoring functions. Results of experiments and user studies using real-world datasets demonstrated substantial performance improvement of the proposed framework over the baselines as well as its effectiveness in discovering exceptional facts.
我们提出了Maverick,一个通用的、可扩展的框架,用于发现知识图中实体的特殊事实。据我们所知,以前没有人研究过这个问题。我们将有关感兴趣实体的异常事实建模为上下文-子空间对,其中子空间是一组属性,上下文由实体匹配的图查询模式定义。就子空间而言,该实体在上下文中的实体中是例外的。模式和子空间的搜索空间都是指数级大的。Maverick对模式进行波束搜索,采用基于匹配的模式构建方法,避免对无效模式进行评估。它在每次迭代中采用两种启发式方法选择有希望的模式来形成波束。Maverick通过利用异常评分函数的上界属性,遍历并修剪作为集合枚举树组织的子空间。使用真实世界数据集的实验和用户研究结果表明,所提出的框架在基线上的性能有了实质性的提高,并且在发现异常事实方面具有有效性。
{"title":"Maverick: Discovering Exceptional Facts from Knowledge Graphs","authors":"Gensheng Zhang, Damian Jimenez, Chengkai Li","doi":"10.1145/3183713.3183730","DOIUrl":"https://doi.org/10.1145/3183713.3183730","url":null,"abstract":"We present Maverick, a general, extensible framework that discovers exceptional facts about entities in knowledge graphs. To the best of our knowledge, there was no previous study of the problem. We model an exceptional fact about an entity of interest as a context-subspace pair, in which a subspace is a set of attributes and a context is defined by a graph query pattern of which the entity is a match. The entity is exceptional among the entities in the context, with regard to the subspace. The search spaces of both patterns and subspaces are exponentially large. Maverick conducts beam search on the patterns which uses a match-based pattern construction method to evade the evaluation of invalid patterns. It applies two heuristics to select promising patterns to form the beam in each iteration. Maverick traverses and prunes the subspaces organized as a set enumeration tree by exploiting the upper bound properties of exceptionality scoring functions. Results of experiments and user studies using real-world datasets demonstrated substantial performance improvement of the proposed framework over the baselines as well as its effectiveness in discovering exceptional facts.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76390576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Random Sampling over Joins Revisited 重新访问连接上的随机抽样
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183739
Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, K. Yi
Joins are expensive, especially on large data and/or multiple relations. One promising approach in mitigating their high costs is to just return a simple random sample of the full join results, which is sufficient for many tasks. Indeed, in as early as 1999, Chaudhuri et al. posed the problem of sampling over joins as a fundamental challenge in large database systems. They also pointed out a fundamental barrier for this problem, that the sampling operator cannot be pushed through a join, i.e., sample( R bowtie S )≠ sample( R ) bowtie sample( S ). To overcome this barrier, they used precomputed statistics to guide the sampling process, but only showed how this works for two-relation joins. This paper revisits this classic problem for both acyclic and cyclic multi-way joins. We build upon the idea of Chaudhuri et al., but extend it in several nontrivial directions. First, we propose a general framework for random sampling over multi-way joins, which includes the algorithm of Chaudhuri et al. as a special case. Second, we explore several ways to instantiate this framework, depending on what prior information is available about the underlying data, and offer different tradeoffs between sample generation latency and throughput. We analyze the properties of different instantiations and evaluate them against the baseline methods; the results clearly demonstrate the superiority of our new techniques.
连接开销很大,特别是在处理大数据和/或多个关系时。降低高成本的一种有希望的方法是只返回完整连接结果的简单随机样本,这对于许多任务来说已经足够了。实际上,早在1999年,Chaudhuri等人就提出了在连接上采样的问题,并将其作为大型数据库系统中的一个基本挑战。他们还指出了这个问题的一个基本障碍,即采样算子不能被推过连接,即sample(R) bowtie S≠sample(R) bowtie sample(S)。为了克服这一障碍,他们使用预先计算的统计数据来指导采样过程,但只展示了这种方法如何用于双关系连接。本文对非循环和循环多路连接重新研究了这一经典问题。我们以Chaudhuri等人的想法为基础,但将其扩展到几个重要的方向。首先,我们提出了一个多路连接随机抽样的一般框架,其中包括Chaudhuri等人的算法作为一个特例。其次,我们探索了几种实例化该框架的方法,这取决于底层数据的可用先验信息,并在样本生成延迟和吞吐量之间提供了不同的权衡。我们分析不同实例化的属性,并根据基线方法对它们进行评估;结果清楚地表明了我们新技术的优越性。
{"title":"Random Sampling over Joins Revisited","authors":"Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, K. Yi","doi":"10.1145/3183713.3183739","DOIUrl":"https://doi.org/10.1145/3183713.3183739","url":null,"abstract":"Joins are expensive, especially on large data and/or multiple relations. One promising approach in mitigating their high costs is to just return a simple random sample of the full join results, which is sufficient for many tasks. Indeed, in as early as 1999, Chaudhuri et al. posed the problem of sampling over joins as a fundamental challenge in large database systems. They also pointed out a fundamental barrier for this problem, that the sampling operator cannot be pushed through a join, i.e., sample( R bowtie S )≠ sample( R ) bowtie sample( S ). To overcome this barrier, they used precomputed statistics to guide the sampling process, but only showed how this works for two-relation joins. This paper revisits this classic problem for both acyclic and cyclic multi-way joins. We build upon the idea of Chaudhuri et al., but extend it in several nontrivial directions. First, we propose a general framework for random sampling over multi-way joins, which includes the algorithm of Chaudhuri et al. as a special case. Second, we explore several ways to instantiate this framework, depending on what prior information is available about the underlying data, and offer different tradeoffs between sample generation latency and throughput. We analyze the properties of different instantiations and evaluate them against the baseline methods; the results clearly demonstrate the superiority of our new techniques.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87632320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
期刊
Proceedings of the 2018 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1