首页 > 最新文献

Proceedings of the 2018 International Conference on Management of Data最新文献

英文 中文
Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications 烟雾演示:深吸一口气的数据密集型谱系应用
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193537
Fotis Psallidas, Eugene Wu
Data lineage is a fundamental type of information that describes the relationships between input and output data items in a workflow. As such, an immense amount of data-intensive applications with logic over the input-output relationships can be expressed declaratively in lineage terms. Unfortunately, many applications resort to hand-tuned implementations because either lineage systems are not fast enough to meet their requirements or due to no knowledge of the lineage capabilities. Recently, we introduced a set of implementation design principles and associated techniques to optimize lineage-enabled database engines and realized them in our prototype database engine, namely, Smoke. In this demonstration, we showcase lineage as the building block across a variety of data-intensive applications, including tooltips and details on demand; crossfilter; and data profiling. In addition, we show how Smoke outperforms alternative lineage systems to meet or improve on existing hand-tuned implementations of these applications.
数据沿袭是描述工作流中输入和输出数据项之间关系的基本信息类型。因此,大量具有输入-输出关系逻辑的数据密集型应用程序可以用沿袭术语声明式地表示。不幸的是,许多应用程序求助于手动调优的实现,因为沿袭系统不够快,无法满足它们的需求,或者由于不了解沿袭功能。最近,我们介绍了一组实现设计原则和相关技术来优化支持继承的数据库引擎,并在我们的原型数据库引擎(即Smoke)中实现了它们。在这个演示中,我们将沿袭展示为跨各种数据密集型应用程序的构建块,包括工具提示和按需详细信息;crossfilter;还有数据分析。此外,我们还展示了Smoke如何优于其他沿袭系统,以满足或改进这些应用程序的现有手动调优实现。
{"title":"Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications","authors":"Fotis Psallidas, Eugene Wu","doi":"10.1145/3183713.3193537","DOIUrl":"https://doi.org/10.1145/3183713.3193537","url":null,"abstract":"Data lineage is a fundamental type of information that describes the relationships between input and output data items in a workflow. As such, an immense amount of data-intensive applications with logic over the input-output relationships can be expressed declaratively in lineage terms. Unfortunately, many applications resort to hand-tuned implementations because either lineage systems are not fast enough to meet their requirements or due to no knowledge of the lineage capabilities. Recently, we introduced a set of implementation design principles and associated techniques to optimize lineage-enabled database engines and realized them in our prototype database engine, namely, Smoke. In this demonstration, we showcase lineage as the building block across a variety of data-intensive applications, including tooltips and details on demand; crossfilter; and data profiling. In addition, we show how Smoke outperforms alternative lineage systems to meet or improve on existing hand-tuned implementations of these applications.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88042857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models 数据计算器:从第一原则和学习成本模型出发的数据结构设计和成本综合
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3199671
Stratos Idreos, Konstantinos Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo
Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of fundamental data access primitives (e.g., random access). With these models, we synthesize the performance cost of complex operations on arbitrary data structure designs without having to: 1) implement the data structure, 2) run the workload, or even 3) access the target hardware. We demonstrate that the Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, i.e., computing how the performance (response time) of a given data structure design is impacted by variations in the: 1) design, 2) hardware, 3) data, and 4) query workloads. This makes it effortless to test numerous designs and ideas before embarking on lengthy implementation, deployment, and hardware acquisition steps. We also demonstrate that the Data Calculator can synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.
数据结构在任何数据驱动的场景中都是至关重要的,但由于巨大的设计空间以及性能对工作负载和硬件的依赖,它们很难设计。我们提出了一个设计引擎,数据计算器,它可以实现交互式和半自动化的数据结构设计。它带来了两个创新。首先,它提供了一组细粒度的设计原语,这些原语捕获了数据布局设计的首要原则:数据结构节点如何布局数据,以及它们如何相互定位。这允许对可能的数据结构设计进行结构化描述,这些设计可以作为这些原语的组合进行合成。第二个创新是使用学习成本模型计算性能。这些模型在不同的硬件和数据配置文件上进行训练,并捕获基本数据访问原语(例如,随机访问)的成本属性。有了这些模型,我们可以综合任意数据结构设计上复杂操作的性能成本,而不必:1)实现数据结构,2)运行工作负载,甚至3)访问目标硬件。我们演示了数据计算器可以帮助数据结构设计师和研究人员准确地回答几秒钟或几分钟的丰富的假设设计问题,即计算给定数据结构设计的性能(响应时间)如何受到以下变化的影响:1)设计、2)硬件、3)数据和4)查询工作负载。这使得在开始漫长的实现、部署和硬件获取步骤之前,可以毫不费力地测试大量的设计和想法。我们还证明了数据计算器可以合成全新的设计,自动完成部分设计,并检测次优设计选择。
{"title":"The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models","authors":"Stratos Idreos, Konstantinos Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo","doi":"10.1145/3183713.3199671","DOIUrl":"https://doi.org/10.1145/3183713.3199671","url":null,"abstract":"Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of fundamental data access primitives (e.g., random access). With these models, we synthesize the performance cost of complex operations on arbitrary data structure designs without having to: 1) implement the data structure, 2) run the workload, or even 3) access the target hardware. We demonstrate that the Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, i.e., computing how the performance (response time) of a given data structure design is impacted by variations in the: 1) design, 2) hardware, 3) data, and 4) query workloads. This makes it effortless to test numerous designs and ideas before embarking on lengthy implementation, deployment, and hardware acquisition steps. We also demonstrate that the Data Calculator can synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88357907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Splaying Log-Structured Merge-Trees 显示日志结构的合并树
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183723
Thomas Lively, Luca Schroeder, Carlos Mendizábal
Modern persistent key-value stores typically use a log-structured merge-tree (LSM-tree) design, which allows for high write throughput. Our observation is that the LSM-tree, however, has suboptimal performance during read-intensive workload windows with non-uniform key access distributions. To address this shortcoming, we propose and analyze a simple decision scheme that can be added to any LSM-based key-value store and dramatically reduce the number of disk I/Os for these classes of workloads. The key insight is that copying a frequently accessed key to the top of an LSM-tree ("splaying'') allows cheaper reads on that key in the near future.
现代的持久键值存储通常使用日志结构的合并树(LSM-tree)设计,这种设计允许高写吞吐量。然而,我们的观察是,lsm树在具有非统一键访问分布的读密集型工作负载窗口期间具有次优性能。为了解决这个缺点,我们提出并分析了一个简单的决策方案,该方案可以添加到任何基于lsm的键值存储中,并显著减少这些工作负载类的磁盘I/ o数量。关键的见解是,将频繁访问的键复制到lsm树的顶部(“展开”)可以在不久的将来更便宜地读取该键。
{"title":"Splaying Log-Structured Merge-Trees","authors":"Thomas Lively, Luca Schroeder, Carlos Mendizábal","doi":"10.1145/3183713.3183723","DOIUrl":"https://doi.org/10.1145/3183713.3183723","url":null,"abstract":"Modern persistent key-value stores typically use a log-structured merge-tree (LSM-tree) design, which allows for high write throughput. Our observation is that the LSM-tree, however, has suboptimal performance during read-intensive workload windows with non-uniform key access distributions. To address this shortcoming, we propose and analyze a simple decision scheme that can be added to any LSM-based key-value store and dramatically reduce the number of disk I/Os for these classes of workloads. The key insight is that copying a frequently accessed key to the top of an LSM-tree (\"splaying'') allows cheaper reads on that key in the near future.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85006710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Industry 3: DB Systems in the Cloud and Open Source 会议详情:行业3:云和开源中的数据库系统
Mohammad Sadoghi
{"title":"Session details: Industry 3: DB Systems in the Cloud and Open Source","authors":"Mohammad Sadoghi","doi":"10.1145/3258015","DOIUrl":"https://doi.org/10.1145/3258015","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83195788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality 任意维无约束约束的k-遗憾查询算法
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196903
Min Xie, R. C. Wong, J. Li, Cheng Long, Ashwin Lall
Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed in the literature: top- k queries and skyline queries. A top- k query requires users to specify their utility functions beforehand and then returns k tuples to the users. A skyline query does not require any utility function from users but it puts no control on the number of tuples returned to users. Recently, a k-regret query was proposed and received attention from the community because it does not require any utility function from users and the output size is controllable, and thus it avoids those deficiencies of top- k queries and skyline queries. Specifically, it returns k tuples that minimize a criterion called the maximum regret ratio . In this paper, we present the lower bound of the maximum regret ratio for the k -regret query. Besides, we propose a novel algorithm, called SPHERE, whose upper bound on the maximum regret ratio is asymptotically optimal and restriction-free for any dimensionality, the best-known result in the literature. We conducted extensive experiments to show that SPHERE performs better than the state-of-the-art methods for the k -regret query.
从大型数据库中提取感兴趣的元组是多准则决策中的一个重要问题。文献中提出了两种具有代表性的查询:top- k查询和skyline查询。top- k查询要求用户事先指定他们的实用函数,然后返回k个元组给用户。skyline查询不需要用户的任何实用函数,但它无法控制返回给用户的元组的数量。最近,由于k-后悔查询不需要用户的任何效用函数,并且输出大小可控,从而避免了top- k查询和skyline查询的不足,而被提出并受到了社区的关注。具体来说,它返回k个元组,这些元组最小化一个称为最大后悔率的标准。本文给出了k -后悔查询的最大后悔率的下界。此外,我们提出了一种新的算法,称为SPHERE,其最大后悔率的上界对于任何维度都是渐近最优的,并且没有限制,这是文献中最著名的结果。我们进行了大量的实验,以表明SPHERE比最先进的k -后悔查询方法表现得更好。
{"title":"Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality","authors":"Min Xie, R. C. Wong, J. Li, Cheng Long, Ashwin Lall","doi":"10.1145/3183713.3196903","DOIUrl":"https://doi.org/10.1145/3183713.3196903","url":null,"abstract":"Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed in the literature: top- k queries and skyline queries. A top- k query requires users to specify their utility functions beforehand and then returns k tuples to the users. A skyline query does not require any utility function from users but it puts no control on the number of tuples returned to users. Recently, a k-regret query was proposed and received attention from the community because it does not require any utility function from users and the output size is controllable, and thus it avoids those deficiencies of top- k queries and skyline queries. Specifically, it returns k tuples that minimize a criterion called the maximum regret ratio . In this paper, we present the lower bound of the maximum regret ratio for the k -regret query. Besides, we propose a novel algorithm, called SPHERE, whose upper bound on the maximum regret ratio is asymptotically optimal and restriction-free for any dimensionality, the best-known result in the literature. We conducted extensive experiments to show that SPHERE performs better than the state-of-the-art methods for the k -regret query.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78730928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
DITA: Distributed In-Memory Trajectory Analytics DITA:分布式内存轨迹分析
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183743
Zeyuan Shang, Guoliang Li, Z. Bao
Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds the storage and processing capability of a single machine, and it calls for large-scale trajectory analytics in distributed environments. The distributed trajectory analytics faces challenges of data locality aware partitioning, load balance, easy-to-use interface, and versatility to support various trajectory similarity functions. To address these challenges, we propose a distributed in-memory trajectory analytics system DITA. We propose an effective partitioning method, global index and local index, to address the data locality problem. We devise cost-based techniques to balance the workload. We develop a filter-verification framework to improve the performance. Moreover, DITA can support most of existing similarity functions to quantify the similarity between trajectories. We integrate our framework seamlessly into Spark SQL, and make it support SQL and DataFrame API interfaces. We have conducted extensive experiments on real world datasets, and experimental results show that DITA outperforms existing distributed trajectory similarity search and join approaches significantly.
轨迹分析可以使许多现实世界的应用受益,例如,基于频繁轨迹的导航系统、道路规划、拼车和交通优化。现有的算法侧重于在单个机器上优化这个问题。然而,轨迹的数量超过了单个机器的存储和处理能力,它需要在分布式环境中进行大规模的轨迹分析。分布式轨迹分析面临着数据位置感知划分、负载平衡、易于使用的界面以及支持各种轨迹相似函数的多功能性等挑战。为了解决这些挑战,我们提出了一个分布式内存轨迹分析系统DITA。为了解决数据局部性问题,我们提出了一种有效的分区方法:全局索引和局部索引。我们设计了基于成本的技术来平衡工作量。我们开发了一个过滤器验证框架来提高性能。此外,DITA可以支持大多数现有的相似度函数来量化轨迹之间的相似度。我们将我们的框架无缝集成到Spark SQL中,并使其支持SQL和DataFrame API接口。我们在真实世界的数据集上进行了大量的实验,实验结果表明DITA显著优于现有的分布式轨迹相似度搜索和连接方法。
{"title":"DITA: Distributed In-Memory Trajectory Analytics","authors":"Zeyuan Shang, Guoliang Li, Z. Bao","doi":"10.1145/3183713.3183743","DOIUrl":"https://doi.org/10.1145/3183713.3183743","url":null,"abstract":"Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds the storage and processing capability of a single machine, and it calls for large-scale trajectory analytics in distributed environments. The distributed trajectory analytics faces challenges of data locality aware partitioning, load balance, easy-to-use interface, and versatility to support various trajectory similarity functions. To address these challenges, we propose a distributed in-memory trajectory analytics system DITA. We propose an effective partitioning method, global index and local index, to address the data locality problem. We devise cost-based techniques to balance the workload. We develop a filter-verification framework to improve the performance. Moreover, DITA can support most of existing similarity functions to quantify the similarity between trajectories. We integrate our framework seamlessly into Spark SQL, and make it support SQL and DataFrame API interfaces. We have conducted extensive experiments on real world datasets, and experimental results show that DITA outperforms existing distributed trajectory similarity search and join approaches significantly.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84202353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Managing Non-Volatile Memory in Database Systems 管理数据库系统中的非易失性内存
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196897
Alexander van Renen, Viktor Leis, A. Kemper, Thomas Neumann, T. Hashida, Kazuichi Oe, Y. Doi, L. Harada, Mitsuru Sato
Non-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with the persistence of traditional storage devices like flash (SSD). While these properties make NVM highly promising, it is not yet clear how to best integrate NVM into the storage layer of modern database systems. Two system designs have been proposed. The first is to use NVM exclusively, i.e., to store all data and index structures on it. However, because NVM has a higher latency than DRAM, this design can be less efficient than main-memory database systems. For this reason, the second approach uses a page-based DRAM cache in front of NVM. This approach, however, does not utilize the byte addressability of NVM and, as a result, accessing an uncached tuple on NVM requires retrieving an entire page. In this work, we evaluate these two approaches and compare them with in-memory databases as well as more traditional buffer managers that use main memory as a cache in front of SSDs. This allows us to determine how much performance gain can be expected from NVM. We also propose a lightweight storage manager that simultaneously supports DRAM, NVM, and flash. Our design utilizes the byte addressability of NVM and uses it as an additional caching layer that improves performance without losing the benefits from the even faster DRAM and the large capacities of SSDs.
非易失性存储器(NVM)是一种新的存储技术,它将DRAM的性能和字节可寻址性与闪存(SSD)等传统存储设备的持久性结合在一起。虽然这些特性使NVM非常有前途,但是如何最好地将NVM集成到现代数据库系统的存储层中还不清楚。提出了两种系统设计方案。第一种方法是独占地使用NVM,也就是说,将所有数据和索引结构存储在NVM上。但是,由于NVM具有比DRAM更高的延迟,因此这种设计的效率可能低于主存数据库系统。出于这个原因,第二种方法在NVM前面使用基于页面的DRAM缓存。但是,这种方法没有利用NVM的字节可寻址性,因此,访问NVM上的未缓存元组需要检索整个页面。在这项工作中,我们评估了这两种方法,并将它们与内存数据库以及使用主存作为ssd前缓存的更传统的缓冲区管理器进行了比较。这使我们能够确定从NVM中可以获得多少性能增益。我们还提出了一个同时支持DRAM、NVM和闪存的轻量级存储管理器。我们的设计利用了NVM的字节可寻址性,并将其用作额外的缓存层,以提高性能,同时又不会失去更快的DRAM和大容量ssd带来的好处。
{"title":"Managing Non-Volatile Memory in Database Systems","authors":"Alexander van Renen, Viktor Leis, A. Kemper, Thomas Neumann, T. Hashida, Kazuichi Oe, Y. Doi, L. Harada, Mitsuru Sato","doi":"10.1145/3183713.3196897","DOIUrl":"https://doi.org/10.1145/3183713.3196897","url":null,"abstract":"Non-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with the persistence of traditional storage devices like flash (SSD). While these properties make NVM highly promising, it is not yet clear how to best integrate NVM into the storage layer of modern database systems. Two system designs have been proposed. The first is to use NVM exclusively, i.e., to store all data and index structures on it. However, because NVM has a higher latency than DRAM, this design can be less efficient than main-memory database systems. For this reason, the second approach uses a page-based DRAM cache in front of NVM. This approach, however, does not utilize the byte addressability of NVM and, as a result, accessing an uncached tuple on NVM requires retrieving an entire page. In this work, we evaluate these two approaches and compare them with in-memory databases as well as more traditional buffer managers that use main memory as a cache in front of SSDs. This allows us to determine how much performance gain can be expected from NVM. We also propose a lightweight storage manager that simultaneously supports DRAM, NVM, and flash. Our design utilizes the byte addressability of NVM and uses it as an additional caching layer that improves performance without losing the benefits from the even faster DRAM and the large capacities of SSDs.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88098087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 98
IMPROVE-QA: An Interactive Mechanism for RDF Question/Answering Systems 改进- qa: RDF问答系统的交互机制
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193555
Xinbo Zhang, Lei Zou
RDF Question/Answering(Q/A) systems can interpret user's question N as SPARQL query Q and return answer set $Q(D)$ over RDF repository D to the user. However, due to the complexity of linking natural phrases with specific RDF items (e.g., entities and predicates), it remains difficult to understand users' questions precisely, hence $Q(D)$ may not meet users' expectation, offering wrong answers and dismissing some correct answers. In this demo, we design an I Interactive Mechanism aiming for PRO motion V ia feedback to Q/A systems (IMPROVE-QA), a whole platform to make existing Q/A systems return more precise answers (denoted as $mathcal Q^prime (D)$) to users. Based on user's feedback over $Q(D)$, IMPROVE-QA automatically refines the original query Q into a new query graph $mathcal Q^prime $ with minimum modifications, where $mathcal Q^prime (D)$ provides more precise answers. We will also demonstrate how IMPROVE-QA can apply the "lesson'' learned from the user in each query to improve the precision of Q/A systems on subsequent natural language questions.
RDF问答(Q/A)系统可以将用户的问题N解释为SPARQL查询Q,并通过RDF存储库D向用户返回答案集Q(D)$。然而,由于将自然短语与特定RDF项(例如实体和谓词)连接起来的复杂性,仍然很难精确地理解用户的问题,因此$Q(D)$可能不符合用户的期望,提供错误的答案并放弃一些正确的答案。在这个演示中,我们设计了一个针对PRO motion V的交互机制,通过对Q/A系统的反馈(IMPROVE-QA),一个完整的平台,使现有的Q/A系统返回更精确的答案(表示为$mathcal Q^prime (D)$)给用户。基于用户对$Q(D)$的反馈,improvement - qa自动将原始查询Q提炼为一个新的查询图$mathcal Q^prime $,修改最少,其中$mathcal Q^prime (D)$提供更精确的答案。我们还将演示improve - qa如何在每个查询中应用从用户那里学到的“教训”,以提高Q/A系统在后续自然语言问题上的精度。
{"title":"IMPROVE-QA: An Interactive Mechanism for RDF Question/Answering Systems","authors":"Xinbo Zhang, Lei Zou","doi":"10.1145/3183713.3193555","DOIUrl":"https://doi.org/10.1145/3183713.3193555","url":null,"abstract":"RDF Question/Answering(Q/A) systems can interpret user's question N as SPARQL query Q and return answer set $Q(D)$ over RDF repository D to the user. However, due to the complexity of linking natural phrases with specific RDF items (e.g., entities and predicates), it remains difficult to understand users' questions precisely, hence $Q(D)$ may not meet users' expectation, offering wrong answers and dismissing some correct answers. In this demo, we design an I Interactive Mechanism aiming for PRO motion V ia feedback to Q/A systems (IMPROVE-QA), a whole platform to make existing Q/A systems return more precise answers (denoted as $mathcal Q^prime (D)$) to users. Based on user's feedback over $Q(D)$, IMPROVE-QA automatically refines the original query Q into a new query graph $mathcal Q^prime $ with minimum modifications, where $mathcal Q^prime (D)$ provides more precise answers. We will also demonstrate how IMPROVE-QA can apply the \"lesson'' learned from the user in each query to improve the precision of Q/A systems on subsequent natural language questions.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90717749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Session details: Research 11: Data Mining 研究11:数据挖掘
L. Lakshmanan
{"title":"Session details: Research 11: Data Mining","authors":"L. Lakshmanan","doi":"10.1145/3258018","DOIUrl":"https://doi.org/10.1145/3258018","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90504153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Research 15: Databases for Emerging Hardware 研究15:面向新兴硬件的数据库
P. Pietzuch
{"title":"Session details: Research 15: Databases for Emerging Hardware","authors":"P. Pietzuch","doi":"10.1145/3258023","DOIUrl":"https://doi.org/10.1145/3258023","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"57 6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85435036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2018 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1