Proceedings of the 2016 International Conference on Management of Data最新文献

英文中文

An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory 主存中13种关系对等连接的实验比较

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882917

Stefan Schuh, Xiao Chen, J. Dittrich

Relational equi-joins are at the heart of almost every query plan. They have been studied, improved, and reexamined on a regular basis since the existence of the database community. In the past four years several new join algorithms have been proposed and experimentally evaluated. Some of those papers contradict each other in their experimental findings. This makes it surprisingly hard to answer a very simple question: what is the fastest join algorithm in 2015? In this paper we will try to develop an answer. We start with an end-to-end black box comparison of the most important methods. Afterwards, we inspect the internals of these algorithms in a white box comparison. We derive improved variants of state-of-the-art join algorithms by applying optimizations like~software-write combine buffers, various hash table implementations, as well as NUMA-awareness in terms of data placement and scheduling. We also inspect various radix partitioning strategies. Eventually, we are in the position to perform a comprehensive comparison of thirteen different join algorithms. We factor in scaling effects in terms of size of the input datasets, the number of threads, different page sizes, and data distributions. Furthermore, we analyze the impact of various joins on an (unchanged) TPC-H query. Finally, we conclude with a list of major lessons learned from our study and a guideline for practitioners implementing massive main-memory joins. As is the case with almost all algorithms in databases, we will learn that there is no single best join algorithm. Each algorithm has its strength and weaknesses and shines in different areas of the parameter space.

关系等连接几乎是每个查询计划的核心。自从数据库社区存在以来，它们一直在定期地被研究、改进和重新检查。在过去的四年中，已经提出了几种新的连接算法并进行了实验评估。其中一些论文的实验结果相互矛盾。这使得回答一个非常简单的问题变得异常困难:2015年最快的连接算法是什么?在本文中，我们将尝试给出一个答案。我们从最重要的方法的端到端黑盒比较开始。然后，我们在白盒比较中检查这些算法的内部。我们通过应用优化，如软件写入组合缓冲区、各种散列表实现，以及数据放置和调度方面的numa感知，派生出最先进的连接算法的改进变体。我们还考察了各种基数划分策略。最后，我们将对13种不同的连接算法进行全面的比较。我们根据输入数据集的大小、线程数量、不同页面大小和数据分布来考虑缩放效应。此外，我们分析了各种连接对(未更改的)TPC-H查询的影响。最后，我们总结了从我们的研究中得到的主要经验教训，并为从业者实现大规模主存连接提供了指导。与数据库中几乎所有算法的情况一样，我们将了解到没有单一的最佳连接算法。每种算法都有其优点和缺点，并在参数空间的不同领域发挥作用。

{"title":"An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory","authors":"Stefan Schuh, Xiao Chen, J. Dittrich","doi":"10.1145/2882903.2882917","DOIUrl":"https://doi.org/10.1145/2882903.2882917","url":null,"abstract":"Relational equi-joins are at the heart of almost every query plan. They have been studied, improved, and reexamined on a regular basis since the existence of the database community. In the past four years several new join algorithms have been proposed and experimentally evaluated. Some of those papers contradict each other in their experimental findings. This makes it surprisingly hard to answer a very simple question: what is the fastest join algorithm in 2015? In this paper we will try to develop an answer. We start with an end-to-end black box comparison of the most important methods. Afterwards, we inspect the internals of these algorithms in a white box comparison. We derive improved variants of state-of-the-art join algorithms by applying optimizations like~software-write combine buffers, various hash table implementations, as well as NUMA-awareness in terms of data placement and scheduling. We also inspect various radix partitioning strategies. Eventually, we are in the position to perform a comprehensive comparison of thirteen different join algorithms. We factor in scaling effects in terms of size of the input datasets, the number of threads, different page sizes, and data distributions. Furthermore, we analyze the impact of various joins on an (unchanged) TPC-H query. Finally, we conclude with a list of major lessons learned from our study and a guideline for practitioners implementing massive main-memory joins. As is the case with almost all algorithms in databases, we will learn that there is no single best join algorithm. Each algorithm has its strength and weaknesses and shines in different areas of the parameter space.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"294 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72920325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

Emma in Action: Declarative Dataflows for Scalable Data Analysis Emma in Action:可扩展数据分析的声明性数据流

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899396

Alexander B. Alexandrov, Andreas Salzmann, Georgi Krastev, Asterios Katsifodimos, V. Markl

Parallel dataflow APIs based on second-order functions were originally seen as a flexible alternative to SQL. Over time, however, their complexity increased due to the number of physical aspects that had to be exposed by the underlying engines in order to facilitate efficient execution. To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions. This demonstration highlights the benefits of an alternative design based on deep language embedding. We showcase Emma - a programming language embedded in Scala. Emma promotes parallel collection processing through native constructs like Scala's for-comprehensions - a declarative syntax akin to SQL. In addition, Emma also advocates quasi-quoting the entire data analysis algorithm rather than its individual dataflow expressions. This allows for decomposing the quoted code into (sequential) control flow and (parallel) dataflow fragments, optimizing the dataflows in context, and transparently offloading them to an engine like Spark or Flink. The proposed design promises increased programmer productivity due to avoiding an impedance mismatch, thereby reducing the lag times and cost of data analysis.

基于二阶函数的并行数据流api最初被视为SQL的灵活替代方案。然而，随着时间的推移，它们的复杂性增加了，因为底层引擎必须公开许多物理方面，以促进有效的执行。为了保持足够的抽象水平并降低数据科学家的进入门槛，像Spark和Flink这样的项目目前在并行集合抽象的基础上提供了特定领域的api。这个演示突出了基于深度语言嵌入的另一种设计的好处。我们展示了Emma——一种嵌入Scala的编程语言。Emma通过Scala的for-comprehension(一种类似于SQL的声明性语法)这样的本地结构来促进并行集合处理。此外，Emma还提倡准引用整个数据分析算法，而不是单个数据流表达式。这允许将引用的代码分解为(顺序的)控制流和(并行的)数据流片段，在上下文中优化数据流，并透明地将它们卸载到像Spark或Flink这样的引擎。由于避免了阻抗不匹配，因此建议的设计承诺提高程序员的工作效率，从而减少延迟时间和数据分析的成本。

{"title":"Emma in Action: Declarative Dataflows for Scalable Data Analysis","authors":"Alexander B. Alexandrov, Andreas Salzmann, Georgi Krastev, Asterios Katsifodimos, V. Markl","doi":"10.1145/2882903.2899396","DOIUrl":"https://doi.org/10.1145/2882903.2899396","url":null,"abstract":"Parallel dataflow APIs based on second-order functions were originally seen as a flexible alternative to SQL. Over time, however, their complexity increased due to the number of physical aspects that had to be exposed by the underlying engines in order to facilitate efficient execution. To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions. This demonstration highlights the benefits of an alternative design based on deep language embedding. We showcase Emma - a programming language embedded in Scala. Emma promotes parallel collection processing through native constructs like Scala's for-comprehensions - a declarative syntax akin to SQL. In addition, Emma also advocates quasi-quoting the entire data analysis algorithm rather than its individual dataflow expressions. This allows for decomposing the quoted code into (sequential) control flow and (parallel) dataflow fragments, optimizing the dataflows in context, and transparently offloading them to an engine like Spark or Flink. The proposed design promises increased programmer productivity due to avoiding an impedance mismatch, thereby reducing the lag times and cost of data analysis.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74145737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Main Memory Adaptive Denormalization 主存储器自适应反规范化

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914835

Zezhou Liu, Stratos Idreos

Joins have traditionally been the most expensive database operator, but they are required to query normalized schemas. In turn, normalized schemas are necessary to minimize update costs and space usage. Joins can be avoided altogether by using a denormalized schema instead of a normalized schema; this improves analytical query processing times at the tradeof increased update overhead, loading cost, and storage requirements. In our work, we show that we can achieve the best of both worlds by leveraging partial, incremental, and dynamic denormalized tables to avoid join operators, resulting in fast query performance while retaining the minimized loading, update, and storage costs of a normalized schema. We introduce adaptive denormalization for modern main memory systems. We replace the traditional join operations with efficient scans over the relevant partial universal tables without incurring the prohibitive cost of full denormalization.

传统上，连接是最昂贵的数据库操作符，但是查询规范化模式需要连接。反过来，规范化模式对于最小化更新成本和空间使用是必要的。通过使用非规范化模式而不是规范化模式，可以完全避免连接;这以增加更新开销、加载成本和存储需求为代价，提高了分析查询处理时间。在我们的工作中，我们展示了我们可以通过利用部分表、增量表和动态非规范化表来避免连接操作符，从而实现两全其美，从而在保持规范化模式的最小加载、更新和存储成本的同时获得快速查询性能。我们为现代主存系统引入了自适应非规范化。我们用对相关部分通用表的高效扫描取代了传统的连接操作，而不会产生完全反规范化的高昂成本。

引用次数: 11

Graph Summarization for Geo-correlated Trends Detection in Social Networks 社交网络中地理相关趋势检测的图形摘要

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914832

Colin Biafore, Faisal Nawab

Trends detection in social networks is possible via a multitude of models with different characteristics. These models are pre-defined and rigid which creates the need to expose the social network graph to data scientists to introduce the human-element in trends detection. However, inspecting large social network graphs visually is tiresome. We tackle this problem by providing effective graph summarizations aimed at the application of geo-correlated trends detection in social networks.

社交网络中的趋势检测可以通过具有不同特征的大量模型来实现。这些模型是预先定义的和严格的，这就需要将社交网络图暴露给数据科学家，以便在趋势检测中引入人为因素。然而，从视觉上检查大型社交网络图是令人厌烦的。我们通过提供有效的图形摘要来解决这个问题，目的是在社交网络中应用地理相关趋势检测。

引用次数: 0

Design Tradeoffs of Data Access Methods 数据访问方法的设计权衡

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2912569

Manos Athanassoulis, Stratos Idreos

Database researchers and practitioners have been building methods to store, access, and update data for more than five decades. Designing access methods has been a constant effort to adapt to the ever changing underlying hardware and workload requirements. The recent explosion in data system designs - including, in addition to traditional SQL systems, NoSQL, NewSQL, and other relational and non-relational systems - makes understanding the tradeoffs of designing access methods more important than ever. Access methods are at the core of any new data system. In this tutorial we survey recent developments in access method design and we place them in the design space where each approach focuses primarily on one or a subset of read performance, update performance, and memory utilization. We discuss how to utilize designs and lessons-learned from past research. In addition, we discuss new ideas on how to build access methods that have tunable behavior, as well as, what is the scenery of open research problems.

50多年来，数据库研究人员和从业人员一直在构建存储、访问和更新数据的方法。为了适应不断变化的底层硬件和工作负载需求，设计访问方法一直是一项持续的工作。最近数据系统设计的爆炸式增长——除了传统的SQL系统之外，还包括NoSQL、NewSQL以及其他关系和非关系系统——使得理解设计访问方法的权衡比以往任何时候都更加重要。访问方法是任何新数据系统的核心。在本教程中，我们概述了访问方法设计方面的最新发展，并将它们放在设计空间中，其中每种方法主要关注读取性能、更新性能和内存利用率的一个或一个子集。我们讨论了如何利用设计和从过去的研究中吸取的教训。此外，我们还讨论了如何建立具有可调行为的访问方法的新思路，以及开放研究问题的前景。

引用次数: 27

REACT: Context-Sensitive Recommendations for Data Analysis REACT:上下文敏感的数据分析建议

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899392

T. Milo, Amit Somech

Data analysis may be a difficult task, especially for non-expert users, as it requires deep understanding of the investigated domain and the particular context. In this demo we present REACT, a system that hooks to the analysis UI and provides the users with personalized recommendations of analysis actions. By matching the current user session to previous sessions of analysts working with the same or other data sets, REACT is able to identify the potentially best next analysis actions in the given user context. Unlike previous work that mainly focused on individual components of the analysis work, REACT provides a holistic approach that captures a wider range of analysis action types by utilizing novel notions of similarity in terms of the individual actions, the analyzed data and the entire analysis workflow. We demonstrate the functionality of REACT, as well as its effectiveness through a digital forensics scenario where users are challenged to detect cyber attacks in real life data achieved from honeypot servers.

数据分析可能是一项困难的任务，特别是对于非专业用户，因为它需要深入了解所调查的领域和特定的上下文。在这个演示中，我们展示了REACT，一个连接到分析UI并为用户提供个性化分析操作建议的系统。通过将当前用户会话与使用相同或其他数据集的分析师的先前会话进行匹配，REACT能够在给定的用户上下文中识别潜在的最佳下一步分析操作。与之前主要关注分析工作的单个组件的工作不同，REACT提供了一种整体方法，通过利用单个操作、分析数据和整个分析工作流方面的新颖相似性概念来捕获更广泛的分析操作类型。我们通过数字取证场景展示了REACT的功能，以及它的有效性，在这个场景中，用户面临着从蜜罐服务器获得的现实生活数据中检测网络攻击的挑战。

引用次数: 19

RxSpatial: Reactive Spatial Library for Real-Time Location Tracking and Processing RxSpatial:用于实时位置跟踪和处理的响应空间库

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899411

Youying Shi, Abdeltawab M. Hendawi, H. Fattah, Mohamed H. Ali

Current commercial spatial libraries implemented strong support on functionalities like intersection, distance, and area for various stationary geospatial objects. The missing point is the support for moving object. Performing moving object real-time location tracking and computation on server side of GIS application is challenging because of high user volume of moving object to track, time complexity of analysis and computation, and requirement of real-timing. In this Demo, we present the RxSpatial, a real time reactive spatial library that consists of (1) a front-end, a programming interface for developers who are familiar with the Reactive framework and the Microsoft Spatial Library, and (2) a back-end for processing spatial operations in a streaming fashion. Then we provide the demonstration scenarios that show how RxSpatial is employed in real-world applications. The demonstration scenarios include criminal activity tracking, collaborative vehicle system, performance analysis and an interactive internal inspection.

当前的商业空间库实现了对各种固定地理空间对象的交叉点、距离和面积等功能的强大支持。缺少的一点是对移动物体的支持。由于运动目标跟踪用户量大、分析计算时间复杂、实时性要求高，在GIS应用服务器端对运动目标进行实时位置跟踪和计算具有一定的挑战性。在这个演示中，我们展示了RxSpatial，一个实时响应式空间库，它包括(1)一个前端，一个为熟悉响应式框架和微软空间库的开发人员提供的编程接口，以及(2)一个以流方式处理空间操作的后端。然后，我们提供演示场景，展示如何在实际应用程序中使用RxSpatial。演示场景包括犯罪活动跟踪、协同车辆系统、性能分析和交互式内部检查。

引用次数: 7

QUEPA: QUerying and Exploring a Polystore by Augmentation QUEPA:通过增强来查询和探索Polystore

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899393

A. Maccioni, E. Basili, Riccardo Torlone

Polystore systems (or simply polystores) have been recently proposed to support a common scenario in which enterprise data are stored in a variety of database technologies relying on different data models and languages. Polystores provide a loosely coupled integration of data sources and support the direct access, with the local language, to each specific storage engine to exploit its distinctive features. Given the absence of a global schema, new challenges for accessing data arise in these environments. In fact, it is usually hard to know in advance if a query to a specific data store can be satisfied with data stored elsewhere in the polystore. QUEPA addresses these issues by introducing augmented search and augmented exploration in a polystore, two access methods based on the automatic enrichment of the result of a query over a storage system with related data in the rest of the polystore. These features do not impact on the applications running on top of the polystore and are compatible with the most common database systems. QUEPA implements in this way a lightweight mechanism for data integration in the polystore and operates in a plug-and-play mode, thus reducing the need for ad-hoc configurations and for middleware layers involving standard APIs, unified query languages or shared data models. In our demonstration audience can experience with the augmentation construct by using the native query languages of the database systems available in the polystore.

最近提出了多存储系统(或简称多存储)来支持一种常见的场景，在这种场景中，企业数据存储在依赖于不同数据模型和语言的各种数据库技术中。polystore提供了数据源的松散耦合集成，并支持使用本地语言直接访问每个特定的存储引擎，以利用其独特的特性。由于缺乏全局模式，在这些环境中出现了访问数据的新挑战。实际上，通常很难提前知道对特定数据存储的查询是否可以满足存储在多存储中其他地方的数据。QUEPA通过在多存储库中引入增强搜索和增强探索来解决这些问题，这两种访问方法基于对存储系统的查询结果进行自动充实，并使用多存储库其余部分的相关数据。这些特性不会影响在polystore上运行的应用程序，并且与大多数常见的数据库系统兼容。QUEPA以这种方式实现了一种轻量级机制，用于在多存储库中集成数据，并以即插即用模式操作，从而减少了对特别配置和涉及标准api、统一查询语言或共享数据模型的中间件层的需求。在我们的演示中，观众可以通过使用polystore中可用的数据库系统的本地查询语言来体验增强构造。

{"title":"QUEPA: QUerying and Exploring a Polystore by Augmentation","authors":"A. Maccioni, E. Basili, Riccardo Torlone","doi":"10.1145/2882903.2899393","DOIUrl":"https://doi.org/10.1145/2882903.2899393","url":null,"abstract":"Polystore systems (or simply polystores) have been recently proposed to support a common scenario in which enterprise data are stored in a variety of database technologies relying on different data models and languages. Polystores provide a loosely coupled integration of data sources and support the direct access, with the local language, to each specific storage engine to exploit its distinctive features. Given the absence of a global schema, new challenges for accessing data arise in these environments. In fact, it is usually hard to know in advance if a query to a specific data store can be satisfied with data stored elsewhere in the polystore. QUEPA addresses these issues by introducing augmented search and augmented exploration in a polystore, two access methods based on the automatic enrichment of the result of a query over a storage system with related data in the rest of the polystore. These features do not impact on the applications running on top of the polystore and are compatible with the most common database systems. QUEPA implements in this way a lightweight mechanism for data integration in the polystore and operates in a plug-and-play mode, thus reducing the need for ad-hoc configurations and for middleware layers involving standard APIs, unified query languages or shared data models. In our demonstration audience can experience with the augmentation construct by using the native query languages of the database systems available in the polystore.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81468448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Transaction Healing: Scaling Optimistic Concurrency Control on Multicores 事务修复:在多核上扩展乐观并发控制

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915202

Yingjun Wu, C. Chan, K. Tan

Today's main-memory databases can support very high transaction rate for OLTP applications. However, when a large number of concurrent transactions contend on the same data records, the system performance can deteriorate significantly. This is especially the case when scaling transaction processing with optimistic concurrency control (OCC) on multicore machines. In this paper, we propose a new concurrency-control mechanism, called transaction healing, that exploits program semantics to scale the conventional OCC towards dozens of cores even under highly contended workloads. Transaction healing captures the dependencies across operations within a transaction prior to its execution. Instead of blindly rejecting a transaction once its validation fails, the proposed mechanism judiciously restores any non-serializable operation and heals inconsistent transaction states as well as query results according to the extracted dependencies. Transaction healing can partially update the membership of read/write sets when processing dependent transactions. Such overhead, however, is largely reduced by carefully avoiding false aborts and rearranging validation orders. We implemented the idea of transaction healing in TheDB, a main-memory database prototype that provides full ACID guarantee with a scalable commit protocol. By evaluating TheDB on a 48-core machine with two widely-used benchmarks, we confirm that transaction healing can scale near-linearly, yielding significantly higher transaction rate than the state-of-the-art OCC implementations.

今天的主内存数据库可以支持OLTP应用程序非常高的事务率。但是，当大量并发事务争用相同的数据记录时，系统性能可能会显著下降。在多核机器上使用乐观并发控制(OCC)扩展事务处理时尤其如此。在本文中，我们提出了一种新的并发控制机制，称为事务修复，它利用程序语义将传统的OCC扩展到几十个核心，即使在高度竞争的工作负载下也是如此。事务修复在事务执行之前捕获事务内各操作之间的依赖关系。提议的机制不是在验证失败后盲目地拒绝事务，而是明智地恢复任何不可序列化的操作，并根据提取的依赖项修复不一致的事务状态和查询结果。事务修复可以在处理依赖事务时部分更新读/写集的成员。但是，通过小心地避免错误中止和重新安排验证顺序，可以在很大程度上减少这种开销。我们在TheDB中实现了事务修复的想法，这是一个主内存数据库原型，通过可扩展的提交协议提供了完整的ACID保证。通过在48核机器上使用两个广泛使用的基准测试评估TheDB，我们确认事务修复可以近乎线性地扩展，产生比最先进的OCC实现更高的事务率。

{"title":"Transaction Healing: Scaling Optimistic Concurrency Control on Multicores","authors":"Yingjun Wu, C. Chan, K. Tan","doi":"10.1145/2882903.2915202","DOIUrl":"https://doi.org/10.1145/2882903.2915202","url":null,"abstract":"Today's main-memory databases can support very high transaction rate for OLTP applications. However, when a large number of concurrent transactions contend on the same data records, the system performance can deteriorate significantly. This is especially the case when scaling transaction processing with optimistic concurrency control (OCC) on multicore machines. In this paper, we propose a new concurrency-control mechanism, called transaction healing, that exploits program semantics to scale the conventional OCC towards dozens of cores even under highly contended workloads. Transaction healing captures the dependencies across operations within a transaction prior to its execution. Instead of blindly rejecting a transaction once its validation fails, the proposed mechanism judiciously restores any non-serializable operation and heals inconsistent transaction states as well as query results according to the extracted dependencies. Transaction healing can partially update the membership of read/write sets when processing dependent transactions. Such overhead, however, is largely reduced by carefully avoiding false aborts and rearranging validation orders. We implemented the idea of transaction healing in TheDB, a main-memory database prototype that provides full ACID guarantee with a scalable commit protocol. By evaluating TheDB on a 48-core machine with two widely-used benchmarks, we confirm that transaction healing can scale near-linearly, yielding significantly higher transaction rate than the state-of-the-art OCC implementations.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"00 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79020583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

QFix: Demonstrating Error Diagnosis in Query Histories QFix:在查询历史中演示错误诊断

Proceedings of the 2016 International Conference on Management of Data

Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899388

Xiaolan Wang, A. Meliou, Eugene Wu

An increasing number of applications in all aspects of society rely on data. Despite the long line of research in data cleaning and repairs, data correctness has been an elusive goal. Errors in the data can be extremely disruptive, and are detrimental to the effectiveness and proper function of data-driven applications. Even when data is cleaned, new errors can be introduced by applications and users who interact with the data. Subsequent valid updates can obscure these errors and propagate them through the dataset causing more discrepancies. Any discovered errors tend to be corrected superficially, on a case-by-case basis, further obscuring the true underlying cause, and making detection of the remaining errors harder. In this demo proposal, we outline the design of QFix, a query-centric framework that derives explanations and repairs for discrepancies in relational data based on potential errors in the queries that operated on the data. This is a marked departure from traditional data-centric techniques that directly fix the data. We then describe how users will use QFix in a demonstration scenario. Participants will be able to select from a number of transactional benchmarks, introduce errors into the queries that are executed, and compare the fixes to the queries proposed by QFix as well as existing alternative algorithms such as decision trees.

社会各个方面越来越多的应用都依赖于数据。尽管在数据清理和修复方面进行了大量研究，但数据正确性一直是一个难以实现的目标。数据中的错误可能极具破坏性，并且对数据驱动应用程序的有效性和正常功能有害。即使清除了数据，应用程序和与数据交互的用户也可能引入新的错误。随后的有效更新可以掩盖这些错误，并通过数据集传播它们，从而导致更多的差异。任何发现的错误都倾向于在个案的基础上进行肤浅的纠正，这进一步模糊了真正的潜在原因，并使检测其余错误变得更加困难。在这个演示建议中，我们概述了QFix的设计，这是一个以查询为中心的框架，它根据对数据进行操作的查询中的潜在错误，对关系数据中的差异进行解释和修复。这与直接修复数据的传统以数据为中心的技术有很大的不同。然后，我们将描述用户将如何在演示场景中使用QFix。参与者将能够从许多事务基准中进行选择，在执行的查询中引入错误，并将修复与QFix提出的查询以及现有的替代算法(如决策树)进行比较。

{"title":"QFix: Demonstrating Error Diagnosis in Query Histories","authors":"Xiaolan Wang, A. Meliou, Eugene Wu","doi":"10.1145/2882903.2899388","DOIUrl":"https://doi.org/10.1145/2882903.2899388","url":null,"abstract":"An increasing number of applications in all aspects of society rely on data. Despite the long line of research in data cleaning and repairs, data correctness has been an elusive goal. Errors in the data can be extremely disruptive, and are detrimental to the effectiveness and proper function of data-driven applications. Even when data is cleaned, new errors can be introduced by applications and users who interact with the data. Subsequent valid updates can obscure these errors and propagate them through the dataset causing more discrepancies. Any discovered errors tend to be corrected superficially, on a case-by-case basis, further obscuring the true underlying cause, and making detection of the remaining errors harder. In this demo proposal, we outline the design of QFix, a query-centric framework that derives explanations and repairs for discrepancies in relational data based on potential errors in the queries that operated on the data. This is a marked departure from traditional data-centric techniques that directly fix the data. We then describe how users will use QFix in a demonstration scenario. Participants will be able to select from a number of transactional benchmarks, introduce errors into the queries that are executed, and compare the fixes to the queries proposed by QFix as well as existing alternative algorithms such as decision trees.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"159 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77818508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2016 International Conference on Management of Data

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀