首页 > 最新文献

Proceedings of the 2016 International Conference on Management of Data最新文献

英文 中文
Energy Elasticity on Heterogeneous Hardware using Adaptive Resource Reconfiguration LIVE 基于自适应资源重构的异构硬件能量弹性研究
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899390
A. Ungethüm, T. Kissinger, Willi-Wolfram Mentzel, Eric Mier, Dirk Habich, Wolfgang Lehner
Energy awareness of database systems has emerged as a critical research topic, since energy consumption is becoming a major limiter for their scalability. Recent energy-related hardware developments trend towards offering more and more configuration opportunities for the software to control its own energy consumption. Existing research so far mainly focused on leveraging this configuration spectrum to find the most energy-efficient configuration for specific operators or entire queries. In this demo, we introduce the concept of energy elasticity and propose the energy-control loop as an implementation of this concept. Energy elasticity refers to the ability of software to behave energy-proportional and energy-efficient at the same time while maintaining a certain quality of service. Thus, our system does not draw the least energy possible but the least energy necessary to still perform reasonably. We demonstrate our overall approach using a rich interactive GUI to give attendees the opportunity to learn more about our concept.
数据库系统的能源意识已经成为一个重要的研究课题,因为能源消耗正在成为数据库系统可扩展性的主要限制因素。最近与能源相关的硬件发展趋势是为软件提供越来越多的配置机会来控制自己的能源消耗。到目前为止,现有的研究主要集中在利用这个配置谱来为特定的操作符或整个查询找到最节能的配置。在这个演示中,我们介绍了能量弹性的概念,并提出了能量控制回路作为这一概念的实现。能量弹性是指软件在保持一定服务质量的同时表现出能量比例和节能的能力。因此,我们的系统并不是尽可能地消耗最少的能量,而是需要最少的能量来保持合理的运行。我们使用一个丰富的交互式GUI来展示我们的整体方法,让与会者有机会了解更多关于我们的概念。
{"title":"Energy Elasticity on Heterogeneous Hardware using Adaptive Resource Reconfiguration LIVE","authors":"A. Ungethüm, T. Kissinger, Willi-Wolfram Mentzel, Eric Mier, Dirk Habich, Wolfgang Lehner","doi":"10.1145/2882903.2899390","DOIUrl":"https://doi.org/10.1145/2882903.2899390","url":null,"abstract":"Energy awareness of database systems has emerged as a critical research topic, since energy consumption is becoming a major limiter for their scalability. Recent energy-related hardware developments trend towards offering more and more configuration opportunities for the software to control its own energy consumption. Existing research so far mainly focused on leveraging this configuration spectrum to find the most energy-efficient configuration for specific operators or entire queries. In this demo, we introduce the concept of energy elasticity and propose the energy-control loop as an implementation of this concept. Energy elasticity refers to the ability of software to behave energy-proportional and energy-efficient at the same time while maintaining a certain quality of service. Thus, our system does not draw the least energy possible but the least energy necessary to still perform reasonably. We demonstrate our overall approach using a rich interactive GUI to give attendees the opportunity to learn more about our concept.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75699071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Rheem: Enabling Multi-Platform Task Execution Rheem:启用多平台任务执行
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899414
D. Agrawal, M. Ba, Laure Berti-Équille, S. Chawla, A. Elmagarmid, Hossam M. Hammady, Yasser Idris, Zoi Kaoudi, Zuhair Khayyat, Sebastian Kruse, M. Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, N. Tang, Mohammed J. Zaki
Many emerging applications, from domains such as healthcare and oil & gas, require several data processing systems for complex analytics. This demo paper showcases system, a framework that provides multi-platform task execution for such applications. It features a three-layer data processing abstraction and a new query optimization approach for multi-platform settings. We will demonstrate the strengths of system by using real-world scenarios from three different applications, namely, machine learning, data cleaning, and data fusion.
许多新兴应用,如医疗保健和石油天然气等领域,需要多个数据处理系统来进行复杂的分析。这篇演示论文展示了system,一个为这些应用程序提供多平台任务执行的框架。它具有三层数据处理抽象和针对多平台设置的新的查询优化方法。我们将通过使用来自三个不同应用程序的真实场景来展示系统的优势,即机器学习,数据清理和数据融合。
{"title":"Rheem: Enabling Multi-Platform Task Execution","authors":"D. Agrawal, M. Ba, Laure Berti-Équille, S. Chawla, A. Elmagarmid, Hossam M. Hammady, Yasser Idris, Zoi Kaoudi, Zuhair Khayyat, Sebastian Kruse, M. Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, N. Tang, Mohammed J. Zaki","doi":"10.1145/2882903.2899414","DOIUrl":"https://doi.org/10.1145/2882903.2899414","url":null,"abstract":"Many emerging applications, from domains such as healthcare and oil & gas, require several data processing systems for complex analytics. This demo paper showcases system, a framework that provides multi-platform task execution for such applications. It features a three-layer data processing abstraction and a new query optimization approach for multi-platform settings. We will demonstrate the strengths of system by using real-world scenarios from three different applications, namely, machine learning, data cleaning, and data fusion.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72958395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Exploring Visualization of Data Transforms 探索数据转换的可视化
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914837
Larry Xu
In the context of data exploration, users often interact with relational database systems in an interactive query session to form useful insights. Each query a user executes can potentially transform a resultset in complex ways. We explore some of the challenges in understanding these transformations, and how these challenges can be solved through more informative visual representations of data transforms. We present the concept of "tweening" of resultsets as a method of incrementally visualizing data transformations, and explore approaches towards generating these resultset tweens. Through a series of user studies, we evaluate tweening as an effective method of understanding the changes that result from data transformations.
在数据探索的上下文中,用户经常在交互式查询会话中与关系数据库系统交互,以形成有用的见解。用户执行的每个查询都可能以复杂的方式转换结果集。我们将探讨理解这些转换的一些挑战,以及如何通过数据转换的更多信息可视化表示来解决这些挑战。我们提出了结果集“渐变”的概念,作为一种增量可视化数据转换的方法,并探索了生成这些结果集渐变的方法。通过一系列的用户研究,我们评估了渐变作为一种有效的方法来理解由数据转换引起的变化。
{"title":"Exploring Visualization of Data Transforms","authors":"Larry Xu","doi":"10.1145/2882903.2914837","DOIUrl":"https://doi.org/10.1145/2882903.2914837","url":null,"abstract":"In the context of data exploration, users often interact with relational database systems in an interactive query session to form useful insights. Each query a user executes can potentially transform a resultset in complex ways. We explore some of the challenges in understanding these transformations, and how these challenges can be solved through more informative visual representations of data transforms. We present the concept of \"tweening\" of resultsets as a method of incrementally visualizing data transformations, and explore approaches towards generating these resultset tweens. Through a series of user studies, we evaluate tweening as an effective method of understanding the changes that result from data transformations.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81945157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads ERMIA:用于异构工作负载的快速内存优化数据库系统
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882905
Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, I. Pandis
Large main memories and massively parallel processors have triggered not only a resurgence of high-performance transaction processing systems optimized for large main-memory and massively parallel processors, but also an increasing demand for processing heterogeneous workloads that include read-mostly transactions. Many modern transaction processing systems adopt a lightweight optimistic concurrency control (OCC) scheme to leverage its low overhead in low contention workloads. However, we observe that the lightweight OCC is not suitable for heterogeneous workloads, causing significant starvation of read-mostly transactions and overall performance degradation. In this paper, we present ERMIA, a memory-optimized database system built from scratch to cater the need of handling heterogeneous workloads. ERMIA adopts snapshot isolation concurrency control to coordinate heterogeneous transactions and provides serializability when desired. Its physical layer supports the concurrency control schemes in a scalable way. Experimental results show that ERMIA delivers comparable or superior performance and near-linear scalability in a variety of workloads, compared to a recent lightweight OCC-based system. At the same time, ERMIA maintains high throughput on read-mostly transactions when the performance of the OCC-based system drops by orders of magnitude.
大型主存储器和大规模并行处理器不仅引发了针对大型主存储器和大规模并行处理器进行优化的高性能事务处理系统的复兴,而且还引发了对处理异构工作负载(包括以读为主的事务)的日益增长的需求。许多现代事务处理系统采用轻量级乐观并发控制(OCC)方案,以在低争用工作负载中利用其低开销。然而,我们观察到轻量级OCC不适合异构工作负载,这会导致大量读取事务的缺乏和整体性能下降。在本文中,我们介绍了ERMIA,一个从头构建的内存优化数据库系统,以满足处理异构工作负载的需要。ERMIA采用快照隔离并发控制来协调异构事务,并在需要时提供序列化性。它的物理层以可伸缩的方式支持并发控制方案。实验结果表明,与最近基于occ的轻量级系统相比,ERMIA在各种工作负载下提供了相当或更好的性能和近线性可伸缩性。同时,当基于occ的系统的性能下降了几个数量级时,ERMIA在大多数读事务上保持高吞吐量。
{"title":"ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads","authors":"Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, I. Pandis","doi":"10.1145/2882903.2882905","DOIUrl":"https://doi.org/10.1145/2882903.2882905","url":null,"abstract":"Large main memories and massively parallel processors have triggered not only a resurgence of high-performance transaction processing systems optimized for large main-memory and massively parallel processors, but also an increasing demand for processing heterogeneous workloads that include read-mostly transactions. Many modern transaction processing systems adopt a lightweight optimistic concurrency control (OCC) scheme to leverage its low overhead in low contention workloads. However, we observe that the lightweight OCC is not suitable for heterogeneous workloads, causing significant starvation of read-mostly transactions and overall performance degradation. In this paper, we present ERMIA, a memory-optimized database system built from scratch to cater the need of handling heterogeneous workloads. ERMIA adopts snapshot isolation concurrency control to coordinate heterogeneous transactions and provides serializability when desired. Its physical layer supports the concurrency control schemes in a scalable way. Experimental results show that ERMIA delivers comparable or superior performance and near-linear scalability in a variety of workloads, compared to a recent lightweight OCC-based system. At the same time, ERMIA maintains high throughput on read-mostly transactions when the performance of the OCC-based system drops by orders of magnitude.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80818181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
CLAMS: Bringing Quality to Data Lakes 蛤蜊:为数据湖带来质量
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899391
Mina H. Farid, Alexandra Roatis, I. Ilyas, H. Hoffmann, Xu Chu
With the increasing incentive of enterprises to ingest as much data as they can in what is commonly referred to as "data lakes", and with the recent development of multiple technologies to support this "load-first" paradigm, the new environment presents serious data management challenges. Among them, the assessment of data quality and cleaning large volumes of heterogeneous data sources become essential tasks in unveiling the value of big data. The coveted use of unstructured and semi-structured data in large volumes makes current data cleaning tools (primarily designed for relational data) not directly adoptable. We present CLAMS, a system to discover and enforce expressive integrity constraints from large amounts of lake data with very limited schema information (e.g., represented as RDF triples). This demonstration shows how CLAMS is able to discover the constraints and the schemas they are defined on simultaneously. CLAMS also introduces a scale-out solution to efficiently detect errors in the raw data. CLAMS interacts with human experts to both validate the discovered constraints and to suggest data repairs. CLAMS has been deployed in a real large-scale enterprise data lake and was experimented with a real data set of 1.2 billion triples. It has been able to spot multiple obscure data inconsistencies and errors early in the data processing stack, providing huge value to the enterprise.
随着企业越来越多地在通常被称为“数据湖”的地方摄取尽可能多的数据,以及最近支持这种“负载优先”范式的多种技术的发展,新环境提出了严峻的数据管理挑战。其中,数据质量评估和海量异构数据源清理成为揭示大数据价值的重要任务。对大量非结构化和半结构化数据的渴望使得当前的数据清理工具(主要是为关系数据设计的)不能直接采用。我们提出了CLAMS,这是一个系统,它可以用非常有限的模式信息(例如,用RDF三元组表示)从大量湖泊数据中发现和执行表达性完整性约束。这个演示展示了CLAMS如何能够同时发现约束和它们所定义的模式。CLAMS还引入了横向扩展解决方案,以有效地检测原始数据中的错误。CLAMS与人类专家进行交互,以验证发现的约束并建议数据修复。CLAMS已经部署在一个真正的大型企业数据湖中,并在一个包含12亿个三元组的真实数据集上进行了实验。它能够在数据处理堆栈的早期发现多个模糊的数据不一致和错误,为企业提供了巨大的价值。
{"title":"CLAMS: Bringing Quality to Data Lakes","authors":"Mina H. Farid, Alexandra Roatis, I. Ilyas, H. Hoffmann, Xu Chu","doi":"10.1145/2882903.2899391","DOIUrl":"https://doi.org/10.1145/2882903.2899391","url":null,"abstract":"With the increasing incentive of enterprises to ingest as much data as they can in what is commonly referred to as \"data lakes\", and with the recent development of multiple technologies to support this \"load-first\" paradigm, the new environment presents serious data management challenges. Among them, the assessment of data quality and cleaning large volumes of heterogeneous data sources become essential tasks in unveiling the value of big data. The coveted use of unstructured and semi-structured data in large volumes makes current data cleaning tools (primarily designed for relational data) not directly adoptable. We present CLAMS, a system to discover and enforce expressive integrity constraints from large amounts of lake data with very limited schema information (e.g., represented as RDF triples). This demonstration shows how CLAMS is able to discover the constraints and the schemas they are defined on simultaneously. CLAMS also introduces a scale-out solution to efficiently detect errors in the raw data. CLAMS interacts with human experts to both validate the discovered constraints and to suggest data repairs. CLAMS has been deployed in a real large-scale enterprise data lake and was experimented with a real data set of 1.2 billion triples. It has been able to spot multiple obscure data inconsistencies and errors early in the data processing stack, providing huge value to the enterprise.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80406317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Functional Dependencies for Graphs 图的函数依赖
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915232
W. Fan, Yinghui Wu, Jingbo Xu
We propose a class of functional dependencies for graphs, referred to as GFDs. GFDs capture both attribute-value dependencies and topological structures of entities, and subsume conditional functional dependencies (CFDs) as a special case. We show that the satisfiability and implication problems for GFDs are coNP-complete and NP-complete, respectively, no worse than their CFD counterparts. We also show that the validation problem for GFDs is coNP-complete. Despite the intractability, we develop parallel scalable algorithms for catching violations of GFDs in large-scale graphs. Using real-life and synthetic data, we experimentally verify that GFDs provide an effective approach to detecting inconsistencies in knowledge and social graphs.
我们提出了一类图的函数依赖,称为gfd。GFDs捕获实体的属性值依赖关系和拓扑结构,并将条件功能依赖关系(cfd)作为特殊情况纳入其中。我们证明了GFDs的可满足性和蕴涵性问题分别是conp完备和np完备的,并不比CFD的同类问题差。我们还证明了GFDs的验证问题是conp完备的。尽管困难,我们开发了并行可扩展算法来捕捉大规模图中gfd的违反。利用现实生活和合成数据,我们实验验证了GFDs提供了一种有效的方法来检测知识和社会图中的不一致性。
{"title":"Functional Dependencies for Graphs","authors":"W. Fan, Yinghui Wu, Jingbo Xu","doi":"10.1145/2882903.2915232","DOIUrl":"https://doi.org/10.1145/2882903.2915232","url":null,"abstract":"We propose a class of functional dependencies for graphs, referred to as GFDs. GFDs capture both attribute-value dependencies and topological structures of entities, and subsume conditional functional dependencies (CFDs) as a special case. We show that the satisfiability and implication problems for GFDs are coNP-complete and NP-complete, respectively, no worse than their CFD counterparts. We also show that the validation problem for GFDs is coNP-complete. Despite the intractability, we develop parallel scalable algorithms for catching violations of GFDs in large-scale graphs. Using real-life and synthetic data, we experimentally verify that GFDs provide an effective approach to detecting inconsistencies in knowledge and social graphs.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83388592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
Making the Case for Query-by-Voice with EchoQuery 用EchoQuery实现语音查询
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899394
Gabriel Lyons, Vinh Q. Tran, Carsten Binnig, U. Çetintemel, Tim Kraska
Recent advances in automatic speech recognition and natural language processing have led to a new generation of robust voice-based interfaces. Yet, there is very little work on using voice-based interfaces to query database systems. In fact, one might even wonder who in her right mind would want to query a database system using voice commands! With this demonstration, we make the case for querying database systems using a voice-based interface, a new querying and interaction paradigm we call Query-by-Voice (QbV). We will demonstrate the practicality and utility of QbV for relational DBMSs using a using a proof-of-concept system called EchoQuery. To achieve a smooth and intuitive interaction, the query interface of EchoQuery is inspired by casual human-to-human conversations. Our demo will show that voice-based interfaces present an intuitive means of querying and consuming data in a database. It will also highlight the unique advantages of QbV over the more traditional approaches, text-based or visual interfaces, for applications where context switching is too expensive, too risky or even not possible at all.
在自动语音识别和自然语言处理方面的最新进展导致了新一代强大的基于语音的界面。然而,使用基于语音的接口来查询数据库系统的工作很少。事实上,人们甚至可能会想知道,谁会想要使用语音命令查询数据库系统!通过这个演示,我们将使用基于语音的接口来查询数据库系统,这是一种新的查询和交互范例,我们称之为语音查询(Query-by-Voice, QbV)。我们将使用一个名为EchoQuery的概念验证系统来演示QbV对关系dbms的实用性和实用性。为了实现流畅和直观的交互,EchoQuery的查询界面的灵感来自于随意的人与人之间的对话。我们的演示将展示基于语音的接口提供了查询和使用数据库中数据的直观方法。它还将强调QbV相对于传统方法(基于文本或可视化界面)的独特优势,这些方法适用于上下文切换过于昂贵、风险太大甚至根本不可能实现的应用程序。
{"title":"Making the Case for Query-by-Voice with EchoQuery","authors":"Gabriel Lyons, Vinh Q. Tran, Carsten Binnig, U. Çetintemel, Tim Kraska","doi":"10.1145/2882903.2899394","DOIUrl":"https://doi.org/10.1145/2882903.2899394","url":null,"abstract":"Recent advances in automatic speech recognition and natural language processing have led to a new generation of robust voice-based interfaces. Yet, there is very little work on using voice-based interfaces to query database systems. In fact, one might even wonder who in her right mind would want to query a database system using voice commands! With this demonstration, we make the case for querying database systems using a voice-based interface, a new querying and interaction paradigm we call Query-by-Voice (QbV). We will demonstrate the practicality and utility of QbV for relational DBMSs using a using a proof-of-concept system called EchoQuery. To achieve a smooth and intuitive interaction, the query interface of EchoQuery is inspired by casual human-to-human conversations. Our demo will show that voice-based interfaces present an intuitive means of querying and consuming data in a database. It will also highlight the unique advantages of QbV over the more traditional approaches, text-based or visual interfaces, for applications where context switching is too expensive, too risky or even not possible at all.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81471810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
The CloudMdsQL Multistore System CloudMdsQL多存储系统
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899400
B. Kolev, Carlyna Bondiombouy, P. Valduriez, R. Jiménez-Peris, Raquel Pau, José Pereira
The blooming of different cloud data management infrastructures has turned multistore systems to a major topic in the nowadays cloud landscape. In this demonstration, we present a Cloud Multidatastore Query Language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store's native query interface. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized. Within our demonstration, we focus on two use cases each involving four diverse data stores (graph, document, relational, and key-value) with its corresponding CloudMdsQL queries. The query execution flows are visualized by an embedded real-time monitoring subsystem. The users can also try out different ad-hoc queries, not necessarily in the context of the use cases.
各种云数据管理基础设施的蓬勃发展使多存储系统成为当今云领域的一个主要话题。在这个演示中,我们介绍了一个云多数据存储查询语言(CloudMdsQL)及其查询引擎。CloudMdsQL是一种类似sql的函数式语言,能够在单个查询中查询多个异构数据存储(关系和NoSQL),该查询可能包含对每个数据存储的本机查询接口的嵌入式调用。主要的创新是,CloudMdsQL查询可以利用本地数据存储的全部功能,只需允许将一些本地数据存储的本地查询(例如针对图数据库的宽度优先搜索查询)作为函数调用,并同时进行优化。在我们的演示中,我们重点关注两个用例,每个用例涉及四个不同的数据存储(图、文档、关系和键值)及其相应的CloudMdsQL查询。查询执行流通过嵌入式实时监控子系统实现可视化。用户还可以尝试不同的特别查询,而不一定是在用例的上下文中。
{"title":"The CloudMdsQL Multistore System","authors":"B. Kolev, Carlyna Bondiombouy, P. Valduriez, R. Jiménez-Peris, Raquel Pau, José Pereira","doi":"10.1145/2882903.2899400","DOIUrl":"https://doi.org/10.1145/2882903.2899400","url":null,"abstract":"The blooming of different cloud data management infrastructures has turned multistore systems to a major topic in the nowadays cloud landscape. In this demonstration, we present a Cloud Multidatastore Query Language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store's native query interface. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized. Within our demonstration, we focus on two use cases each involving four diverse data stores (graph, document, relational, and key-value) with its corresponding CloudMdsQL queries. The query execution flows are visualized by an embedded real-time monitoring subsystem. The users can also try out different ad-hoc queries, not necessarily in the context of the use cases.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78747902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning ActiveClean:一个用于现代机器学习的交互式数据清理框架
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899409
S. Krishnan, M. Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu
Databases can be corrupted with various errors such as missing, incorrect, or inconsistent values. Increasingly, modern data analysis pipelines involve Machine Learning, and the effects of dirty data can be difficult to debug.Dirty data is often sparse, and naive sampling solutions are not suited for high-dimensional models. We propose ActiveClean, a progressive framework for training Machine Learning models with data cleaning. Our framework updates a model iteratively as the analyst cleans small batches of data, and includes numerous optimizations such as importance weighting and dirty data detection. We designed a visual interface to wrap around this framework and demonstrate ActiveClean for a video classification problem and a topic modeling problem.
数据库可能会因各种错误而损坏,例如丢失、不正确或不一致的值。现代数据分析管道越来越多地涉及到机器学习,而脏数据的影响可能难以调试。脏数据通常是稀疏的,朴素采样解决方案不适合高维模型。我们提出ActiveClean,这是一个渐进式框架,用于训练具有数据清洗的机器学习模型。我们的框架在分析人员清理小批量数据时迭代地更新模型,并包括许多优化,如重要性加权和脏数据检测。我们设计了一个可视化界面来封装这个框架,并演示ActiveClean用于视频分类问题和主题建模问题。
{"title":"ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning","authors":"S. Krishnan, M. Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu","doi":"10.1145/2882903.2899409","DOIUrl":"https://doi.org/10.1145/2882903.2899409","url":null,"abstract":"Databases can be corrupted with various errors such as missing, incorrect, or inconsistent values. Increasingly, modern data analysis pipelines involve Machine Learning, and the effects of dirty data can be difficult to debug.Dirty data is often sparse, and naive sampling solutions are not suited for high-dimensional models. We propose ActiveClean, a progressive framework for training Machine Learning models with data cleaning. Our framework updates a model iteratively as the analyst cleans small batches of data, and includes numerous optimizations such as importance weighting and dirty data detection. We designed a visual interface to wrap around this framework and demonstrate ActiveClean for a video classification problem and a topic modeling problem.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83889121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
T-Part: Partitioning of Transactions for Forward-Pushing in Deterministic Database Systems 第三部分:确定性数据库系统中前推的事务划分
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915227
Shan-Hung Wu, Tsai-Yu Feng, Meng-Kai Liao, Shao-Kan Pi, Yu-Shan Lin
Deterministic database systems have been shown to yield high throughput on a cluster of commodity machines while ensuring the strong consistency between replicas, provided that the data can be well-partitioned on these machines. However, data partitioning can be suboptimal for many reasons in real-world applications. In this paper, we present T-Part, a transaction execution engine that partitions transactions in a deterministic database system to deal with the unforeseeable workloads or workloads whose data are hard to partition. By modeling the dependency between transactions as a T-graph and continuously partitioning that graph, T-Part allows each transaction to know which later transactions on other machines will read its writes so that it can push forward the writes to those later transactions immediately after committing. This forward-pushing reduces the chance that the later transactions stall due to the unavailability of remote data. We implement a prototype for T-Part. Extensive experiments are conducted and the results demonstrate the effectiveness of T-Part.
确定性数据库系统已被证明可以在商用机器集群上产生高吞吐量,同时确保副本之间的强一致性,前提是数据可以在这些机器上进行良好的分区。然而,在实际应用程序中,由于许多原因,数据分区可能不是最优的。在本文中,我们提出了T-Part,一个事务执行引擎,它在确定性数据库系统中对事务进行分区,以处理不可预见的工作负载或数据难以分区的工作负载。通过将事务之间的依赖关系建模为t图并连续划分该图,T-Part允许每个事务知道其他机器上哪些稍后的事务将读取其写操作,以便在提交后立即将写操作推进到那些稍后的事务。这种前推减少了由于远程数据不可用而导致后期事务停滞的可能性。我们实现了T-Part的原型。进行了大量的实验,结果证明了T-Part的有效性。
{"title":"T-Part: Partitioning of Transactions for Forward-Pushing in Deterministic Database Systems","authors":"Shan-Hung Wu, Tsai-Yu Feng, Meng-Kai Liao, Shao-Kan Pi, Yu-Shan Lin","doi":"10.1145/2882903.2915227","DOIUrl":"https://doi.org/10.1145/2882903.2915227","url":null,"abstract":"Deterministic database systems have been shown to yield high throughput on a cluster of commodity machines while ensuring the strong consistency between replicas, provided that the data can be well-partitioned on these machines. However, data partitioning can be suboptimal for many reasons in real-world applications. In this paper, we present T-Part, a transaction execution engine that partitions transactions in a deterministic database system to deal with the unforeseeable workloads or workloads whose data are hard to partition. By modeling the dependency between transactions as a T-graph and continuously partitioning that graph, T-Part allows each transaction to know which later transactions on other machines will read its writes so that it can push forward the writes to those later transactions immediately after committing. This forward-pushing reduces the chance that the later transactions stall due to the unavailability of remote data. We implement a prototype for T-Part. Extensive experiments are conducted and the results demonstrate the effectiveness of T-Part.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86126791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Proceedings of the 2016 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1