首页 > 最新文献

Proceedings of the 2016 International Conference on Management of Data最新文献

英文 中文
Big Graph Analytics Systems 大图分析系统
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2912566
D. Yan, Yingyi Bu, Yuanyuan Tian, A. Deshpande, James Cheng
In recent years we have witnessed a surging interest in developing Big Graph processing systems. To date, tens of Big Graph systems have been proposed. This tutorial provides a timely and comprehensive review of existing Big Graph systems, and summarizes their pros and cons from various perspectives. We start from the existing vertex-centric systems, which which a programmer thinks intuitively like a vertex when developing parallel graph algorithms. We then introduce systems that adopt other computation paradigms and execution settings. The topics covered in this tutorial include programming models and algorithm design, computation models, communication mechanisms, out-of-core support, fault tolerance, dynamic graph support, and so on. We also highlight future research opportunities on Big Graph analytics.
近年来,我们见证了对开发大图处理系统的兴趣激增。到目前为止,已经提出了数十个大图系统。本教程提供了对现有Big Graph系统的及时和全面的回顾,并从不同的角度总结了它们的优缺点。我们从现有的以顶点为中心的系统开始,程序员在开发并行图算法时直观地将其视为顶点。然后介绍采用其他计算范式和执行设置的系统。本教程涵盖的主题包括编程模型和算法设计、计算模型、通信机制、核外支持、容错、动态图支持等等。我们还强调了大图分析的未来研究机会。
{"title":"Big Graph Analytics Systems","authors":"D. Yan, Yingyi Bu, Yuanyuan Tian, A. Deshpande, James Cheng","doi":"10.1145/2882903.2912566","DOIUrl":"https://doi.org/10.1145/2882903.2912566","url":null,"abstract":"In recent years we have witnessed a surging interest in developing Big Graph processing systems. To date, tens of Big Graph systems have been proposed. This tutorial provides a timely and comprehensive review of existing Big Graph systems, and summarizes their pros and cons from various perspectives. We start from the existing vertex-centric systems, which which a programmer thinks intuitively like a vertex when developing parallel graph algorithms. We then introduce systems that adopt other computation paradigms and execution settings. The topics covered in this tutorial include programming models and algorithm design, computation models, communication mechanisms, out-of-core support, fault tolerance, dynamic graph support, and so on. We also highlight future research opportunities on Big Graph analytics.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80836635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration 使用图形处理单元实现具有BLU加速的DB2快速查询处理的混合设计:技术演示
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2903735
S. Meraji, Berni Schiefer, Lan Pham, Lee Chu, Peter Kokosielis, Adam J. Storm, Wayne Young, Chang Ge, Geoffrey Ng, Kajan Kanagaratnam
In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2's column store technology). Moreover, we show the benefits and problems of using hardware accelerators (more specifically GPUs) in a real commercial Relational Database Management System(RDBMS).We investigate the effect of off-loading specific database operations to a GPU, and show how doing so results in a significant performance improvement. We then demonstrate that for some queries, using just CPU to perform the entire operation is more beneficial. While we use some of Nvidia's fast kernels for operations like sort, we have also developed our own high performance kernels for operations such as group by and aggregation. Finally, we show how we use a dynamic design that can make use of optimizer metadata to intelligently choose a GPU kernel to run. For the first time in the literature, we use benchmarks representative of customer environments to gauge the performance of our prototype, the results of which show that we can get a speed increase upwards of 2x, using a realistic set of queries.
在本文中,我们将展示如何使用Nvidia gpu和主机CPU内核在DB2数据库中使用BLU加速(DB2的列存储技术)来实现更快的查询处理。此外,我们还展示了在实际的商业关系数据库管理系统(RDBMS)中使用硬件加速器(更具体地说是gpu)的好处和问题。我们研究了将特定数据库操作卸载到GPU上的效果,并展示了这样做是如何显著提高性能的。然后,我们演示了对于某些查询,仅使用CPU来执行整个操作更为有益。虽然我们使用Nvidia的一些快速内核来进行排序等操作,但我们也开发了自己的高性能内核来进行分组和聚合等操作。最后,我们将展示如何使用动态设计,该设计可以利用优化器元数据来智能地选择要运行的GPU内核。在文献中,我们第一次使用代表客户环境的基准测试来衡量原型的性能,其结果表明,使用一组真实的查询,我们可以将速度提高2倍以上。
{"title":"Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration","authors":"S. Meraji, Berni Schiefer, Lan Pham, Lee Chu, Peter Kokosielis, Adam J. Storm, Wayne Young, Chang Ge, Geoffrey Ng, Kajan Kanagaratnam","doi":"10.1145/2882903.2903735","DOIUrl":"https://doi.org/10.1145/2882903.2903735","url":null,"abstract":"In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2's column store technology). Moreover, we show the benefits and problems of using hardware accelerators (more specifically GPUs) in a real commercial Relational Database Management System(RDBMS).We investigate the effect of off-loading specific database operations to a GPU, and show how doing so results in a significant performance improvement. We then demonstrate that for some queries, using just CPU to perform the entire operation is more beneficial. While we use some of Nvidia's fast kernels for operations like sort, we have also developed our own high performance kernels for operations such as group by and aggregation. Finally, we show how we use a dynamic design that can make use of optimizer metadata to intelligently choose a GPU kernel to run. For the first time in the literature, we use benchmarks representative of customer environments to gauge the performance of our prototype, the results of which show that we can get a speed increase upwards of 2x, using a realistic set of queries.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90517921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Vectorizing an In Situ Query Engine 就地查询引擎的矢量化
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914829
Panagiotis Sioulas, A. Ailamaki
Database systems serve a wide range of use cases efficiently, but require data to be loaded and adapted to the system's execution engine. This pre-processing step is a bottleneck to the analysis of the increasingly large and heterogeneous datasets. Therefore, numerous research efforts advocate for querying each dataset in situ,i.e., without pre-loading it in a DBMS. On the other hand, performing analysis over raw data entails numerous overheads because of the potentially inefficient data representations. In this paper, we investigate the effect of vector processing on raw data querying. We enhance the operators of a query engine to use SIMD operations. Specifically, we examine the effect of SIMD on two different cases: the scan operators that perform the CPU-intensive task of input parsing, and the part of the query pipeline that performs a selection and computes an aggregate. We show that a vectorized approach has a lot of potential to improve performance, which nevertheless comes with trade-offs.
数据库系统有效地服务于广泛的用例,但需要加载数据并使其适应系统的执行引擎。这一预处理步骤是分析日益庞大和异构数据集的瓶颈。因此,许多研究工作提倡就地查询每个数据集,即。,而无需在DBMS中预加载它。另一方面,对原始数据执行分析会带来大量开销,因为可能存在低效的数据表示。本文研究了向量处理对原始数据查询的影响。我们增强了查询引擎的操作符,以使用SIMD操作。具体来说,我们将研究SIMD在两种不同情况下的影响:执行输入解析的cpu密集型任务的扫描操作符,以及执行选择和计算聚合的查询管道部分。我们展示了矢量化方法在提高性能方面有很大的潜力,然而这是有代价的。
{"title":"Vectorizing an In Situ Query Engine","authors":"Panagiotis Sioulas, A. Ailamaki","doi":"10.1145/2882903.2914829","DOIUrl":"https://doi.org/10.1145/2882903.2914829","url":null,"abstract":"Database systems serve a wide range of use cases efficiently, but require data to be loaded and adapted to the system's execution engine. This pre-processing step is a bottleneck to the analysis of the increasingly large and heterogeneous datasets. Therefore, numerous research efforts advocate for querying each dataset in situ,i.e., without pre-loading it in a DBMS. On the other hand, performing analysis over raw data entails numerous overheads because of the potentially inefficient data representations. In this paper, we investigate the effect of vector processing on raw data querying. We enhance the operators of a query engine to use SIMD operations. Specifically, we examine the effect of SIMD on two different cases: the scan operators that perform the CPU-intensive task of input parsing, and the part of the query pipeline that performs a selection and computes an aggregate. We show that a vectorized approach has a lot of potential to improve performance, which nevertheless comes with trade-offs.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75273663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes 使用混合索引降低主存OLTP数据库的存储开销
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915222
Huanchen Zhang, D. Andersen, Andrew Pavlo, M. Kaminsky, Lin Ma, Rui Shen
Using indexes for query execution is crucial for achieving high performance in modern on-line transaction processing databases. For a main-memory database, however, these indexes consume a large fraction of the total memory available and are thus a major source of storage overhead of in-memory databases. To reduce this overhead, we propose using a two-stage index: The first stage ingests all incoming entries and is kept small for fast read and write operations. The index periodically migrates entries from the first stage to the second, which uses a more compact, read-optimized data structure. Our first contribution is hybrid index, a dual-stage index architecture that achieves both space efficiency and high performance. Our second contribution is Dual-Stage Transformation (DST), a set of guidelines for converting any order-preserving index structure into a hybrid index. Our third contribution is applying DST to four popular order-preserving index structures and evaluating them in both standalone microbenchmarks and a full in-memory DBMS using several transaction processing workloads. Our results show that hybrid indexes provide comparable throughput to the original ones while reducing the memory overhead by up to 70%.
在现代在线事务处理数据库中,使用索引执行查询对于实现高性能至关重要。但是,对于主内存数据库,这些索引消耗了可用内存总量的很大一部分,因此是内存数据库存储开销的主要来源。为了减少这种开销,我们建议使用两阶段索引:第一阶段摄取所有传入条目,并且保持较小以进行快速读写操作。索引定期将条目从第一阶段迁移到第二阶段,第二阶段使用更紧凑、读优化的数据结构。我们的第一个贡献是混合索引,这是一种双阶段索引架构,可以同时实现空间效率和高性能。我们的第二个贡献是双阶段转换(Dual-Stage Transformation, DST),这是一组将任何保持顺序的索引结构转换为混合索引的指南。我们的第三个贡献是将DST应用于四种流行的保序索引结构,并在独立微基准测试和使用多个事务处理工作负载的完整内存DBMS中对它们进行评估。我们的结果表明,混合索引提供了与原始索引相当的吞吐量,同时将内存开销减少了高达70%。
{"title":"Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes","authors":"Huanchen Zhang, D. Andersen, Andrew Pavlo, M. Kaminsky, Lin Ma, Rui Shen","doi":"10.1145/2882903.2915222","DOIUrl":"https://doi.org/10.1145/2882903.2915222","url":null,"abstract":"Using indexes for query execution is crucial for achieving high performance in modern on-line transaction processing databases. For a main-memory database, however, these indexes consume a large fraction of the total memory available and are thus a major source of storage overhead of in-memory databases. To reduce this overhead, we propose using a two-stage index: The first stage ingests all incoming entries and is kept small for fast read and write operations. The index periodically migrates entries from the first stage to the second, which uses a more compact, read-optimized data structure. Our first contribution is hybrid index, a dual-stage index architecture that achieves both space efficiency and high performance. Our second contribution is Dual-Stage Transformation (DST), a set of guidelines for converting any order-preserving index structure into a hybrid index. Our third contribution is applying DST to four popular order-preserving index structures and evaluating them in both standalone microbenchmarks and a full in-memory DBMS using several transaction processing workloads. Our results show that hybrid indexes provide comparable throughput to the original ones while reducing the memory overhead by up to 70%.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75421296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
SnappyData: A Hybrid Transactional Analytical Store Built On Spark SnappyData:一个基于Spark的混合事务性分析存储
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899408
Jags Ramnarayan, Barzan Mozafari, S. Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh S. Mahajan, Rishitesh Mishra, Kishor Bachhav
In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.
近年来,我们的客户对使用不同产品的组合来处理他们的流、事务和分析需求的传统方法表示失望。以定制方式拼接异构环境的常见做法增加了开发复杂性和总拥有成本,从而导致了巨大的生产问题。通过SnappyData这个开源平台,我们提出了一个统一的实时操作分析引擎,在一个集成的解决方案中提供流分析、OLTP和OLAP。我们通过Apache Spark(作为一个大数据计算引擎)和GemFire(作为一个具有横向扩展SQL语义的内存事务存储)的无缝集成来实现这个平台。在这个演示中,在展示了几个用例场景之后,我们展示了SnappyData作为我们的内存解决方案,用于在面对大数据量或高速流时交付真正的交互式分析(即,几秒钟)。我们展示了SnappyData可以利用最先进的近似查询处理技术和各种数据概要。最后,我们允许用户定义各种高级精度契约(HAC),以直观的方式与SnappyData交流他们的精度需求。
{"title":"SnappyData: A Hybrid Transactional Analytical Store Built On Spark","authors":"Jags Ramnarayan, Barzan Mozafari, S. Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh S. Mahajan, Rishitesh Mishra, Kishor Bachhav","doi":"10.1145/2882903.2899408","DOIUrl":"https://doi.org/10.1145/2882903.2899408","url":null,"abstract":"In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82267246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
ReproZip: Computational Reproducibility With Ease rerepzip:轻松计算再现性
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899401
F. Chirigati, Rémi Rampin, D. Shasha, J. Freire
We present ReproZip, the recommended packaging tool for the SIGMOD Reproducibility Review. ReproZip was designed to simplify the process of making an existing computational experiment reproducible across platforms, even when the experiment was put together without reproducibility in mind. The tool creates a self-contained package for an experiment by automatically tracking and identifying all its required dependencies. The researcher can share the package with others, who can then use ReproZip to unpack the experiment, reproduce the findings on their favorite operating system, as well as modify the original experiment for reuse in new research, all with little effort. The demo will consist of examples of non-trivial experiments, showing how these can be packed in a Linux machine and reproduced on different machines and operating systems. Demo visitors will also be able to pack and reproduce their own experiments.
我们介绍了repzip,这是SIGMOD可再现性审查推荐的打包工具。repzip旨在简化现有计算实验跨平台可重复性的过程,即使实验放在一起时没有考虑可重复性。该工具通过自动跟踪和识别所有所需的依赖关系,为实验创建一个自包含的包。研究人员可以与其他人共享这个包,然后其他人可以使用repzip来解压缩实验,在他们喜欢的操作系统上复制发现,以及修改原始实验以便在新的研究中重用,所有这些都不需要花费多少精力。该演示将包含一些重要的实验示例,展示如何将这些实验打包到Linux机器中,并在不同的机器和操作系统上重现。演示参观者还可以打包并复制他们自己的实验。
{"title":"ReproZip: Computational Reproducibility With Ease","authors":"F. Chirigati, Rémi Rampin, D. Shasha, J. Freire","doi":"10.1145/2882903.2899401","DOIUrl":"https://doi.org/10.1145/2882903.2899401","url":null,"abstract":"We present ReproZip, the recommended packaging tool for the SIGMOD Reproducibility Review. ReproZip was designed to simplify the process of making an existing computational experiment reproducible across platforms, even when the experiment was put together without reproducibility in mind. The tool creates a self-contained package for an experiment by automatically tracking and identifying all its required dependencies. The researcher can share the package with others, who can then use ReproZip to unpack the experiment, reproduce the findings on their favorite operating system, as well as modify the original experiment for reuse in new research, all with little effort. The demo will consist of examples of non-trivial experiments, showing how these can be packed in a Linux machine and reproduced on different machines and operating systems. Demo visitors will also be able to pack and reproduce their own experiments.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87008327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
Microblogs Data Management Systems: Querying, Analysis, and Visualization 微博数据管理系统:查询、分析和可视化
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2912570
M. Mokbel, A. Magdy
Microblogs data, e.g., tweets, reviews, news comments, and social media comments, has gained considerable attention in recent years due to its popularity and rich contents. Nowadays, microblogs applications span a wide spectrum of interests, including analyzing events and users activities and critical applications like discovering health issues and rescue services. Consequently, major research efforts are spent to manage, analyze, and visualize microblogs data to support different applications. In this tutorial, we give a 1.5 hours overview about microblogs data management, analysis, visualization, and systems. The tutorial gives a comprehensive review for research on core data management components to support microblogs queries at scale. This includes system-level issues and on-going work on supporting microblogs data through the rising wave of big data systems. In addition, the tutorial reviews research on microblogs data analysis and visualization. Through its different parts, the tutorial highlights the challenges and opportunities in microblogs data research.
微博数据,如推文、评论、新闻评论、社交媒体评论等,近年来因其广受欢迎和内容丰富而备受关注。如今,微博应用涵盖了广泛的兴趣范围,包括分析事件和用户活动,以及发现健康问题和救援服务等关键应用。因此,主要的研究工作都花在管理、分析和可视化微博数据上,以支持不同的应用程序。在本教程中,我们将用1.5小时概述微博数据管理、分析、可视化和系统。本教程全面回顾了支持大规模微博查询的核心数据管理组件的研究。这包括系统层面的问题,以及通过正在兴起的大数据系统支持微博数据的持续工作。此外,本教程还回顾了微博数据分析和可视化方面的研究。通过其不同的部分,该教程突出了微博数据研究的挑战和机遇。
{"title":"Microblogs Data Management Systems: Querying, Analysis, and Visualization","authors":"M. Mokbel, A. Magdy","doi":"10.1145/2882903.2912570","DOIUrl":"https://doi.org/10.1145/2882903.2912570","url":null,"abstract":"Microblogs data, e.g., tweets, reviews, news comments, and social media comments, has gained considerable attention in recent years due to its popularity and rich contents. Nowadays, microblogs applications span a wide spectrum of interests, including analyzing events and users activities and critical applications like discovering health issues and rescue services. Consequently, major research efforts are spent to manage, analyze, and visualize microblogs data to support different applications. In this tutorial, we give a 1.5 hours overview about microblogs data management, analysis, visualization, and systems. The tutorial gives a comprehensive review for research on core data management components to support microblogs queries at scale. This includes system-level issues and on-going work on supporting microblogs data through the rising wave of big data systems. In addition, the tutorial reviews research on microblogs data analysis and visualization. Through its different parts, the tutorial highlights the challenges and opportunities in microblogs data research.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88998744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Query Planning for Evaluating SPARQL Property Paths 计算SPARQL属性路径的查询规划
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882944
N. Yakovets, P. Godfrey, Jarek Gryz
The extension of SPARQL in version 1.1 with property paths offers a type of regular path query for RDF graph databases. Such queries are difficult to optimize and evaluate efficiently, however. We have embarked on a project, Waveguide, to build a cost-based optimizer for SPARQL queries with property paths. Waveguide builds a query plan--- which we call a waveplan (WP)--- which guides the query evaluation. There are numerous choices in the construction of a plan, and a number of optimization methods, so the space of plans for a query can be quite large. Execution costs of plans for the same query can vary by orders of magnitude. A WGP's costs can be estimated, which opens the way to cost-based optimization. We demonstrate that the plan space of Waveguide properly subsumes existing techniques and that the new plans it adds are relevant.
SPARQL在1.1版中扩展了属性路径,为RDF图数据库提供了一种常规路径查询。然而,这样的查询很难有效地优化和评估。我们已经启动了一个名为Waveguide的项目,为带有属性路径的SPARQL查询构建一个基于成本的优化器。Waveguide构建了一个查询计划——我们称之为waveplan (WP)——它指导查询评估。在计划的构建中有许多选择,并且有许多优化方法,因此查询的计划空间可能相当大。同一查询的计划执行成本可能会有数量级的变化。可以估计WGP的成本,这为基于成本的优化开辟了道路。我们证明了波导的平面空间适当地包含了现有的技术,并且它添加的新平面是相关的。
{"title":"Query Planning for Evaluating SPARQL Property Paths","authors":"N. Yakovets, P. Godfrey, Jarek Gryz","doi":"10.1145/2882903.2882944","DOIUrl":"https://doi.org/10.1145/2882903.2882944","url":null,"abstract":"The extension of SPARQL in version 1.1 with property paths offers a type of regular path query for RDF graph databases. Such queries are difficult to optimize and evaluate efficiently, however. We have embarked on a project, Waveguide, to build a cost-based optimizer for SPARQL queries with property paths. Waveguide builds a query plan--- which we call a waveplan (WP)--- which guides the query evaluation. There are numerous choices in the construction of a plan, and a number of optimization methods, so the space of plans for a query can be quite large. Execution costs of plans for the same query can vary by orders of magnitude. A WGP's costs can be estimated, which opens the way to cost-based optimization. We demonstrate that the plan space of Waveguide properly subsumes existing techniques and that the new plans it adds are relevant.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87636670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Quegel: A General-Purpose System for Querying Big Graphs 一个用于查询大图的通用系统
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899398
Qizhen Zhang, D. Yan, James Cheng
Inspired by Google's Pregel, many distributed graph processing systems have been developed recently to process big graphs. These systems expose a vertex-centric programming interface to users, where a programmer thinks like a vertex when designing parallel graph algorithms. However, existing systems are designed for tasks where most vertices in a graph participate in the computation, and they are not suitable for processing light-workload graph queries which only access a small portion of vertices. This is because their programming model can seriously under-utilize the resources in a cluster for processing graph queries. In this demonstration, we introduce a general-purpose system for querying big graphs, called Quegel, which treats queries as first-class citizens in the design of its computing model. Quegel adopts a novel superstep-sharing execution model to overcome the weaknesses of existing systems. We demonstrate it is user-friendly to write parallel graph-querying programs with Quegel's interface; and we also show that Quegel is able to achieve real-time response time in various applications, including the two applications that we plan to demonstrate: point-to-point shortest-path queries and XML keyword search.
受Google的Pregel的启发,最近开发了许多分布式图形处理系统来处理大图形。这些系统向用户公开了一个以顶点为中心的编程接口,程序员在设计并行图算法时就像一个顶点一样思考。然而,现有的系统是为图中大多数顶点参与计算的任务而设计的,它们不适合处理只访问一小部分顶点的轻工作量图查询。这是因为他们的编程模型在处理图查询时可能严重地没有充分利用集群中的资源。在本演示中,我们将介绍一个用于查询大图的通用系统,称为Quegel,它在设计其计算模型时将查询视为一等公民。为了克服现有系统的缺点,Quegel采用了一种新的超步共享执行模型。我们证明了用Quegel接口编写并行图查询程序是用户友好的;我们还展示了Quegel能够在各种应用程序中实现实时响应时间,包括我们计划演示的两个应用程序:点对点最短路径查询和XML关键字搜索。
{"title":"Quegel: A General-Purpose System for Querying Big Graphs","authors":"Qizhen Zhang, D. Yan, James Cheng","doi":"10.1145/2882903.2899398","DOIUrl":"https://doi.org/10.1145/2882903.2899398","url":null,"abstract":"Inspired by Google's Pregel, many distributed graph processing systems have been developed recently to process big graphs. These systems expose a vertex-centric programming interface to users, where a programmer thinks like a vertex when designing parallel graph algorithms. However, existing systems are designed for tasks where most vertices in a graph participate in the computation, and they are not suitable for processing light-workload graph queries which only access a small portion of vertices. This is because their programming model can seriously under-utilize the resources in a cluster for processing graph queries. In this demonstration, we introduce a general-purpose system for querying big graphs, called Quegel, which treats queries as first-class citizens in the design of its computing model. Quegel adopts a novel superstep-sharing execution model to overcome the weaknesses of existing systems. We demonstrate it is user-friendly to write parallel graph-querying programs with Quegel's interface; and we also show that Quegel is able to achieve real-time response time in various applications, including the two applications that we plan to demonstrate: point-to-point shortest-path queries and XML keyword search.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88677590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads 在混合工作负载的行存储和列存储之间架起桥梁
Pub Date : 2016-06-14 DOI: 10.1145/2882903.2915231
Joy Arulraj, Andrew Pavlo, Prashanth Menon
Data-intensive applications seek to obtain trill insights in real-time by analyzing a combination of historical data sets alongside recently collected data. This means that to support such hybrid workloads, database management systems (DBMSs) need to handle both fast ACID transactions and complex analytical queries on the same database. But the current trend is to use specialized systems that are optimized for only one of these workloads, and thus require an organization to maintain separate copies of the database. This adds additional cost to deploying a database application in terms of both storage and administration overhead. To overcome this barrier, we present a hybrid DBMS architecture that efficiently supports varied workloads on the same database. Our approach differs from previous methods in that we use a single execution engine that is oblivious to the storage layout of data without sacrificing the performance benefits of the specialized systems. This obviates the need to maintain separate copies of the database in multiple independent systems. We also present a technique to continuously evolve the database's physical storage layout by analyzing the queries' access patterns and choosing the optimal layout for different segments of data within the same table. To evaluate this work, we implemented our architecture in an in-memory DBMS. Our results show that our approach delivers up to 3x higher throughput compared to static storage layouts across different workloads. We also demonstrate that our continuous adaptation mechanism allows the DBMS to achieve a near-optimal layout for an arbitrary workload without requiring any manual tuning.
数据密集型应用程序通过分析历史数据集和最近收集的数据,寻求实时获得令人兴奋的见解。这意味着为了支持这种混合工作负载,数据库管理系统(dbms)需要在同一数据库上处理快速ACID事务和复杂的分析查询。但是目前的趋势是使用专门的系统,这些系统只针对这些工作负载中的一种进行了优化,因此需要组织维护数据库的单独副本。这在存储和管理开销方面增加了部署数据库应用程序的额外成本。为了克服这一障碍,我们提出了一种混合DBMS体系结构,它可以有效地支持同一数据库上的各种工作负载。我们的方法与以前的方法不同,因为我们使用一个单一的执行引擎,它忽略了数据的存储布局,而不会牺牲专用系统的性能优势。这避免了在多个独立系统中维护数据库的单独副本的需要。我们还提出了一种技术,通过分析查询的访问模式,并为同一表中的不同数据段选择最佳布局,从而不断发展数据库的物理存储布局。为了评估这项工作,我们在内存DBMS中实现了我们的体系结构。我们的结果表明,与跨不同工作负载的静态存储布局相比,我们的方法提供了高达3倍的吞吐量。我们还演示了我们的连续适应机制允许DBMS在不需要任何手动调优的情况下为任意工作负载实现近乎最佳的布局。
{"title":"Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads","authors":"Joy Arulraj, Andrew Pavlo, Prashanth Menon","doi":"10.1145/2882903.2915231","DOIUrl":"https://doi.org/10.1145/2882903.2915231","url":null,"abstract":"Data-intensive applications seek to obtain trill insights in real-time by analyzing a combination of historical data sets alongside recently collected data. This means that to support such hybrid workloads, database management systems (DBMSs) need to handle both fast ACID transactions and complex analytical queries on the same database. But the current trend is to use specialized systems that are optimized for only one of these workloads, and thus require an organization to maintain separate copies of the database. This adds additional cost to deploying a database application in terms of both storage and administration overhead. To overcome this barrier, we present a hybrid DBMS architecture that efficiently supports varied workloads on the same database. Our approach differs from previous methods in that we use a single execution engine that is oblivious to the storage layout of data without sacrificing the performance benefits of the specialized systems. This obviates the need to maintain separate copies of the database in multiple independent systems. We also present a technique to continuously evolve the database's physical storage layout by analyzing the queries' access patterns and choosing the optimal layout for different segments of data within the same table. To evaluate this work, we implemented our architecture in an in-memory DBMS. Our results show that our approach delivers up to 3x higher throughput compared to static storage layouts across different workloads. We also demonstrate that our continuous adaptation mechanism allows the DBMS to achieve a near-optimal layout for an arbitrary workload without requiring any manual tuning.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73796562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 130
期刊
Proceedings of the 2016 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1