首页 > 最新文献

Proceedings of the 2016 International Conference on Management of Data最新文献

英文 中文
Scaling Multicore Databases via Constrained Parallel Execution 通过约束并行执行扩展多核数据库
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882934
Zhaoguo Wang, Shuai Mu, Yang Cui, Han Yi, Haibo Chen, Jinyang Li
Multicore in-memory databases often rely on traditional con- currency control schemes such as two-phase-locking (2PL) or optimistic concurrency control (OCC). Unfortunately, when the workload exhibits a non-trivial amount of contention, both 2PL and OCC sacrifice much parallel execution op- portunity. In this paper, we describe a new concurrency control scheme, interleaving constrained concurrency con- trol (IC3), which provides serializability while allowing for parallel execution of certain conflicting transactions. IC3 combines the static analysis of the transaction workload with runtime techniques that track and enforce dependencies among concurrent transactions. The use of static analysis simplifies IC3's runtime design, allowing it to scale to many cores. Evaluations on a 64-core machine using the TPC- C benchmark show that IC3 outperforms traditional con- currency control schemes under contention. It achieves the throughput of 434K transactions/sec on the TPC-C bench- mark configured with only one warehouse. It also scales better than several recent concurrent control schemes that also target contended workloads.
多核内存数据库通常依赖于传统的虚拟货币控制方案,如两阶段锁定(2PL)或乐观并发控制(OCC)。不幸的是,当工作负载显示出大量的争用时,2PL和OCC都牺牲了许多并行执行机会。在本文中,我们描述了一种新的并发控制方案,交错约束并发控制(IC3),它提供了串行性,同时允许并行执行某些冲突事务。IC3将事务工作负载的静态分析与跟踪和执行并发事务之间的依赖关系的运行时技术相结合。静态分析的使用简化了IC3的运行时设计,允许它扩展到多个内核。在64核机器上使用TPC- C基准测试的评估表明,IC3在争用下优于传统的代币控制方案。它在仅配置一个仓库的TPC-C基准测试上实现了434K事务/秒的吞吐量。它的可伸缩性也优于最近的几种针对竞争工作负载的并发控制方案。
{"title":"Scaling Multicore Databases via Constrained Parallel Execution","authors":"Zhaoguo Wang, Shuai Mu, Yang Cui, Han Yi, Haibo Chen, Jinyang Li","doi":"10.1145/2882903.2882934","DOIUrl":"https://doi.org/10.1145/2882903.2882934","url":null,"abstract":"Multicore in-memory databases often rely on traditional con- currency control schemes such as two-phase-locking (2PL) or optimistic concurrency control (OCC). Unfortunately, when the workload exhibits a non-trivial amount of contention, both 2PL and OCC sacrifice much parallel execution op- portunity. In this paper, we describe a new concurrency control scheme, interleaving constrained concurrency con- trol (IC3), which provides serializability while allowing for parallel execution of certain conflicting transactions. IC3 combines the static analysis of the transaction workload with runtime techniques that track and enforce dependencies among concurrent transactions. The use of static analysis simplifies IC3's runtime design, allowing it to scale to many cores. Evaluations on a 64-core machine using the TPC- C benchmark show that IC3 outperforms traditional con- currency control schemes under contention. It achieves the throughput of 434K transactions/sec on the TPC-C bench- mark configured with only one warehouse. It also scales better than several recent concurrent control schemes that also target contended workloads.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79574668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
DBSherlock: A Performance Diagnostic Tool for Transactional Databases DBSherlock:事务性数据库的性能诊断工具
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915218
Dong Young Yoon, Ning Niu, Barzan Mozafari
Running an online transaction processing (OLTP) system is one of the most daunting tasks required of database administrators (DBAs). As businesses rely on OLTP databases to support their mission-critical and real-time applications, poor database performance directly impacts their revenue and user experience. As a result, DBAs constantly monitor, diagnose, and rectify any performance decays. Unfortunately, the manual process of debugging and diagnosing OLTP performance problems is extremely tedious and non-trivial. Rather than being caused by a single slow query, performance problems in OLTP databases are often due to a large number of concurrent and competing transactions adding up to compounded, non-linear effects that are difficult to isolate. Sudden changes in request volume, transactional patterns, network traffic, or data distribution can cause previously abundant resources to become scarce, and the performance to plummet. This paper presents a practical tool for assisting DBAs in quickly and reliably diagnosing performance problems in an OLTP database. By analyzing hundreds of statistics and configurations collected over the lifetime of the system, our algorithm quickly identifies a small set of potential causes and presents them to the DBA. The root-cause established by the DBA is reincorporated into our algorithm as a new causal model to improve future diagnoses. Our experiments show that this algorithm is substantially more accurate than the state-of-the-art algorithm in finding correct explanations.
运行在线事务处理(OLTP)系统是数据库管理员(dba)需要完成的最艰巨的任务之一。由于企业依赖OLTP数据库来支持其关键任务和实时应用程序,因此较差的数据库性能会直接影响其收入和用户体验。因此,dba不断地监视、诊断和纠正任何性能下降。不幸的是,调试和诊断OLTP性能问题的手动过程非常繁琐和重要。OLTP数据库中的性能问题通常不是由单个缓慢的查询引起的,而是由于大量并发和竞争事务叠加在一起造成了难以隔离的复合非线性影响。请求量、事务模式、网络流量或数据分布的突然变化可能导致以前丰富的资源变得稀缺,性能急剧下降。本文提供了一个实用的工具,可以帮助dba快速可靠地诊断OLTP数据库中的性能问题。通过分析在系统生命周期内收集的数百个统计信息和配置,我们的算法可以快速识别一小部分潜在原因,并将其呈现给DBA。DBA建立的根本原因被重新纳入我们的算法,作为一个新的因果模型,以提高未来的诊断。我们的实验表明,该算法在寻找正确的解释方面比最先进的算法要准确得多。
{"title":"DBSherlock: A Performance Diagnostic Tool for Transactional Databases","authors":"Dong Young Yoon, Ning Niu, Barzan Mozafari","doi":"10.1145/2882903.2915218","DOIUrl":"https://doi.org/10.1145/2882903.2915218","url":null,"abstract":"Running an online transaction processing (OLTP) system is one of the most daunting tasks required of database administrators (DBAs). As businesses rely on OLTP databases to support their mission-critical and real-time applications, poor database performance directly impacts their revenue and user experience. As a result, DBAs constantly monitor, diagnose, and rectify any performance decays. Unfortunately, the manual process of debugging and diagnosing OLTP performance problems is extremely tedious and non-trivial. Rather than being caused by a single slow query, performance problems in OLTP databases are often due to a large number of concurrent and competing transactions adding up to compounded, non-linear effects that are difficult to isolate. Sudden changes in request volume, transactional patterns, network traffic, or data distribution can cause previously abundant resources to become scarce, and the performance to plummet. This paper presents a practical tool for assisting DBAs in quickly and reliably diagnosing performance problems in an OLTP database. By analyzing hundreds of statistics and configurations collected over the lifetime of the system, our algorithm quickly identifies a small set of potential causes and presents them to the DBA. The root-cause established by the DBA is reincorporated into our algorithm as a new causal model to improve future diagnoses. Our experiments show that this algorithm is substantially more accurate than the state-of-the-art algorithm in finding correct explanations.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"196 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79882225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Automated Demand-driven Resource Scaling in Relational Database-as-a-Service 关系数据库即服务中自动化需求驱动的资源扩展
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2903733
Sudipto Das, Feng Li, Vivek R. Narasayya, A. König
Relational Database-as-a-Service (DaaS) platforms today support the abstraction of a resource container that guarantees a fixed amount of resources. Tenants are responsible for selecting a container size suitable for their workloads, which they can change to leverage the cloud's elasticity. However, automating this task is daunting for most tenants since estimating resource demands for arbitrary SQL workloads in an RDBMS is complex and challenging. In addition, workloads and resource requirements can vary significantly within minutes to hours, and container sizes vary by orders of magnitude both in the amount of resources as well as monetary cost. We present a solution to enable a DaaS to auto-scale container sizes on behalf of its tenants. Approaches to auto-scale stateless services, such as web servers, that rely on historical resource utilization as the primary signal, often perform poorly for stateful database servers which are significantly more complex. Our solution derives a set of robust signals from database engine telemetry and combines them to significantly improve accuracy of demand estimation for database workloads resulting in more accurate scaling decisions. Our solution raises the abstraction by allowing tenants to reason about monetary budget and query latency rather than resources. We prototyped our approach in Microsoft Azure SQL Database and ran extensive experiments using workloads with realistic time-varying resource demand patterns obtained from production traces. Compared to an approach that uses only resource utilization to estimate demand, our approach results in 1.5x to 3x lower monetary costs while achieving comparable query latencies.
关系数据库即服务(DaaS)平台现在支持资源容器的抽象,以保证固定数量的资源。租户负责选择适合其工作负载的容器大小,他们可以更改容器大小以利用云的弹性。然而,对大多数租户来说,自动化这项任务是令人生畏的,因为在RDBMS中估计任意SQL工作负载的资源需求既复杂又具有挑战性。此外,工作负载和资源需求可能在几分钟到几小时内发生显著变化,容器大小在资源数量和货币成本方面也会发生数量级的变化。我们提供了一个解决方案,使DaaS能够代表其租户自动缩放容器大小。自动扩展无状态服务(如web服务器)的方法依赖于历史资源利用率作为主要信号,对于复杂得多的有状态数据库服务器,这种方法的性能通常很差。我们的解决方案从数据库引擎遥测中获得一组健壮的信号,并将它们组合起来,以显著提高数据库工作负载需求估计的准确性,从而产生更准确的扩展决策。我们的解决方案通过允许租户推断货币预算和查询延迟而不是资源来提高抽象性。我们在Microsoft Azure SQL数据库中对我们的方法进行了原型化,并使用从生产轨迹中获得的具有真实时变资源需求模式的工作负载进行了广泛的实验。与仅使用资源利用率来估计需求的方法相比,我们的方法在实现相当的查询延迟的同时降低了1.5到3倍的货币成本。
{"title":"Automated Demand-driven Resource Scaling in Relational Database-as-a-Service","authors":"Sudipto Das, Feng Li, Vivek R. Narasayya, A. König","doi":"10.1145/2882903.2903733","DOIUrl":"https://doi.org/10.1145/2882903.2903733","url":null,"abstract":"Relational Database-as-a-Service (DaaS) platforms today support the abstraction of a resource container that guarantees a fixed amount of resources. Tenants are responsible for selecting a container size suitable for their workloads, which they can change to leverage the cloud's elasticity. However, automating this task is daunting for most tenants since estimating resource demands for arbitrary SQL workloads in an RDBMS is complex and challenging. In addition, workloads and resource requirements can vary significantly within minutes to hours, and container sizes vary by orders of magnitude both in the amount of resources as well as monetary cost. We present a solution to enable a DaaS to auto-scale container sizes on behalf of its tenants. Approaches to auto-scale stateless services, such as web servers, that rely on historical resource utilization as the primary signal, often perform poorly for stateful database servers which are significantly more complex. Our solution derives a set of robust signals from database engine telemetry and combines them to significantly improve accuracy of demand estimation for database workloads resulting in more accurate scaling decisions. Our solution raises the abstraction by allowing tenants to reason about monetary budget and query latency rather than resources. We prototyped our approach in Microsoft Azure SQL Database and ran extensive experiments using workloads with realistic time-varying resource demand patterns obtained from production traces. Compared to an approach that uses only resource utilization to estimate demand, our approach results in 1.5x to 3x lower monetary costs while achieving comparable query latencies.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85170142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Optimization of Nested Queries using the NF2 Algebra 使用NF2代数优化嵌套查询
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915241
Jürgen Hölsch, Michael Grossniklaus, M. Scholl
A key promise of SQL is that the optimizer will find the most efficient execution plan, regardless of how the query is formulated. In general, query optimizers of modern database systems are able to keep this promise, with the notable exception of nested queries. While several optimization techniques for nested queries have been proposed, their adoption in practice has been limited. In this paper, we argue that the NF2 (non-first normal form) algebra, which was originally designed to process nested tables, is a better approach to nested query optimization as it fulfills two key requirements. First, the NF2 algebra can represent all types of nested queries as well as both existing and novel optimization techniques based on its equivalences. Second, performance benefits can be achieved with little changes to existing transformation-based query optimizers as the NF2 algebra is an extension of the relational algebra.
SQL的一个关键承诺是,无论查询是如何制定的,优化器都会找到最有效的执行计划。一般来说,现代数据库系统的查询优化器能够做到这一点,除了嵌套查询。虽然已经提出了几种针对嵌套查询的优化技术,但它们在实践中的采用受到限制。在本文中,我们认为NF2(非第一范式)代数(最初设计用于处理嵌套表)是一种更好的嵌套查询优化方法,因为它满足了两个关键要求。首先,NF2代数可以表示所有类型的嵌套查询,以及基于其等价性的现有和新的优化技术。其次,由于NF2代数是关系代数的扩展,因此只需对现有的基于转换的查询优化器进行少量更改就可以获得性能优势。
{"title":"Optimization of Nested Queries using the NF2 Algebra","authors":"Jürgen Hölsch, Michael Grossniklaus, M. Scholl","doi":"10.1145/2882903.2915241","DOIUrl":"https://doi.org/10.1145/2882903.2915241","url":null,"abstract":"A key promise of SQL is that the optimizer will find the most efficient execution plan, regardless of how the query is formulated. In general, query optimizers of modern database systems are able to keep this promise, with the notable exception of nested queries. While several optimization techniques for nested queries have been proposed, their adoption in practice has been limited. In this paper, we argue that the NF2 (non-first normal form) algebra, which was originally designed to process nested tables, is a better approach to nested query optimization as it fulfills two key requirements. First, the NF2 algebra can represent all types of nested queries as well as both existing and novel optimization techniques based on its equivalences. Second, performance benefits can be achieved with little changes to existing transformation-based query optimizers as the NF2 algebra is an extension of the relational algebra.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90971329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Wander Join: Online Aggregation for Joins Wander Join:连接的在线聚合
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899413
Feifei Li, Bin Wu, K. Yi, Zhuoyue Zhao
Joins are expensive, and online aggregation over joins was proposed to mitigate the cost, which offers a nice and flexible tradeoff between query efficiency and accuracy in a continuous, online fashion. However, the state-of-the-art approach, in both internal and external memory, is based on ripple join, which is still very expensive and may also need very restrictive assumptions (e.g., tuples in a table are stored in random order). We introduce a new approach, wander join, to the online aggregation problem by performing random walks over the underlying join graph. We have also implemented and tested wander join in the latest PostgreSQL.
连接是昂贵的,在线聚合取代连接是为了降低成本而提出的,它以连续的在线方式在查询效率和准确性之间提供了一个很好的灵活的权衡。然而,在内部和外部内存中,最先进的方法是基于波纹连接,这仍然非常昂贵,并且可能还需要非常严格的假设(例如,表中的元组以随机顺序存储)。我们引入了一种新的方法,漫游连接,通过在底层连接图上执行随机漫步来解决在线聚合问题。我们还在最新的PostgreSQL中实现并测试了wander join。
{"title":"Wander Join: Online Aggregation for Joins","authors":"Feifei Li, Bin Wu, K. Yi, Zhuoyue Zhao","doi":"10.1145/2882903.2899413","DOIUrl":"https://doi.org/10.1145/2882903.2899413","url":null,"abstract":"Joins are expensive, and online aggregation over joins was proposed to mitigate the cost, which offers a nice and flexible tradeoff between query efficiency and accuracy in a continuous, online fashion. However, the state-of-the-art approach, in both internal and external memory, is based on ripple join, which is still very expensive and may also need very restrictive assumptions (e.g., tuples in a table are stored in random order). We introduce a new approach, wander join, to the online aggregation problem by performing random walks over the underlying join graph. We have also implemented and tested wander join in the latest PostgreSQL.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73081134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index 用一个索引对多个距离函数进行近似最近邻搜索
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882930
Yuxin Zheng, Qi Guo, A. Tung, Sai Wu
Due to the "curse of dimensionality" problem, it is very expensive to process the nearest neighbor (NN) query in high-dimensional spaces; and hence, approximate approaches, such as Locality-Sensitive Hashing (LSH), are widely used for their theoretical guarantees and empirical performance. Current LSH-based approaches target at the L1 and L2 spaces, while as shown in previous work, the fractional distance metrics (Lp metrics with 0 < p < 1) can provide more insightful results than the usual L1 and L2 metrics for data mining and multimedia applications. However, none of the existing work can support multiple fractional distance metrics using one index. In this paper, we propose LazyLSH that answers approximate nearest neighbor queries for multiple Lp metrics with theoretical guarantees. Different from previous LSH approaches which need to build one dedicated index for every query space, LazyLSH uses a single base index to support the computations in multiple Lp spaces, significantly reducing the maintenance overhead. Extensive experiments show that LazyLSH provides more accurate results for approximate kNN search under fractional distance metrics.
由于“维数诅咒”问题,在高维空间中处理最近邻(NN)查询是非常昂贵的;因此,近似方法,如位置敏感哈希(LSH),因其理论保证和经验性能而被广泛使用。当前基于lsh的方法针对L1和L2空间,而正如之前的工作所示,对于数据挖掘和多媒体应用,分数距离度量(0 < p < 1的Lp度量)可以提供比通常的L1和L2度量更有洞察力的结果。然而,现有的工作都不能支持使用一个索引的多个分数距离度量。在本文中,我们提出了一个具有理论保证的回答多个Lp指标的近似最近邻查询的LazyLSH。与以前需要为每个查询空间构建一个专用索引的LSH方法不同,LazyLSH使用单个基本索引来支持多个Lp空间中的计算,从而大大降低了维护开销。大量的实验表明,在分数距离度量下,LazyLSH提供了更准确的近似kNN搜索结果。
{"title":"LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index","authors":"Yuxin Zheng, Qi Guo, A. Tung, Sai Wu","doi":"10.1145/2882903.2882930","DOIUrl":"https://doi.org/10.1145/2882903.2882930","url":null,"abstract":"Due to the \"curse of dimensionality\" problem, it is very expensive to process the nearest neighbor (NN) query in high-dimensional spaces; and hence, approximate approaches, such as Locality-Sensitive Hashing (LSH), are widely used for their theoretical guarantees and empirical performance. Current LSH-based approaches target at the L1 and L2 spaces, while as shown in previous work, the fractional distance metrics (Lp metrics with 0 < p < 1) can provide more insightful results than the usual L1 and L2 metrics for data mining and multimedia applications. However, none of the existing work can support multiple fractional distance metrics using one index. In this paper, we propose LazyLSH that answers approximate nearest neighbor queries for multiple Lp metrics with theoretical guarantees. Different from previous LSH approaches which need to build one dedicated index for every query space, LazyLSH uses a single base index to support the computations in multiple Lp spaces, significantly reducing the maintenance overhead. Extensive experiments show that LazyLSH provides more accurate results for approximate kNN search under fractional distance metrics.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90201548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Research Contribution as a Measure of Influence 研究贡献作为影响的衡量标准
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914834
Lais M. A. Rocha, Mirella M. Moro
We propose the 3c-index that measures the influence degree of researchers by evaluating the links they establish between communities. We evaluate its performance against well known metrics. The results show 3c-index outperforms them in most cases and can be employed as a complementary metric to assess researchers' productivity.
我们提出了3c指数,通过评估研究人员在社区之间建立的联系来衡量他们的影响程度。我们根据众所周知的指标来评估它的性能。结果表明,3c-index在大多数情况下优于它们,可以作为评估研究人员生产力的补充指标。
{"title":"Research Contribution as a Measure of Influence","authors":"Lais M. A. Rocha, Mirella M. Moro","doi":"10.1145/2882903.2914834","DOIUrl":"https://doi.org/10.1145/2882903.2914834","url":null,"abstract":"We propose the 3c-index that measures the influence degree of researchers by evaluating the links they establish between communities. We evaluate its performance against well known metrics. The results show 3c-index outperforms them in most cases and can be employed as a complementary metric to assess researchers' productivity.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90325975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FERARI: A Prototype for Complex Event Processing over Streaming Multi-cloud Platforms FERARI:流多云平台上复杂事件处理的原型
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899395
Ioannis Flouris, Vasiliki Manikaki, Nikos Giatrakos, Antonios Deligiannakis, M. Garofalakis, M. Mock, Sebastian Bothe, Inna Skarbovsky, Fabiana Fournier, Marko Stajcer, Tomislav Krizan, Jonathan Yom-Tov, Taji Curin
In this demo, we present FERARI, a prototype that enables real-time Complex Event Processing (CEP) for large volume event data streams over distributed topologies. Our prototype constitutes, to our knowledge, the first complete, multi-cloud based end-to-end CEP solution incorporating: a) a user-friendly, web-based query authoring tool, (b) a powerful CEP engine implemented on top of a streaming cloud platform, (c) a CEP optimizer that chooses the best query execution plan with respect to low latency and/or reduced inter-cloud communication burden, and (d) a query analytics dashboard encompassing graph and map visualization tools to provide a holistic picture with respect to the detected complex events to final stakeholders. As a proof-of-concept, we apply FERARI to enable mobile fraud detection over real, properly anonymized, telecommunication data from T-Hrvatski Telekom network in Croatia.
在这个演示中,我们展示了FERARI,这是一个原型,可以对分布式拓扑上的大容量事件数据流进行实时复杂事件处理(CEP)。据我们所知,我们的原型构成了第一个完整的、基于多云的端到端CEP解决方案,包括:a)用户友好的、基于web的查询编写工具,(b)在流云平台上实现的强大的CEP引擎,(c)选择最佳查询执行计划的CEP优化器,考虑到低延迟和/或减少云间通信负担,以及(d)包含图形和地图可视化工具的查询分析仪表板,为最终利益相关者提供有关检测到的复杂事件的整体图片。作为概念验证,我们应用FERARI对来自克罗地亚T-Hrvatski电信网络的真实、适当匿名的电信数据进行移动欺诈检测。
{"title":"FERARI: A Prototype for Complex Event Processing over Streaming Multi-cloud Platforms","authors":"Ioannis Flouris, Vasiliki Manikaki, Nikos Giatrakos, Antonios Deligiannakis, M. Garofalakis, M. Mock, Sebastian Bothe, Inna Skarbovsky, Fabiana Fournier, Marko Stajcer, Tomislav Krizan, Jonathan Yom-Tov, Taji Curin","doi":"10.1145/2882903.2899395","DOIUrl":"https://doi.org/10.1145/2882903.2899395","url":null,"abstract":"In this demo, we present FERARI, a prototype that enables real-time Complex Event Processing (CEP) for large volume event data streams over distributed topologies. Our prototype constitutes, to our knowledge, the first complete, multi-cloud based end-to-end CEP solution incorporating: a) a user-friendly, web-based query authoring tool, (b) a powerful CEP engine implemented on top of a streaming cloud platform, (c) a CEP optimizer that chooses the best query execution plan with respect to low latency and/or reduced inter-cloud communication burden, and (d) a query analytics dashboard encompassing graph and map visualization tools to provide a holistic picture with respect to the detected complex events to final stakeholders. As a proof-of-concept, we apply FERARI to enable mobile fraud detection over real, properly anonymized, telecommunication data from T-Hrvatski Telekom network in Croatia.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74663280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Constance: An Intelligent Data Lake System 康斯坦斯:智能数据湖系统
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899389
Rihan Hai, Sandra Geisler, C. Quix
As the challenge of our time, Big Data still has many research hassles, especially the variety of data. The high diversity of data sources often results in information silos, a collection of non-integrated data management systems with heterogeneous schemas, query languages, and APIs. Data Lake systems have been proposed as a solution to this problem, by providing a schema-less repository for raw data with a common access interface. However, just dumping all data into a data lake without any metadata management, would only lead to a 'data swamp'. To avoid this, we propose Constance, a Data Lake system with sophisticated metadata management over raw data extracted from heterogeneous data sources. Constance discovers, extracts, and summarizes the structural metadata from the data sources, and annotates data and metadata with semantic information to avoid ambiguities. With embedded query rewriting engines supporting structured data and semi-structured data, Constance provides users a unified interface for query processing and data exploration. During the demo, we will walk through each functional component of Constance. Constance will be applied to two real-life use cases in order to show attendees the importance and usefulness of our generic and extensible data lake system.
作为我们时代的挑战,大数据的研究仍然存在许多问题,尤其是数据的多样性。数据源的高度多样性通常会导致信息孤岛,这是一组具有异构模式、查询语言和api的非集成数据管理系统。数据湖系统已经被提出作为解决这个问题的方案,它为原始数据提供一个无模式的存储库,并提供一个通用的访问接口。然而,仅仅将所有数据倾倒到数据湖中而不进行任何元数据管理,只会导致“数据沼泽”。为了避免这种情况,我们提出Constance,这是一个数据湖系统,对从异构数据源提取的原始数据进行了复杂的元数据管理。Constance从数据源中发现、提取和总结结构化元数据,并用语义信息对数据和元数据进行标注,避免歧义。通过支持结构化数据和半结构化数据的嵌入式查询重写引擎,Constance为用户提供了查询处理和数据探索的统一界面。在演示过程中,我们将介绍Constance的每个功能组件。Constance将应用于两个现实生活中的用例,以向与会者展示我们的通用和可扩展数据湖系统的重要性和有用性。
{"title":"Constance: An Intelligent Data Lake System","authors":"Rihan Hai, Sandra Geisler, C. Quix","doi":"10.1145/2882903.2899389","DOIUrl":"https://doi.org/10.1145/2882903.2899389","url":null,"abstract":"As the challenge of our time, Big Data still has many research hassles, especially the variety of data. The high diversity of data sources often results in information silos, a collection of non-integrated data management systems with heterogeneous schemas, query languages, and APIs. Data Lake systems have been proposed as a solution to this problem, by providing a schema-less repository for raw data with a common access interface. However, just dumping all data into a data lake without any metadata management, would only lead to a 'data swamp'. To avoid this, we propose Constance, a Data Lake system with sophisticated metadata management over raw data extracted from heterogeneous data sources. Constance discovers, extracts, and summarizes the structural metadata from the data sources, and annotates data and metadata with semantic information to avoid ambiguities. With embedded query rewriting engines supporting structured data and semi-structured data, Constance provides users a unified interface for query processing and data exploration. During the demo, we will walk through each functional component of Constance. Constance will be applied to two real-life use cases in order to show attendees the importance and usefulness of our generic and extensible data lake system.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79424998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 199
Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration 使用图形处理单元实现具有BLU加速的DB2快速查询处理的混合设计:技术演示
Pub Date : 2016-06-26 DOI: 10.1145/2882903.2903735
S. Meraji, Berni Schiefer, Lan Pham, Lee Chu, Peter Kokosielis, Adam J. Storm, Wayne Young, Chang Ge, Geoffrey Ng, Kajan Kanagaratnam
In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2's column store technology). Moreover, we show the benefits and problems of using hardware accelerators (more specifically GPUs) in a real commercial Relational Database Management System(RDBMS).We investigate the effect of off-loading specific database operations to a GPU, and show how doing so results in a significant performance improvement. We then demonstrate that for some queries, using just CPU to perform the entire operation is more beneficial. While we use some of Nvidia's fast kernels for operations like sort, we have also developed our own high performance kernels for operations such as group by and aggregation. Finally, we show how we use a dynamic design that can make use of optimizer metadata to intelligently choose a GPU kernel to run. For the first time in the literature, we use benchmarks representative of customer environments to gauge the performance of our prototype, the results of which show that we can get a speed increase upwards of 2x, using a realistic set of queries.
在本文中,我们将展示如何使用Nvidia gpu和主机CPU内核在DB2数据库中使用BLU加速(DB2的列存储技术)来实现更快的查询处理。此外,我们还展示了在实际的商业关系数据库管理系统(RDBMS)中使用硬件加速器(更具体地说是gpu)的好处和问题。我们研究了将特定数据库操作卸载到GPU上的效果,并展示了这样做是如何显著提高性能的。然后,我们演示了对于某些查询,仅使用CPU来执行整个操作更为有益。虽然我们使用Nvidia的一些快速内核来进行排序等操作,但我们也开发了自己的高性能内核来进行分组和聚合等操作。最后,我们将展示如何使用动态设计,该设计可以利用优化器元数据来智能地选择要运行的GPU内核。在文献中,我们第一次使用代表客户环境的基准测试来衡量原型的性能,其结果表明,使用一组真实的查询,我们可以将速度提高2倍以上。
{"title":"Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration","authors":"S. Meraji, Berni Schiefer, Lan Pham, Lee Chu, Peter Kokosielis, Adam J. Storm, Wayne Young, Chang Ge, Geoffrey Ng, Kajan Kanagaratnam","doi":"10.1145/2882903.2903735","DOIUrl":"https://doi.org/10.1145/2882903.2903735","url":null,"abstract":"In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2's column store technology). Moreover, we show the benefits and problems of using hardware accelerators (more specifically GPUs) in a real commercial Relational Database Management System(RDBMS).We investigate the effect of off-loading specific database operations to a GPU, and show how doing so results in a significant performance improvement. We then demonstrate that for some queries, using just CPU to perform the entire operation is more beneficial. While we use some of Nvidia's fast kernels for operations like sort, we have also developed our own high performance kernels for operations such as group by and aggregation. Finally, we show how we use a dynamic design that can make use of optimizer metadata to intelligently choose a GPU kernel to run. For the first time in the literature, we use benchmarks representative of customer environments to gauge the performance of our prototype, the results of which show that we can get a speed increase upwards of 2x, using a realistic set of queries.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90517921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Proceedings of the 2016 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1