使用cyclops执行和优化连续查询

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI:10.1145/2463676.2465248

Harold Lim, S. Babu

{"title":"使用cyclops执行和优化连续查询","authors":"Harold Lim, S. Babu","doi":"10.1145/2463676.2465248","DOIUrl":null,"url":null,"abstract":"As the data collected by enterprises grows in scale, there is a growing trend of performing data analytics on large datasets. Batch processing systems that can handle petabyte scale of data, such as Hadoop, have flourished and gained traction in the industry. As the results of batch analytics have been used to continuously improve front-facing user experience, there is a growing interest in pushing the processing latency down. This trend has fueled a resurgence in the development and usage of execution engines that can process continuous queries.\n An important class of continuous queries is windowed aggregation queries. Such queries arise in a wide range of applications such as generating personalized content and results. Today, considerable manual effort goes into finding the most suitable execution engine for these queries and on tuning query performance on these engines. An ecosystem composed of multiple execution engines may be needed in order to run the overall query workload efficiently given the diverse set of requirements that arise in practice.\n Cyclops is a continuous query processing platform that manages and orchestrates windowed aggregation queries in an ecosystem composed of multiple continuous query execution engines. Cyclops employs a cost-based approach for picking the most suitable engine and plan for executing a given query. This demonstration first presents an interactive visualization of the rich execution plan space of windowed aggregation queries, which allows users to analyze and understand the differences among plans. The next part of the demonstration will drill down into the design of Cyclops. For a given query, we show the cost spectrum of query execution plans across three different execution engines---Esper, Storm, and Hadoop---as estimated by Cyclops.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Execution and optimization of continuous queries with cyclops\",\"authors\":\"Harold Lim, S. Babu\",\"doi\":\"10.1145/2463676.2465248\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the data collected by enterprises grows in scale, there is a growing trend of performing data analytics on large datasets. Batch processing systems that can handle petabyte scale of data, such as Hadoop, have flourished and gained traction in the industry. As the results of batch analytics have been used to continuously improve front-facing user experience, there is a growing interest in pushing the processing latency down. This trend has fueled a resurgence in the development and usage of execution engines that can process continuous queries.\\n An important class of continuous queries is windowed aggregation queries. Such queries arise in a wide range of applications such as generating personalized content and results. Today, considerable manual effort goes into finding the most suitable execution engine for these queries and on tuning query performance on these engines. An ecosystem composed of multiple execution engines may be needed in order to run the overall query workload efficiently given the diverse set of requirements that arise in practice.\\n Cyclops is a continuous query processing platform that manages and orchestrates windowed aggregation queries in an ecosystem composed of multiple continuous query execution engines. Cyclops employs a cost-based approach for picking the most suitable engine and plan for executing a given query. This demonstration first presents an interactive visualization of the rich execution plan space of windowed aggregation queries, which allows users to analyze and understand the differences among plans. The next part of the demonstration will drill down into the design of Cyclops. For a given query, we show the cost spectrum of query execution plans across three different execution engines---Esper, Storm, and Hadoop---as estimated by Cyclops.\",\"PeriodicalId\":87344,\"journal\":{\"name\":\"Proceedings. ACM-SIGMOD International Conference on Management of Data\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. ACM-SIGMOD International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2463676.2465248\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. ACM-SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2463676.2465248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

随着企业收集的数据规模越来越大，对大数据集进行数据分析的趋势越来越明显。可以处理pb级数据的批处理系统，如Hadoop，已经在业界蓬勃发展并获得了牵引力。由于批处理分析的结果已被用于不断改进面向前端的用户体验，因此人们对降低处理延迟的兴趣越来越大。这种趋势推动了能够处理连续查询的执行引擎的开发和使用的复兴。一类重要的连续查询是窗口聚合查询。此类查询出现在广泛的应用程序中，例如生成个性化的内容和结果。目前，大量的人工工作都用于为这些查询寻找最合适的执行引擎，并在这些引擎上调优查询性能。考虑到实践中出现的不同需求集，为了有效地运行整个查询工作负载，可能需要由多个执行引擎组成的生态系统。Cyclops是一个连续查询处理平台，在由多个连续查询执行引擎组成的生态系统中管理和编排窗口聚合查询。Cyclops采用基于成本的方法来选择最合适的引擎和执行给定查询的计划。这个演示首先展示了窗口聚合查询的丰富执行计划空间的交互式可视化，它允许用户分析和理解计划之间的差异。演示的下一部分将深入到独眼巨人的设计。对于给定的查询，我们显示了查询执行计划在三种不同执行引擎(Esper、Storm和Hadoop)上的成本谱，这是由Cyclops估计的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Execution and optimization of continuous queries with cyclops

As the data collected by enterprises grows in scale, there is a growing trend of performing data analytics on large datasets. Batch processing systems that can handle petabyte scale of data, such as Hadoop, have flourished and gained traction in the industry. As the results of batch analytics have been used to continuously improve front-facing user experience, there is a growing interest in pushing the processing latency down. This trend has fueled a resurgence in the development and usage of execution engines that can process continuous queries. An important class of continuous queries is windowed aggregation queries. Such queries arise in a wide range of applications such as generating personalized content and results. Today, considerable manual effort goes into finding the most suitable execution engine for these queries and on tuning query performance on these engines. An ecosystem composed of multiple execution engines may be needed in order to run the overall query workload efficiently given the diverse set of requirements that arise in practice. Cyclops is a continuous query processing platform that manages and orchestrates windowed aggregation queries in an ecosystem composed of multiple continuous query execution engines. Cyclops employs a cost-based approach for picking the most suitable engine and plan for executing a given query. This demonstration first presents an interactive visualization of the rich execution plan space of windowed aggregation queries, which allows users to analyze and understand the differences among plans. The next part of the demonstration will drill down into the design of Cyclops. For a given query, we show the cost spectrum of query execution plans across three different execution engines---Esper, Storm, and Hadoop---as estimated by Cyclops.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助