Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems最新文献_第6页

CASQD: continuous detection of activity-based subgraph pattern queries on dynamic graphs CASQD:动态图上基于活动的子图模式查询的连续检测

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933316

J. Mondal, A. Deshpande

The ability to detect and analyze interesting subgraph patterns on large and dynamic graph-structured data in near-real time is crucial for many applications; example includes anomaly detection in phone call networks, advertisement targeting in social networks, malware detection in file download graphs, and many more. Such patterns often need to reason about how the nodes are connected to each other (i.e., the structural component) as well as how the nodes behave in the network (i.e., the activity component). An example of such an activity-driven subgraph pattern is a clique of users in a social network (the structural predicate), who each have posted more than 10 messages in last 2 hours (the activity-based predicate). In this paper, we present Casqd, a system for continuous detection and analysis of such active subgraph pattern queries over large dynamic graphs. Some of key challenges in executing such queries include: handling a wide variety of user-specified activities of interest, low selectivities of activity-based predicates and the resultant exponential search space, and high ingestion rates. A key abstraction in Casqd is a notion called graph-view, which acts as an independence layer between the query language and the underlying physical representation of the graph and the active attributes. This abstraction is aimed at simplifying the query language, while empowering the query optimizer. Considering the balance between expressibility (i.e., patterns that cover many real-world use cases) and optimizability of such patterns, we primarily focus on efficient continuous detection of the active regular structures (specifically, active cliques, active stars, and active bi-cliques). We develop a series of optimization techniques including model-based neighborhood explorations, lazy evaluation of the activity predicates, neighborhood-based search space pruning, and others, for efficient query evaluation. We perform a thorough comparative study of the execution strategies under various settings, and show that our system is capable of achieving event processing throughputs over 800k/s using a single, powerful machine.

近实时地检测和分析大型动态图结构数据上有趣的子图模式的能力对许多应用程序至关重要;示例包括电话网络中的异常检测、社交网络中的广告定位、文件下载图中的恶意软件检测等等。这种模式通常需要推断节点如何相互连接(即，结构组件)以及节点在网络中的行为(即，活动组件)。这种活动驱动子图模式的一个例子是社交网络中的一群用户(结构谓词)，他们每个人在过去2小时内发布了10条以上的消息(基于活动的谓词)。在本文中，我们提出了Casqd，一个连续检测和分析大型动态图上这种活动子图模式查询的系统。执行此类查询的一些关键挑战包括:处理各种各样的用户指定的感兴趣的活动，基于活动的谓词的低选择性和由此产生的指数搜索空间，以及高摄取率。Casqd中的一个关键抽象是称为图视图的概念，它充当查询语言与图和活动属性的底层物理表示之间的独立层。这种抽象旨在简化查询语言，同时增强查询优化器的功能。考虑到可表达性(即，覆盖许多现实世界用例的模式)和这些模式的可优化性之间的平衡，我们主要关注有效的连续检测活动规则结构(特别是，活动cliques，活动stars和活动bi-cliques)。我们开发了一系列优化技术，包括基于模型的邻域探索、活动谓词的惰性评估、基于邻域的搜索空间修剪等，以实现高效的查询评估。我们对不同设置下的执行策略进行了全面的比较研究，并表明我们的系统能够使用一台功能强大的机器实现超过800k/s的事件处理吞吐量。

{"title":"CASQD: continuous detection of activity-based subgraph pattern queries on dynamic graphs","authors":"J. Mondal, A. Deshpande","doi":"10.1145/2933267.2933316","DOIUrl":"https://doi.org/10.1145/2933267.2933316","url":null,"abstract":"The ability to detect and analyze interesting subgraph patterns on large and dynamic graph-structured data in near-real time is crucial for many applications; example includes anomaly detection in phone call networks, advertisement targeting in social networks, malware detection in file download graphs, and many more. Such patterns often need to reason about how the nodes are connected to each other (i.e., the structural component) as well as how the nodes behave in the network (i.e., the activity component). An example of such an activity-driven subgraph pattern is a clique of users in a social network (the structural predicate), who each have posted more than 10 messages in last 2 hours (the activity-based predicate). In this paper, we present Casqd, a system for continuous detection and analysis of such active subgraph pattern queries over large dynamic graphs. Some of key challenges in executing such queries include: handling a wide variety of user-specified activities of interest, low selectivities of activity-based predicates and the resultant exponential search space, and high ingestion rates. A key abstraction in Casqd is a notion called graph-view, which acts as an independence layer between the query language and the underlying physical representation of the graph and the active attributes. This abstraction is aimed at simplifying the query language, while empowering the query optimizer. Considering the balance between expressibility (i.e., patterns that cover many real-world use cases) and optimizability of such patterns, we primarily focus on efficient continuous detection of the active regular structures (specifically, active cliques, active stars, and active bi-cliques). We develop a series of optimization techniques including model-based neighborhood explorations, lazy evaluation of the activity predicates, neighborhood-based search space pruning, and others, for efficient query evaluation. We perform a thorough comparative study of the execution strategies under various settings, and show that our system is capable of achieving event processing throughputs over 800k/s using a single, powerful machine.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"27 22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116535770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Stateful complex event detection on event streams using parallelization of event stream aggregations and detection tasks 使用事件流聚合和检测任务的并行化对事件流进行有状态复杂事件检测

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933518

Saeed Fathollahzadeh, Kia Teymourian, M. Sharifi

Detection of stateful complex event patterns using parallel programming features is a challenging task because of statefulness of event detection operators. Parallelization of event detection tasks needs to be implemented in a way that keeps track of state changes by new arriving events. In this paper, we describe our implementation for a customized complex event detection engine by using Open Multi-Processing (OpenMP), a shared memory programming model. In our system event detection is implemented using Deterministic Finite Automata (DFAs). We implemented a data stream aggregator that merges 4 given event streams into a sequence of C++ objects in a buffer used as source event stream for event detection in a next processing step. We describe implementation details and 3 architectural variations for stream aggregation and parallelized of event processing. We conducted performance experiments with each of the variations and report some of our experimental results. A comparison of our performance results shows that for event processing on single machine with multi cores and limited memory, using mutli-threads with shared buffer has better stream processing performance than an implementation with multi-processes and shared memory.

由于事件检测操作符的有状态性，使用并行编程特性检测有状态的复杂事件模式是一项具有挑战性的任务。事件检测任务的并行化需要以一种跟踪新到达事件的状态变化的方式实现。在本文中，我们描述了我们使用开放多处理(Open Multi-Processing, OpenMP)共享内存编程模型实现的自定义复杂事件检测引擎。在我们的系统中，事件检测是使用确定性有限自动机(dfa)实现的。我们实现了一个数据流聚合器，它将4个给定的事件流合并到一个缓冲区中的c++对象序列中，作为源事件流，以便在下一个处理步骤中进行事件检测。我们描述了流聚合和事件处理并行化的实现细节和3种体系结构变化。我们对每一种变体都进行了性能实验，并报告了一些实验结果。性能对比结果表明，在多核、有限内存的单机事件处理中，多线程共享缓存比多进程共享内存具有更好的流处理性能。

{"title":"Stateful complex event detection on event streams using parallelization of event stream aggregations and detection tasks","authors":"Saeed Fathollahzadeh, Kia Teymourian, M. Sharifi","doi":"10.1145/2933267.2933518","DOIUrl":"https://doi.org/10.1145/2933267.2933518","url":null,"abstract":"Detection of stateful complex event patterns using parallel programming features is a challenging task because of statefulness of event detection operators. Parallelization of event detection tasks needs to be implemented in a way that keeps track of state changes by new arriving events. In this paper, we describe our implementation for a customized complex event detection engine by using Open Multi-Processing (OpenMP), a shared memory programming model. In our system event detection is implemented using Deterministic Finite Automata (DFAs). We implemented a data stream aggregator that merges 4 given event streams into a sequence of C++ objects in a buffer used as source event stream for event detection in a next processing step. We describe implementation details and 3 architectural variations for stream aggregation and parallelized of event processing. We conducted performance experiments with each of the variations and report some of our experimental results. A comparison of our performance results shows that for event processing on single machine with multi cores and limited memory, using mutli-threads with shared buffer has better stream processing performance than an implementation with multi-processes and shared memory.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132834042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The DEBS 2016 grand challenge DEBS 2016大挑战

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933519

Vincenzo Gulisano, Zbigniew Jerzak, Spyros Voulgaris, H. Ziekow

The DEBS Grand Challenge is a series of challenges which address problems in event stream processing. The focus of the Grand Challenge in 2016 is on processing of data streams that originate from social networks. Hence, the data represents an evolving graph structure. With this challenge we take up the general scenario and data source from the 2014 SIGMOD contest. However, in contrasts to the SIGMOD contest, the DEBS grand challenge explicitly focuses on continuous processing of streaming data and thus dynamic changes in graphs. This paper describes the specifics of the data streams and continuous queries that define the DEBS Grand Challenge 2016.

DEBS大挑战是一系列解决事件流处理问题的挑战。2016年大挑战的重点是处理来自社交网络的数据流。因此，数据表示一个不断发展的图结构。在这个挑战中，我们采用了2014年SIGMOD竞赛中的一般场景和数据源。然而，与SIGMOD竞赛相比，DEBS的大挑战明确地关注于流数据的连续处理，从而关注图中的动态变化。本文描述了定义2016年DEBS大挑战的数据流和连续查询的细节。

引用次数: 26

Infusing trust in indoor tracking: poster 在室内跟踪中注入信任:海报

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933538

Ryan Rybarczyk, R. Raje, M. Tuceryan

An indoor tracking system is inherently an asynchronous and distributed system that contains various types (e.g., detection, selection, and fusion) of events. One of the key challenges with regards to indoor tracking is an efficient selection and arrangement of sensor devices in the environment. Selecting the "right" subset of these sensors for tracking an object as it traverses an indoor environment is the necessary precondition to achieving accurate indoor tracking. With the recent proliferation of mobile devices, specifically those with many onboard sensors, this challenge has increased in both complexity and scale. No longer can one assume that the sensor infrastructure is static, but rather indoor tracking systems must consider and properly plan for a wide variety of sensors, both static and mobile, to be present. In such a dynamic setup, sensors need to be properly selected using an opportunistic approach. This opportunistic tracking allows for a new dimension of indoor tracking that previously was often infeasible or unpractical due to logistic or financial constraints of most entities. In this paper, we are proposing a selection technique that uses trust as manifested by its a quality-of-service (QoS) feature, accuracy, in a sensor selection function. We first outline how classification of sensors is achieved in a dynamic manner and then how the accuracy can be discerned from this classification in an effort to properly identify the trust of a tracking sensor and then use this information to improve the sensor selection process. We conclude this paper with a discussion of results of this implementation on a prototype indoor tracking system in an effort to demonstrate the overall effectiveness of this selection technique.

室内跟踪系统本质上是一个异步和分布式系统，包含各种类型的事件(例如，检测、选择和融合)。室内跟踪的关键挑战之一是在环境中有效地选择和安排传感器设备。当物体穿过室内环境时，选择这些传感器的“正确”子集来跟踪物体是实现准确室内跟踪的必要前提。随着最近移动设备的激增，特别是那些带有许多机载传感器的设备，这一挑战在复杂性和规模上都有所增加。人们不能再假设传感器基础设施是静态的，而是室内跟踪系统必须考虑并适当规划各种各样的传感器，包括静态和移动传感器。在这种动态设置中，需要使用机会主义方法正确选择传感器。这种机会跟踪允许室内跟踪的新维度，以前由于大多数实体的后勤或财务限制通常是不可行或不实际的。在本文中，我们提出了一种选择技术，该技术在传感器选择函数中使用信任，这体现在其服务质量(QoS)特征，即准确性。我们首先概述了如何以动态方式实现传感器分类，然后如何从这种分类中识别准确性，以正确识别跟踪传感器的信任，然后使用此信息来改进传感器选择过程。在本文的最后，我们讨论了在原型室内跟踪系统上实现的结果，以证明这种选择技术的整体有效性。

{"title":"Infusing trust in indoor tracking: poster","authors":"Ryan Rybarczyk, R. Raje, M. Tuceryan","doi":"10.1145/2933267.2933538","DOIUrl":"https://doi.org/10.1145/2933267.2933538","url":null,"abstract":"An indoor tracking system is inherently an asynchronous and distributed system that contains various types (e.g., detection, selection, and fusion) of events. One of the key challenges with regards to indoor tracking is an efficient selection and arrangement of sensor devices in the environment. Selecting the \"right\" subset of these sensors for tracking an object as it traverses an indoor environment is the necessary precondition to achieving accurate indoor tracking. With the recent proliferation of mobile devices, specifically those with many onboard sensors, this challenge has increased in both complexity and scale. No longer can one assume that the sensor infrastructure is static, but rather indoor tracking systems must consider and properly plan for a wide variety of sensors, both static and mobile, to be present. In such a dynamic setup, sensors need to be properly selected using an opportunistic approach. This opportunistic tracking allows for a new dimension of indoor tracking that previously was often infeasible or unpractical due to logistic or financial constraints of most entities. In this paper, we are proposing a selection technique that uses trust as manifested by its a quality-of-service (QoS) feature, accuracy, in a sensor selection function. We first outline how classification of sensors is achieved in a dynamic manner and then how the accuracy can be discerned from this classification in an effort to properly identify the trust of a tracking sensor and then use this information to improve the sensor selection process. We conclude this paper with a discussion of results of this implementation on a prototype indoor tracking system in an effort to demonstrate the overall effectiveness of this selection technique.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133839724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Routing and scheduling of spatio-temporal tasks for optimizing airborne sensor system utilization 优化机载传感器系统利用率的时空任务路由与调度

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933301

San Yeung, S. Madria, M. Linderman, James R. Milligan

Airborne image sensing systems are equipped on piloted or remotely-piloted aerial vehicles to collect imagery data. Often the equipped image sensors are mostly underutilized. The objective is to increase the sensor system utilization by enabling dynamic multitasking so that ground operators can access and transmit sensor task requests to an aerial vehicle. However, this may deviate the original route of an aerial vehicle. In this paper, we will be investigating this new problem of generating a new route to follow, as long as the assigned target points and original waypoints are not affected. Our goal is to find an optimal route on the fly between the given original waypoints such that it satisfies the maximum number of sensor task requests from ground users, of minimum sum of deviations subject to maximum deviation from the original route, without violating the original mission and flight maneuvering constraints. With the given constraints, finding an optimal route is an NP-hard problem. Therefore, we proposed two heuristic-based methods: namely, the FPCA approach that utilizes the idea of footprint diameter, and the SWCA approach that tackles this problem via the use of task clustering. The performance of these algorithms are compared through experiments using data from real flight trajectories. Our results show that SWCA outperforms FPCA in most settings.

机载图像传感系统装备在有人驾驶或遥控驾驶的飞行器上收集图像数据。通常配备的图像传感器大多未得到充分利用。目标是通过实现动态多任务来提高传感器系统的利用率，以便地面操作人员可以访问并向飞行器发送传感器任务请求。然而，这可能会偏离飞行器的原始路线。在本文中，我们将研究在不影响指定的目标点和原始路径点的情况下，生成新路径的新问题。我们的目标是在给定的原始航路点之间找到一条最优的飞行路线，使其满足地面用户的传感器任务请求的最大数量，偏离原始路线的最大偏差的最小总和，而不违反原始任务和飞行机动约束。在给定约束条件下，寻找最优路径是一个np困难问题。因此，我们提出了两种基于启发式的方法:即利用足迹直径思想的FPCA方法，以及通过使用任务聚类来解决这一问题的SWCA方法。通过实际飞行轨迹数据的实验，比较了这些算法的性能。我们的结果表明，SWCA在大多数情况下优于FPCA。

{"title":"Routing and scheduling of spatio-temporal tasks for optimizing airborne sensor system utilization","authors":"San Yeung, S. Madria, M. Linderman, James R. Milligan","doi":"10.1145/2933267.2933301","DOIUrl":"https://doi.org/10.1145/2933267.2933301","url":null,"abstract":"Airborne image sensing systems are equipped on piloted or remotely-piloted aerial vehicles to collect imagery data. Often the equipped image sensors are mostly underutilized. The objective is to increase the sensor system utilization by enabling dynamic multitasking so that ground operators can access and transmit sensor task requests to an aerial vehicle. However, this may deviate the original route of an aerial vehicle. In this paper, we will be investigating this new problem of generating a new route to follow, as long as the assigned target points and original waypoints are not affected. Our goal is to find an optimal route on the fly between the given original waypoints such that it satisfies the maximum number of sensor task requests from ground users, of minimum sum of deviations subject to maximum deviation from the original route, without violating the original mission and flight maneuvering constraints. With the given constraints, finding an optimal route is an NP-hard problem. Therefore, we proposed two heuristic-based methods: namely, the FPCA approach that utilizes the idea of footprint diameter, and the SWCA approach that tackles this problem via the use of task clustering. The performance of these algorithms are compared through experiments using data from real flight trajectories. Our results show that SWCA outperforms FPCA in most settings.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127610121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

In-memory indexation of event streams 事件流的内存索引

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933511

Ahmad Hasan, A. Paschke

Evaluating continuous queries in real-time while receiving events notifications requires a balanced exploitation of available resources. In this paper, we present our solution for aggregating events from social streams in memory. With the help of a compact representation of the data and relying on an efficient tree data structure, we were able to minimize the costs of the updates required when an event enters or leaves the current window which led to low and stable latencies and high throughput.

在接收事件通知时实时评估连续查询需要平衡地利用可用资源。在本文中，我们提出了从内存中的社交流中聚合事件的解决方案。借助紧凑的数据表示和高效的树型数据结构，我们能够将事件进入或离开当前窗口时所需的更新成本降至最低，从而实现低而稳定的延迟和高吞吐量。

引用次数: 0

Optimal operator placement for distributed stream processing applications 分布式流处理应用程序的最佳操作符放置

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933312

V. Cardellini, V. Grassi, F. L. Presti, Matteo Nardelli

Data Stream Processing (DSP) applications are widely used to timely extract information from distributed data sources, such as sensing devices, monitoring stations, and social networks. To successfully handle this ever increasing amount of data, recent trends investigate the possibility of exploiting decentralized computational resources (e.g., Fog computing) to define the applications placement. Several placement policies have been proposed in the literature, but they are based on different assumptions and optimization goals and, as such, they are not completely comparable to each other. In this paper we study the placement problem for distributed DSP applications. Our contributions are twofold. We provide a general formulation of the optimal DSP placement (for short, ODP) as an Integer Linear Programming problem which takes explicitly into account the heterogeneity of computing and networking resources and which encompasses - as special cases - the different solutions proposed in the literature. We present an ODP-based scheduler for the Apache Storm DSP framework. This allows us to compare some well-known centralized and decentralized placement solutions. We also extensively analyze the ODP scalability with respect to various parameter settings.

数据流处理(DSP)应用广泛用于从分布式数据源(如传感设备、监测站和社交网络)中及时提取信息。为了成功处理这种不断增长的数据量，最近的趋势是研究利用分散计算资源(例如，雾计算)来定义应用程序放置的可能性。文献中提出了几种放置策略，但它们基于不同的假设和优化目标，因此，它们之间不能完全比较。本文主要研究分布式DSP应用中的放置问题。我们的贡献是双重的。我们提供了最佳DSP放置(简称ODP)的一般公式，作为一个整数线性规划问题，该问题明确考虑了计算和网络资源的异质性，并包含-作为特殊情况-文献中提出的不同解决方案。我们为Apache Storm DSP框架提出了一个基于odp的调度器。这使我们能够比较一些知名的集中式和分散式放置解决方案。我们还根据各种参数设置广泛分析了ODP的可伸缩性。

{"title":"Optimal operator placement for distributed stream processing applications","authors":"V. Cardellini, V. Grassi, F. L. Presti, Matteo Nardelli","doi":"10.1145/2933267.2933312","DOIUrl":"https://doi.org/10.1145/2933267.2933312","url":null,"abstract":"Data Stream Processing (DSP) applications are widely used to timely extract information from distributed data sources, such as sensing devices, monitoring stations, and social networks. To successfully handle this ever increasing amount of data, recent trends investigate the possibility of exploiting decentralized computational resources (e.g., Fog computing) to define the applications placement. Several placement policies have been proposed in the literature, but they are based on different assumptions and optimization goals and, as such, they are not completely comparable to each other. In this paper we study the placement problem for distributed DSP applications. Our contributions are twofold. We provide a general formulation of the optimal DSP placement (for short, ODP) as an Integer Linear Programming problem which takes explicitly into account the heterogeneity of computing and networking resources and which encompasses - as special cases - the different solutions proposed in the literature. We present an ODP-based scheduler for the Apache Storm DSP framework. This allows us to compare some well-known centralized and decentralized placement solutions. We also extensively analyze the ODP scalability with respect to various parameter settings.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132616183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 151

GraphCEP: real-time data analytics using parallel complex event and graph processing GraphCEP:使用并行复杂事件和图形处理的实时数据分析

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933509

R. Mayer, C. Mayer, M. Tariq, K. Rothermel

In recent years, the proliferation of highly dynamic graph-structured data streams fueled the demand for real-time data analytics. For instance, detecting recent trends in social networks enables new applications in areas such as disaster detection, business analytics or health-care. Parallel Complex Event Processing has evolved as the paradigm of choice to analyze data streams in a timely manner, where the incoming data streams are split and processed independently by parallel operator instances. However, the degree of parallelism is limited by the feasibility of splitting the data streams into independent parts such that correctness of event processing is still ensured. In this paper, we overcome this limitation for graph-structured data by further parallelizing individual operator instances using modern graph processing systems. These systems partition the graph data and execute graph algorithms in a highly parallel fashion, for instance using cloud resources. To this end, we propose a novel graph-based Complex Event Processing system GraphCEP and evaluate its performance in the setting of two case studies from the DEBS Grand Challenge 2016.

近年来，高动态图结构数据流的激增推动了对实时数据分析的需求。例如，检测社交网络的最新趋势可以在灾难检测、业务分析或医疗保健等领域实现新的应用。并行复杂事件处理已经发展成为及时分析数据流的首选范式，其中传入的数据流由并行操作符实例分离和独立处理。然而，并行度受到将数据流分割成独立部分的可行性的限制，这样仍然可以确保事件处理的正确性。在本文中，我们通过使用现代图处理系统进一步并行化单个算子实例，克服了图结构数据的这一限制。这些系统对图数据进行分区，并以高度并行的方式执行图算法，例如使用云资源。为此，我们提出了一种新的基于图形的复杂事件处理系统GraphCEP，并在2016年DEBS大挑战的两个案例研究中评估了它的性能。

{"title":"GraphCEP: real-time data analytics using parallel complex event and graph processing","authors":"R. Mayer, C. Mayer, M. Tariq, K. Rothermel","doi":"10.1145/2933267.2933509","DOIUrl":"https://doi.org/10.1145/2933267.2933509","url":null,"abstract":"In recent years, the proliferation of highly dynamic graph-structured data streams fueled the demand for real-time data analytics. For instance, detecting recent trends in social networks enables new applications in areas such as disaster detection, business analytics or health-care. Parallel Complex Event Processing has evolved as the paradigm of choice to analyze data streams in a timely manner, where the incoming data streams are split and processed independently by parallel operator instances. However, the degree of parallelism is limited by the feasibility of splitting the data streams into independent parts such that correctness of event processing is still ensured. In this paper, we overcome this limitation for graph-structured data by further parallelizing individual operator instances using modern graph processing systems. These systems partition the graph data and execute graph algorithms in a highly parallel fashion, for instance using cloud resources. To this end, we propose a novel graph-based Complex Event Processing system GraphCEP and evaluate its performance in the setting of two case studies from the DEBS Grand Challenge 2016.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129170698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Quality-driven disorder handling for concurrent windowed stream queries with shared operators 具有共享操作符的并发窗口流查询的质量驱动无序处理

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933307

Yuanzhen Ji, A. Nica, Zbigniew Jerzak, Gregor Hackenbroich, C. Fetzer

Handling timestamp-disorder among stream tuples is a basic requirement for data stream processing, and involves an inevitable tradeoff between the latency and the quality of stream query results. To meet the tradeoff requirements of diverse streaming applications, the approach of buffer-based, quality-driven disorder handling (QDDH) was proposed recently, which aims to minimize sizes of stream-sorting buffers, thus the result latency, while honoring user-specified result-quality requirements. Previous work on QDDH focuses only on individual stream queries. However, streaming systems often run multiple queries concurrently, and may exploit sharing opportunities across the concurrent queries. Under such shared query execution, stream-sorting buffers can be shared across queries as well, which can potentially reduce the overall memory cost incurred by the sorting buffers. In this paper, focusing on windowed stream queries, we propose a solution for doing QDDH for concurrent queries, across which common source and stream-filtering operators are shared. Experimental results show that our solution can determine the optimal way of sharing sorting buffers across the concurrent queries, such that the goal of quality-driven result-latency minimization is achieved for each query at a minimum memory cost.

处理流元组之间的时间戳紊乱是数据流处理的基本要求，并且涉及到延迟和流查询结果质量之间不可避免的权衡。为了满足各种流应用的权衡需求，最近提出了基于缓冲区的质量驱动无序处理(QDDH)方法，该方法旨在最小化流排序缓冲区的大小，从而减少结果延迟，同时满足用户指定的结果质量要求。以前关于QDDH的工作只关注于单个流查询。然而，流系统通常并发地运行多个查询，并且可能利用跨并发查询的共享机会。在这种共享查询执行下，流排序缓冲区也可以跨查询共享，这可能会降低排序缓冲区产生的总体内存成本。在本文中，我们主要关注有窗口的流查询，我们提出了一种为并发查询执行QDDH的解决方案，在该解决方案中共享公共源和流过滤操作符。实验结果表明，我们的解决方案可以确定跨并发查询共享排序缓冲区的最佳方式，从而以最小的内存成本为每个查询实现质量驱动的结果延迟最小化的目标。

{"title":"Quality-driven disorder handling for concurrent windowed stream queries with shared operators","authors":"Yuanzhen Ji, A. Nica, Zbigniew Jerzak, Gregor Hackenbroich, C. Fetzer","doi":"10.1145/2933267.2933307","DOIUrl":"https://doi.org/10.1145/2933267.2933307","url":null,"abstract":"Handling timestamp-disorder among stream tuples is a basic requirement for data stream processing, and involves an inevitable tradeoff between the latency and the quality of stream query results. To meet the tradeoff requirements of diverse streaming applications, the approach of buffer-based, quality-driven disorder handling (QDDH) was proposed recently, which aims to minimize sizes of stream-sorting buffers, thus the result latency, while honoring user-specified result-quality requirements. Previous work on QDDH focuses only on individual stream queries. However, streaming systems often run multiple queries concurrently, and may exploit sharing opportunities across the concurrent queries. Under such shared query execution, stream-sorting buffers can be shared across queries as well, which can potentially reduce the overall memory cost incurred by the sorting buffers. In this paper, focusing on windowed stream queries, we propose a solution for doing QDDH for concurrent queries, across which common source and stream-filtering operators are shared. Experimental results show that our solution can determine the optimal way of sharing sorting buffers across the concurrent queries, such that the goal of quality-driven result-latency minimization is achieved for each query at a minimum memory cost.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126245396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Bandwidth-efficient content-based routing on software-defined networks 基于软件定义网络的带宽高效内容路由

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

Pub Date : 2016-06-13 DOI: 10.1145/2933267.2933310

Sukanya Bhowmik, M. Tariq, J. Grunert, K. Rothermel

With the vision of Internet of Things gaining popularity at a global level, efficient publish/subscribe middleware for communication within and across datacenters is extremely desirable. In this respect, the very popular Software-defined Networking (SDN), which enables publish/subscribe middleware to perform line-rate filtering of events directly on hardware, can prove to be very useful. While deploying content filters directly on switches of a software-defined network allows optimized paths, high throughput rates, and low end-to-end latency, it suffers from certain inherent limitations w.r.t. no. of bits available on hardware switches to represent these filters. Such a limitation affects expressiveness of filters, resulting in unnecessary traffic in the network. In this paper, we explore various techniques to represent content filters expressively while being limited by hardware. We implement and evaluate techniques that i) use workload, in terms of events and subscriptions, to represent content, and ii) efficiently select attributes to reduce redundancy in content. Moreover, these techniques complement each other and can be combined together to further enhance performance. Our detailed performance evaluations show the potential of these techniques in reducing unnecessary traffic when subjected to different workloads.

随着物联网在全球范围内的普及，用于数据中心内部和跨数据中心通信的高效发布/订阅中间件是非常需要的。在这方面，非常流行的软件定义网络(SDN)可以证明非常有用，它允许发布/订阅中间件直接在硬件上执行事件的行速率过滤。虽然直接在软件定义网络的交换机上部署内容过滤器允许优化路径、高吞吐量和低端到端延迟，但它存在某些固有的限制。在硬件交换机上可用的位来表示这些滤波器。这样的限制会影响过滤器的表现力，导致网络中出现不必要的流量。在本文中，我们探索了各种技术来表达内容过滤器，而不受硬件的限制。我们实现并评估了以下技术:i)使用工作负载(就事件和订阅而言)来表示内容，以及ii)有效地选择属性以减少内容中的冗余。此外，这些技术相互补充，可以组合在一起进一步提高性能。我们详细的性能评估显示了这些技术在处理不同工作负载时减少不必要流量的潜力。

{"title":"Bandwidth-efficient content-based routing on software-defined networks","authors":"Sukanya Bhowmik, M. Tariq, J. Grunert, K. Rothermel","doi":"10.1145/2933267.2933310","DOIUrl":"https://doi.org/10.1145/2933267.2933310","url":null,"abstract":"With the vision of Internet of Things gaining popularity at a global level, efficient publish/subscribe middleware for communication within and across datacenters is extremely desirable. In this respect, the very popular Software-defined Networking (SDN), which enables publish/subscribe middleware to perform line-rate filtering of events directly on hardware, can prove to be very useful. While deploying content filters directly on switches of a software-defined network allows optimized paths, high throughput rates, and low end-to-end latency, it suffers from certain inherent limitations w.r.t. no. of bits available on hardware switches to represent these filters. Such a limitation affects expressiveness of filters, resulting in unnecessary traffic in the network. In this paper, we explore various techniques to represent content filters expressively while being limited by hardware. We implement and evaluate techniques that i) use workload, in terms of events and subscriptions, to represent content, and ii) efficiently select attributes to reduce redundancy in content. Moreover, these techniques complement each other and can be combined together to further enhance performance. Our detailed performance evaluations show the potential of these techniques in reducing unnecessary traffic when subjected to different workloads.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125254328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9