A. Kammoun, Tanguy Raynaud, Syed Gillani, K. Singh, J. Fayolle, F. Laforest
This paper presents a generic solution to the spatiotemporal prediction problem provided for the DEBS Grand Challenge 2018. Our solution employs an efficient multi-dimensional index to store the training and historical dataset. With the arrival of new tasks of events, we query our indexing structure to determine the closest points of interests. Based on these points, we select the ones with the highest overall score and predict the destination and time of the vessel in question. Our solution does not rely on existing machine learning techniques and provides a novel view of the prediction problem in the streaming settings. Hence, the prediction is not just based on the recent data, but on all the useful historical dataset.
{"title":"A Scalable Framework for Accelerating Situation Prediction over Spatio-temporal Event Streams","authors":"A. Kammoun, Tanguy Raynaud, Syed Gillani, K. Singh, J. Fayolle, F. Laforest","doi":"10.1145/3210284.3220508","DOIUrl":"https://doi.org/10.1145/3210284.3220508","url":null,"abstract":"This paper presents a generic solution to the spatiotemporal prediction problem provided for the DEBS Grand Challenge 2018. Our solution employs an efficient multi-dimensional index to store the training and historical dataset. With the arrival of new tasks of events, we query our indexing structure to determine the closest points of interests. Based on these points, we select the ones with the highest overall score and predict the destination and time of the vessel in question. Our solution does not rely on existing machine learning techniques and provides a novel view of the prediction problem in the streaming settings. Hence, the prediction is not just based on the recent data, but on all the useful historical dataset.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116643743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dominik Meißner, Benjamin Erb, Frank Kargl, Matthias Tichy
State changes over time are inherent characteristics of stateful applications. So far, there are almost no attempts to make the past application history programmatically accessible or even modifiable. This is primarily due to the complexity of temporal changes and a difficult alignment with prevalent programming primitives and persistence strategies. Retroactive computing enables powerful capabilities though, including computations and predictions of alternate application timelines, post-hoc bug fixes, or retroactive state explorations. We propose an event-driven programming model that is oriented towards serverless computing and applies retroaction to the event sourcing paradigm. Our model is deliberately restrictive, but therefore keeps the complexity of retroactive operations in check. We introduce retro-λ, a runtime platform that implements the model and provides retroactive capabilites to its applications. While retro-λ only shows negligible performance overheads compared to similar solutions for running regular applications, it enables its users to execute retroactive computations on the application histories as part of its programming model.
{"title":"Retro-λ","authors":"Dominik Meißner, Benjamin Erb, Frank Kargl, Matthias Tichy","doi":"10.1145/3210284.3210285","DOIUrl":"https://doi.org/10.1145/3210284.3210285","url":null,"abstract":"State changes over time are inherent characteristics of stateful applications. So far, there are almost no attempts to make the past application history programmatically accessible or even modifiable. This is primarily due to the complexity of temporal changes and a difficult alignment with prevalent programming primitives and persistence strategies. Retroactive computing enables powerful capabilities though, including computations and predictions of alternate application timelines, post-hoc bug fixes, or retroactive state explorations. We propose an event-driven programming model that is oriented towards serverless computing and applies retroaction to the event sourcing paradigm. Our model is deliberately restrictive, but therefore keeps the complexity of retroactive operations in check. We introduce retro-λ, a runtime platform that implements the model and provides retroactive capabilites to its applications. While retro-λ only shows negligible performance overheads compared to similar solutions for running regular applications, it enables its users to execute retroactive computations on the application histories as part of its programming model.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122466129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data acquisition systems for particle physics experiments produce vasts amounts of data. It is sometimes unfeasible to store it all since the storage requirements will be enormous. For this reason, an on-line filtering system selects the relevant pieces of information according to the goals of the experiment, before finally sending them to permanent storage. While data is being analyzed, it is temporarily stored in a large high-speed buffering system. Data production follows a cycle, with long periods of many hours where no data is being produced by the experiment. Also, data production is not constant, and there are fluctuations in the input rate. This offers the possibility of over-provisioning the buffering system and trading processing power for storage space. This buffer can be used for storage for periods of many days. In this work, a model was created to study the behavior of some aspects of the ATLAS data acquisition system, and specifically the buffering system for the on-line filter.
{"title":"Buffering Strategies for Large-Scale Data-Acquisition Systems","authors":"Alejandro Santos","doi":"10.1145/3210284.3219500","DOIUrl":"https://doi.org/10.1145/3210284.3219500","url":null,"abstract":"Data acquisition systems for particle physics experiments produce vasts amounts of data. It is sometimes unfeasible to store it all since the storage requirements will be enormous. For this reason, an on-line filtering system selects the relevant pieces of information according to the goals of the experiment, before finally sending them to permanent storage. While data is being analyzed, it is temporarily stored in a large high-speed buffering system. Data production follows a cycle, with long periods of many hours where no data is being produced by the experiment. Also, data production is not constant, and there are fluctuations in the input rate. This offers the possibility of over-provisioning the buffering system and trading processing power for storage space. This buffer can be used for storage for periods of many days. In this work, a model was created to study the behavior of some aspects of the ATLAS data acquisition system, and specifically the buffering system for the on-line filter.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126590926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The primary consumption of news is now increasingly online and has resulted in a large volume of online news from varied news outlets. Consequently news aggregators have become popular for clustering, ranking and personalization of news which process millions of news articles each day. In addition, since news articles stream constantly, there is a need for a scalable event-based system which can facilitate news mining in an online fashion. To address these challenges, we propose a distributed framework to process news articles and cluster them to facilitate many news mining tasks. The core of our system is a novel and scalable distributed clustering algorithm using Locality Sensitive Hashing which is robust to outliers and noise. In addition, we also propose an online version of the clustering algorithm to dynamically maintain the news event clusters. We implement the proposed solution on Apache Spark. Using a large news collection with over 8 million news articles, we show that our approach outperforms widely-used clustering techniques such as K-Means both in run time and clustering quality.
{"title":"Distributed and Dynamic Clustering For News Events","authors":"Vinay Setty","doi":"10.1145/3210284.3219774","DOIUrl":"https://doi.org/10.1145/3210284.3219774","url":null,"abstract":"The primary consumption of news is now increasingly online and has resulted in a large volume of online news from varied news outlets. Consequently news aggregators have become popular for clustering, ranking and personalization of news which process millions of news articles each day. In addition, since news articles stream constantly, there is a need for a scalable event-based system which can facilitate news mining in an online fashion. To address these challenges, we propose a distributed framework to process news articles and cluster them to facilitate many news mining tasks. The core of our system is a novel and scalable distributed clustering algorithm using Locality Sensitive Hashing which is robust to outliers and noise. In addition, we also propose an online version of the clustering algorithm to dynamically maintain the news event clusters. We implement the proposed solution on Apache Spark. Using a large news collection with over 8 million news articles, we show that our approach outperforms widely-used clustering techniques such as K-Means both in run time and clustering quality.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131950191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose scalable algorithms allowing primo to infer a map of vessels' trajectories and secundo to predict future locations of a vessel on sea. Our system is based on Apache Spark -a fast and scalable engine for large-scale data processing. The training dataset is event-based. Each event depicts the GPS position of the vessel at a timestamp. We propose and implement a workflow computing trips' patterns, with GPS locations of each trip summarized using GeoHashing. The latter is an efficient encoding of a geographic location into a short string of letters and digits. In order to perform prediction queries efficiently, we propose (i) a geohash positional index which maps each geohash to a list of pairs (trip-pattern-identifier, offset of the geohash in the geohash sequence of the trip-pattern), (ii) a departure-port index which maps each departure port to a list of trip-patterns' identifiers, as well as (iii) a pairwise geohash sequence alignment allowing to score the similarity of two geohash-sequences using queen-spatial neighborhood.
{"title":"Scalable Maritime Traffic Map Inference and Real-time Prediction of Vessels' Future Locations on Apache Spark","authors":"Rim Moussa","doi":"10.1145/3210284.3220506","DOIUrl":"https://doi.org/10.1145/3210284.3220506","url":null,"abstract":"In this paper, we propose scalable algorithms allowing primo to infer a map of vessels' trajectories and secundo to predict future locations of a vessel on sea. Our system is based on Apache Spark -a fast and scalable engine for large-scale data processing. The training dataset is event-based. Each event depicts the GPS position of the vessel at a timestamp. We propose and implement a workflow computing trips' patterns, with GPS locations of each trip summarized using GeoHashing. The latter is an efficient encoding of a geographic location into a short string of letters and digits. In order to perform prediction queries efficiently, we propose (i) a geohash positional index which maps each geohash to a list of pairs (trip-pattern-identifier, offset of the geohash in the geohash sequence of the trip-pattern), (ii) a departure-port index which maps each departure port to a list of trip-patterns' identifiers, as well as (iii) a pairwise geohash sequence alignment allowing to score the similarity of two geohash-sequences using queen-spatial neighborhood.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121154612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed pub/sub must make principal design choices with regards to overlay topologies and routing protocols. It is challenging to tackle both aspects together, and most existing work merely considers one. We argue the necessity to address both problems simultaneously since only the right combination of the two can deliver an efficient internet-scale pub/sub. Traditional design space spans from structured data-oblivious overlays employing greedy routing strategies all the way to unstructured data-driven overlays using naive broadcast-based routing. The two ends of the spectra come with unacceptable prices: the former often exerts considerable overhead on each node for forwarding irrelevant messages, while the latter is difficult to scale due to prohibitive latencies stemming from unbounded node degrees and network diameters. To achieve the best of both worlds, we propose BeaConvey, a distributed pub/sub system for federated environments. First, we define the small-world and interest-close overlay (SWICO) that embraces both small-world properties and pub/sub semantics. To construct a SWCIO, we devise a greedy heuristic to assign small-world identifiers and fingers in a centralized manner. Second, we develop a family of peer-to-peer pub/sub routing protocols that leverages such SWICOs. Empirical evaluation shows that BeaConvey achieves substantial improvement in routing overhead and propagation delays. For instance, the routing overhead of BeaConvey is only 20% to 40% of the state of the art. This acceleration is consistent across a variety of pub/sub workloads, and BeaConvey obtains such adaptability by optimizing both overlay and routing, which complement each other in different situations. Under one Facebook workload with a skewed distribution, 78% of the improvement is accredited to a better overlay. Under another non-skewed workload, more advanced routing contributes 95% of cost reduction.
{"title":"BeaConvey","authors":"Chen Chen, Y. Tock, Sarunas Girdzijauskas","doi":"10.1145/3210284.3210287","DOIUrl":"https://doi.org/10.1145/3210284.3210287","url":null,"abstract":"Distributed pub/sub must make principal design choices with regards to overlay topologies and routing protocols. It is challenging to tackle both aspects together, and most existing work merely considers one. We argue the necessity to address both problems simultaneously since only the right combination of the two can deliver an efficient internet-scale pub/sub. Traditional design space spans from structured data-oblivious overlays employing greedy routing strategies all the way to unstructured data-driven overlays using naive broadcast-based routing. The two ends of the spectra come with unacceptable prices: the former often exerts considerable overhead on each node for forwarding irrelevant messages, while the latter is difficult to scale due to prohibitive latencies stemming from unbounded node degrees and network diameters. To achieve the best of both worlds, we propose BeaConvey, a distributed pub/sub system for federated environments. First, we define the small-world and interest-close overlay (SWICO) that embraces both small-world properties and pub/sub semantics. To construct a SWCIO, we devise a greedy heuristic to assign small-world identifiers and fingers in a centralized manner. Second, we develop a family of peer-to-peer pub/sub routing protocols that leverages such SWICOs. Empirical evaluation shows that BeaConvey achieves substantial improvement in routing overhead and propagation delays. For instance, the routing overhead of BeaConvey is only 20% to 40% of the state of the art. This acceleration is consistent across a variety of pub/sub workloads, and BeaConvey obtains such adaptability by optimizing both overlay and routing, which complement each other in different situations. Under one Facebook workload with a skewed distribution, 78% of the improvement is accredited to a better overlay. Under another non-skewed workload, more advanced routing contributes 95% of cost reduction.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122895968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial Intelligence is leading to ubiquitous sources of Big Data arriving at high-velocity and in real-time. To effectively deal with it, we need to be able to adapt to changes in the distribution of the data being produced, and we need to do it using a minimum amount of time and memory. In this paper, we detail modern applications falling into this context, and discuss some state-of-the-art methodologies in mining data streams in real-time, and the open source tools that are available to do machine learning/data mining in real-time for this challenging setting.
{"title":"Ubiquitous Artificial Intelligence and Dynamic Data Streams","authors":"A. Bifet, J. Read","doi":"10.1145/3210284.3214345","DOIUrl":"https://doi.org/10.1145/3210284.3214345","url":null,"abstract":"Artificial Intelligence is leading to ubiquitous sources of Big Data arriving at high-velocity and in real-time. To effectively deal with it, we need to be able to adapt to changes in the distribution of the data being produced, and we need to do it using a minimum amount of time and memory. In this paper, we detail modern applications falling into this context, and discuss some state-of-the-art methodologies in mining data streams in real-time, and the open source tools that are available to do machine learning/data mining in real-time for this challenging setting.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128534045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyungkun Jung, Kang-Woo Lee, Joong-Hyun Choi, Eun-Sun Cho
1Predicting the destination port and arrival time of a vessel is challenging, even with the availability of a tremendous amount of trace data. Our goal for this challenge is to build a solution to accurately predict the destination port and arrival times of a given vessel using Bayesian inference and heuristics.
{"title":"Bayesian Estimation of Vessel Destination and Arrival Times","authors":"Hyungkun Jung, Kang-Woo Lee, Joong-Hyun Choi, Eun-Sun Cho","doi":"10.1145/3210284.3220501","DOIUrl":"https://doi.org/10.1145/3210284.3220501","url":null,"abstract":"1Predicting the destination port and arrival time of a vessel is challenging, even with the availability of a tremendous amount of trace data. Our goal for this challenge is to build a solution to accurately predict the destination port and arrival times of a given vessel using Bayesian inference and heuristics.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"10 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125728115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Ritter, Norman May, F. Forsberg, S. Rinderle-Ma
Enterprise Application Integration is the centerpiece of current on-premise, cloud and device integration scenarios. We describe optimization strategies that help reduce the model complexity, and improve the process execution using design time techniques. In order to achieve this, we formalize compositions of Enterprise Integration Patterns based on their characteristics, and propose a realization of optimization strategies using graph rewriting. The framework is successfully evaluated on a real-world catalog of pattern compositions, containing over 900 integration scenarios.
{"title":"Optimization Strategies for Integration Pattern Compositions","authors":"Daniel Ritter, Norman May, F. Forsberg, S. Rinderle-Ma","doi":"10.1145/3210284.3210295","DOIUrl":"https://doi.org/10.1145/3210284.3210295","url":null,"abstract":"Enterprise Application Integration is the centerpiece of current on-premise, cloud and device integration scenarios. We describe optimization strategies that help reduce the model complexity, and improve the process execution using design time techniques. In order to achieve this, we formalize compositions of Enterprise Integration Patterns based on their characteristics, and propose a realization of optimization strategies using graph rewriting. The framework is successfully evaluated on a real-world catalog of pattern compositions, containing over 900 integration scenarios.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133843857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a neural network based system to predict vessels' trajectories including the destination port and estimated arrival time. The system is designed to address DEBS Grand Challenge 2018, which provides a set of data streams containing vessel information and coordinates ordered by time. Our goal is to design a system which can accurately predict future trajectories, destination port and arrival time for a vessel. Our solution is based on the sequence-to-sequence model which uses a spatial grid for trajectory prediction. We divided sea area into a spatial grid and then used vessels' recent trajectory as a sequence of codes to extract movement tendency. The extracted movement tendency allowed us to predict future movements till the destination. We built our solution using distributed architecture model and applied load balancing techniques to achieve maximum performance and scalability. We also design an interactive user interface which showcases real-time trajectories of vessels including their predicted destination and arrival time.
{"title":"Vessel Trajectory Prediction using Sequence-to-Sequence Models over Spatial Grid","authors":"Duc-Duy Nguyen, Chan Le Van, M. Ali","doi":"10.1145/3210284.3219775","DOIUrl":"https://doi.org/10.1145/3210284.3219775","url":null,"abstract":"In this paper, we propose a neural network based system to predict vessels' trajectories including the destination port and estimated arrival time. The system is designed to address DEBS Grand Challenge 2018, which provides a set of data streams containing vessel information and coordinates ordered by time. Our goal is to design a system which can accurately predict future trajectories, destination port and arrival time for a vessel. Our solution is based on the sequence-to-sequence model which uses a spatial grid for trajectory prediction. We divided sea area into a spatial grid and then used vessels' recent trajectory as a sequence of codes to extract movement tendency. The extracted movement tendency allowed us to predict future movements till the destination. We built our solution using distributed architecture model and applied load balancing techniques to achieve maximum performance and scalability. We also design an interactive user interface which showcases real-time trajectories of vessels including their predicted destination and arrival time.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"90 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113968799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}