Pub Date : 2023-08-01DOI: 10.14778/3611540.3611567
Anna Povzner, Prince Mahajan, Jason Gustafson, Jun Rao, Ismael Juma, Feng Min, Shriram Sridharan, Nikhil Bhatia, Gopi Attaluri, Adithya Chandra, Stanislav Kozlovski, Rajini Sivaram, Lucas Bradstreet, Bob Barrett, Dhruvil Shah, David Jacot, David Arthur, Ron Dagostino, Colin McCabe, Manikumar Reddy Obili, Kowshik Prakasam, Jose Garcia Sancio, Vikas Singh, Alok Nikhil, Kamal Gupta
Event streaming is an increasingly critical infrastructure service used in many industries and there is growing demand for cloud-native solutions. Confluent Cloud provides a massive scale event streaming platform built on top of Apache Kafka with tens of thousands of clusters running in 70+ regions across AWS, Google Cloud, and Azure. This paper introduces Kora , the cloud-native platform for Apache Kafka at the core of Confluent Cloud. We describe Kora's design that enables it to meet its cloud-native goals, such as reliability, elasticity, and cost efficiency. We discuss Kora's abstractions which allow users to think in terms of their workload requirements and not the underlying infrastructure, and we discuss how Kora is designed to provide consistent, predictable performance across cloud environments with diverse capabilities.
{"title":"Kora: A Cloud-Native Event Streaming Platform for Kafka","authors":"Anna Povzner, Prince Mahajan, Jason Gustafson, Jun Rao, Ismael Juma, Feng Min, Shriram Sridharan, Nikhil Bhatia, Gopi Attaluri, Adithya Chandra, Stanislav Kozlovski, Rajini Sivaram, Lucas Bradstreet, Bob Barrett, Dhruvil Shah, David Jacot, David Arthur, Ron Dagostino, Colin McCabe, Manikumar Reddy Obili, Kowshik Prakasam, Jose Garcia Sancio, Vikas Singh, Alok Nikhil, Kamal Gupta","doi":"10.14778/3611540.3611567","DOIUrl":"https://doi.org/10.14778/3611540.3611567","url":null,"abstract":"Event streaming is an increasingly critical infrastructure service used in many industries and there is growing demand for cloud-native solutions. Confluent Cloud provides a massive scale event streaming platform built on top of Apache Kafka with tens of thousands of clusters running in 70+ regions across AWS, Google Cloud, and Azure. This paper introduces Kora , the cloud-native platform for Apache Kafka at the core of Confluent Cloud. We describe Kora's design that enables it to meet its cloud-native goals, such as reliability, elasticity, and cost efficiency. We discuss Kora's abstractions which allow users to think in terms of their workload requirements and not the underlying infrastructure, and we discuss how Kora is designed to provide consistent, predictable performance across cloud environments with diverse capabilities.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135002986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.14778/3611540.3611598
Mahdi Ghorbani, Amir Shaikhha
Machine learning over relational data has been used in several applications. The traditional approach of joining relations first and then training a model on the joined table is time-consuming and requires a significant amount of memory. Recent research has focused on in-database machine learning (in-DB ML) to address this issue; these methods train the models over relations without joining, resulting in a more efficient process. However, such systems have ad-hoc user interfaces and specific data formats, making them challenging to use. To address this problem, this paper presents OpenDBML, a framework for democratizing in-DB ML. OpenDBML offers a Python interface for multiple in-DB ML systems, a set of commonly used datasets, and the ability to add new datasets and in-DB ML systems via both Python and web interfaces. The paper also presents comprehensive demonstration scenarios to illustrate how to use OpenDBML effectively.
{"title":"Demonstration of OpenDBML, a Framework for Democratizing In-Database Machine Learning","authors":"Mahdi Ghorbani, Amir Shaikhha","doi":"10.14778/3611540.3611598","DOIUrl":"https://doi.org/10.14778/3611540.3611598","url":null,"abstract":"Machine learning over relational data has been used in several applications. The traditional approach of joining relations first and then training a model on the joined table is time-consuming and requires a significant amount of memory. Recent research has focused on in-database machine learning (in-DB ML) to address this issue; these methods train the models over relations without joining, resulting in a more efficient process. However, such systems have ad-hoc user interfaces and specific data formats, making them challenging to use. To address this problem, this paper presents OpenDBML, a framework for democratizing in-DB ML. OpenDBML offers a Python interface for multiple in-DB ML systems, a set of commonly used datasets, and the ability to add new datasets and in-DB ML systems via both Python and web interfaces. The paper also presents comprehensive demonstration scenarios to illustrate how to use OpenDBML effectively.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Time series forecasting, that predicts events through a sequence of time, has received increasing attention in past decades. The diverse range of time series forecasting models presents a challenge for selecting the most suitable model for a given dataset. As such, the Alibaba Cloud database monitoring system must address the issue of selecting an optimal forecasting model for a single time series data. While several model selection frameworks, including AutoAI-TS, have been developed to predict a dataset, their effectiveness may be limited as they may not adapt well to all types of time series, resulting in reduced prediction accuracy. Alternatively, models such as AutoForecast, which train on individual data points, may offer better adaptability but are limited by longer training time required. In this paper, we introduce SimpleTS, a versatile framework for time series forecasting that exhibits high efficiency and accuracy across all types of time series data. When performing an online prediction task, SimpleTS first classifies input time series into one type, and then efficiently selects the most suitable prediction model for this type. To optimize performance, SimpleTS (i) clusters models with similar performance to improve the efficiency of classification; (ii) uses soft labeling and weighted representation learning to achieve higher classification accuracy for different time series types. Extensive experiments on 3 private datasets and 52 public datasets show that SimpleTS outperforms the state-of-the-art toolkits in terms of both training time and prediction accuracy.
{"title":"SimpleTS: An Efficient and Universal Model Selection Framework for Time Series Forecasting","authors":"Yuanyuan Yao, Dimeng Li, Hailiang Jie, Hailiang Jie, Tianyi Li, Jie Chen, Jiaqi Wang, Feifei Li, Yunjun Gao","doi":"10.14778/3611540.3611561","DOIUrl":"https://doi.org/10.14778/3611540.3611561","url":null,"abstract":"Time series forecasting, that predicts events through a sequence of time, has received increasing attention in past decades. The diverse range of time series forecasting models presents a challenge for selecting the most suitable model for a given dataset. As such, the Alibaba Cloud database monitoring system must address the issue of selecting an optimal forecasting model for a single time series data. While several model selection frameworks, including AutoAI-TS, have been developed to predict a dataset, their effectiveness may be limited as they may not adapt well to all types of time series, resulting in reduced prediction accuracy. Alternatively, models such as AutoForecast, which train on individual data points, may offer better adaptability but are limited by longer training time required. In this paper, we introduce SimpleTS, a versatile framework for time series forecasting that exhibits high efficiency and accuracy across all types of time series data. When performing an online prediction task, SimpleTS first classifies input time series into one type, and then efficiently selects the most suitable prediction model for this type. To optimize performance, SimpleTS (i) clusters models with similar performance to improve the efficiency of classification; (ii) uses soft labeling and weighted representation learning to achieve higher classification accuracy for different time series types. Extensive experiments on 3 private datasets and 52 public datasets show that SimpleTS outperforms the state-of-the-art toolkits in terms of both training time and prediction accuracy.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.14778/3611540.3611603
Kristin Fritsch, Stefanie Scherzinger
With quantum computers now available as cloud services, there is a global quest for applications where a quantum advantage can be shown. Naturally, data management is a candidate domain. Workable solutions require the design of hybrid quantum algorithms, where a quantum computing unit (a QPU) and classical computing (via CPUs) cooperate towards solving a problem. This demo illustrates such an end-to-end solution targeting NP-hard variants of database schema matching. Our demo is intended to be educational (and hopefully inspiring), allowing participants to explore the critical design decisions, such as the handover between phases of QPU- and CPU-based computation. It will also allow participants to experience hands-on - through playful interaction - how easily problem sizes exceed the limitations of today's QPUs.
{"title":"Solving Hard Variants of Database Schema Matching on Quantum Computers","authors":"Kristin Fritsch, Stefanie Scherzinger","doi":"10.14778/3611540.3611603","DOIUrl":"https://doi.org/10.14778/3611540.3611603","url":null,"abstract":"With quantum computers now available as cloud services, there is a global quest for applications where a quantum advantage can be shown. Naturally, data management is a candidate domain. Workable solutions require the design of hybrid quantum algorithms, where a quantum computing unit (a QPU) and classical computing (via CPUs) cooperate towards solving a problem. This demo illustrates such an end-to-end solution targeting NP-hard variants of database schema matching. Our demo is intended to be educational (and hopefully inspiring), allowing participants to explore the critical design decisions, such as the handover between phases of QPU- and CPU-based computation. It will also allow participants to experience hands-on - through playful interaction - how easily problem sizes exceed the limitations of today's QPUs.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.14778/3611540.3611576
Zhengtong Yan, Valter Uotila, Jiaheng Lu
Join Order Selection (JOS) is a fundamental challenge in query optimization, as it significantly affects query performance. However, finding an optimal join order is an NP-hard problem due to the exponentially large search space. Despite the decades-long effort, traditional methods still suffer from limitations. Deep Reinforcement Learning (DRL) approaches have recently gained growing interest and shown superior performance over traditional methods. These DRL-based methods could leverage prior experience through the trial-and-error strategy to automatically explore the optimal join order. This tutorial will focus on recent DRL-based approaches for join order selection by providing a comprehensive overview of the various approaches. We will start by briefly introducing the core concepts of join ordering and the traditional methods for JOS. Next, we will provide some preliminary knowledge about DRL and then delve into DRL-based join order selection approaches by offering detailed information on those methods, analyzing their relationships, and summarizing their weaknesses and strengths. To help the audience gain a deeper understanding of DRL approaches for JOS, we will present two open-source demonstrations and compare their differences. Finally, we will identify research challenges and open problems to provide insights into future research directions. This tutorial will provide valuable guidance for developing more practical DRL approaches for JOS.
Join Order Selection (Join Order Selection, JOS)是查询优化中的一个基本挑战,因为它会显著影响查询性能。然而,由于搜索空间呈指数级增长,寻找最优连接顺序是一个np困难问题。尽管经过了几十年的努力,传统方法仍然受到限制。深度强化学习(DRL)方法最近获得了越来越多的兴趣,并显示出优于传统方法的性能。这些基于drl的方法可以通过试错策略利用先前的经验来自动探索最优连接顺序。本教程将通过对各种方法的全面概述,重点介绍最近用于连接顺序选择的基于drl的方法。我们将首先简要介绍连接排序的核心概念和用于jo的传统方法。接下来,我们将提供一些关于DRL的初步知识,然后深入研究基于DRL的连接顺序选择方法,提供有关这些方法的详细信息,分析它们之间的关系,并总结它们的优缺点。为了帮助读者更深入地理解用于JOS的DRL方法,我们将提供两个开源演示并比较它们的差异。最后,我们将确定研究挑战和开放问题,以提供对未来研究方向的见解。本教程将为开发更实用的JOS DRL方法提供有价值的指导。
{"title":"Join Order Selection with Deep Reinforcement Learning: Fundamentals, Techniques, and Challenges","authors":"Zhengtong Yan, Valter Uotila, Jiaheng Lu","doi":"10.14778/3611540.3611576","DOIUrl":"https://doi.org/10.14778/3611540.3611576","url":null,"abstract":"Join Order Selection (JOS) is a fundamental challenge in query optimization, as it significantly affects query performance. However, finding an optimal join order is an NP-hard problem due to the exponentially large search space. Despite the decades-long effort, traditional methods still suffer from limitations. Deep Reinforcement Learning (DRL) approaches have recently gained growing interest and shown superior performance over traditional methods. These DRL-based methods could leverage prior experience through the trial-and-error strategy to automatically explore the optimal join order. This tutorial will focus on recent DRL-based approaches for join order selection by providing a comprehensive overview of the various approaches. We will start by briefly introducing the core concepts of join ordering and the traditional methods for JOS. Next, we will provide some preliminary knowledge about DRL and then delve into DRL-based join order selection approaches by offering detailed information on those methods, analyzing their relationships, and summarizing their weaknesses and strengths. To help the audience gain a deeper understanding of DRL approaches for JOS, we will present two open-source demonstrations and compare their differences. Finally, we will identify research challenges and open problems to provide insights into future research directions. This tutorial will provide valuable guidance for developing more practical DRL approaches for JOS.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135002991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.14778/3611540.3611592
Xiang Wang, Xin Wang, Zhaozhuo Li, Dong Han
Visual query is a vital technique for comprehending and analyzing knowledge graphs, which provides an effective method to lower the barrier of querying knowledge graphs for non-professional users. Nevertheless, visual query techniques for knowledge graphs and ontologies that have emerged in recent years cannot bridge the gap between global information provided by the knowledge graph schema and underlying data of knowledge graph. Thus it cannot fully exploit the global information to navigate users for querying knowledge graphs. This demonstration showcases KGNav, a Knowledge Graph Navigational visual query system. KGNav (1) redefines the minimal unit of operation to abstract the conceptual hierarchy, i.e., Knowledge Graph Schema, in the domain from the original knowledge graph in an offline semi-automatic way through the equivalence relations between these units; it also (2) provides a series of operators and an interactive GUI to capture user query intentions, guiding users to explore the Knowledge Graph Schema to achieve in-depth analysis of knowledge graphs. We will demonstrate the capability of KGNav in reducing tedious queries, enabling users to swiftly grasp the structure of the knowledge graph, and performing queries through several fundamental scenarios.
{"title":"KGNav: A Knowledge Graph Navigational Visual Query System","authors":"Xiang Wang, Xin Wang, Zhaozhuo Li, Dong Han","doi":"10.14778/3611540.3611592","DOIUrl":"https://doi.org/10.14778/3611540.3611592","url":null,"abstract":"Visual query is a vital technique for comprehending and analyzing knowledge graphs, which provides an effective method to lower the barrier of querying knowledge graphs for non-professional users. Nevertheless, visual query techniques for knowledge graphs and ontologies that have emerged in recent years cannot bridge the gap between global information provided by the knowledge graph schema and underlying data of knowledge graph. Thus it cannot fully exploit the global information to navigate users for querying knowledge graphs. This demonstration showcases KGNav, a Knowledge Graph Navigational visual query system. KGNav (1) redefines the minimal unit of operation to abstract the conceptual hierarchy, i.e., Knowledge Graph Schema, in the domain from the original knowledge graph in an offline semi-automatic way through the equivalence relations between these units; it also (2) provides a series of operators and an interactive GUI to capture user query intentions, guiding users to explore the Knowledge Graph Schema to achieve in-depth analysis of knowledge graphs. We will demonstrate the capability of KGNav in reducing tedious queries, enabling users to swiftly grasp the structure of the knowledge graph, and performing queries through several fundamental scenarios.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The development of Earth Observation technology contributes to the production of massive raster data. It is vital to manage and conduct analytical tasks on the raster data. Existing solutions employ dedicated systems for the raster data management and processing, respectively, incurring problems such as data redundancy, difficulty in updating, expensive data transferring and transformation, etc. To cope with these limitations, this demonstration presents Ganos Aero, a cloud-native system for big raster data management and processing. Ganos Aero proposes a unified raster data model for both the data management and processing, which stores a single copy of the raster data and without performing an expensive tiling procedure, and thus achieves significant improvement in the storage and updating efficiency. To enable efficient query and batch task processing, Ganos Aero implements an on-the-fly tile production mechanism, and optimizes its performance using the cloud features including decoupling compute from storage and pushing costly operations closer to the storage layer. Since deployed in Alibaba Cloud in 2022, Ganos Aero has been playing a critical role in many real applications including the modern agriculture, environment monitoring and protection, et al.
{"title":"Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing","authors":"Fei Xiao, Jiong Xie, Zhida Chen, Feifei Li, Zhen Chen, Jianwei Liu, Yinpei Liu","doi":"10.14778/3611540.3611597","DOIUrl":"https://doi.org/10.14778/3611540.3611597","url":null,"abstract":"The development of Earth Observation technology contributes to the production of massive raster data. It is vital to manage and conduct analytical tasks on the raster data. Existing solutions employ dedicated systems for the raster data management and processing, respectively, incurring problems such as data redundancy, difficulty in updating, expensive data transferring and transformation, etc. To cope with these limitations, this demonstration presents Ganos Aero, a cloud-native system for big raster data management and processing. Ganos Aero proposes a unified raster data model for both the data management and processing, which stores a single copy of the raster data and without performing an expensive tiling procedure, and thus achieves significant improvement in the storage and updating efficiency. To enable efficient query and batch task processing, Ganos Aero implements an on-the-fly tile production mechanism, and optimizes its performance using the cloud features including decoupling compute from storage and pushing costly operations closer to the storage layer. Since deployed in Alibaba Cloud in 2022, Ganos Aero has been playing a critical role in many real applications including the modern agriculture, environment monitoring and protection, et al.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, at ByteDance, we have started seeing more and more business scenarios that require performing real-time data serving besides complex Ad Hoc analysis over large amounts of freshly imported data. The serving workload requires performing complex queries over massive newly added data items with minimal delay. These systems are often used in mission-critical scenarios, whereas traditional OLAP systems cannot handle such use cases. To work around the problem, ByteDance products often have to use multiple systems together in production, forcing the same data to be ETLed into multiple systems, causing data consistency problems, wasting resources, and increasing learning and maintenance costs. To solve the above problem, we built a single Hybrid Serving and Analytical Processing (HSAP) system to handle both workload types. HSAP is still in its early stage, and very few systems are yet on the market. This paper demonstrates how to build Krypton, a competitive cloud-native HSAP system that provides both excellent elasticity and query performance by utilizing many previously known query processing techniques, a hierarchical cache with persistent memory, and a native columnar storage format. Krypton can support high data freshness, high data ingestion rates, and strong data consistency. We also discuss lessons and best practices we learned in developing and operating Krypton in production.
{"title":"Krypton: Real-Time Serving and Analytical SQL Engine at ByteDance","authors":"Jianjun Chen, Rui Shi, Heng Chen, Li Zhang, Ruidong Li, Wei Ding, Liya Fan, Hao Wang, Mu Xiong, Yuxiang Chen, Benchao Dong, Kuankuan Guo, Yuanjin Lin, Xiao Liu, Haiyang Shi, Peipei Wang, Zikang Wang, Yemeng Yang, Junda Zhao, Dongyan Zhou, Zhikai Zuo, Yuming Liang","doi":"10.14778/3611540.3611545","DOIUrl":"https://doi.org/10.14778/3611540.3611545","url":null,"abstract":"In recent years, at ByteDance, we have started seeing more and more business scenarios that require performing real-time data serving besides complex Ad Hoc analysis over large amounts of freshly imported data. The serving workload requires performing complex queries over massive newly added data items with minimal delay. These systems are often used in mission-critical scenarios, whereas traditional OLAP systems cannot handle such use cases. To work around the problem, ByteDance products often have to use multiple systems together in production, forcing the same data to be ETLed into multiple systems, causing data consistency problems, wasting resources, and increasing learning and maintenance costs. To solve the above problem, we built a single Hybrid Serving and Analytical Processing (HSAP) system to handle both workload types. HSAP is still in its early stage, and very few systems are yet on the market. This paper demonstrates how to build Krypton, a competitive cloud-native HSAP system that provides both excellent elasticity and query performance by utilizing many previously known query processing techniques, a hierarchical cache with persistent memory, and a native columnar storage format. Krypton can support high data freshness, high data ingestion rates, and strong data consistency. We also discuss lessons and best practices we learned in developing and operating Krypton in production.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.14778/3611540.3611618
Xintong Song, Yusen Zhu, Jianfei Wu, Bai Liu, Hongkang Wei
Anomaly detection has been extensively implemented in industry. The reality is that an application may have numerous scenarios where anomalies need to be monitored. However, the complete process of anomaly detection will take much time, including data acquisition, data processing, model training, and model deployment. In particular, some simple scenarios do not require building complex anomaly detection models. This results in a waste of resources. To solve these problems, we build an anomaly detection pipeline(ADOps) to modularize each step. For simple anomaly detection scenarios, no programming is required and new anomaly detection tasks can be created by simply modifying the configuration file. In addition, it can also improve the development efficiency of complex anomaly detection models. We show how users create anomaly detection tasks on the anomaly detection pipeline and how engineers use it to develop anomaly detection models.
{"title":"ADOps: An Anomaly Detection Pipeline in Structured Logs","authors":"Xintong Song, Yusen Zhu, Jianfei Wu, Bai Liu, Hongkang Wei","doi":"10.14778/3611540.3611618","DOIUrl":"https://doi.org/10.14778/3611540.3611618","url":null,"abstract":"Anomaly detection has been extensively implemented in industry. The reality is that an application may have numerous scenarios where anomalies need to be monitored. However, the complete process of anomaly detection will take much time, including data acquisition, data processing, model training, and model deployment. In particular, some simple scenarios do not require building complex anomaly detection models. This results in a waste of resources. To solve these problems, we build an anomaly detection pipeline(ADOps) to modularize each step. For simple anomaly detection scenarios, no programming is required and new anomaly detection tasks can be created by simply modifying the configuration file. In addition, it can also improve the development efficiency of complex anomaly detection models. We show how users create anomaly detection tasks on the anomaly detection pipeline and how engineers use it to develop anomaly detection models.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.14778/3611540.3611619
Jonas Spenger, Chengyang Huang, Philipp Haller, Paris Carbone
Serverless applications spanning the cloud and edge require flexible programming frameworks for expressing compositions across the different levels of deployment. Another critical aspect for applications with state is failure resilience beyond the scope of a single dataflow graph that is the current standard in data streaming systems. This paper presents Portals, an interactive, stateful dataflow composition framework with strong end-to-end guarantees. Portals enables event-driven, resilient applications that span across dataflow graphs and serverless deployments. The demonstration exhibits three scenarios in our multi-dataflow streaming-based system: dynamically composing a stateful serverless application; an interactive cloud and edge serverless application; and a Portals browser playground.
{"title":"Portals: A Showcase of Multi-Dataflow Stateful Serverless","authors":"Jonas Spenger, Chengyang Huang, Philipp Haller, Paris Carbone","doi":"10.14778/3611540.3611619","DOIUrl":"https://doi.org/10.14778/3611540.3611619","url":null,"abstract":"Serverless applications spanning the cloud and edge require flexible programming frameworks for expressing compositions across the different levels of deployment. Another critical aspect for applications with state is failure resilience beyond the scope of a single dataflow graph that is the current standard in data streaming systems. This paper presents Portals, an interactive, stateful dataflow composition framework with strong end-to-end guarantees. Portals enables event-driven, resilient applications that span across dataflow graphs and serverless deployments. The demonstration exhibits three scenarios in our multi-dataflow streaming-based system: dynamically composing a stateful serverless application; an interactive cloud and edge serverless application; and a Portals browser playground.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}