Proceedings of the Vldb Endowment最新文献

英文中文

Cornet: Learning Spreadsheet Formatting Rules by Example 通过示例学习电子表格格式规则

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611620

Mukul Singh, José Cambronero Sanchez, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen

Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most spreadsheet platforms is the ability to define data-dependent formatting rules. These rules can express actions such as "color red all entries in a column that are negative" or "bold all rows not containing error or failure". Unfortunately, users who want to exercise this functionality need to manually write these conditional formatting (CF) rules. We introduce Cornet, a system that automatically learns such conditional formatting rules from user examples. Cornet takes inspiration from inductive program synthesis and combines symbolic rule enumeration, based on semi-supervised clustering and iterative decision tree learning, with a neural ranker to produce accurate conditional formatting rules. In this demonstration, we show Cornet in action as a simple add-in to Microsoft's Excel. After the user provides one or two formatted cells as examples, Cornet generates formatting rule suggestions for the user to apply to the spreadsheet.

数据管理和分析任务通常使用电子表格软件进行。在大多数电子表格平台中，一个流行的特性是能够定义依赖于数据的格式化规则。这些规则可以表达诸如“将一列中为负数的所有条目涂成红色”或“将不包含错误或失败的所有行加粗”之类的操作。不幸的是，想要使用此功能的用户需要手动编写这些条件格式化(CF)规则。我们介绍Cornet，一个从用户示例中自动学习条件格式规则的系统。Cornet从归纳程序综合中获得灵感，将基于半监督聚类和迭代决策树学习的符号规则枚举与神经排序器相结合，以产生准确的条件格式规则。在这个演示中，我们将展示Cornet作为Microsoft Excel的一个简单插件的作用。在用户提供一个或两个格式化的单元格作为示例之后，Cornet将生成格式化规则建议，供用户应用于电子表格。

引用次数: 0

QO-Insight: Inspecting Steered Query Optimizers QO-Insight:检查导向查询优化器

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611586

Christoph Anneser, Mario Petruccelli, Nesime Tatbul, David Cohen, Zhenggang Xu, Prithviraj Pandian, Nikolay Laptev, Ryan Marcus, Alfons Kemper

Steered query optimizers address the planning mistakes of traditional query optimizers by providing them with hints on a per-query basis, thereby guiding them in the right direction. This paper introduces QO-Insight, a visual tool designed for exploring query execution traces of such steered query optimizers. Although steered query optimizers are typically perceived as black boxes, QO-Insight empowers database administrators and experts to gain qualitative insights and enhance their performance through visual inspection and analysis.

定向查询优化器通过在每个查询的基础上为传统查询优化器提供提示，从而解决了传统查询优化器的规划错误，从而将它们引导到正确的方向。本文介绍了qos - insight，这是一个可视化工具，用于探索此类导向查询优化器的查询执行轨迹。虽然导向查询优化器通常被视为黑盒，但qos - insight使数据库管理员和专家能够获得定性的见解，并通过视觉检查和分析提高性能。

引用次数: 1

KGNav: A Knowledge Graph Navigational Visual Query System KGNav:知识图谱导航可视化查询系统

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611592

Xiang Wang, Xin Wang, Zhaozhuo Li, Dong Han

Visual query is a vital technique for comprehending and analyzing knowledge graphs, which provides an effective method to lower the barrier of querying knowledge graphs for non-professional users. Nevertheless, visual query techniques for knowledge graphs and ontologies that have emerged in recent years cannot bridge the gap between global information provided by the knowledge graph schema and underlying data of knowledge graph. Thus it cannot fully exploit the global information to navigate users for querying knowledge graphs. This demonstration showcases KGNav, a Knowledge Graph Navigational visual query system. KGNav (1) redefines the minimal unit of operation to abstract the conceptual hierarchy, i.e., Knowledge Graph Schema, in the domain from the original knowledge graph in an offline semi-automatic way through the equivalence relations between these units; it also (2) provides a series of operators and an interactive GUI to capture user query intentions, guiding users to explore the Knowledge Graph Schema to achieve in-depth analysis of knowledge graphs. We will demonstrate the capability of KGNav in reducing tedious queries, enabling users to swiftly grasp the structure of the knowledge graph, and performing queries through several fundamental scenarios.

可视化查询是理解和分析知识图的重要技术，为非专业用户降低知识图查询的障碍提供了一种有效的方法。然而，近年来出现的针对知识图和本体的可视化查询技术并不能弥补知识图模式提供的全局信息与知识图底层数据之间的差距。因此，它不能充分利用全局信息来引导用户查询知识图谱。这个演示展示了KGNav，一个知识图谱导航可视化查询系统。KGNav(1)重新定义了最小操作单元，通过这些单元之间的等价关系，以离线半自动的方式从原始知识图中抽象出领域内的概念层次，即知识图图式(Knowledge Graph Schema);(2)提供了一系列操作符和交互式GUI来捕捉用户查询意图，引导用户探索知识图图式，实现对知识图的深入分析。我们将展示KGNav在减少繁琐查询，使用户能够快速掌握知识图的结构以及通过几个基本场景执行查询方面的能力。

{"title":"KGNav: A Knowledge Graph Navigational Visual Query System","authors":"Xiang Wang, Xin Wang, Zhaozhuo Li, Dong Han","doi":"10.14778/3611540.3611592","DOIUrl":"https://doi.org/10.14778/3611540.3611592","url":null,"abstract":"Visual query is a vital technique for comprehending and analyzing knowledge graphs, which provides an effective method to lower the barrier of querying knowledge graphs for non-professional users. Nevertheless, visual query techniques for knowledge graphs and ontologies that have emerged in recent years cannot bridge the gap between global information provided by the knowledge graph schema and underlying data of knowledge graph. Thus it cannot fully exploit the global information to navigate users for querying knowledge graphs. This demonstration showcases KGNav, a Knowledge Graph Navigational visual query system. KGNav (1) redefines the minimal unit of operation to abstract the conceptual hierarchy, i.e., Knowledge Graph Schema, in the domain from the original knowledge graph in an offline semi-automatic way through the equivalence relations between these units; it also (2) provides a series of operators and an interactive GUI to capture user query intentions, guiding users to explore the Knowledge Graph Schema to achieve in-depth analysis of knowledge graphs. We will demonstrate the capability of KGNav in reducing tedious queries, enabling users to swiftly grasp the structure of the knowledge graph, and performing queries through several fundamental scenarios.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing Ganos Aero:用于大栅格数据管理和处理的云原生系统

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611597

Fei Xiao, Jiong Xie, Zhida Chen, Feifei Li, Zhen Chen, Jianwei Liu, Yinpei Liu

The development of Earth Observation technology contributes to the production of massive raster data. It is vital to manage and conduct analytical tasks on the raster data. Existing solutions employ dedicated systems for the raster data management and processing, respectively, incurring problems such as data redundancy, difficulty in updating, expensive data transferring and transformation, etc. To cope with these limitations, this demonstration presents Ganos Aero, a cloud-native system for big raster data management and processing. Ganos Aero proposes a unified raster data model for both the data management and processing, which stores a single copy of the raster data and without performing an expensive tiling procedure, and thus achieves significant improvement in the storage and updating efficiency. To enable efficient query and batch task processing, Ganos Aero implements an on-the-fly tile production mechanism, and optimizes its performance using the cloud features including decoupling compute from storage and pushing costly operations closer to the storage layer. Since deployed in Alibaba Cloud in 2022, Ganos Aero has been playing a critical role in many real applications including the modern agriculture, environment monitoring and protection, et al.

对地观测技术的发展促进了大量栅格数据的产生。对栅格数据进行管理和分析是至关重要的。现有的解决方案分别采用专用系统进行栅格数据的管理和处理，存在数据冗余、更新困难、数据传输和转换成本高等问题。为了应对这些限制，本演示展示了Ganos Aero，一个用于大栅格数据管理和处理的云原生系统。Ganos Aero为数据管理和处理提出了统一的栅格数据模型，该模型存储栅格数据的单一副本，无需执行昂贵的平铺过程，从而显著提高了存储和更新效率。为了实现高效的查询和批处理任务，Ganos Aero实现了一种即时瓷砖生产机制，并使用云特性优化其性能，包括将计算与存储分离，并将昂贵的操作推到更靠近存储层的位置。自2022年部署到阿里云以来，Ganos Aero在现代农业、环境监测和保护等许多实际应用中发挥了关键作用。

{"title":"Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing","authors":"Fei Xiao, Jiong Xie, Zhida Chen, Feifei Li, Zhen Chen, Jianwei Liu, Yinpei Liu","doi":"10.14778/3611540.3611597","DOIUrl":"https://doi.org/10.14778/3611540.3611597","url":null,"abstract":"The development of Earth Observation technology contributes to the production of massive raster data. It is vital to manage and conduct analytical tasks on the raster data. Existing solutions employ dedicated systems for the raster data management and processing, respectively, incurring problems such as data redundancy, difficulty in updating, expensive data transferring and transformation, etc. To cope with these limitations, this demonstration presents Ganos Aero, a cloud-native system for big raster data management and processing. Ganos Aero proposes a unified raster data model for both the data management and processing, which stores a single copy of the raster data and without performing an expensive tiling procedure, and thus achieves significant improvement in the storage and updating efficiency. To enable efficient query and batch task processing, Ganos Aero implements an on-the-fly tile production mechanism, and optimizes its performance using the cloud features including decoupling compute from storage and pushing costly operations closer to the storage layer. Since deployed in Alibaba Cloud in 2022, Ganos Aero has been playing a critical role in many real applications including the modern agriculture, environment monitoring and protection, et al.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Demonstration of OpenDBML, a Framework for Democratizing In-Database Machine Learning openbml的演示，一个民主化的数据库内机器学习框架

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611598

Mahdi Ghorbani, Amir Shaikhha

Machine learning over relational data has been used in several applications. The traditional approach of joining relations first and then training a model on the joined table is time-consuming and requires a significant amount of memory. Recent research has focused on in-database machine learning (in-DB ML) to address this issue; these methods train the models over relations without joining, resulting in a more efficient process. However, such systems have ad-hoc user interfaces and specific data formats, making them challenging to use. To address this problem, this paper presents OpenDBML, a framework for democratizing in-DB ML. OpenDBML offers a Python interface for multiple in-DB ML systems, a set of commonly used datasets, and the ability to add new datasets and in-DB ML systems via both Python and web interfaces. The paper also presents comprehensive demonstration scenarios to illustrate how to use OpenDBML effectively.

在关系数据上的机器学习已经在几个应用中使用。首先连接关系，然后在连接表上训练模型的传统方法非常耗时，并且需要大量内存。最近的研究集中在数据库内机器学习(in-DB ML)来解决这个问题;这些方法在不连接的情况下对关系模型进行训练，从而产生更有效的过程。然而，这样的系统具有特别的用户界面和特定的数据格式，使它们难以使用。为了解决这个问题，本文提出了OpenDBML，一个民主化数据库内ML的框架。OpenDBML为多个数据库内ML系统提供了一个Python接口，一组常用的数据集，以及通过Python和web接口添加新数据集和数据库内ML系统的能力。本文还提供了全面的演示场景来说明如何有效地使用OpenDBML。

引用次数: 0

SimpleTS: An Efficient and Universal Model Selection Framework for Time Series Forecasting SimpleTS:一种有效且通用的时间序列预测模型选择框架

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611561

Yuanyuan Yao, Dimeng Li, Hailiang Jie, Hailiang Jie, Tianyi Li, Jie Chen, Jiaqi Wang, Feifei Li, Yunjun Gao

Time series forecasting, that predicts events through a sequence of time, has received increasing attention in past decades. The diverse range of time series forecasting models presents a challenge for selecting the most suitable model for a given dataset. As such, the Alibaba Cloud database monitoring system must address the issue of selecting an optimal forecasting model for a single time series data. While several model selection frameworks, including AutoAI-TS, have been developed to predict a dataset, their effectiveness may be limited as they may not adapt well to all types of time series, resulting in reduced prediction accuracy. Alternatively, models such as AutoForecast, which train on individual data points, may offer better adaptability but are limited by longer training time required. In this paper, we introduce SimpleTS, a versatile framework for time series forecasting that exhibits high efficiency and accuracy across all types of time series data. When performing an online prediction task, SimpleTS first classifies input time series into one type, and then efficiently selects the most suitable prediction model for this type. To optimize performance, SimpleTS (i) clusters models with similar performance to improve the efficiency of classification; (ii) uses soft labeling and weighted representation learning to achieve higher classification accuracy for different time series types. Extensive experiments on 3 private datasets and 52 public datasets show that SimpleTS outperforms the state-of-the-art toolkits in terms of both training time and prediction accuracy.

时间序列预测，即通过时间序列预测事件，在过去几十年中受到越来越多的关注。时间序列预测模型的多样性对给定数据集选择最合适的模型提出了挑战。因此，阿里云数据库监测系统必须解决单个时间序列数据选择最优预测模型的问题。虽然已经开发了包括AutoAI-TS在内的几个模型选择框架来预测数据集，但它们的有效性可能受到限制，因为它们可能无法很好地适应所有类型的时间序列，从而导致预测精度降低。另外，像AutoForecast这样在单个数据点上进行训练的模型可能提供更好的适应性，但受所需训练时间较长的限制。在本文中，我们介绍了SimpleTS，这是一个用于时间序列预测的通用框架，在所有类型的时间序列数据中都表现出高效率和准确性。在执行在线预测任务时，SimpleTS首先将输入的时间序列分类为一种类型，然后高效地选择最适合该类型的预测模型。为了优化性能，SimpleTS (i)将性能相近的模型聚类，提高分类效率;(ii)利用软标记和加权表示学习对不同时间序列类型实现更高的分类精度。在3个私有数据集和52个公共数据集上进行的大量实验表明，SimpleTS在训练时间和预测精度方面都优于最先进的工具包。

{"title":"SimpleTS: An Efficient and Universal Model Selection Framework for Time Series Forecasting","authors":"Yuanyuan Yao, Dimeng Li, Hailiang Jie, Hailiang Jie, Tianyi Li, Jie Chen, Jiaqi Wang, Feifei Li, Yunjun Gao","doi":"10.14778/3611540.3611561","DOIUrl":"https://doi.org/10.14778/3611540.3611561","url":null,"abstract":"Time series forecasting, that predicts events through a sequence of time, has received increasing attention in past decades. The diverse range of time series forecasting models presents a challenge for selecting the most suitable model for a given dataset. As such, the Alibaba Cloud database monitoring system must address the issue of selecting an optimal forecasting model for a single time series data. While several model selection frameworks, including AutoAI-TS, have been developed to predict a dataset, their effectiveness may be limited as they may not adapt well to all types of time series, resulting in reduced prediction accuracy. Alternatively, models such as AutoForecast, which train on individual data points, may offer better adaptability but are limited by longer training time required. In this paper, we introduce SimpleTS, a versatile framework for time series forecasting that exhibits high efficiency and accuracy across all types of time series data. When performing an online prediction task, SimpleTS first classifies input time series into one type, and then efficiently selects the most suitable prediction model for this type. To optimize performance, SimpleTS (i) clusters models with similar performance to improve the efficiency of classification; (ii) uses soft labeling and weighted representation learning to achieve higher classification accuracy for different time series types. Extensive experiments on 3 private datasets and 52 public datasets show that SimpleTS outperforms the state-of-the-art toolkits in terms of both training time and prediction accuracy.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Solving Hard Variants of Database Schema Matching on Quantum Computers 解决量子计算机上数据库模式匹配的硬变体问题

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611603

Kristin Fritsch, Stefanie Scherzinger

With quantum computers now available as cloud services, there is a global quest for applications where a quantum advantage can be shown. Naturally, data management is a candidate domain. Workable solutions require the design of hybrid quantum algorithms, where a quantum computing unit (a QPU) and classical computing (via CPUs) cooperate towards solving a problem. This demo illustrates such an end-to-end solution targeting NP-hard variants of database schema matching. Our demo is intended to be educational (and hopefully inspiring), allowing participants to explore the critical design decisions, such as the handover between phases of QPU- and CPU-based computation. It will also allow participants to experience hands-on - through playful interaction - how easily problem sizes exceed the limitations of today's QPUs.

随着量子计算机现在可以作为云服务使用，全球都在寻求能够显示量子优势的应用。自然，数据管理是一个候选领域。可行的解决方案需要设计混合量子算法，其中量子计算单元(QPU)和经典计算(通过cpu)合作解决问题。这个演示演示了针对数据库模式匹配的NP-hard变体的端到端解决方案。我们的演示旨在具有教育意义(并希望具有启发性)，允许参与者探索关键的设计决策，例如基于QPU和基于cpu的计算阶段之间的切换。它还将允许参与者亲身体验-通过有趣的互动-如何轻松地超越当今qpu的限制问题的大小。

引用次数: 0

ADOps: An Anomaly Detection Pipeline in Structured Logs 采用:结构化日志异常检测管道

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611618

Xintong Song, Yusen Zhu, Jianfei Wu, Bai Liu, Hongkang Wei

Anomaly detection has been extensively implemented in industry. The reality is that an application may have numerous scenarios where anomalies need to be monitored. However, the complete process of anomaly detection will take much time, including data acquisition, data processing, model training, and model deployment. In particular, some simple scenarios do not require building complex anomaly detection models. This results in a waste of resources. To solve these problems, we build an anomaly detection pipeline(ADOps) to modularize each step. For simple anomaly detection scenarios, no programming is required and new anomaly detection tasks can be created by simply modifying the configuration file. In addition, it can also improve the development efficiency of complex anomaly detection models. We show how users create anomaly detection tasks on the anomaly detection pipeline and how engineers use it to develop anomaly detection models.

异常检测在工业中得到了广泛的应用。实际情况是，应用程序可能有许多需要监视异常情况的场景。但是，异常检测的完整过程需要花费大量的时间，包括数据采集、数据处理、模型训练和模型部署。特别是，一些简单的场景不需要构建复杂的异常检测模型。这导致了资源的浪费。为了解决这些问题，我们构建了一个异常检测管道(ADOps)来模块化每个步骤。对于简单的异常检测场景，不需要编程，只需修改配置文件即可创建新的异常检测任务。此外，它还可以提高复杂异常检测模型的开发效率。我们展示了用户如何在异常检测管道上创建异常检测任务，以及工程师如何使用它来开发异常检测模型。

引用次数: 0

Portals: A Showcase of Multi-Dataflow Stateful Serverless 门户:展示多数据流的无状态服务器

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611619

Jonas Spenger, Chengyang Huang, Philipp Haller, Paris Carbone

Serverless applications spanning the cloud and edge require flexible programming frameworks for expressing compositions across the different levels of deployment. Another critical aspect for applications with state is failure resilience beyond the scope of a single dataflow graph that is the current standard in data streaming systems. This paper presents Portals, an interactive, stateful dataflow composition framework with strong end-to-end guarantees. Portals enables event-driven, resilient applications that span across dataflow graphs and serverless deployments. The demonstration exhibits three scenarios in our multi-dataflow streaming-based system: dynamically composing a stateful serverless application; an interactive cloud and edge serverless application; and a Portals browser playground.

跨越云和边缘的无服务器应用程序需要灵活的编程框架来跨不同部署级别表达组合。具有状态的应用程序的另一个关键方面是超出单个数据流图范围的故障恢复能力，这是数据流系统中的当前标准。本文介绍了portal，它是一个具有强大的端到端保证的交互式、有状态的数据流组合框架。门户支持跨数据流图和无服务器部署的事件驱动的弹性应用程序。该演示展示了基于多数据流的流系统中的三种场景:动态组合一个有状态的无服务器应用程序;交互式云和边缘无服务器应用程序;以及一个门户网站浏览器平台。

引用次数: 1

Kora: A Cloud-Native Event Streaming Platform for Kafka Kora: Kafka的云原生事件流平台

3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment

Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611567

Anna Povzner, Prince Mahajan, Jason Gustafson, Jun Rao, Ismael Juma, Feng Min, Shriram Sridharan, Nikhil Bhatia, Gopi Attaluri, Adithya Chandra, Stanislav Kozlovski, Rajini Sivaram, Lucas Bradstreet, Bob Barrett, Dhruvil Shah, David Jacot, David Arthur, Ron Dagostino, Colin McCabe, Manikumar Reddy Obili, Kowshik Prakasam, Jose Garcia Sancio, Vikas Singh, Alok Nikhil, Kamal Gupta

Event streaming is an increasingly critical infrastructure service used in many industries and there is growing demand for cloud-native solutions. Confluent Cloud provides a massive scale event streaming platform built on top of Apache Kafka with tens of thousands of clusters running in 70+ regions across AWS, Google Cloud, and Azure. This paper introduces Kora , the cloud-native platform for Apache Kafka at the core of Confluent Cloud. We describe Kora's design that enables it to meet its cloud-native goals, such as reliability, elasticity, and cost efficiency. We discuss Kora's abstractions which allow users to think in terms of their workload requirements and not the underlying infrastructure, and we discuss how Kora is designed to provide consistent, predictable performance across cloud environments with diverse capabilities.

事件流是许多行业使用的越来越重要的基础设施服务，对云原生解决方案的需求也在不断增长。Confluent Cloud提供了一个建立在Apache Kafka之上的大规模事件流平台，在AWS、Google Cloud和Azure的70多个区域中运行着数万个集群。本文介绍了作为Confluent Cloud核心的Apache Kafka云原生平台Kora。我们描述了Kora的设计，使其能够满足其云原生目标，例如可靠性、弹性和成本效率。我们讨论了Kora的抽象，它允许用户根据他们的工作负载需求而不是底层基础设施进行思考，我们还讨论了Kora是如何设计的，以便在具有不同功能的云环境中提供一致的、可预测的性能。

{"title":"Kora: A Cloud-Native Event Streaming Platform for Kafka","authors":"Anna Povzner, Prince Mahajan, Jason Gustafson, Jun Rao, Ismael Juma, Feng Min, Shriram Sridharan, Nikhil Bhatia, Gopi Attaluri, Adithya Chandra, Stanislav Kozlovski, Rajini Sivaram, Lucas Bradstreet, Bob Barrett, Dhruvil Shah, David Jacot, David Arthur, Ron Dagostino, Colin McCabe, Manikumar Reddy Obili, Kowshik Prakasam, Jose Garcia Sancio, Vikas Singh, Alok Nikhil, Kamal Gupta","doi":"10.14778/3611540.3611567","DOIUrl":"https://doi.org/10.14778/3611540.3611567","url":null,"abstract":"Event streaming is an increasingly critical infrastructure service used in many industries and there is growing demand for cloud-native solutions. Confluent Cloud provides a massive scale event streaming platform built on top of Apache Kafka with tens of thousands of clusters running in 70+ regions across AWS, Google Cloud, and Azure. This paper introduces Kora , the cloud-native platform for Apache Kafka at the core of Confluent Cloud. We describe Kora's design that enables it to meet its cloud-native goals, such as reliability, elasticity, and cost efficiency. We discuss Kora's abstractions which allow users to think in terms of their workload requirements and not the underlying infrastructure, and we discuss how Kora is designed to provide consistent, predictable performance across cloud environments with diverse capabilities.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135002986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Vldb Endowment

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀