Proc. VLDB Endow.最新文献_第2页

Solver-In-The-Loop Cluster Resource Management for Database-as-a-Service 面向数据库即服务的环内求解器集群资源管理

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3625054.3625062

A. König, Yi Shan, Karan Newatia, Luke Marshall, Vivek R. Narasayya

In Database-as-a-Service (DBaaS) clusters, resource management is a complex optimization problem that assigns tenants to nodes, subject to various constraints and objectives. Tenants share resources within a node, however, their resource demands can change over time and exhibit high variance. As tenants may accumulate large state, moving them to a different node becomes disruptive, making intelligent placement decisions crucial to avoid service disruption. Placement decisions need to account for dynamic changes in tenant resource demands, different causes of service disruption, and various placement constraints, giving rise to a complex search space. In this paper, we show how to bring combinatorial solvers to bear on this problem, formulating the objective of minimizing service disruption as an optimization problem amenable to fast solutions. We implemented our approach in the Service Fabric cluster manager codebase. Experiments show significant reductions in constraint violations and tenant moves, compared to the previous state-of-the-art, including the unmodified Service Fabric cluster manager, as well as recent research on DBaaS tenant placement.

在数据库即服务（DBaaS）集群中，资源管理是一个复杂的优化问题，需要根据各种约束条件和目标将租户分配到节点上。租户共享节点内的资源，但他们的资源需求会随着时间的推移而变化，并表现出很大的差异。由于租户可能会积累大量的状态，将他们转移到不同的节点会造成中断，因此智能的安置决策对于避免服务中断至关重要。放置决策需要考虑租户资源需求的动态变化、服务中断的不同原因以及各种放置限制，这就产生了一个复杂的搜索空间。在本文中，我们展示了如何利用组合求解器来解决这一问题，将服务中断最小化的目标表述为可快速解决的优化问题。我们在 Service Fabric 集群管理器代码库中实施了我们的方法。实验表明，与以前的先进技术（包括未修改的 Service Fabric 集群管理器）以及最近关于 DBaaS 租户安置的研究相比，违反约束和租户移动的情况明显减少。

{"title":"Solver-In-The-Loop Cluster Resource Management for Database-as-a-Service","authors":"A. König, Yi Shan, Karan Newatia, Luke Marshall, Vivek R. Narasayya","doi":"10.14778/3625054.3625062","DOIUrl":"https://doi.org/10.14778/3625054.3625062","url":null,"abstract":"In Database-as-a-Service (DBaaS) clusters, resource management is a complex optimization problem that assigns tenants to nodes, subject to various constraints and objectives. Tenants share resources within a node, however, their resource demands can change over time and exhibit high variance. As tenants may accumulate large state, moving them to a different node becomes disruptive, making intelligent placement decisions crucial to avoid service disruption. Placement decisions need to account for dynamic changes in tenant resource demands, different causes of service disruption, and various placement constraints, giving rise to a complex search space. In this paper, we show how to bring combinatorial solvers to bear on this problem, formulating the objective of minimizing service disruption as an optimization problem amenable to fast solutions. We implemented our approach in the Service Fabric cluster manager codebase. Experiments show significant reductions in constraint violations and tenant moves, compared to the previous state-of-the-art, including the unmodified Service Fabric cluster manager, as well as recent research on DBaaS tenant placement.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"4 1","pages":"4254-4267"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139346916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Catalyst: Optimizing Cache Management for Large In-memory Key-value Systems 催化剂：优化大型内存键值系统的缓存管理

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3625054.3625068

Kefei Wang, Feng Chen

In-memory key-value cache systems, such as Memcached and Redis, are essential in today's data centers. A key mission of such cache systems is to identify the most valuable data for caching. To achieve this, the current system design keeps track of each key-value item's access and attempts to make accurate estimation on its temporal locality. All it aims is to achieve the highest cache hit ratio. However, as cache capacity quickly increases, the overhead of managing metadata for a massive amount of small key-value items rises to an unbearable level. Put it simply, the current fine-grained, heavy-cost approach cannot continue to scale. In this paper, we have performed an experimental study on the scalability challenge of the current key-value cache system design and quantitatively analyzed the inherent issues related to the metadata operations for cache management. We further propose a key-value cache management scheme, called Catalyst , based on a highly efficient metadata structure, which allows us to make effective caching decisions in a scalable way. By offloading non-essential metadata operations to GPU, we can further dedicate the limited CPU and memory resources to the main service operations for improved throughput and latency. We have developed a prototype based on Memcached. Our experimental results show that our scheme can significantly enhance the scalability and improve the cache system performance by a factor of up to 4.3.

内存键值缓存系统（如 Memcached 和 Redis）在当今的数据中心中至关重要。这类缓存系统的一个关键任务是识别最有价值的缓存数据。为实现这一目标，当前的系统设计会跟踪每个键值项的访问情况，并尝试对其时间位置进行精确估算。这样做的目的只是为了达到最高的缓存命中率。然而，随着高速缓存容量的迅速增加，为大量小键值项管理元数据的开销会上升到难以承受的程度。简单地说，目前这种细粒度、高成本的方法无法继续扩展。在本文中，我们对当前键值缓存系统设计的可扩展性挑战进行了实验研究，并定量分析了与缓存管理元数据操作相关的内在问题。我们进一步提出了一种基于高效元数据结构的键值缓存管理方案，称为 "催化剂"（Catalyst），它允许我们以可扩展的方式做出有效的缓存决策。通过将非必要的元数据操作卸载到 GPU，我们可以进一步将有限的 CPU 和内存资源用于主要服务操作，从而提高吞吐量和延迟。我们开发了一个基于 Memcached 的原型。实验结果表明，我们的方案可以显著增强可扩展性，并将缓存系统的性能提高 4.3 倍。

{"title":"Catalyst: Optimizing Cache Management for Large In-memory Key-value Systems","authors":"Kefei Wang, Feng Chen","doi":"10.14778/3625054.3625068","DOIUrl":"https://doi.org/10.14778/3625054.3625068","url":null,"abstract":"In-memory key-value cache systems, such as Memcached and Redis, are essential in today's data centers. A key mission of such cache systems is to identify the most valuable data for caching. To achieve this, the current system design keeps track of each key-value item's access and attempts to make accurate estimation on its temporal locality. All it aims is to achieve the highest cache hit ratio. However, as cache capacity quickly increases, the overhead of managing metadata for a massive amount of small key-value items rises to an unbearable level. Put it simply, the current fine-grained, heavy-cost approach cannot continue to scale. In this paper, we have performed an experimental study on the scalability challenge of the current key-value cache system design and quantitatively analyzed the inherent issues related to the metadata operations for cache management. We further propose a key-value cache management scheme, called Catalyst , based on a highly efficient metadata structure, which allows us to make effective caching decisions in a scalable way. By offloading non-essential metadata operations to GPU, we can further dedicate the limited CPU and memory resources to the main service operations for improved throughput and latency. We have developed a prototype based on Memcached. Our experimental results show that our scheme can significantly enhance the scalability and improve the cache system performance by a factor of up to 4.3.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"54 1","pages":"4339-4352"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139344141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AMNES: Accelerating the computation of data correlation using FPGAs AMNES：利用 FPGA 加速数据相关性计算

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3625054.3625056

Monica Chiosa, Thomas B. Preußer, Michaela Blott, Gustavo Alonso

A widely used approach to characterize input data in both databases and ML is computing the correlation between attributes. The operation is supported by all major database engines and ML platforms. However, it is an expensive operation as the number of attributes involved grows. To address the issue, in this paper we introduce AMNES, a stream analytics system offloading the correlation operator into an FPGA-based network interface card. AMNES processes data at network line rate and the design can be used in combination with smart storage or SmartNICs to implement near data or in-network data processing. AMNES design goes beyond matrix multiplication and offers a customized solution for correlation computation bypassing the CPU. Our experiments show that AMNES can sustain streams arriving at 100 Gbps over an RDMA network, while requiring only ten milliseconds to compute the correlation coefficients among 64 streams, an order of magnitude better than competing CPU or GPU designs.

数据库和 ML 中广泛使用的一种表征输入数据的方法是计算属性之间的相关性。所有主要数据库引擎和 ML 平台都支持这种操作。然而，随着所涉及属性数量的增加，这一操作的成本也会随之增加。为了解决这个问题，我们在本文中介绍了 AMNES，这是一种流分析系统，可将相关运算器卸载到基于 FPGA 的网络接口卡中。AMNES 以网络线路速率处理数据，其设计可与智能存储或 SmartNIC 结合使用，以实现近距离数据或网络内数据处理。AMNES 的设计超越了矩阵乘法，为绕过 CPU 的相关计算提供了定制解决方案。我们的实验表明，AMNES 可以通过 RDMA 网络支持 100 Gbps 的数据流，而计算 64 个数据流之间的相关系数仅需 10 毫秒，比竞争对手的 CPU 或 GPU 设计高出一个数量级。

引用次数: 0

ZIP: Lazy Imputation during Query Processing ZIP：查询处理过程中的懒惰估算

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3617838.3617841

Yiming Lin, S. Mehrotra

This paper develops a query-time missing value imputation framework, entitled ZIP, that modifies relational operators to be imputation aware in order to minimize the joint cost of imputing and query processing. The modified operators use a cost-based decision function to determine whether to invoke imputation or to defer to downstream operators to resolve missing values. The modified query processing logic ensures results with deferred imputations are identical to those produced if all missing values were imputed first. ZIP includes a novel outer-join based approach to preserve missing values during execution, and a bloom filter based index to optimize the space and running overhead. Extensive experiments on both real and synthetic data sets demonstrate 10 to 25 times improvement when augmenting the state-of-the-art technology, ImputeDB, with ZIP-based deferred imputation. ZIP also outperforms the offline approach by up to 19607 times in a real data set.

本文开发了一种名为 ZIP 的查询时缺失值归因框架，它可以修改关系运算符，使其具有归因意识，从而最大限度地降低归因和查询处理的联合成本。修改后的运算符使用基于成本的决策函数来决定是调用估算还是推迟到下游运算符来解决缺失值问题。修改后的查询处理逻辑可确保延迟估算的结果与先估算所有缺失值的结果相同。ZIP 包括一种新颖的基于外连接的方法，用于在执行过程中保留缺失值，以及一种基于 Bloom 过滤器的索引，用于优化空间和运行开销。在真实数据集和合成数据集上进行的大量实验表明，在使用基于 ZIP 的延迟估算技术增强最先进的 ImputeDB 时，效果提高了 10 到 25 倍。在真实数据集中，ZIP 的性能也比离线方法高出 19607 倍。

引用次数: 0

Flexible Resource Allocation for Relational Database-as-a-Service 关系数据库即服务的灵活资源分配

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3625054.3625058

Pankaj Arora, Surajit Chaudhuri, Sudipto Das, Junfeng Dong, Cyril George, Ajay Kalhan, A. König, Willis Lang, Changsong Li, Feng Li, Jiaqi Liu, Lukas M. Maas, Akshay Mata, Ishai Menache, Justin Moeller, Vivek R. Narasayya, Matthaios Olma, Morgan Oslake, Elnaz Rezai, Yi Shan, Manoj Syamala, Shize Xu, Vasileios Zois

Oversubscription is an essential cost management strategy for cloud database providers, and its importance is magnified by the emerging paradigm of serverless databases. In contrast to general purpose techniques used for oversubscription in hypervisors, operating systems and cluster managers, we develop techniques that leverage our understanding of how DBMSs use resources and how resource allocations impact database performance. Our techniques are designed to flexibly redistribute resources across database tenants at the node and cluster levels with low overhead. We have implemented our techniques in a commercial cloud database service: Azure SQL Database. Experiments using microbenchmarks, industry-standard benchmarks and real-world resource usage traces show that using our approach, it is possible to tightly control the impact on database performance even with a relatively high degree of oversubscription.

对于云数据库提供商来说，超额订购是一项重要的成本管理策略，而无服务器数据库这一新兴模式则放大了超额订购的重要性。与管理程序、操作系统和集群管理器中用于超额订购的通用技术不同，我们开发的技术利用了我们对数据库管理系统如何使用资源以及资源分配如何影响数据库性能的理解。我们的技术旨在以较低的开销在节点和集群级别灵活地重新分配数据库租户的资源。我们已经在商业云数据库服务中实现了我们的技术：Azure SQL 数据库。使用微基准、行业标准基准和实际资源使用跟踪进行的实验表明，使用我们的方法，即使超量订阅程度相对较高，也能严格控制对数据库性能的影响。

{"title":"Flexible Resource Allocation for Relational Database-as-a-Service","authors":"Pankaj Arora, Surajit Chaudhuri, Sudipto Das, Junfeng Dong, Cyril George, Ajay Kalhan, A. König, Willis Lang, Changsong Li, Feng Li, Jiaqi Liu, Lukas M. Maas, Akshay Mata, Ishai Menache, Justin Moeller, Vivek R. Narasayya, Matthaios Olma, Morgan Oslake, Elnaz Rezai, Yi Shan, Manoj Syamala, Shize Xu, Vasileios Zois","doi":"10.14778/3625054.3625058","DOIUrl":"https://doi.org/10.14778/3625054.3625058","url":null,"abstract":"Oversubscription is an essential cost management strategy for cloud database providers, and its importance is magnified by the emerging paradigm of serverless databases. In contrast to general purpose techniques used for oversubscription in hypervisors, operating systems and cluster managers, we develop techniques that leverage our understanding of how DBMSs use resources and how resource allocations impact database performance. Our techniques are designed to flexibly redistribute resources across database tenants at the node and cluster levels with low overhead. We have implemented our techniques in a commercial cloud database service: Azure SQL Database. Experiments using microbenchmarks, industry-standard benchmarks and real-world resource usage traces show that using our approach, it is possible to tightly control the impact on database performance even with a relatively high degree of oversubscription.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"141 1","pages":"4202-4215"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139346437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Host Profit Maximization: Leveraging Performance Incentives and User Flexibility 主机利润最大化：利用绩效激励和用户灵活性

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3617838.3617843

Xueqin Chang, Xiangyu Ke, Lu Chen, Congcong Ge, Ziheng Wei, Yunjun Gao

The social network host has knowledge of the network structure and user characteristics and can earn a profit by providing merchants with viral marketing campaigns. We investigate the problem of host profit maximization by leveraging performance incentives and user flexibility. To incentivize the host's performance, we propose setting a desired influence threshold that would allow the host to receive full payment, with the possibility of a small bonus for exceeding the threshold. Unlike existing works that assume a user's choice is frozen once they are activated, we introduce the Dynamic State Switching model to capture "comparative shopping" behavior from an economic perspective, in which users have the flexibilities to change their minds about which product to adopt based on the accumulated influence and propaganda strength of each product. In addition, the incentivized cost of a user serving as an influence source is treated as a negative part of the host's profit. The host profit maximization problem is NP-hard, submodular, and non-monotone. To address this challenge, we propose an efficient greedy algorithm and devise a scalable version with an approximation guarantee to select the seed sets. As a side contribution, we develop two seed allocation algorithms to balance the distribution of adoptions among merchants with small profit sacrifice. Through extensive experiments on four real-world social networks, we demonstrate that our methods are effective and scalable.

社交网络主机了解网络结构和用户特征，可以通过向商家提供病毒式营销活动来赚取利润。我们利用绩效激励和用户灵活性来研究主机利润最大化问题。为了激励主机的表现，我们建议设定一个理想的影响力阈值，使主机可以获得全额付款，如果超过阈值，主机还可以获得小额奖金。与假定用户的选择一旦被激活就会冻结的现有著作不同，我们引入了动态状态切换模型，从经济学角度捕捉 "比较购物 "行为，即用户可以根据每种产品累积的影响力和宣传力度，灵活改变采用哪种产品的主意。此外，用户作为影响力来源的激励成本被视为主机利润的负部分。主机利润最大化问题是一个 NP 难、亚模性和非单调的问题。为了应对这一挑战，我们提出了一种高效的贪婪算法，并设计了一种具有近似保证的可扩展版本来选择种子集。作为附带贡献，我们还开发了两种种子分配算法，以在牺牲较小利润的情况下平衡商家之间的采用分布。通过在四个真实社交网络上的广泛实验，我们证明了我们的方法是有效和可扩展的。

{"title":"Host Profit Maximization: Leveraging Performance Incentives and User Flexibility","authors":"Xueqin Chang, Xiangyu Ke, Lu Chen, Congcong Ge, Ziheng Wei, Yunjun Gao","doi":"10.14778/3617838.3617843","DOIUrl":"https://doi.org/10.14778/3617838.3617843","url":null,"abstract":"The social network host has knowledge of the network structure and user characteristics and can earn a profit by providing merchants with viral marketing campaigns. We investigate the problem of host profit maximization by leveraging performance incentives and user flexibility. To incentivize the host's performance, we propose setting a desired influence threshold that would allow the host to receive full payment, with the possibility of a small bonus for exceeding the threshold. Unlike existing works that assume a user's choice is frozen once they are activated, we introduce the Dynamic State Switching model to capture \"comparative shopping\" behavior from an economic perspective, in which users have the flexibilities to change their minds about which product to adopt based on the accumulated influence and propaganda strength of each product. In addition, the incentivized cost of a user serving as an influence source is treated as a negative part of the host's profit. The host profit maximization problem is NP-hard, submodular, and non-monotone. To address this challenge, we propose an efficient greedy algorithm and devise a scalable version with an approximation guarantee to select the seed sets. As a side contribution, we develop two seed allocation algorithms to balance the distribution of adoptions among merchants with small profit sacrifice. Through extensive experiments on four real-world social networks, we demonstrate that our methods are effective and scalable.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"19 1","pages":"51-64"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139344906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Dynamic Weighted Set Sampling and Its Extension 高效动态加权集合采样及其扩展

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3617838.3617840

Fangyuan Zhang, Mengxu Jiang, Sibo Wang

Given a weighted set S of n elements, weighted set sampling (WSS) samples an element in S so that each element a i ; is sampled with a probability proportional to its weight w ( a i ). The classic alias method pre-processes an index in O ( n ) time with O ( n ) space and handles WSS with O (1) time. Yet, the alias method does not support dynamic updates. By minor modifications of existing dynamic WSS schemes, it is possible to achieve an expected O (1) update time and draw t independent samples in expected O ( t ) time with linear space, which is theoretically optimal. But such a method is impractical and even slower than a binary search tree-based solution. How to support both efficient sampling and updates in practice is still challenging. Motivated by this, we design BUS , an efficient scheme that handles an update in O (1) amortized time and draws t independent samples in O (log n + t) time with linear space. A natural extension of WSS is the weighted independent range sampling (WIRS) , where each element in S is a data point from R. Given an arbitrary range Q = [ℓ, r ] at query time, WIRS aims to do weighted set sampling on the set S Q of data points falling into range Q. We show that by integrating the theoretically optimal dynamic WSS scheme mentioned above, it can handle an update in O (log n ) time and can draw t independent samples for WIRS in O (log n + t ) time, the same as the state-of-the-art static algorithm. Again, such a solution by integrating the optimal dynamic WSS scheme is still impractical to handle WIRS queries. We further propose WIRS-BUS to integrate BUS to handle WIRS queries, which handles each update in O (log n ) time and draws t independent samples in O (log 2 n + t ) time with linear space. Extensive experiments show that our BUS and WIRS-BUS are efficient for both sampling and updates.

给定一个包含 n 个元素的加权集合 S，加权集合采样（WSS）对 S 中的元素进行采样，这样每个元素 a i ；被采样的概率与其权重 w ( a i ) 成正比。经典的别名法用 O ( n ) 的时间和 O ( n ) 的空间预处理索引，用 O (1) 的时间处理 WSS。然而，别名法不支持动态更新。通过对现有的动态 WSS 方案稍作修改，可以实现预期 O (1) 更新时间，并在预期 O ( t ) 时间内用线性空间绘制 t 个独立样本，这在理论上是最优的。但这种方法并不实用，甚至比基于二叉搜索树的解决方案更慢。如何在实践中同时支持高效采样和更新仍是一个挑战。受此启发，我们设计了一种高效方案 BUS，它能在 O (1) 个摊销时间内处理更新，并在 O (log n + t) 个线性空间内抽取 t 个独立样本。给定查询时的任意范围 Q = [ℓ, r ]，WIRS 的目的是对范围 Q 中的数据点集合 S Q 进行加权集采样。我们的研究表明，通过整合上述理论上最优的动态 WSS 方案，它可以在 O (log n ) 时间内处理一次更新，并在 O (log n + t ) 时间内为 WIRS 绘制 t 个独立样本，与最先进的静态算法相同。同样，这种通过整合最优动态 WSS 方案来处理 WIRS 查询的解决方案仍然不切实际。我们进一步提出了 WIRS-BUS，以整合 BUS 来处理 WIRS 查询，它能在 O (log n ) 时间内处理每次更新，并在 O (log 2 n + t ) 时间内以线性空间绘制 t 个独立样本。大量实验表明，我们的 BUS 和 WIRS-BUS 在采样和更新方面都很高效。

{"title":"Efficient Dynamic Weighted Set Sampling and Its Extension","authors":"Fangyuan Zhang, Mengxu Jiang, Sibo Wang","doi":"10.14778/3617838.3617840","DOIUrl":"https://doi.org/10.14778/3617838.3617840","url":null,"abstract":"Given a weighted set S of n elements, weighted set sampling (WSS) samples an element in S so that each element a i ; is sampled with a probability proportional to its weight w ( a i ). The classic alias method pre-processes an index in O ( n ) time with O ( n ) space and handles WSS with O (1) time. Yet, the alias method does not support dynamic updates. By minor modifications of existing dynamic WSS schemes, it is possible to achieve an expected O (1) update time and draw t independent samples in expected O ( t ) time with linear space, which is theoretically optimal. But such a method is impractical and even slower than a binary search tree-based solution. How to support both efficient sampling and updates in practice is still challenging. Motivated by this, we design BUS , an efficient scheme that handles an update in O (1) amortized time and draws t independent samples in O (log n + t) time with linear space. A natural extension of WSS is the weighted independent range sampling (WIRS) , where each element in S is a data point from R. Given an arbitrary range Q = [ℓ, r ] at query time, WIRS aims to do weighted set sampling on the set S Q of data points falling into range Q. We show that by integrating the theoretically optimal dynamic WSS scheme mentioned above, it can handle an update in O (log n ) time and can draw t independent samples for WIRS in O (log n + t ) time, the same as the state-of-the-art static algorithm. Again, such a solution by integrating the optimal dynamic WSS scheme is still impractical to handle WIRS queries. We further propose WIRS-BUS to integrate BUS to handle WIRS queries, which handles each update in O (log n ) time and draws t independent samples in O (log 2 n + t ) time with linear space. Extensive experiments show that our BUS and WIRS-BUS are efficient for both sampling and updates.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"3 1","pages":"15-27"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139343953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing Data Pipelines for Machine Learning in Feature Stores 在特征库中优化机器学习的数据管道

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3625054.3625060

Rui Liu, Kwanghyun Park, Fotis Psallidas, Xiaoyong Zhu, Jinghui Mo, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos, Yuanyuan Tian, Jesús Camacho-Rodríguez

Data pipelines (i.e., converting raw data to features) are critical for machine learning (ML) models, yet their development and management is time-consuming. Feature stores have recently emerged as a new "DBMS-for-ML" with the premise of enabling data scientists and engineers to define and manage their data pipelines. While current feature stores fulfill their promise from a functionality perspective, they are resource-hungry---with ample opportunities for implementing database-style optimizations to enhance their performance. In this paper, we propose a novel set of optimizations specifically targeted for point-in-time join, which is a critical operation in data pipelines. We implement these optimizations on top of Feathr: a widely-used feature store, and evaluate them on use cases from both the TPCx-AI benchmark and real-world online retail scenarios. Our thorough experimental analysis shows that our optimizations can accelerate data pipelines by up to 3× over state-of-the-art baselines.

数据管道（即将原始数据转换为特征）对于机器学习（ML）模型至关重要，但其开发和管理却非常耗时。最近，特征库作为一种新的 "DBMS-for-ML "出现了，其前提是让数据科学家和工程师能够定义和管理他们的数据管道。虽然从功能角度看，当前的特征库实现了它们的承诺，但它们却非常耗费资源--有大量机会实施数据库式的优化来提高它们的性能。在本文中，我们提出了一套新颖的优化方案，专门针对数据管道中的关键操作--时间点连接。我们在广泛使用的特征存储 Feathr 上实现了这些优化，并在 TPCx-AI 基准和真实世界在线零售场景的使用案例中对其进行了评估。全面的实验分析表明，与最先进的基线相比，我们的优化能将数据管道的速度提高 3 倍。

引用次数: 0

FedGTA: Topology-aware Averaging for Federated Graph Learning FedGTA：拓扑感知平均法用于联盟图学习

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3617838.3617842

Xunkai Li, Zhengyu Wu, Wentao Zhang, Yinlin Zhu, Ronghua Li, Guoren Wang

Federated Graph Learning (FGL) is a distributed machine learning paradigm that enables collaborative training on large-scale subgraphs across multiple local systems. Existing FGL studies fall into two categories: (i) FGL Optimization, which improves multi-client training in existing machine learning models; (ii) FGL Model, which enhances performance with complex local models and multi-client interactions. However, most FGL optimization strategies are designed specifically for the computer vision domain and ignore graph structure, presenting dissatisfied performance and slow convergence. Meanwhile, complex local model architectures in FGL Models studies lack scalability for handling large-scale subgraphs and have deployment limitations. To address these issues, we propose Federated Graph Topology-aware Aggregation (FedGTA), a personalized optimization strategy that optimizes through topology-aware local smoothing confidence and mixed neighbor features. During experiments, we deploy FedGTA in 12 multi-scale real-world datasets with the Louvain and Metis split. This allows us to evaluate the performance and robustness of FedGTA across a range of scenarios. Extensive experiments demonstrate that FedGTA achieves state-of-the-art performance while exhibiting high scalability and efficiency. The experiment includes ogbn-papers100M, the most representative large-scale graph database so that we can verify the applicability of our method to large-scale graph learning. To the best of our knowledge, our study is the first to bridge large-scale graph learning with FGL using this optimization strategy, contributing to the development of efficient and scalable FGL methods.

联合图学习（FGL）是一种分布式机器学习范式，可在多个本地系统的大规模子图上进行协作训练。现有的 FGL 研究分为两类：(i) FGL 优化，用于改进现有机器学习模型中的多客户端训练；(ii) FGL 模型，用于提高复杂本地模型和多客户端交互的性能。然而，大多数 FGL 优化策略都是专为计算机视觉领域设计的，忽略了图结构，因此性能不尽人意，收敛速度较慢。同时，FGL 模型研究中的复杂局部模型架构缺乏处理大规模子图的可扩展性，并且存在部署限制。为了解决这些问题，我们提出了联邦图拓扑感知聚合（FedGTA），这是一种个性化优化策略，通过拓扑感知局部平滑置信度和混合邻居特征进行优化。在实验过程中，我们在 12 个多尺度真实世界数据集中部署了 FedGTA，这些数据集包括卢万数据集和 Metis 数据集。这使我们能够评估 FedGTA 在各种情况下的性能和鲁棒性。广泛的实验证明，FedGTA 实现了最先进的性能，同时表现出很高的可扩展性和效率。实验包括最具代表性的大规模图数据库 ogbn-papers100M，这样我们就能验证我们的方法在大规模图学习中的适用性。据我们所知，我们的研究是第一项利用这种优化策略将大规模图学习与 FGL 联系起来的研究，为开发高效、可扩展的 FGL 方法做出了贡献。

{"title":"FedGTA: Topology-aware Averaging for Federated Graph Learning","authors":"Xunkai Li, Zhengyu Wu, Wentao Zhang, Yinlin Zhu, Ronghua Li, Guoren Wang","doi":"10.14778/3617838.3617842","DOIUrl":"https://doi.org/10.14778/3617838.3617842","url":null,"abstract":"Federated Graph Learning (FGL) is a distributed machine learning paradigm that enables collaborative training on large-scale subgraphs across multiple local systems. Existing FGL studies fall into two categories: (i) FGL Optimization, which improves multi-client training in existing machine learning models; (ii) FGL Model, which enhances performance with complex local models and multi-client interactions. However, most FGL optimization strategies are designed specifically for the computer vision domain and ignore graph structure, presenting dissatisfied performance and slow convergence. Meanwhile, complex local model architectures in FGL Models studies lack scalability for handling large-scale subgraphs and have deployment limitations. To address these issues, we propose Federated Graph Topology-aware Aggregation (FedGTA), a personalized optimization strategy that optimizes through topology-aware local smoothing confidence and mixed neighbor features. During experiments, we deploy FedGTA in 12 multi-scale real-world datasets with the Louvain and Metis split. This allows us to evaluate the performance and robustness of FedGTA across a range of scenarios. Extensive experiments demonstrate that FedGTA achieves state-of-the-art performance while exhibiting high scalability and efficiency. The experiment includes ogbn-papers100M, the most representative large-scale graph database so that we can verify the applicability of our method to large-scale graph learning. To the best of our knowledge, our study is the first to bridge large-scale graph learning with FGL using this optimization strategy, contributing to the development of efficient and scalable FGL methods.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"116 1","pages":"41-50"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139346945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DecLog: Decentralized Logging in Non-Volatile Memory for Time Series Database Systems DecLog：时间序列数据库系统非易失性内存中的分散式日志记录

Proc. VLDB Endow.

Pub Date : 2023-09-01 DOI: 10.14778/3617838.3617839

Bolong Zheng, Yongyong Gao, J. Wan, Lingsen Yan, Long Hu, Bo Liu, Yunjun Gao, Xiaofang Zhou, Christian S. Jensen

Growing demands for the efficient processing of extreme-scale time series workloads call for more capable time series database management systems (TSDBMS). Specifically, to maintain consistency and durability of transaction processing, systems employ write-ahead logging (WAL) whereby transactions are committed only after the related log entries are flushed to disk. However, when faced with massive I/O, this becomes a throughput bottleneck. Recent advances in byte-addressable Non-Volatile Memory (NVM) provide opportunities to improve logging performance by persisting logs to NVM instead. Existing studies typically track complex transaction dependencies and use barrier instructions of NVM to ensure log ordering. In contrast, few studies consider the heavy-tailed characteristics of time series workloads, where most transactions are independent of each other. We propose DecLog, a decentralized NVM-based logging system that enables concurrent logging of TSDBMS transactions. Specifically, we propose data-driven log sequence numbering and relaxed ordering strategies to track transaction dependencies and resolve serialization issues. We also propose a parallel logging method to persist logs to NVM after being compressed and aligned. An experimental study on the YCSB-TS benchmark offers insight into the performance properties of DecLog, showing that it improves throughput by up to 4.6× while offering lower recovery time in comparison to the open source TSDBMS Beringei.

高效处理超大规模时间序列工作负载的需求日益增长，这就需要功能更强大的时间序列数据库管理系统（TSDBMS）。具体来说，为了保持事务处理的一致性和持久性，系统采用了先写日志（WAL）技术，即只有在相关日志条目刷新到磁盘后才提交事务。然而，当面临大量 I/O 时，这就成了吞吐量的瓶颈。字节可寻址非易失性存储器（NVM）的最新进展为通过将日志持久化到 NVM 来提高日志性能提供了机会。现有研究通常会跟踪复杂的事务依赖关系，并使用 NVM 的障碍指令来确保日志排序。相比之下，很少有研究考虑到时间序列工作负载的重尾特性，即大多数事务是相互独立的。我们提出的 DecLog 是一种基于 NVM 的分散式日志系统，可实现 TSDBMS 事务的并发日志记录。具体来说，我们提出了数据驱动的日志序列编号和宽松排序策略，以跟踪事务依赖性并解决序列化问题。我们还提出了一种并行日志记录方法，可在压缩和对齐后将日志持续记录到 NVM 中。通过对 YCSB-TS 基准的实验研究，我们深入了解了 DecLog 的性能特性，结果表明与开源 TSDBMS Beringei 相比，DecLog 的吞吐量提高了 4.6 倍，同时恢复时间更短。

{"title":"DecLog: Decentralized Logging in Non-Volatile Memory for Time Series Database Systems","authors":"Bolong Zheng, Yongyong Gao, J. Wan, Lingsen Yan, Long Hu, Bo Liu, Yunjun Gao, Xiaofang Zhou, Christian S. Jensen","doi":"10.14778/3617838.3617839","DOIUrl":"https://doi.org/10.14778/3617838.3617839","url":null,"abstract":"Growing demands for the efficient processing of extreme-scale time series workloads call for more capable time series database management systems (TSDBMS). Specifically, to maintain consistency and durability of transaction processing, systems employ write-ahead logging (WAL) whereby transactions are committed only after the related log entries are flushed to disk. However, when faced with massive I/O, this becomes a throughput bottleneck. Recent advances in byte-addressable Non-Volatile Memory (NVM) provide opportunities to improve logging performance by persisting logs to NVM instead. Existing studies typically track complex transaction dependencies and use barrier instructions of NVM to ensure log ordering. In contrast, few studies consider the heavy-tailed characteristics of time series workloads, where most transactions are independent of each other. We propose DecLog, a decentralized NVM-based logging system that enables concurrent logging of TSDBMS transactions. Specifically, we propose data-driven log sequence numbering and relaxed ordering strategies to track transaction dependencies and resolve serialization issues. We also propose a parallel logging method to persist logs to NVM after being compressed and aligned. An experimental study on the YCSB-TS benchmark offers insight into the performance properties of DecLog, showing that it improves throughput by up to 4.6× while offering lower recovery time in comparison to the open source TSDBMS Beringei.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"12 1","pages":"1-14"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139346928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0