Pub Date : 2023-09-01DOI: 10.14778/3625054.3625062
A. König, Yi Shan, Karan Newatia, Luke Marshall, Vivek R. Narasayya
In Database-as-a-Service (DBaaS) clusters, resource management is a complex optimization problem that assigns tenants to nodes, subject to various constraints and objectives. Tenants share resources within a node, however, their resource demands can change over time and exhibit high variance. As tenants may accumulate large state, moving them to a different node becomes disruptive, making intelligent placement decisions crucial to avoid service disruption. Placement decisions need to account for dynamic changes in tenant resource demands, different causes of service disruption, and various placement constraints, giving rise to a complex search space. In this paper, we show how to bring combinatorial solvers to bear on this problem, formulating the objective of minimizing service disruption as an optimization problem amenable to fast solutions. We implemented our approach in the Service Fabric cluster manager codebase. Experiments show significant reductions in constraint violations and tenant moves, compared to the previous state-of-the-art, including the unmodified Service Fabric cluster manager, as well as recent research on DBaaS tenant placement.
在数据库即服务(DBaaS)集群中,资源管理是一个复杂的优化问题,需要根据各种约束条件和目标将租户分配到节点上。租户共享节点内的资源,但他们的资源需求会随着时间的推移而变化,并表现出很大的差异。由于租户可能会积累大量的状态,将他们转移到不同的节点会造成中断,因此智能的安置决策对于避免服务中断至关重要。放置决策需要考虑租户资源需求的动态变化、服务中断的不同原因以及各种放置限制,这就产生了一个复杂的搜索空间。 在本文中,我们展示了如何利用组合求解器来解决这一问题,将服务中断最小化的目标表述为可快速解决的优化问题。我们在 Service Fabric 集群管理器代码库中实施了我们的方法。实验表明,与以前的先进技术(包括未修改的 Service Fabric 集群管理器)以及最近关于 DBaaS 租户安置的研究相比,违反约束和租户移动的情况明显减少。
{"title":"Solver-In-The-Loop Cluster Resource Management for Database-as-a-Service","authors":"A. König, Yi Shan, Karan Newatia, Luke Marshall, Vivek R. Narasayya","doi":"10.14778/3625054.3625062","DOIUrl":"https://doi.org/10.14778/3625054.3625062","url":null,"abstract":"In Database-as-a-Service (DBaaS) clusters, resource management is a complex optimization problem that assigns tenants to nodes, subject to various constraints and objectives. Tenants share resources within a node, however, their resource demands can change over time and exhibit high variance. As tenants may accumulate large state, moving them to a different node becomes disruptive, making intelligent placement decisions crucial to avoid service disruption. Placement decisions need to account for dynamic changes in tenant resource demands, different causes of service disruption, and various placement constraints, giving rise to a complex search space. In this paper, we show how to bring combinatorial solvers to bear on this problem, formulating the objective of minimizing service disruption as an optimization problem amenable to fast solutions. We implemented our approach in the Service Fabric cluster manager codebase. Experiments show significant reductions in constraint violations and tenant moves, compared to the previous state-of-the-art, including the unmodified Service Fabric cluster manager, as well as recent research on DBaaS tenant placement.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"4 1","pages":"4254-4267"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139346916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.14778/3625054.3625068
Kefei Wang, Feng Chen
In-memory key-value cache systems, such as Memcached and Redis, are essential in today's data centers. A key mission of such cache systems is to identify the most valuable data for caching. To achieve this, the current system design keeps track of each key-value item's access and attempts to make accurate estimation on its temporal locality. All it aims is to achieve the highest cache hit ratio. However, as cache capacity quickly increases, the overhead of managing metadata for a massive amount of small key-value items rises to an unbearable level. Put it simply, the current fine-grained, heavy-cost approach cannot continue to scale. In this paper, we have performed an experimental study on the scalability challenge of the current key-value cache system design and quantitatively analyzed the inherent issues related to the metadata operations for cache management. We further propose a key-value cache management scheme, called Catalyst , based on a highly efficient metadata structure, which allows us to make effective caching decisions in a scalable way. By offloading non-essential metadata operations to GPU, we can further dedicate the limited CPU and memory resources to the main service operations for improved throughput and latency. We have developed a prototype based on Memcached. Our experimental results show that our scheme can significantly enhance the scalability and improve the cache system performance by a factor of up to 4.3.
{"title":"Catalyst: Optimizing Cache Management for Large In-memory Key-value Systems","authors":"Kefei Wang, Feng Chen","doi":"10.14778/3625054.3625068","DOIUrl":"https://doi.org/10.14778/3625054.3625068","url":null,"abstract":"In-memory key-value cache systems, such as Memcached and Redis, are essential in today's data centers. A key mission of such cache systems is to identify the most valuable data for caching. To achieve this, the current system design keeps track of each key-value item's access and attempts to make accurate estimation on its temporal locality. All it aims is to achieve the highest cache hit ratio. However, as cache capacity quickly increases, the overhead of managing metadata for a massive amount of small key-value items rises to an unbearable level. Put it simply, the current fine-grained, heavy-cost approach cannot continue to scale. In this paper, we have performed an experimental study on the scalability challenge of the current key-value cache system design and quantitatively analyzed the inherent issues related to the metadata operations for cache management. We further propose a key-value cache management scheme, called Catalyst , based on a highly efficient metadata structure, which allows us to make effective caching decisions in a scalable way. By offloading non-essential metadata operations to GPU, we can further dedicate the limited CPU and memory resources to the main service operations for improved throughput and latency. We have developed a prototype based on Memcached. Our experimental results show that our scheme can significantly enhance the scalability and improve the cache system performance by a factor of up to 4.3.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"54 1","pages":"4339-4352"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139344141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.14778/3625054.3625056
Monica Chiosa, Thomas B. Preußer, Michaela Blott, Gustavo Alonso
A widely used approach to characterize input data in both databases and ML is computing the correlation between attributes. The operation is supported by all major database engines and ML platforms. However, it is an expensive operation as the number of attributes involved grows. To address the issue, in this paper we introduce AMNES, a stream analytics system offloading the correlation operator into an FPGA-based network interface card. AMNES processes data at network line rate and the design can be used in combination with smart storage or SmartNICs to implement near data or in-network data processing. AMNES design goes beyond matrix multiplication and offers a customized solution for correlation computation bypassing the CPU. Our experiments show that AMNES can sustain streams arriving at 100 Gbps over an RDMA network, while requiring only ten milliseconds to compute the correlation coefficients among 64 streams, an order of magnitude better than competing CPU or GPU designs.
数据库和 ML 中广泛使用的一种表征输入数据的方法是计算属性之间的相关性。所有主要数据库引擎和 ML 平台都支持这种操作。然而,随着所涉及属性数量的增加,这一操作的成本也会随之增加。为了解决这个问题,我们在本文中介绍了 AMNES,这是一种流分析系统,可将相关运算器卸载到基于 FPGA 的网络接口卡中。AMNES 以网络线路速率处理数据,其设计可与智能存储或 SmartNIC 结合使用,以实现近距离数据或网络内数据处理。AMNES 的设计超越了矩阵乘法,为绕过 CPU 的相关计算提供了定制解决方案。我们的实验表明,AMNES 可以通过 RDMA 网络支持 100 Gbps 的数据流,而计算 64 个数据流之间的相关系数仅需 10 毫秒,比竞争对手的 CPU 或 GPU 设计高出一个数量级。
{"title":"AMNES: Accelerating the computation of data correlation using FPGAs","authors":"Monica Chiosa, Thomas B. Preußer, Michaela Blott, Gustavo Alonso","doi":"10.14778/3625054.3625056","DOIUrl":"https://doi.org/10.14778/3625054.3625056","url":null,"abstract":"A widely used approach to characterize input data in both databases and ML is computing the correlation between attributes. The operation is supported by all major database engines and ML platforms. However, it is an expensive operation as the number of attributes involved grows. To address the issue, in this paper we introduce AMNES, a stream analytics system offloading the correlation operator into an FPGA-based network interface card. AMNES processes data at network line rate and the design can be used in combination with smart storage or SmartNICs to implement near data or in-network data processing. AMNES design goes beyond matrix multiplication and offers a customized solution for correlation computation bypassing the CPU. Our experiments show that AMNES can sustain streams arriving at 100 Gbps over an RDMA network, while requiring only ten milliseconds to compute the correlation coefficients among 64 streams, an order of magnitude better than competing CPU or GPU designs.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"196 1","pages":"4174-4187"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139344269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.14778/3617838.3617841
Yiming Lin, S. Mehrotra
This paper develops a query-time missing value imputation framework, entitled ZIP, that modifies relational operators to be imputation aware in order to minimize the joint cost of imputing and query processing. The modified operators use a cost-based decision function to determine whether to invoke imputation or to defer to downstream operators to resolve missing values. The modified query processing logic ensures results with deferred imputations are identical to those produced if all missing values were imputed first. ZIP includes a novel outer-join based approach to preserve missing values during execution, and a bloom filter based index to optimize the space and running overhead. Extensive experiments on both real and synthetic data sets demonstrate 10 to 25 times improvement when augmenting the state-of-the-art technology, ImputeDB, with ZIP-based deferred imputation. ZIP also outperforms the offline approach by up to 19607 times in a real data set.
本文开发了一种名为 ZIP 的查询时缺失值归因框架,它可以修改关系运算符,使其具有归因意识,从而最大限度地降低归因和查询处理的联合成本。修改后的运算符使用基于成本的决策函数来决定是调用估算还是推迟到下游运算符来解决缺失值问题。修改后的查询处理逻辑可确保延迟估算的结果与先估算所有缺失值的结果相同。ZIP 包括一种新颖的基于外连接的方法,用于在执行过程中保留缺失值,以及一种基于 Bloom 过滤器的索引,用于优化空间和运行开销。在真实数据集和合成数据集上进行的大量实验表明,在使用基于 ZIP 的延迟估算技术增强最先进的 ImputeDB 时,效果提高了 10 到 25 倍。在真实数据集中,ZIP 的性能也比离线方法高出 19607 倍。
{"title":"ZIP: Lazy Imputation during Query Processing","authors":"Yiming Lin, S. Mehrotra","doi":"10.14778/3617838.3617841","DOIUrl":"https://doi.org/10.14778/3617838.3617841","url":null,"abstract":"This paper develops a query-time missing value imputation framework, entitled ZIP, that modifies relational operators to be imputation aware in order to minimize the joint cost of imputing and query processing. The modified operators use a cost-based decision function to determine whether to invoke imputation or to defer to downstream operators to resolve missing values. The modified query processing logic ensures results with deferred imputations are identical to those produced if all missing values were imputed first. ZIP includes a novel outer-join based approach to preserve missing values during execution, and a bloom filter based index to optimize the space and running overhead. Extensive experiments on both real and synthetic data sets demonstrate 10 to 25 times improvement when augmenting the state-of-the-art technology, ImputeDB, with ZIP-based deferred imputation. ZIP also outperforms the offline approach by up to 19607 times in a real data set.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"123 1","pages":"28-40"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139344504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.14778/3625054.3625058
Pankaj Arora, Surajit Chaudhuri, Sudipto Das, Junfeng Dong, Cyril George, Ajay Kalhan, A. König, Willis Lang, Changsong Li, Feng Li, Jiaqi Liu, Lukas M. Maas, Akshay Mata, Ishai Menache, Justin Moeller, Vivek R. Narasayya, Matthaios Olma, Morgan Oslake, Elnaz Rezai, Yi Shan, Manoj Syamala, Shize Xu, Vasileios Zois
Oversubscription is an essential cost management strategy for cloud database providers, and its importance is magnified by the emerging paradigm of serverless databases. In contrast to general purpose techniques used for oversubscription in hypervisors, operating systems and cluster managers, we develop techniques that leverage our understanding of how DBMSs use resources and how resource allocations impact database performance. Our techniques are designed to flexibly redistribute resources across database tenants at the node and cluster levels with low overhead. We have implemented our techniques in a commercial cloud database service: Azure SQL Database. Experiments using microbenchmarks, industry-standard benchmarks and real-world resource usage traces show that using our approach, it is possible to tightly control the impact on database performance even with a relatively high degree of oversubscription.
{"title":"Flexible Resource Allocation for Relational Database-as-a-Service","authors":"Pankaj Arora, Surajit Chaudhuri, Sudipto Das, Junfeng Dong, Cyril George, Ajay Kalhan, A. König, Willis Lang, Changsong Li, Feng Li, Jiaqi Liu, Lukas M. Maas, Akshay Mata, Ishai Menache, Justin Moeller, Vivek R. Narasayya, Matthaios Olma, Morgan Oslake, Elnaz Rezai, Yi Shan, Manoj Syamala, Shize Xu, Vasileios Zois","doi":"10.14778/3625054.3625058","DOIUrl":"https://doi.org/10.14778/3625054.3625058","url":null,"abstract":"Oversubscription is an essential cost management strategy for cloud database providers, and its importance is magnified by the emerging paradigm of serverless databases. In contrast to general purpose techniques used for oversubscription in hypervisors, operating systems and cluster managers, we develop techniques that leverage our understanding of how DBMSs use resources and how resource allocations impact database performance. Our techniques are designed to flexibly redistribute resources across database tenants at the node and cluster levels with low overhead. We have implemented our techniques in a commercial cloud database service: Azure SQL Database. Experiments using microbenchmarks, industry-standard benchmarks and real-world resource usage traces show that using our approach, it is possible to tightly control the impact on database performance even with a relatively high degree of oversubscription.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"141 1","pages":"4202-4215"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139346437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The social network host has knowledge of the network structure and user characteristics and can earn a profit by providing merchants with viral marketing campaigns. We investigate the problem of host profit maximization by leveraging performance incentives and user flexibility. To incentivize the host's performance, we propose setting a desired influence threshold that would allow the host to receive full payment, with the possibility of a small bonus for exceeding the threshold. Unlike existing works that assume a user's choice is frozen once they are activated, we introduce the Dynamic State Switching model to capture "comparative shopping" behavior from an economic perspective, in which users have the flexibilities to change their minds about which product to adopt based on the accumulated influence and propaganda strength of each product. In addition, the incentivized cost of a user serving as an influence source is treated as a negative part of the host's profit. The host profit maximization problem is NP-hard, submodular, and non-monotone. To address this challenge, we propose an efficient greedy algorithm and devise a scalable version with an approximation guarantee to select the seed sets. As a side contribution, we develop two seed allocation algorithms to balance the distribution of adoptions among merchants with small profit sacrifice. Through extensive experiments on four real-world social networks, we demonstrate that our methods are effective and scalable.
{"title":"Host Profit Maximization: Leveraging Performance Incentives and User Flexibility","authors":"Xueqin Chang, Xiangyu Ke, Lu Chen, Congcong Ge, Ziheng Wei, Yunjun Gao","doi":"10.14778/3617838.3617843","DOIUrl":"https://doi.org/10.14778/3617838.3617843","url":null,"abstract":"The social network host has knowledge of the network structure and user characteristics and can earn a profit by providing merchants with viral marketing campaigns. We investigate the problem of host profit maximization by leveraging performance incentives and user flexibility. To incentivize the host's performance, we propose setting a desired influence threshold that would allow the host to receive full payment, with the possibility of a small bonus for exceeding the threshold. Unlike existing works that assume a user's choice is frozen once they are activated, we introduce the Dynamic State Switching model to capture \"comparative shopping\" behavior from an economic perspective, in which users have the flexibilities to change their minds about which product to adopt based on the accumulated influence and propaganda strength of each product. In addition, the incentivized cost of a user serving as an influence source is treated as a negative part of the host's profit. The host profit maximization problem is NP-hard, submodular, and non-monotone. To address this challenge, we propose an efficient greedy algorithm and devise a scalable version with an approximation guarantee to select the seed sets. As a side contribution, we develop two seed allocation algorithms to balance the distribution of adoptions among merchants with small profit sacrifice. Through extensive experiments on four real-world social networks, we demonstrate that our methods are effective and scalable.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"19 1","pages":"51-64"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139344906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.14778/3617838.3617840
Fangyuan Zhang, Mengxu Jiang, Sibo Wang
Given a weighted set S of n elements, weighted set sampling (WSS) samples an element in S so that each element a i ; is sampled with a probability proportional to its weight w ( a i ). The classic alias method pre-processes an index in O ( n ) time with O ( n ) space and handles WSS with O (1) time. Yet, the alias method does not support dynamic updates. By minor modifications of existing dynamic WSS schemes, it is possible to achieve an expected O (1) update time and draw t independent samples in expected O ( t ) time with linear space, which is theoretically optimal. But such a method is impractical and even slower than a binary search tree-based solution. How to support both efficient sampling and updates in practice is still challenging. Motivated by this, we design BUS , an efficient scheme that handles an update in O (1) amortized time and draws t independent samples in O (log n + t) time with linear space. A natural extension of WSS is the weighted independent range sampling (WIRS) , where each element in S is a data point from R. Given an arbitrary range Q = [ℓ, r ] at query time, WIRS aims to do weighted set sampling on the set S Q of data points falling into range Q. We show that by integrating the theoretically optimal dynamic WSS scheme mentioned above, it can handle an update in O (log n ) time and can draw t independent samples for WIRS in O (log n + t ) time, the same as the state-of-the-art static algorithm. Again, such a solution by integrating the optimal dynamic WSS scheme is still impractical to handle WIRS queries. We further propose WIRS-BUS to integrate BUS to handle WIRS queries, which handles each update in O (log n ) time and draws t independent samples in O (log 2 n + t ) time with linear space. Extensive experiments show that our BUS and WIRS-BUS are efficient for both sampling and updates.
给定一个包含 n 个元素的加权集合 S,加权集合采样(WSS)对 S 中的元素进行采样,这样每个元素 a i ;被采样的概率与其权重 w ( a i ) 成正比。经典的别名法用 O ( n ) 的时间和 O ( n ) 的空间预处理索引,用 O (1) 的时间处理 WSS。然而,别名法不支持动态更新。通过对现有的动态 WSS 方案稍作修改,可以实现预期 O (1) 更新时间,并在预期 O ( t ) 时间内用线性空间绘制 t 个独立样本,这在理论上是最优的。但这种方法并不实用,甚至比基于二叉搜索树的解决方案更慢。如何在实践中同时支持高效采样和更新仍是一个挑战。受此启发,我们设计了一种高效方案 BUS,它能在 O (1) 个摊销时间内处理更新,并在 O (log n + t) 个线性空间内抽取 t 个独立样本。 给定查询时的任意范围 Q = [ℓ, r ],WIRS 的目的是对范围 Q 中的数据点集合 S Q 进行加权集采样。我们的研究表明,通过整合上述理论上最优的动态 WSS 方案,它可以在 O (log n ) 时间内处理一次更新,并在 O (log n + t ) 时间内为 WIRS 绘制 t 个独立样本,与最先进的静态算法相同。同样,这种通过整合最优动态 WSS 方案来处理 WIRS 查询的解决方案仍然不切实际。我们进一步提出了 WIRS-BUS,以整合 BUS 来处理 WIRS 查询,它能在 O (log n ) 时间内处理每次更新,并在 O (log 2 n + t ) 时间内以线性空间绘制 t 个独立样本。大量实验表明,我们的 BUS 和 WIRS-BUS 在采样和更新方面都很高效。
{"title":"Efficient Dynamic Weighted Set Sampling and Its Extension","authors":"Fangyuan Zhang, Mengxu Jiang, Sibo Wang","doi":"10.14778/3617838.3617840","DOIUrl":"https://doi.org/10.14778/3617838.3617840","url":null,"abstract":"Given a weighted set S of n elements, weighted set sampling (WSS) samples an element in S so that each element a i ; is sampled with a probability proportional to its weight w ( a i ). The classic alias method pre-processes an index in O ( n ) time with O ( n ) space and handles WSS with O (1) time. Yet, the alias method does not support dynamic updates. By minor modifications of existing dynamic WSS schemes, it is possible to achieve an expected O (1) update time and draw t independent samples in expected O ( t ) time with linear space, which is theoretically optimal. But such a method is impractical and even slower than a binary search tree-based solution. How to support both efficient sampling and updates in practice is still challenging. Motivated by this, we design BUS , an efficient scheme that handles an update in O (1) amortized time and draws t independent samples in O (log n + t) time with linear space. A natural extension of WSS is the weighted independent range sampling (WIRS) , where each element in S is a data point from R. Given an arbitrary range Q = [ℓ, r ] at query time, WIRS aims to do weighted set sampling on the set S Q of data points falling into range Q. We show that by integrating the theoretically optimal dynamic WSS scheme mentioned above, it can handle an update in O (log n ) time and can draw t independent samples for WIRS in O (log n + t ) time, the same as the state-of-the-art static algorithm. Again, such a solution by integrating the optimal dynamic WSS scheme is still impractical to handle WIRS queries. We further propose WIRS-BUS to integrate BUS to handle WIRS queries, which handles each update in O (log n ) time and draws t independent samples in O (log 2 n + t ) time with linear space. Extensive experiments show that our BUS and WIRS-BUS are efficient for both sampling and updates.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"3 1","pages":"15-27"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139343953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.14778/3625054.3625060
Rui Liu, Kwanghyun Park, Fotis Psallidas, Xiaoyong Zhu, Jinghui Mo, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos, Yuanyuan Tian, Jesús Camacho-Rodríguez
Data pipelines (i.e., converting raw data to features) are critical for machine learning (ML) models, yet their development and management is time-consuming. Feature stores have recently emerged as a new "DBMS-for-ML" with the premise of enabling data scientists and engineers to define and manage their data pipelines. While current feature stores fulfill their promise from a functionality perspective, they are resource-hungry---with ample opportunities for implementing database-style optimizations to enhance their performance. In this paper, we propose a novel set of optimizations specifically targeted for point-in-time join, which is a critical operation in data pipelines. We implement these optimizations on top of Feathr: a widely-used feature store, and evaluate them on use cases from both the TPCx-AI benchmark and real-world online retail scenarios. Our thorough experimental analysis shows that our optimizations can accelerate data pipelines by up to 3× over state-of-the-art baselines.
{"title":"Optimizing Data Pipelines for Machine Learning in Feature Stores","authors":"Rui Liu, Kwanghyun Park, Fotis Psallidas, Xiaoyong Zhu, Jinghui Mo, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos, Yuanyuan Tian, Jesús Camacho-Rodríguez","doi":"10.14778/3625054.3625060","DOIUrl":"https://doi.org/10.14778/3625054.3625060","url":null,"abstract":"Data pipelines (i.e., converting raw data to features) are critical for machine learning (ML) models, yet their development and management is time-consuming. Feature stores have recently emerged as a new \"DBMS-for-ML\" with the premise of enabling data scientists and engineers to define and manage their data pipelines. While current feature stores fulfill their promise from a functionality perspective, they are resource-hungry---with ample opportunities for implementing database-style optimizations to enhance their performance. In this paper, we propose a novel set of optimizations specifically targeted for point-in-time join, which is a critical operation in data pipelines. We implement these optimizations on top of Feathr: a widely-used feature store, and evaluate them on use cases from both the TPCx-AI benchmark and real-world online retail scenarios. Our thorough experimental analysis shows that our optimizations can accelerate data pipelines by up to 3× over state-of-the-art baselines.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"24 1","pages":"4230-4239"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139343940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federated Graph Learning (FGL) is a distributed machine learning paradigm that enables collaborative training on large-scale subgraphs across multiple local systems. Existing FGL studies fall into two categories: (i) FGL Optimization, which improves multi-client training in existing machine learning models; (ii) FGL Model, which enhances performance with complex local models and multi-client interactions. However, most FGL optimization strategies are designed specifically for the computer vision domain and ignore graph structure, presenting dissatisfied performance and slow convergence. Meanwhile, complex local model architectures in FGL Models studies lack scalability for handling large-scale subgraphs and have deployment limitations. To address these issues, we propose Federated Graph Topology-aware Aggregation (FedGTA), a personalized optimization strategy that optimizes through topology-aware local smoothing confidence and mixed neighbor features. During experiments, we deploy FedGTA in 12 multi-scale real-world datasets with the Louvain and Metis split. This allows us to evaluate the performance and robustness of FedGTA across a range of scenarios. Extensive experiments demonstrate that FedGTA achieves state-of-the-art performance while exhibiting high scalability and efficiency. The experiment includes ogbn-papers100M, the most representative large-scale graph database so that we can verify the applicability of our method to large-scale graph learning. To the best of our knowledge, our study is the first to bridge large-scale graph learning with FGL using this optimization strategy, contributing to the development of efficient and scalable FGL methods.
{"title":"FedGTA: Topology-aware Averaging for Federated Graph Learning","authors":"Xunkai Li, Zhengyu Wu, Wentao Zhang, Yinlin Zhu, Ronghua Li, Guoren Wang","doi":"10.14778/3617838.3617842","DOIUrl":"https://doi.org/10.14778/3617838.3617842","url":null,"abstract":"Federated Graph Learning (FGL) is a distributed machine learning paradigm that enables collaborative training on large-scale subgraphs across multiple local systems. Existing FGL studies fall into two categories: (i) FGL Optimization, which improves multi-client training in existing machine learning models; (ii) FGL Model, which enhances performance with complex local models and multi-client interactions. However, most FGL optimization strategies are designed specifically for the computer vision domain and ignore graph structure, presenting dissatisfied performance and slow convergence. Meanwhile, complex local model architectures in FGL Models studies lack scalability for handling large-scale subgraphs and have deployment limitations. To address these issues, we propose Federated Graph Topology-aware Aggregation (FedGTA), a personalized optimization strategy that optimizes through topology-aware local smoothing confidence and mixed neighbor features. During experiments, we deploy FedGTA in 12 multi-scale real-world datasets with the Louvain and Metis split. This allows us to evaluate the performance and robustness of FedGTA across a range of scenarios. Extensive experiments demonstrate that FedGTA achieves state-of-the-art performance while exhibiting high scalability and efficiency. The experiment includes ogbn-papers100M, the most representative large-scale graph database so that we can verify the applicability of our method to large-scale graph learning. To the best of our knowledge, our study is the first to bridge large-scale graph learning with FGL using this optimization strategy, contributing to the development of efficient and scalable FGL methods.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"116 1","pages":"41-50"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139346945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.14778/3617838.3617839
Bolong Zheng, Yongyong Gao, J. Wan, Lingsen Yan, Long Hu, Bo Liu, Yunjun Gao, Xiaofang Zhou, Christian S. Jensen
Growing demands for the efficient processing of extreme-scale time series workloads call for more capable time series database management systems (TSDBMS). Specifically, to maintain consistency and durability of transaction processing, systems employ write-ahead logging (WAL) whereby transactions are committed only after the related log entries are flushed to disk. However, when faced with massive I/O, this becomes a throughput bottleneck. Recent advances in byte-addressable Non-Volatile Memory (NVM) provide opportunities to improve logging performance by persisting logs to NVM instead. Existing studies typically track complex transaction dependencies and use barrier instructions of NVM to ensure log ordering. In contrast, few studies consider the heavy-tailed characteristics of time series workloads, where most transactions are independent of each other. We propose DecLog, a decentralized NVM-based logging system that enables concurrent logging of TSDBMS transactions. Specifically, we propose data-driven log sequence numbering and relaxed ordering strategies to track transaction dependencies and resolve serialization issues. We also propose a parallel logging method to persist logs to NVM after being compressed and aligned. An experimental study on the YCSB-TS benchmark offers insight into the performance properties of DecLog, showing that it improves throughput by up to 4.6× while offering lower recovery time in comparison to the open source TSDBMS Beringei.
{"title":"DecLog: Decentralized Logging in Non-Volatile Memory for Time Series Database Systems","authors":"Bolong Zheng, Yongyong Gao, J. Wan, Lingsen Yan, Long Hu, Bo Liu, Yunjun Gao, Xiaofang Zhou, Christian S. Jensen","doi":"10.14778/3617838.3617839","DOIUrl":"https://doi.org/10.14778/3617838.3617839","url":null,"abstract":"Growing demands for the efficient processing of extreme-scale time series workloads call for more capable time series database management systems (TSDBMS). Specifically, to maintain consistency and durability of transaction processing, systems employ write-ahead logging (WAL) whereby transactions are committed only after the related log entries are flushed to disk. However, when faced with massive I/O, this becomes a throughput bottleneck. Recent advances in byte-addressable Non-Volatile Memory (NVM) provide opportunities to improve logging performance by persisting logs to NVM instead. Existing studies typically track complex transaction dependencies and use barrier instructions of NVM to ensure log ordering. In contrast, few studies consider the heavy-tailed characteristics of time series workloads, where most transactions are independent of each other. We propose DecLog, a decentralized NVM-based logging system that enables concurrent logging of TSDBMS transactions. Specifically, we propose data-driven log sequence numbering and relaxed ordering strategies to track transaction dependencies and resolve serialization issues. We also propose a parallel logging method to persist logs to NVM after being compressed and aligned. An experimental study on the YCSB-TS benchmark offers insight into the performance properties of DecLog, showing that it improves throughput by up to 4.6× while offering lower recovery time in comparison to the open source TSDBMS Beringei.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"12 1","pages":"1-14"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139346928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}