Proc. VLDB Endow.最新文献_第8页

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent Angel-PTM:一种可扩展且经济的腾讯大规模预训练系统

Proc. VLDB Endow.

Pub Date : 2023-03-06 DOI: 10.48550/arXiv.2303.02868

Xiaonan Nie, Yi Liu, Fangcheng Fu, J. Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui

Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transformer models. Angel-PTM can train extremely large-scale models with hierarchical memory efficiently. The key designs of Angel-PTM are a fine-grained memory management via the Page abstraction and a unified scheduling method that coordinates computations, data movements, and communications. Furthermore, Angel-PTM supports extreme model scaling with SSD storage and implements a lock-free updating mechanism to address the SSD I/O bottlenecks. Experimental results demonstrate that Angel-PTM outperforms existing systems by up to 114.8% in terms of maximum model scale as well as up to 88.9% in terms of training throughput. Additionally, experiments on GPT3-175B and T5-MoE-1.2T models utilizing hundreds of GPUs verify our strong scalability.

近年来，大规模预训练模型取得了前所未有的成就，尤其是Transformer模型。腾讯公司的许多产品和服务，如微信、QQ和腾讯广告，都已被选中，以获得预训练模型的能力。在这项工作中，我们提出了Angel-PTM，这是一个高效的深度学习系统，专为预训练和微调Transformer模型而设计。Angel-PTM可以有效地训练具有分层记忆的超大规模模型。Angel-PTM的关键设计是通过页面抽象实现的细粒度内存管理和协调计算、数据移动和通信的统一调度方法。此外，Angel-PTM支持SSD存储的极端模型扩展，并实现无锁更新机制，以解决SSD I/O瓶颈。实验结果表明，Angel-PTM在最大模型规模方面优于现有系统114.8%，在训练吞吐量方面优于现有系统88.9%。此外，在使用数百个gpu的GPT3-175B和T5-MoE-1.2T模型上的实验验证了我们强大的可扩展性。

{"title":"Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent","authors":"Xiaonan Nie, Yi Liu, Fangcheng Fu, J. Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui","doi":"10.48550/arXiv.2303.02868","DOIUrl":"https://doi.org/10.48550/arXiv.2303.02868","url":null,"abstract":"\u0000 Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transformer models. Angel-PTM can train extremely large-scale models with hierarchical memory efficiently. The key designs of Angel-PTM are a fine-grained memory management via the\u0000 Page\u0000 abstraction and a unified scheduling method that coordinates computations, data movements, and communications. Furthermore, Angel-PTM supports extreme model scaling with SSD storage and implements a lock-free updating mechanism to address the SSD I/O bottlenecks. Experimental results demonstrate that Angel-PTM outperforms existing systems by up to 114.8% in terms of maximum model scale as well as up to 88.9% in terms of training throughput. Additionally, experiments on GPT3-175B and T5-MoE-1.2T models utilizing hundreds of GPUs verify our strong scalability.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"57 1","pages":"3781-3794"},"PeriodicalIF":0.0,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77215011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Marigold: Efficient k-means Clustering in High Dimensions Marigold:高效的高维k均值聚类

Proc. VLDB Endow.

Pub Date : 2023-03-01 DOI: 10.14778/3587136.3587147

Kasper Overgaard Mortensen, Fatemeh Zardbani, M. A. Haque, S. Agustsson, D. Mottin, Philip Hofmann, Panagiotis Karras

How can we efficiently and scalably cluster high-dimensional data? The k -means algorithm clusters data by iteratively reducing intra-cluster Euclidean distances until convergence. While it finds applications from recommendation engines to image segmentation, its application to high-dimensional data is hindered by the need to repeatedly compute Euclidean distances among points and centroids. In this paper, we propose Marigold ( k -means for high-dimensional data), a scalable algorithm for k -means clustering in high dimensions. Marigold prunes distance calculations by means of (i) a tight distance-bounding scheme; (ii) a stepwise calculation over a multiresolution transform; and (iii) exploiting the triangle inequality. To our knowledge, such an arsenal of pruning techniques has not been hitherto applied to k -means. Our work is motivated by time-critical Angle-Resolved Photoemission Spectroscopy (ARPES) experiments, where it is vital to detect clusters among high-dimensional spectra in real time. In a thorough experimental study with real-world data sets we demonstrate that Marigold efficiently clusters high-dimensional data, achieving approximately one order of magnitude improvement over prior art.

如何高效、可扩展地聚类高维数据?k均值算法通过迭代地减少聚类内的欧氏距离来聚类数据，直到收敛。虽然它从推荐引擎到图像分割都有应用，但由于需要反复计算点和质心之间的欧几里德距离，它在高维数据中的应用受到了阻碍。本文提出了一种可扩展的高维k均值聚类算法Marigold (k -means for high-dimensional data)。万寿菊李子距离的计算(i)紧距离边界格式;(ii)对一个多分辨率变换进行逐步计算;(3)利用三角不等式。据我们所知，迄今为止，这种修剪技术的武库尚未应用于k -means。我们的工作是由时间临界角分辨光谱学(ARPES)实验激发的，在该实验中，实时检测高维光谱中的簇是至关重要的。在对真实世界数据集的彻底实验研究中，我们证明了Marigold有效地聚类高维数据，比现有技术实现了大约一个数量级的改进。

{"title":"Marigold: Efficient k-means Clustering in High Dimensions","authors":"Kasper Overgaard Mortensen, Fatemeh Zardbani, M. A. Haque, S. Agustsson, D. Mottin, Philip Hofmann, Panagiotis Karras","doi":"10.14778/3587136.3587147","DOIUrl":"https://doi.org/10.14778/3587136.3587147","url":null,"abstract":"\u0000 How can we efficiently and scalably cluster high-dimensional data? The\u0000 k\u0000 -means algorithm clusters data by iteratively reducing intra-cluster Euclidean distances until convergence. While it finds applications from recommendation engines to image segmentation, its application to high-dimensional data is hindered by the need to repeatedly compute Euclidean distances among points and centroids. In this paper, we propose Marigold (\u0000 k\u0000 -means for high-dimensional data), a scalable algorithm for\u0000 k\u0000 -means clustering in high dimensions. Marigold prunes distance calculations by means of (i) a tight distance-bounding scheme; (ii) a stepwise calculation over a multiresolution transform; and (iii) exploiting the triangle inequality. To our knowledge, such an arsenal of pruning techniques has not been hitherto applied to\u0000 k\u0000 -means. Our work is motivated by time-critical Angle-Resolved Photoemission Spectroscopy (ARPES) experiments, where it is vital to detect clusters among high-dimensional spectra in real time. In a thorough experimental study with real-world data sets we demonstrate that Marigold efficiently clusters high-dimensional data, achieving approximately one order of magnitude improvement over prior art.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"10 1","pages":"1740-1748"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84599009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

GriDB: Scaling Blockchain Database via Sharding and Off-Chain Cross-Shard Mechanism GriDB:通过分片和链下跨分片机制扩展区块链数据库

Proc. VLDB Endow.

Pub Date : 2023-03-01 DOI: 10.14778/3587136.3587143

Zicong Hong, Song Guo, Enyuan Zhou, Wuhui Chen, Huawei Huang, Albert Y. Zomaya

Blockchain databases have attracted widespread attention but suffer from poor scalability due to underlying non-scalable blockchains. While blockchain sharding is necessary for a scalable blockchain database, it poses a new challenge named on-chain cross-shard database services. Each cross-shard database service (e.g., cross-shard queries or inter-shard load balancing) involves massive cross-shard data exchanges, while the existing cross-shard mechanisms need to process each cross-shard data exchange via the consensus of all nodes in the related shards (i.e., on-chain) to resist a Byzantine environment of blockchain, which eliminates sharding benefits. To tackle the challenge, this paper presents GriDB, the first scalable blockchain database, by designing a novel off-chain cross-shard mechanism for efficient cross-shard database services. Borrowing the idea of off-chain payments, GriDB delegates massive cross-shard data exchange to a few nodes, each of which is randomly picked from a different shard. Considering the Byzantine environment, the untrusted delegates cooperate to generate succinct proof for cross-shard data exchanges, while the consensus is only responsible for the low-cost proof verification. However, different from payments, the database services' verification has more requirements (e.g., completeness, correctness, freshness, and availability); thus, we introduce several new authenticated data structures (ADS). Particularly, we utilize consensus to extend the threat model and reduce the complexity of traditional accumulator-based ADS for verifiable cross-shard queries with a rich set of relational operators. Moreover, we study the necessity of inter-shard load balancing for a scalable blockchain database and design an off-chain and live approach for both efficiency and availability during balancing. An evaluation of our prototype shows the performance of GriDB in terms of scalability in workloads with queries and updates.

区块链数据库受到广泛关注，但由于底层区块链不可扩展，其可扩展性较差。虽然区块链分片对于可扩展的区块链数据库是必要的，但它提出了一个新的挑战，即链上跨分片数据库服务。每个跨分片数据库服务(例如，跨分片查询或跨分片负载平衡)都涉及大量的跨分片数据交换，而现有的跨分片机制需要通过相关分片(即链上)中所有节点的共识来处理每个跨分片数据交换，以抵御区块链的拜占庭环境，这消除了分片的好处。为了应对这一挑战，本文提出了GriDB，第一个可扩展的区块链数据库，通过设计一种新颖的链下跨分片机制来实现高效的跨分片数据库服务。借用链下支付的思想，GriDB将大量的跨分片数据交换委托给几个节点，每个节点都是从不同的分片中随机选择的。考虑到拜占庭环境，不受信任的代表合作为跨分片数据交换生成简洁的证明，而共识只负责低成本的证明验证。但是，与支付不同的是，数据库服务的验证有更多的要求(如完整性、正确性、新鲜度、可用性);因此，我们引入了几种新的身份验证数据结构(ADS)。特别是，我们利用共识扩展了威胁模型，并降低了传统的基于累加器的ADS的复杂性，用于具有丰富关系操作符的可验证跨分片查询。此外，我们研究了可扩展区块链数据库分片间负载平衡的必要性，并在平衡期间设计了一种离线和实时方法，以提高效率和可用性。对我们的原型的评估显示了GriDB在具有查询和更新的工作负载中的可伸缩性方面的性能。

{"title":"GriDB: Scaling Blockchain Database via Sharding and Off-Chain Cross-Shard Mechanism","authors":"Zicong Hong, Song Guo, Enyuan Zhou, Wuhui Chen, Huawei Huang, Albert Y. Zomaya","doi":"10.14778/3587136.3587143","DOIUrl":"https://doi.org/10.14778/3587136.3587143","url":null,"abstract":"\u0000 Blockchain databases have attracted widespread attention but suffer from poor scalability due to underlying non-scalable blockchains. While blockchain sharding is necessary for a scalable blockchain database, it poses a new challenge named\u0000 on-chain cross-shard database services.\u0000 Each cross-shard database service (e.g., cross-shard queries or inter-shard load balancing) involves massive cross-shard data exchanges, while the existing cross-shard mechanisms need to process each cross-shard data exchange via the consensus of all nodes in the related shards (i.e., on-chain) to resist a Byzantine environment of blockchain, which eliminates sharding benefits.\u0000 \u0000 \u0000 To tackle the challenge, this paper presents GriDB, the first scalable blockchain database, by designing a novel\u0000 off-chain cross-shard mechanism\u0000 for efficient cross-shard database services. Borrowing the idea of off-chain payments, GriDB delegates massive cross-shard data exchange to a few nodes, each of which is randomly picked from a different shard. Considering the Byzantine environment, the untrusted delegates cooperate to generate succinct proof for cross-shard data exchanges, while the consensus is only responsible for the low-cost proof verification. However, different from payments, the database services' verification has more requirements (e.g., completeness, correctness, freshness, and availability); thus, we introduce several new\u0000 authenticated data structures\u0000 (ADS). Particularly, we utilize consensus to extend the threat model and reduce the complexity of traditional accumulator-based ADS for verifiable cross-shard queries with a rich set of relational operators. Moreover, we study the necessity of inter-shard load balancing for a scalable blockchain database and design an off-chain and live approach for both efficiency and availability during balancing. An evaluation of our prototype shows the performance of GriDB in terms of scalability in workloads with queries and updates.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"37 1","pages":"1685-1698"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80589588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans LOGER:一个用于生成高效和健壮的查询执行计划的学习优化器

Proc. VLDB Endow.

Pub Date : 2023-03-01 DOI: 10.14778/3587136.3587150

Tianyi Chen, Jun Gao, Hedui Chen, Yaofeng Tu

Query optimization based on deep reinforcement learning (DRL) has become a hot research topic recently. Despite the achieved promising progress, DRL optimizers still face great challenges of robustly producing efficient plans, due to the vast search space for both join order and operator selection and the highly varying execution latency taken as the feedback signal. In this paper, we propose LOGER, a learned optimizer towards generating efficient and robust plans, aiming at producing both efficient join orders and operators. LOGER first utilizes Graph Transformer to capture relationships between tables and predicates. Then, the search space is reorganized, in which LOGER learns to restrict specific operators instead of directly selecting one for each join, while utilizing DBMS built-in optimizer to select physical operators under the restrictions. Such a strategy exploits expert knowledge to improve the robustness of plan generation while offering sufficient plan search flexibility. Furthermore, LOGER introduces ε -beam search, which keeps multiple search paths that preserve promising plans while performing guided exploration. Finally, LOGER introduces a loss function with reward weighting to further enhance performance robustness by reducing the fluctuation caused by poor operators, and log transformation to compress the range of rewards. We conduct experiments on Join Order Benchmark (JOB), TPC-DS and Stack Overflow, and demonstrate that LOGER can achieve a performance better than existing learned query optimizers, with a 2.07x speedup on JOB compared with PostgreSQL.

基于深度强化学习(DRL)的查询优化是近年来的研究热点。尽管取得了可喜的进展，但由于连接顺序和算子选择的巨大搜索空间以及作为反馈信号的高度变化的执行延迟，DRL优化器仍然面临鲁棒生成高效计划的巨大挑战。在本文中，我们提出了LOGER，一个用于生成高效鲁棒计划的学习优化器，旨在生成高效的连接顺序和操作符。logger首先利用Graph Transformer捕获表和谓词之间的关系。然后，对搜索空间进行重组，其中LOGER学习限制特定的操作符，而不是为每个连接直接选择一个操作符，同时利用DBMS内置的优化器在限制下选择物理操作符。该策略利用专家知识提高了计划生成的鲁棒性，同时提供了足够的计划搜索灵活性。此外，LOGER引入了ε波束搜索，在进行引导勘探的同时保留多条搜索路径，以保留有希望的计划。最后，LOGER引入了一个带有奖励权重的损失函数，通过减少糟糕算子带来的波动进一步增强性能的鲁棒性，并通过对数变换压缩奖励的范围。我们在Join Order Benchmark (JOB)、TPC-DS和Stack Overflow上进行了实验，并证明LOGER可以实现比现有学习查询优化器更好的性能，与PostgreSQL相比，LOGER在JOB上的加速速度提高了2.07倍。

{"title":"LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans","authors":"Tianyi Chen, Jun Gao, Hedui Chen, Yaofeng Tu","doi":"10.14778/3587136.3587150","DOIUrl":"https://doi.org/10.14778/3587136.3587150","url":null,"abstract":"\u0000 Query optimization based on deep reinforcement learning (DRL) has become a hot research topic recently. Despite the achieved promising progress, DRL optimizers still face great challenges of robustly producing efficient plans, due to the vast search space for both join order and operator selection and the highly varying execution latency taken as the feedback signal. In this paper, we propose LOGER, a learned optimizer towards generating efficient and robust plans, aiming at producing both efficient join orders and operators. LOGER first utilizes Graph Transformer to capture relationships between tables and predicates. Then, the search space is reorganized, in which LOGER learns to restrict specific operators instead of directly selecting one for each join, while utilizing DBMS built-in optimizer to select physical operators under the restrictions. Such a strategy exploits expert knowledge to improve the robustness of plan generation while offering sufficient plan search flexibility. Furthermore, LOGER introduces\u0000 ε\u0000 -beam search, which keeps multiple search paths that preserve promising plans while performing guided exploration. Finally, LOGER introduces a loss function with reward weighting to further enhance performance robustness by reducing the fluctuation caused by poor operators, and log transformation to compress the range of rewards. We conduct experiments on Join Order Benchmark (JOB), TPC-DS and Stack Overflow, and demonstrate that LOGER can achieve a performance better than existing learned query optimizers, with a 2.07x speedup on JOB compared with PostgreSQL.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"15 1","pages":"1777-1789"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74593850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Distributed Graph Embedding with Information-Oriented Random Walks 面向信息随机游走的分布式图嵌入

Proc. VLDB Endow.

Pub Date : 2023-03-01 DOI: 10.48550/arXiv.2303.15702

Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, Yu Cao

Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks. The increasing availability of billion-edge graphs underscores the importance of learning efficient and effective embeddings on large graphs, such as link prediction on Twitter with over one billion edges. Most existing graph embedding methods fall short of reaching high data scalability. In this paper, we present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs. DistGER incrementally computes information-centric random walks. It further leverages a multi-proximity-aware, streaming, parallel graph partitioning strategy, simultaneously achieving high local partition quality and excellent workload balancing across machines. DistGER also improves the distributed Skip-Gram learning model to generate node embeddings by optimizing the access locality, CPU throughput, and synchronization efficiency. Experiments on real-world graphs demonstrate that compared to state-of-the-art distributed graph embedding frameworks, including KnightKing, DistDGL, and Pytorch-BigGraph, DistGER exhibits 2.33×--129× acceleration, 45% reduction in cross-machines communication, and >10% effectiveness improvement in downstream tasks.

图嵌入将图节点映射到低维向量上，广泛应用于机器学习任务。越来越多的十亿边图的可用性强调了在大型图上学习高效和有效嵌入的重要性，例如在Twitter上有超过10亿条边的链接预测。大多数现有的图嵌入方法都无法达到高数据可扩展性。在本文中，我们提出了一个通用的、分布式的、以信息为中心的随机行走图嵌入框架DistGER，它可以扩展到嵌入十亿边图。DistGER增量计算以信息为中心的随机漫步。它进一步利用多邻近感知、流、并行图分区策略，同时实现高本地分区质量和出色的跨机器工作负载平衡。DistGER还改进了分布式Skip-Gram学习模型，通过优化访问局域性、CPU吞吐量和同步效率来生成节点嵌入。在真实图形上的实验表明，与最先进的分布式图形嵌入框架(包括KnightKing, DistDGL和Pytorch-BigGraph)相比，DistGER具有2.33 -129倍的加速，跨机器通信减少45%，下游任务效率提高>10%。

{"title":"Distributed Graph Embedding with Information-Oriented Random Walks","authors":"Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, Yu Cao","doi":"10.48550/arXiv.2303.15702","DOIUrl":"https://doi.org/10.48550/arXiv.2303.15702","url":null,"abstract":"Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks. The increasing availability of billion-edge graphs underscores the importance of learning efficient and effective embeddings on large graphs, such as link prediction on Twitter with over one billion edges. Most existing graph embedding methods fall short of reaching high data scalability. In this paper, we present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs. DistGER incrementally computes information-centric random walks. It further leverages a multi-proximity-aware, streaming, parallel graph partitioning strategy, simultaneously achieving high local partition quality and excellent workload balancing across machines. DistGER also improves the distributed Skip-Gram learning model to generate node embeddings by optimizing the access locality, CPU throughput, and synchronization efficiency. Experiments on real-world graphs demonstrate that compared to state-of-the-art distributed graph embedding frameworks, including KnightKing, DistDGL, and Pytorch-BigGraph, DistGER exhibits 2.33×--129× acceleration, 45% reduction in cross-machines communication, and >10% effectiveness improvement in downstream tasks.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"17 1","pages":"1643-1656"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90471653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Elf: Erasing-based Lossless Floating-Point Compression Elf:基于擦除的无损浮点压缩

Proc. VLDB Endow.

Pub Date : 2023-03-01 DOI: 10.14778/3587136.3587149

Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, Yu Zheng

There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e., Elf. The main idea of Elf is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis, Elf can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros. Elf works in a streaming fashion. It takes only O ( N ) (where N is the length of a time series) in time and O (1) in space, and achieves a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of Elf compared with 9 advanced competitors.

以前所未有的高速率生成了数量惊人的浮点时间序列数据。对时间序列数据进行高效、紧凑和无损的压缩，对于各种场景都具有重要意义。现有的大多数无损浮点压缩方法都是基于异或操作，但它们并没有充分利用后面的零，这通常会导致令人不满意的压缩比。提出了一种基于擦除的无损浮点压缩算法Elf。Elf的主要思想是擦除浮点值的最后几位(即将它们设置为零)，因此xor值应该包含许多末尾的零。基于擦除的方法面临三方面的挑战。首先，如何快速确定被擦除的位?第二，如何从被擦除的数据中无损地恢复原始数据?第三，如何对擦除后的数据进行紧凑编码?通过严格的数学分析，Elf可以直接确定被擦除的比特，并在不损失任何精度的情况下恢复原始值。为了进一步提高压缩比，我们提出了一种新的编码策略来处理带有多个尾零的xor值。Elf以流媒体方式工作。它在时间上只需要O (N)(其中N为时间序列的长度)，在空间上只需要O(1)，并且在理论保证下获得了显著的压缩比。使用22个数据集进行的大量实验表明，Elf与9个先进的竞争对手相比具有强大的性能。

{"title":"Elf: Erasing-based Lossless Floating-Point Compression","authors":"Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, Yu Zheng","doi":"10.14778/3587136.3587149","DOIUrl":"https://doi.org/10.14778/3587136.3587149","url":null,"abstract":"\u0000 There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e.,\u0000 Elf.\u0000 The main idea of\u0000 Elf\u0000 is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis,\u0000 Elf\u0000 can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros.\u0000 Elf\u0000 works in a streaming fashion. It takes only\u0000 O\u0000 (\u0000 N\u0000 ) (where\u0000 N\u0000 is the length of a time series) in time and\u0000 O\u0000 (1) in space, and achieves a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of\u0000 Elf\u0000 compared with 9 advanced competitors.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"37 1","pages":"1763-1776"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79259995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

When Database Meets New Storage Devices: Understanding and Exposing Performance Mismatches via Configurations 当数据库遇到新的存储设备:通过配置理解和暴露性能不匹配

Proc. VLDB Endow.

Pub Date : 2023-03-01 DOI: 10.14778/3587136.3587145

Haochen He, Erci Xu, Shanshan Li, Zhouyang Jia, Si Zheng, Yue Yu, Jun Ma, Xiangke Liao

NVMe SSD hugely boosts the I/O speed, with up to GB/s throughput and microsecond-level latency. Unfortunately, DBMS users can often find their high-performanced storage devices tend to deliver less-than-expected or even worse performance when compared to their traditional peers. While many works focus on proposing new DBMS designs to fully exploit NVMe SSDs, few systematically study the symptoms, root causes and possible detection methods of such performance mismatches on existing databases. In this paper, we start with an empirical study where we systematically expose and analyze the performance mismatches on six popular databases via controlled configuration tuning. From the study, we find that all six databases can suffer from performance mismatches. Moreover, we conclude that the root causes can be categorized as databases' unawareness of new storage devices characteristics in I/O size, I/O parallelism and I/O sequentiality. We report 17 mismatches to developers and 15 are confirmed. Additionally, we realize testing all configuration knobs yields low efficiency. Therefore, we propose a fast performance mismatch detection framework and evaluation shows that our framework brings two orders of magnitude speedup than baseline without sacrificing effectiveness.

NVMe SSD极大地提高了I/O速度，具有高达GB/s的吞吐量和微秒级的延迟。不幸的是，DBMS用户经常会发现，与传统存储设备相比，他们的高性能存储设备往往提供的性能低于预期，甚至更差。虽然许多工作都集中在提出新的DBMS设计来充分利用NVMe ssd，但很少有系统地研究现有数据库上这种性能不匹配的症状、根本原因和可能的检测方法。在本文中，我们从一项实证研究开始，通过受控的配置调优，系统地揭示和分析了六个流行数据库上的性能不匹配。从研究中，我们发现所有六个数据库都存在性能不匹配的问题。此外，我们得出结论，根本原因可以归类为数据库在I/O大小、I/O并行性和I/O顺序性方面对新存储设备特征的不了解。我们向开发人员报告了17个不匹配项，其中15个已确认。此外，我们意识到测试所有配置旋钮的效率很低。因此，我们提出了一个快速的性能不匹配检测框架，评估表明我们的框架在不牺牲效率的情况下比基线提高了两个数量级的速度。

{"title":"When Database Meets New Storage Devices: Understanding and Exposing Performance Mismatches via Configurations","authors":"Haochen He, Erci Xu, Shanshan Li, Zhouyang Jia, Si Zheng, Yue Yu, Jun Ma, Xiangke Liao","doi":"10.14778/3587136.3587145","DOIUrl":"https://doi.org/10.14778/3587136.3587145","url":null,"abstract":"NVMe SSD hugely boosts the I/O speed, with up to GB/s throughput and microsecond-level latency. Unfortunately, DBMS users can often find their high-performanced storage devices tend to deliver less-than-expected or even worse performance when compared to their traditional peers. While many works focus on proposing new DBMS designs to fully exploit NVMe SSDs, few systematically study the symptoms, root causes and possible detection methods of such performance mismatches on existing databases.\u0000 In this paper, we start with an empirical study where we systematically expose and analyze the performance mismatches on six popular databases via controlled configuration tuning. From the study, we find that all six databases can suffer from performance mismatches. Moreover, we conclude that the root causes can be categorized as databases' unawareness of new storage devices characteristics in I/O size, I/O parallelism and I/O sequentiality. We report 17 mismatches to developers and 15 are confirmed.\u0000 Additionally, we realize testing all configuration knobs yields low efficiency. Therefore, we propose a fast performance mismatch detection framework and evaluation shows that our framework brings two orders of magnitude speedup than baseline without sacrificing effectiveness.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"49 1","pages":"1712-1725"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75915307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SODA: A Set of Fast Oblivious Algorithms in Distributed Secure Data Analytics SODA:分布式安全数据分析中的一组快速遗忘算法

Proc. VLDB Endow.

Pub Date : 2023-03-01 DOI: 10.14778/3587136.3587142

Xiang Li, Nuozhou Sun, Yunqian Luo, M. Gao

Cloud systems are now a prevalent platform to host large-scale big-data analytics applications such as machine learning and relational database. However, data privacy remains as a critical concern for public cloud systems. Existing trusted hardware could provide an isolated execution domain on an untrusted platform, but also suffers from access-pattern-based side channels at various levels including memory, disks, and networking. Oblivious algorithms can address these vulnerabilities by hiding the program data access patterns. Unfortunately, current oblivious algorithms for data analytics are limited to single-machine execution, only support simple operations, and/or suffer from significant performance overheads due to the use of expensive global sort and excessive data padding. In this work, we propose SODA, a set of efficient and oblivious algorithms for distributed data analytics operators, including filter, aggregate, and binary equi-join. To improve performance, SODA completely avoids the expensive oblivious global sort primitive, and minimizes the data padding overheads. SODA makes use of low-cost (pseudo-)random communication instead of expensive global sort to ensure uniform data traffic in oblivious filter and aggregate. It also adopts a novel two-level bin-packing approach in oblivious join to alleviate both input redistribution and join product skewness, thus minimizing necessary data padding. Compared to the state-of-the-art system, SODA not only extends the functionality but also improves the performance. It achieves 1.1× to 14.6× speedups on complex multi-operator data analytics workloads.

云系统现在是托管大规模大数据分析应用程序(如机器学习和关系数据库)的流行平台。然而，数据隐私仍然是公共云系统的一个关键问题。现有的可信硬件可以在不可信的平台上提供隔离的执行域，但也会受到各种级别(包括内存、磁盘和网络)上基于访问模式的侧通道的影响。遗忘算法可以通过隐藏程序数据访问模式来解决这些漏洞。不幸的是，当前用于数据分析的遗忘算法仅限于单机执行，只支持简单的操作，并且/或者由于使用昂贵的全局排序和过多的数据填充而遭受显著的性能开销。在这项工作中，我们提出了SODA，这是一组用于分布式数据分析运算符的高效且无关紧要的算法，包括过滤，聚合和二进制等连接。为了提高性能，SODA完全避免了昂贵的遗忘全局排序原语，并最小化了数据填充开销。SODA利用低成本(伪)随机通信代替昂贵的全局排序，以确保在遗忘过滤和聚合中数据流量一致。在遗忘连接中采用了一种新颖的两级装箱方法，既减轻了输入重分配，又减轻了连接产品的偏度，从而最大限度地减少了必要的数据填充。与最先进的系统相比，SODA不仅扩展了功能，而且提高了性能。它在复杂的多操作员数据分析工作负载上实现了1.1到14.6倍的加速。

{"title":"SODA: A Set of Fast Oblivious Algorithms in Distributed Secure Data Analytics","authors":"Xiang Li, Nuozhou Sun, Yunqian Luo, M. Gao","doi":"10.14778/3587136.3587142","DOIUrl":"https://doi.org/10.14778/3587136.3587142","url":null,"abstract":"Cloud systems are now a prevalent platform to host large-scale big-data analytics applications such as machine learning and relational database. However, data privacy remains as a critical concern for public cloud systems. Existing trusted hardware could provide an isolated execution domain on an untrusted platform, but also suffers from access-pattern-based side channels at various levels including memory, disks, and networking. Oblivious algorithms can address these vulnerabilities by hiding the program data access patterns. Unfortunately, current oblivious algorithms for data analytics are limited to single-machine execution, only support simple operations, and/or suffer from significant performance overheads due to the use of expensive global sort and excessive data padding.\u0000 In this work, we propose SODA, a set of efficient and oblivious algorithms for distributed data analytics operators, including filter, aggregate, and binary equi-join. To improve performance, SODA completely avoids the expensive oblivious global sort primitive, and minimizes the data padding overheads. SODA makes use of low-cost (pseudo-)random communication instead of expensive global sort to ensure uniform data traffic in oblivious filter and aggregate. It also adopts a novel two-level bin-packing approach in oblivious join to alleviate both input redistribution and join product skewness, thus minimizing necessary data padding. Compared to the state-of-the-art system, SODA not only extends the functionality but also improves the performance. It achieves 1.1× to 14.6× speedups on complex multi-operator data analytics workloads.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"31 1","pages":"1671-1684"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85758398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SPG: Structure-Private Graph Database via SqueezePIR SPG:基于SqueezePIR的结构私有图形数据库

Proc. VLDB Endow.

Pub Date : 2023-03-01 DOI: 10.14778/3587136.3587138

Ling Liang, Jilan Lin, Zheng Qu, Ishtiyaque Ahmad, Fengbin Tu, Trinabh Gupta, Yufei Ding, Yuan Xie

Many relational data in our daily life are represented as graphs, making graph application an important workload. Because of the large scale of graph datasets, moving graph data to the cloud becomes a popular option. To keep the confidential and private graph secure from an untrusted cloud server, many cryptographic techniques are leveraged to hide the content of the data. However, protecting only the data content is not enough for a graph database. Because the structural information of the graph can be revealed through the database accessing track. In this work, we study the graph neural network (GNN), an important graph workload to mine information from a graph database. We find that the server is able to infer which node is processing during the edge retrieving phase and also learn its neighbor indices during GNN's aggregation phase. This leads to the leakage of the information of graph structure data. In this work, we present SPG, a structure-private graph database with SqueezePIR. Our SPG is built on top of Private Information Retrieval (PIR), which securely hides which nodes/neighbors are accessed. In addition, we propose SqueezePIR, a compression technique to overcome the computation overhead of PIR. Based on our evaluation, our SqueezePIR achieves 11.85× speedup on average with less than 2% accuracy loss when compared to the state-of-the-art FastPIR protocol.

在我们的日常生活中，许多关系数据都是用图形表示的，这使得图形应用成为一项重要的工作。由于图形数据集的规模很大，将图形数据移动到云端成为一种流行的选择。为了保证机密和私有图形不受不可信云服务器的攻击，需要利用许多加密技术来隐藏数据的内容。然而，对于图数据库来说，仅仅保护数据内容是不够的。因为图的结构信息可以通过数据库访问轨迹来揭示。在这项工作中，我们研究了图神经网络(GNN)，这是一种从图数据库中挖掘信息的重要图负载。我们发现服务器能够在边缘检索阶段推断出哪个节点正在处理，并在GNN的聚合阶段学习其邻居索引。这就导致了图结构数据信息的泄露。在这项工作中，我们提出了SPG，一个基于SqueezePIR的结构私有图形数据库。我们的SPG建立在私有信息检索(PIR)之上，它可以安全地隐藏访问的节点/邻居。此外，我们提出了一种压缩技术SqueezePIR来克服PIR的计算开销。根据我们的评估，与最先进的FastPIR协议相比，我们的SqueezePIR实现了11.85倍的平均加速，精度损失不到2%。

{"title":"SPG: Structure-Private Graph Database via SqueezePIR","authors":"Ling Liang, Jilan Lin, Zheng Qu, Ishtiyaque Ahmad, Fengbin Tu, Trinabh Gupta, Yufei Ding, Yuan Xie","doi":"10.14778/3587136.3587138","DOIUrl":"https://doi.org/10.14778/3587136.3587138","url":null,"abstract":"Many relational data in our daily life are represented as graphs, making graph application an important workload. Because of the large scale of graph datasets, moving graph data to the cloud becomes a popular option. To keep the confidential and private graph secure from an untrusted cloud server, many cryptographic techniques are leveraged to hide the content of the data. However, protecting only the data content is not enough for a graph database. Because the structural information of the graph can be revealed through the database accessing track.\u0000 In this work, we study the graph neural network (GNN), an important graph workload to mine information from a graph database. We find that the server is able to infer which node is processing during the edge retrieving phase and also learn its neighbor indices during GNN's aggregation phase. This leads to the leakage of the information of graph structure data. In this work, we present SPG, a structure-private graph database with SqueezePIR. Our SPG is built on top of Private Information Retrieval (PIR), which securely hides which nodes/neighbors are accessed. In addition, we propose SqueezePIR, a compression technique to overcome the computation overhead of PIR. Based on our evaluation, our SqueezePIR achieves 11.85× speedup on average with less than 2% accuracy loss when compared to the state-of-the-art FastPIR protocol.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"44 1","pages":"1615-1628"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86841932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

SUFF: Accelerating Subgraph Matching with Historical Data 加速子图与历史数据的匹配

Proc. VLDB Endow.

Pub Date : 2023-03-01 DOI: 10.14778/3587136.3587144

Xun Jian, Zhiyuan Li, Lei Chen

Subgraph matching is a fundamental problem in graph theory and has wide applications in areas like sociology, chemistry, and social networks. Due to its NP-hardness, the basic approach is a brute-force search over the whole search space. Some pruning strategies have been proposed to reduce the search space. However, they are either space-inefficient or based on assumptions that the graph has specific properties. In this paper, we propose SUFF, a general and powerful structure filtering framework, which can accelerate most of the existing approaches with slight modifications. Specifically, it builds a set of filters using matching results of past queries, and uses them to prune the search space for future queries. By fully utilizing the relationship between matches of two queries, it ensures that such pruning is sound. Furthermore, several optimizations are proposed to reduce the computation and space cost for building, storing, and using filters. Extensive experiments are conducted on multiple real-world data sets and representative existing approaches. The results show that SUFF can achieve up to 15X speedup with small overheads.

子图匹配是图论中的一个基本问题，在社会学、化学和社会网络等领域有着广泛的应用。由于其np -硬度，基本方法是在整个搜索空间内进行蛮力搜索。为了减少搜索空间，提出了一些修剪策略。然而，它们要么是空间低效的，要么是基于图具有特定属性的假设。在本文中，我们提出了一个通用的、强大的结构滤波框架SUFF，它可以在稍加修改的情况下加速大多数现有的方法。具体来说，它使用过去查询的匹配结果构建一组过滤器，并使用它们为未来的查询修剪搜索空间。通过充分利用两个查询的匹配之间的关系，它确保了这种修剪是合理的。此外，还提出了一些优化方法，以减少构建、存储和使用过滤器的计算和空间成本。在多个真实世界数据集和具有代表性的现有方法上进行了广泛的实验。结果表明，SUFF可以以较小的开销实现高达15倍的加速。

引用次数: 1