首页 > 最新文献

Proc. VLDB Endow.最新文献

英文 中文
Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals 为什么不:修改对个人不公平的前k排名
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598606
Zixuan Chen, P. Manolios, Mirek Riedewald
This work considers why-not questions in the context of top-k queries and score-based ranking functions. Following the popular linear scalarization approach for multi-objective optimization, we study rankings based on the weighted sum of multiple scores. A given weight choice may be controversial or perceived as unfair to certain individuals or organizations, triggering the question why some entity of interest has not yet shown up in the top-k. We introduce various notions of such why-not-yet queries and formally define them as satisfiability or optimization problems, whose goal is to propose alternative ranking functions that address the placement of the entities of interest. While some why-not-yet problems have linear constraints, others require quantifiers, disjunction, and negation. We propose several optimizations, ranging from a monotonic-core construction that approximates the complex constraints with a conjunction of linear ones, to various techniques that let the user control the tradeoff between running time and approximation quality. Experiments with real and synthetic data demonstrate the practicality and scalability of our technique, showing its superiority compared to the state of the art (SOA).
这项工作考虑了top-k查询和基于分数的排名函数上下文中的why-not问题。在多目标优化中,我们遵循流行的线性标量化方法,基于多个分数的加权和来研究排名。给定的权重选择可能会引起争议,或者被认为对某些个人或组织不公平,从而引发以下问题:为什么某些利益实体尚未出现在前k名中?我们引入了各种关于why-not-yet查询的概念,并将其正式定义为可满足性或优化问题,其目标是提出解决感兴趣实体位置的替代排序函数。虽然有些“为什么还没有”问题具有线性约束,但其他问题则需要量词、析取和否定。我们提出了几种优化方法,从单调核心结构(用线性约束的结合近似复杂约束)到各种技术(让用户控制运行时间和近似质量之间的权衡)。用真实数据和合成数据进行的实验证明了我们技术的实用性和可伸缩性,显示了它与现有技术(SOA)相比的优越性。
{"title":"Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals","authors":"Zixuan Chen, P. Manolios, Mirek Riedewald","doi":"10.14778/3598581.3598606","DOIUrl":"https://doi.org/10.14778/3598581.3598606","url":null,"abstract":"This work considers why-not questions in the context of top-k queries and score-based ranking functions. Following the popular linear scalarization approach for multi-objective optimization, we study rankings based on the weighted sum of multiple scores. A given weight choice may be controversial or perceived as unfair to certain individuals or organizations, triggering the question why some entity of interest has not yet shown up in the top-k. We introduce various notions of such why-not-yet queries and formally define them as satisfiability or optimization problems, whose goal is to propose alternative ranking functions that address the placement of the entities of interest. While some why-not-yet problems have linear constraints, others require quantifiers, disjunction, and negation. We propose several optimizations, ranging from a monotonic-core construction that approximates the complex constraints with a conjunction of linear ones, to various techniques that let the user control the tradeoff between running time and approximation quality. Experiments with real and synthetic data demonstrate the practicality and scalability of our technique, showing its superiority compared to the state of the art (SOA).","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"16 1","pages":"2377-2390"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78534442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LEON: A New Framework for ML-Aided Query Optimization 一个新的机器学习辅助查询优化框架
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598597
Xu Chen, Haitian Chen, Zibo Liang, Shuncheng Liu, Jinghong Wang, Kai Zeng, Han Su, Kai Zheng
Query optimization has long been a fundamental yet challenging topic in the database field. With the prosperity of machine learning (ML), some recent works have shown the advantages of reinforcement learning (RL) based learned query optimizer. However, they suffer from fundamental limitations due to the data-driven nature of ML. Motivated by the ML characteristics and database maturity, we propose LEON -a framework for ML-aidEd query OptimizatioN. LEON improves the expert query optimizer to self-adjust to the particular deployment by leveraging ML and the fundamental knowledge in the expert query optimizer. To train the ML model, a pairwise ranking objective is proposed, which is substantially different from the previous regression objective. To help the optimizer to escape the local minima and avoid failure, a ranking and uncertainty-based exploration strategy is proposed, which discovers the valuable plans to aid the optimizer. Furthermore, an ML model-guided pruning is proposed to increase the planning efficiency without hurting too much performance. Extensive experiments offer evidence that the proposed framework can outperform the state-of-the-art methods in terms of end-to-end latency performance, training efficiency, and stability.
查询优化一直是数据库领域的一个基础而又具有挑战性的课题。随着机器学习(ML)的蓬勃发展,近年来的一些研究工作显示了基于强化学习(RL)的学习查询优化器的优势。然而,由于ML的数据驱动特性,它们受到了基本的限制。受ML特征和数据库成熟度的激励,我们提出了一个用于ML辅助查询优化的框架LEON。LEON通过利用ML和专家查询优化器中的基础知识,改进了专家查询优化器,使其能够自我调整以适应特定的部署。为了训练机器学习模型,提出了一个与之前回归目标有本质区别的成对排序目标。为了帮助优化器摆脱局部极小值,避免失败,提出了一种基于排序和不确定性的搜索策略,发现有价值的方案来帮助优化器。在此基础上,提出了一种机器学习模型引导下的剪枝方法,在不影响性能的前提下提高规划效率。大量的实验证明,所提出的框架在端到端延迟性能、训练效率和稳定性方面优于最先进的方法。
{"title":"LEON: A New Framework for ML-Aided Query Optimization","authors":"Xu Chen, Haitian Chen, Zibo Liang, Shuncheng Liu, Jinghong Wang, Kai Zeng, Han Su, Kai Zheng","doi":"10.14778/3598581.3598597","DOIUrl":"https://doi.org/10.14778/3598581.3598597","url":null,"abstract":"\u0000 Query optimization has long been a fundamental yet challenging topic in the database field. With the prosperity of machine learning (ML), some recent works have shown the advantages of reinforcement learning (RL) based learned query optimizer. However, they suffer from fundamental limitations due to the data-driven nature of ML. Motivated by the ML characteristics and database maturity, we propose\u0000 LEON\u0000 -a framework for ML-aidEd query OptimizatioN.\u0000 LEON\u0000 improves the expert query optimizer to self-adjust to the particular deployment by leveraging ML and the fundamental knowledge in the expert query optimizer. To train the ML model, a pairwise ranking objective is proposed, which is substantially different from the previous regression objective. To help the optimizer to escape the local minima and avoid failure, a ranking and uncertainty-based exploration strategy is proposed, which discovers the valuable plans to aid the optimizer. Furthermore, an ML model-guided pruning is proposed to increase the planning efficiency without hurting too much performance. Extensive experiments offer evidence that the proposed framework can outperform the state-of-the-art methods in terms of end-to-end latency performance, training efficiency, and stability.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"1 1","pages":"2261-2273"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72862331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
BICE: Exploring Compact Search Space by Using Bipartite Matching and Cell-Wide Verification 利用二部匹配和单元范围验证探索紧凑搜索空间
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598591
Yunyoung Choi, Kunsoo Park, Hyunjoon Kim
Subgraph matching is the problem of searching for all embeddings of a query graph in a data graph, and subgraph query processing (also known as subgraph search) is to find all the data graphs that contain a query graph as subgraphs. Extensive research has been done to develop practical solutions for both problems. However, the existing solutions still show limited query processing time due to a lot of unnecessary computations in search. In this paper, we focus on exploring as compact search space as possible by using three techniques: (1) pruning by bipartite matching, (2) pruning by failing sets with bipartite matching, and (3) cell-wide verification. We propose a new algorithm BICE, which combines these three techniques. We conduct extensive experiments on real-world datasets as well as synthetic datasets to evaluate the effectiveness of the techniques. Experiments show that our approach outperforms the fastest existing subgraph search algorithm by up to two orders of magnitude in terms of elapsed time to process a query. Our approach also outperforms state-of-the-art subgraph matching algorithms by up to two orders of magnitude.
子图匹配是在数据图中搜索查询图的所有嵌入的问题,而子图查询处理(也称为子图搜索)是将包含查询图的所有数据图作为子图查找。为了找到解决这两个问题的切实可行的办法,已经进行了广泛的研究。然而,现有的解决方案由于在搜索中大量不必要的计算,仍然显示出有限的查询处理时间。在本文中,我们主要通过使用三种技术来探索尽可能紧凑的搜索空间:(1)通过二部匹配进行修剪,(2)通过具有二部匹配的失败集进行修剪,以及(3)单元范围验证。我们提出了一种新的算法BICE,将这三种技术结合起来。我们对真实世界的数据集以及合成数据集进行了广泛的实验,以评估这些技术的有效性。实验表明,我们的方法在处理查询的运行时间方面比现有最快的子图搜索算法高出两个数量级。我们的方法也比最先进的子图匹配算法高出两个数量级。
{"title":"BICE: Exploring Compact Search Space by Using Bipartite Matching and Cell-Wide Verification","authors":"Yunyoung Choi, Kunsoo Park, Hyunjoon Kim","doi":"10.14778/3598581.3598591","DOIUrl":"https://doi.org/10.14778/3598581.3598591","url":null,"abstract":"Subgraph matching is the problem of searching for all embeddings of a query graph in a data graph, and subgraph query processing (also known as subgraph search) is to find all the data graphs that contain a query graph as subgraphs. Extensive research has been done to develop practical solutions for both problems. However, the existing solutions still show limited query processing time due to a lot of unnecessary computations in search. In this paper, we focus on exploring as compact search space as possible by using three techniques: (1) pruning by bipartite matching, (2) pruning by failing sets with bipartite matching, and (3) cell-wide verification. We propose a new algorithm BICE, which combines these three techniques. We conduct extensive experiments on real-world datasets as well as synthetic datasets to evaluate the effectiveness of the techniques. Experiments show that our approach outperforms the fastest existing subgraph search algorithm by up to two orders of magnitude in terms of elapsed time to process a query. Our approach also outperforms state-of-the-art subgraph matching algorithms by up to two orders of magnitude.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"19 1","pages":"2186-2198"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83800896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Indexing for Long Patterns: Anchors are All you Need 长模式的文本索引:锚是你所需要的
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598586
Lorraine A. K. Ayad, G. Loukides, S. Pissis
In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such string datasets in a compact form but also to simultaneously enable fast pattern matching queries. This is the classic text indexing problem. The four absolute measures anyone should pay attention to when designing or implementing a text index are: (i) index space; (ii) query time; (iii) construction space; and (iv) construction time. Unfortunately, however, most (if not all) widely-used indexes (e.g., suffix tree, suffix array, or their compressed counterparts) are not optimized for all four measures simultaneously, as it is difficult to have the best of all four worlds. Here, we take an important step in this direction by showing that text indexing with locally consistent anchors (lc-anchors) offers remarkably good performance in all four measures, when we have at hand a lower bound l on the length of the queried patterns --- which is arguably a quite reasonable assumption in practical applications. Specifically, we improve on the construction of the index proposed by Loukides and Pissis, which is based on bidirectional string anchors (bd-anchors), a new type of lc-anchors, by: (i) designing an average-case linear-time algorithm to compute bd-anchors; and (ii) developing a semi-external-memory implementation to construct the index in small space using near-optimal work. We then present an extensive experimental evaluation, based on the four measures, using real benchmark datasets. The results show that, for long patterns, the index constructed using our improved algorithms compares favorably to all classic indexes: (compressed) suffix tree; (compressed) suffix array; and the FM-index.
在许多现实世界的数据库系统中,很大一部分数据由字符串表示:一些字母表上的字母序列。这是因为字符串可以很容易地编码来自不同来源的数据。以紧凑的形式表示这样的字符串数据集通常很重要,但同时也要启用快速模式匹配查询。这是典型的文本索引问题。任何人在设计或实现文本索引时都应该注意的四个绝对措施是:(i)索引空间;(ii)查询时间;(三)建筑空间;(四)施工时间。然而,不幸的是,大多数(如果不是全部的话)广泛使用的索引(例如,后缀树、后缀数组或它们的压缩对应项)并没有同时针对所有四种度量进行优化,因为很难做到所有四种度量的最佳。在这里,我们向这个方向迈出了重要的一步,通过展示使用本地一致锚点(lc-锚点)的文本索引在所有四个度量中都提供了非常好的性能,当我们有查询模式长度的下界l时——这在实际应用中可以说是一个相当合理的假设。具体来说,我们改进了Loukides和Pissis提出的基于双向字符串锚点(bd-anchor)的索引构建方法,这是一种新型的lc锚点,我们设计了一种计算bd-anchor的平均情况线性时间算法;(ii)开发一种半外部内存实现,使用接近最优的工作在小空间中构建索引。然后,我们提出了一个广泛的实验评估,基于四个措施,使用真实的基准数据集。结果表明,对于长模式,使用改进算法构建的索引优于所有经典索引:(压缩)后缀树;(压缩)后缀数组;以及fm指数。
{"title":"Text Indexing for Long Patterns: Anchors are All you Need","authors":"Lorraine A. K. Ayad, G. Loukides, S. Pissis","doi":"10.14778/3598581.3598586","DOIUrl":"https://doi.org/10.14778/3598581.3598586","url":null,"abstract":"\u0000 In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such string datasets in a compact form but also to\u0000 simultaneously\u0000 enable fast pattern matching queries. This is the classic text indexing problem. The four absolute measures anyone should pay attention to when designing or implementing a text index are:\u0000 (i)\u0000 index space;\u0000 (ii)\u0000 query time;\u0000 (iii)\u0000 construction space; and\u0000 (iv)\u0000 construction time. Unfortunately, however, most (if not all) widely-used indexes (e.g., suffix tree, suffix array, or their compressed counterparts) are not optimized for all four measures simultaneously, as it is difficult to have the best of all four worlds. Here, we take an important step in this direction by showing that text indexing with locally consistent anchors (lc-anchors) offers remarkably good performance in all four measures, when we have at hand a lower bound\u0000 l\u0000 on the length of the queried patterns --- which is arguably a quite reasonable assumption in practical applications. Specifically, we improve on the construction of the index proposed by Loukides and Pissis, which is based on bidirectional string anchors (bd-anchors), a new type of lc-anchors, by:\u0000 (i)\u0000 designing an average-case linear-time algorithm to compute bd-anchors; and\u0000 (ii)\u0000 developing a semi-external-memory implementation to construct the index in small space using near-optimal work. We then present an extensive experimental evaluation, based on the four measures, using real benchmark datasets. The results show that, for long patterns, the index constructed using our improved algorithms compares favorably to all classic indexes: (compressed) suffix tree; (compressed) suffix array; and the FM-index.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"34 1","pages":"2117-2131"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91242008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Maximal D-truss Search in Dynamic Directed Graphs 动态有向图中的最大d -桁架搜索
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598592
Anxin Tian, Alexander Zhou, Yue Wang, Lei Chen
Community search (CS) aims at personalized subgraph discovery which is the key to understanding the organisation of many real-world networks. CS in undirected networks has attracted significant attention from researchers, including many solutions for various cohesive subgraph structures and for different levels of dynamism with edge insertions and deletions, while they are much less considered for directed graphs. In this paper, we propose incremental solutions of CS based on the D-truss in dynamic directed graphs, where the D-truss is a cohesive subgraph structure defined based on two types of triangles in directed graphs. We first analyze the theoretical boundedness of D-truss given edge insertions and deletions, then we present basic single-update algorithms. To improve the efficiency, we propose an order-based D-Index, associated batch-update algorithms and a fully-dynamic query algorithm. Our extensive experiments on real-world graphs show that our proposed solution achieves a significant speedup compared to the SOTA solution, the scalability over updates is also verified.
社区搜索(CS)的目标是个性化的子图发现,这是理解许多现实世界网络组织的关键。无向网络中的CS引起了研究人员的极大关注,包括许多针对各种内聚子图结构和不同水平的动态边缘插入和删除的解决方案,而有向图中的CS则很少被考虑。本文提出了基于动态有向图中d -桁架的CS增量解,其中d -桁架是基于有向图中两类三角形定义的内聚子图结构。首先分析了给定边缘插入和删除的d -桁架的理论有界性,然后给出了基本的单次更新算法。为了提高效率,我们提出了基于顺序的D-Index、相关的批量更新算法和全动态查询算法。我们在真实图形上的大量实验表明,与SOTA解决方案相比,我们提出的解决方案实现了显着的加速,并且还验证了更新的可扩展性。
{"title":"Maximal D-truss Search in Dynamic Directed Graphs","authors":"Anxin Tian, Alexander Zhou, Yue Wang, Lei Chen","doi":"10.14778/3598581.3598592","DOIUrl":"https://doi.org/10.14778/3598581.3598592","url":null,"abstract":"Community search (CS) aims at personalized subgraph discovery which is the key to understanding the organisation of many real-world networks. CS in undirected networks has attracted significant attention from researchers, including many solutions for various cohesive subgraph structures and for different levels of dynamism with edge insertions and deletions, while they are much less considered for directed graphs. In this paper, we propose incremental solutions of CS based on the D-truss in dynamic directed graphs, where the D-truss is a cohesive subgraph structure defined based on two types of triangles in directed graphs. We first analyze the theoretical boundedness of D-truss given edge insertions and deletions, then we present basic single-update algorithms. To improve the efficiency, we propose an order-based D-Index, associated batch-update algorithms and a fully-dynamic query algorithm. Our extensive experiments on real-world graphs show that our proposed solution achieves a significant speedup compared to the SOTA solution, the scalability over updates is also verified.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"46 1","pages":"2199-2211"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87330597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Pando: Enhanced Data Skipping with Logical Data Partitioning Pando:增强数据跳跃与逻辑数据分区
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598601
Sivaprasad Sudhir, Wenbo Tao, N. Laptev, Cyrille Habis, Michael J. Cafarella, S. Madden
With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.
对于大量的数据,快速检索与查询相关的数据对于实现高性能至关重要。现代基于云的数据库系统经常将数据划分为块,并使用各种技术在查询执行期间跳过不相关的块。已经提出了几种算法(通常基于在数据上运行的查询工作负载的历史属性)来调优数据的物理布局,以减少访问的块数量。这些跳过块的方法的有效性取决于存储的元数据以及物理数据布局与查询的对齐程度。现有的自动物理数据库设计工作错过了跳过块的重要机会,因为它忽略了工作负载中显示强烈相关结果的逻辑谓词。在本文中,我们提出了Pando,它通过使用关联感知逻辑分区通知物理布局决策,从而比过去的方法实现更好的块跳转。在一系列基准测试和实际工作负载中,Pando的扫描块数量减少了2.8倍,端到端查询执行时间加快了2.3倍。
{"title":"Pando: Enhanced Data Skipping with Logical Data Partitioning","authors":"Sivaprasad Sudhir, Wenbo Tao, N. Laptev, Cyrille Habis, Michael J. Cafarella, S. Madden","doi":"10.14778/3598581.3598601","DOIUrl":"https://doi.org/10.14778/3598581.3598601","url":null,"abstract":"With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"20 1","pages":"2316-2329"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75873552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WiscSort: External Sorting For Byte-Addressable Storage wisscsort:字节可寻址存储的外部排序
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598585
Vinay Banakar, Kan Wu, Yuvraj Patel, K. Keeton, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
We present WiscSort, a new approach to high-performance concurrent sorting for existing and future byte-addressable storage (BAS) devices. WiscSort carefully reduces writes, exploits random reads by splitting keys and values during sorting, and performs interference-aware scheduling with thread pool sizing to avoid I/O bandwidth degradation. We introduce the BRAID model which encompasses the unique characteristics of BAS devices. Many state-of-the-art sorting systems do not comply with the BRAID model and deliver sub-optimal performance, whereas WiscSort demonstrates the effectiveness of complying with BRAID. We show that WiscSort is 2-7 x faster than competing approaches on a standard sort benchmark. We evaluate the effectiveness of key-value separation on different key-value sizes and compare our concurrency optimizations with various other concurrency models. Finally, we emulate generic BAS devices and show how our techniques perform well with various combinations of hardware properties.
我们提出了wissort,一种用于现有和未来的字节可寻址存储(BAS)设备的高性能并发排序的新方法。wisscsort仔细地减少了写入,在排序过程中通过拆分键和值来利用随机读取,并通过线程池大小执行干扰感知调度,以避免I/O带宽退化。我们介绍了包含BAS器件独特特性的BRAID模型。许多最先进的分类系统不符合BRAID模型并提供次优性能,而wissort显示了符合BRAID的有效性。在标准排序基准测试中,wissort比竞争方法快2-7倍。我们在不同的键值大小上评估键值分离的有效性,并将我们的并发优化与其他各种并发模型进行比较。最后,我们模拟了通用的BAS设备,并展示了我们的技术如何在各种硬件属性组合中表现良好。
{"title":"WiscSort: External Sorting For Byte-Addressable Storage","authors":"Vinay Banakar, Kan Wu, Yuvraj Patel, K. Keeton, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau","doi":"10.14778/3598581.3598585","DOIUrl":"https://doi.org/10.14778/3598581.3598585","url":null,"abstract":"We present WiscSort, a new approach to high-performance concurrent sorting for existing and future byte-addressable storage (BAS) devices. WiscSort carefully reduces writes, exploits random reads by splitting keys and values during sorting, and performs interference-aware scheduling with thread pool sizing to avoid I/O bandwidth degradation. We introduce the BRAID model which encompasses the unique characteristics of BAS devices. Many state-of-the-art sorting systems do not comply with the BRAID model and deliver sub-optimal performance, whereas WiscSort demonstrates the effectiveness of complying with BRAID. We show that WiscSort is 2-7 x faster than competing approaches on a standard sort benchmark. We evaluate the effectiveness of key-value separation on different key-value sizes and compare our concurrency optimizations with various other concurrency models. Finally, we emulate generic BAS devices and show how our techniques perform well with various combinations of hardware properties.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"125 1","pages":"2103-2116"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90222264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Opportunities for Quantum Acceleration of Databases: Optimization of Queries and Transaction Schedules 数据库量子加速的机会:查询和事务调度的优化
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598603
Umut Çalikyilmaz, Sven Groppe, Jinghua Groppe, Tobias Winker, S. Prestel, Farida Shagieva, Daanish Arya, F. Preis, L. Gruenwald
The capabilities of quantum computers, such as the number of supported qubits and maximum circuit depth, have grown exponentially in recent years. Commercially relevant applications that take advantage of quantum computing are expected to be available soon. In this paper, we shed light on the possibilities of accelerating database tasks using quantum computing with examples of optimizing queries and transaction schedules and present some open challenges for future studies in the field.
近年来,量子计算机的能力,如支持的量子比特数量和最大电路深度,呈指数级增长。利用量子计算的商业相关应用预计很快就会出现。在本文中,我们通过优化查询和事务调度的例子阐明了使用量子计算加速数据库任务的可能性,并提出了该领域未来研究的一些开放挑战。
{"title":"Opportunities for Quantum Acceleration of Databases: Optimization of Queries and Transaction Schedules","authors":"Umut Çalikyilmaz, Sven Groppe, Jinghua Groppe, Tobias Winker, S. Prestel, Farida Shagieva, Daanish Arya, F. Preis, L. Gruenwald","doi":"10.14778/3598581.3598603","DOIUrl":"https://doi.org/10.14778/3598581.3598603","url":null,"abstract":"The capabilities of quantum computers, such as the number of supported qubits and maximum circuit depth, have grown exponentially in recent years. Commercially relevant applications that take advantage of quantum computing are expected to be available soon. In this paper, we shed light on the possibilities of accelerating database tasks using quantum computing with examples of optimizing queries and transaction schedules and present some open challenges for future studies in the field.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"33 1","pages":"2344-2353"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78227965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SEIDEN: Revisiting Query Processing in Video Database Systems 视频数据库系统中的查询处理
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598599
J. Bang, Gaurav Tarlok Kakkar, Pramod Chunduri, Subrata Mitra, Joy Arulraj
State-of-the-art video database management systems (VDBMSs) often use lightweight proxy models to accelerate object retrieval and aggregate queries. The key assumption underlying these systems is that the proxy model is an order of magnitude faster than the heavyweight oracle model. However, recent advances in computer vision have invalidated this assumption. Inference time of recently proposed oracle models is on par with or even lower than the proxy models used in state-of-the-art (SoTA) VDBMSs. This paper presents Seiden, a VDBMS that leverages this radical shift in the runtime gap between the oracle and proxy models. Instead of relying on a proxy model, Seiden directly applies the oracle model over a subset of frames to build a query-agnostic index, and samples additional frames to answer the query using an exploration-exploitation scheme during query processing. By leveraging the temporal continuity of the video and the output of the oracle model on the sampled frames, Seiden delivers faster query processing and better query accuracy than SoTA VDBMSs. Our empirical evaluation shows that Seiden is on average 6.6 x faster than SoTA VDBMSs across diverse queries and datasets.
最新的视频数据库管理系统(vdbms)通常使用轻量级代理模型来加速对象检索和聚合查询。这些系统的关键假设是代理模型比重量级oracle模型快一个数量级。然而,计算机视觉的最新进展已经推翻了这一假设。最近提出的oracle模型的推理时间与最先进的(SoTA) vdbms中使用的代理模型相当,甚至更低。本文介绍了Seiden,一种VDBMS,它利用了oracle和代理模型之间运行时差距的这种根本性转变。Seiden没有依赖代理模型,而是直接在框架子集上应用oracle模型来构建与查询无关的索引,并在查询处理期间使用探索利用方案对其他框架进行采样以回答查询。通过利用视频的时间连续性和采样帧上oracle模型的输出,Seiden提供了比SoTA vdbms更快的查询处理和更好的查询准确性。我们的经验评估表明,在不同的查询和数据集上,Seiden比SoTA vdbms平均快6.6倍。
{"title":"SEIDEN: Revisiting Query Processing in Video Database Systems","authors":"J. Bang, Gaurav Tarlok Kakkar, Pramod Chunduri, Subrata Mitra, Joy Arulraj","doi":"10.14778/3598581.3598599","DOIUrl":"https://doi.org/10.14778/3598581.3598599","url":null,"abstract":"State-of-the-art video database management systems (VDBMSs) often use lightweight proxy models to accelerate object retrieval and aggregate queries. The key assumption underlying these systems is that the proxy model is an order of magnitude faster than the heavyweight oracle model. However, recent advances in computer vision have invalidated this assumption. Inference time of recently proposed oracle models is on par with or even lower than the proxy models used in state-of-the-art (SoTA) VDBMSs. This paper presents Seiden, a VDBMS that leverages this radical shift in the runtime gap between the oracle and proxy models. Instead of relying on a proxy model, Seiden directly applies the oracle model over a subset of frames to build a query-agnostic index, and samples additional frames to answer the query using an exploration-exploitation scheme during query processing. By leveraging the temporal continuity of the video and the output of the oracle model on the sampled frames, Seiden delivers faster query processing and better query accuracy than SoTA VDBMSs. Our empirical evaluation shows that Seiden is on average 6.6 x faster than SoTA VDBMSs across diverse queries and datasets.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"85 1","pages":"2289-2301"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75829131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
VeriBench: Analyzing the Performance of Database Systems with Verifiability VeriBench:分析具有可验证性的数据库系统的性能
Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598588
Cong Yue, Meihui Zhang, Changhao Zhu, Gang Chen, Dumitrel Loghin, B. Ooi
Database systems are paying more attention to data security in recent years. Immutable systems such as blockchains, verifiable databases, and ledger databases are equipped with various verifiability mechanisms to protect data. Such systems often adopt different threat models, and techniques, therefore, have different performance implications compared to traditional database systems. So far, there is no uniform benchmarking tool for evaluating the performance of these systems, especially at the level of verification functions. In this paper, we first survey the design space of the verifiability-enabled database systems along five dimensions: threat model, authenticated data structure (ADS), query processing, verification, and auditing. Based on this survey, we design and implement VeriBench, a benchmark framework for verifiability-enabled database systems. VeriBench enables a fair comparison of systems designed with different underlying technologies that share the client-side verification scheme, and focuses on design space exploration to provide a deeper understanding of different system design choices. VeriBench incorporates micro- and macro-benchmarks to provide a comprehensive evaluation. Further, VeriBench is designed to enable easy extension for benchmarking new systems and workloads. We run VeriBench to conduct a comprehensive analysis of state-of-the-art systems comprising blockchains, ledger databases, and log transparency technologies. The results expose the weaknesses and strengths of each underlying design choice, and the insights should serve as guidance for future development.
近年来,数据库系统越来越重视数据安全问题。不可变系统,如区块链、可验证数据库和分类账数据库,配备了各种可验证机制来保护数据。这样的系统通常采用不同的威胁模型,因此,与传统数据库系统相比,技术具有不同的性能含义。到目前为止,还没有统一的基准测试工具来评估这些系统的性能,特别是在验证功能的级别上。在本文中,我们首先从威胁模型、身份验证数据结构(ADS)、查询处理、验证和审计五个方面调查了支持可验证性的数据库系统的设计空间。基于这项调查,我们设计并实现了VeriBench,这是一个可验证数据库系统的基准框架。VeriBench能够对使用共享客户端验证方案的不同底层技术设计的系统进行公平比较,并专注于设计空间探索,以提供对不同系统设计选择的更深入理解。VeriBench结合了微观和宏观基准,以提供全面的评估。此外,VeriBench旨在为新系统和工作负载的基准测试提供方便的扩展。我们运行VeriBench对最先进的系统进行全面分析,包括区块链、分类账数据库和日志透明技术。结果揭示了每个底层设计选择的弱点和优势,这些见解应该作为未来开发的指导。
{"title":"VeriBench: Analyzing the Performance of Database Systems with Verifiability","authors":"Cong Yue, Meihui Zhang, Changhao Zhu, Gang Chen, Dumitrel Loghin, B. Ooi","doi":"10.14778/3598581.3598588","DOIUrl":"https://doi.org/10.14778/3598581.3598588","url":null,"abstract":"\u0000 Database systems are paying more attention to data security in recent years. Immutable systems such as blockchains, verifiable databases, and ledger databases are equipped with various verifiability mechanisms to protect data. Such systems often adopt different threat models, and techniques, therefore, have different performance implications compared to traditional database systems. So far, there is no uniform benchmarking tool for evaluating the performance of these systems, especially at the level of verification functions. In this paper, we first survey the design space of the\u0000 verifiability-enabled database systems\u0000 along five dimensions: threat model, authenticated data structure (ADS), query processing, verification, and auditing. Based on this survey, we design and implement VeriBench, a benchmark framework for\u0000 verifiability-enabled database systems.\u0000 VeriBench enables a fair comparison of systems designed with different underlying technologies that share the client-side verification scheme, and focuses on design space exploration to provide a deeper understanding of different system design choices. VeriBench incorporates micro- and macro-benchmarks to provide a comprehensive evaluation. Further, VeriBench is designed to enable easy extension for benchmarking new systems and workloads. We run VeriBench to conduct a comprehensive analysis of state-of-the-art systems comprising blockchains, ledger databases, and log transparency technologies. The results expose the weaknesses and strengths of each underlying design choice, and the insights should serve as guidance for future development.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"10 1","pages":"2145-2157"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78529444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proc. VLDB Endow.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1