Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management最新文献

英文中文

Research challenges in deep reinforcement learning-based join query optimization 基于深度强化学习的连接查询优化研究挑战

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Pub Date : 2020-06-14 DOI: 10.1145/3401071.3401657

R. Guo, Khuzaima S. Daudjee

The order in which relations are joined and the physical join operators used are two aspects of query plans which have a significant impact on the execution latency of join queries. However, the set of valid query plans grows exponentially with the number of relations to be joined. Hence, it becomes computationally expensive to enumerate all such plans for a complex join query. Recently, several deep reinforcement learning (DRL) based approaches propose using neural networks to construct a query plan. They demonstrate that efficient query plans can be found without exhaustively enumerating the search space. We integrated our implementation of a DRL-based solution to optimize join order and operators into the PostgreSQL query optimizer. In practice, we found limitations in the quality of the query plans chosen which are not addressed in existing approaches. In this paper we highlight some of these limitations and propose future research challenges along with potential solutions.

连接关系的顺序和使用的物理连接操作符是查询计划的两个方面，这两个方面对连接查询的执行延迟有重大影响。但是，有效查询计划的集合随着要连接的关系的数量呈指数增长。因此，为一个复杂的连接查询枚举所有这样的计划在计算上是非常昂贵的。近年来，一些基于深度强化学习(DRL)的方法提出使用神经网络来构建查询计划。它们表明，不需要详尽地枚举搜索空间就可以找到有效的查询计划。我们将基于drl的解决方案集成到PostgreSQL查询优化器中，以优化连接顺序和操作符。在实践中，我们发现所选择的查询计划的质量存在限制，而这些限制在现有方法中没有得到解决。在本文中，我们强调了这些局限性，并提出了未来的研究挑战以及潜在的解决方案。

引用次数: 13

Bandit join: preliminary results 土匪加入:初步结果

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Pub Date : 2020-06-14 DOI: 10.1145/3401071.3401655

Vahid Ghadakchi, Mian Xie, Arash Termehchy

Join is arguably the most costly and frequently used operation in relational query processing. Join algorithms usually spend the majority of their time on scanning and attempting to join the parts of the base relations that do not satisfy the join condition and do not generate any results. This causes slow response time, particularly, in interactive and exploratory environments where users would like real-time performance. In this paper, we outline our vision on using online learning and adaptation to execute joins efficiently. In our approach, scan operators that precede a join, learn which parts of the relations are more likely to join during the query execution and produce more results faster by doing fewer I/O accesses. Our empirical studies using standard benchmarks indicate that this approach outperforms similar methods considerably.

连接无疑是关系查询处理中成本最高、使用最频繁的操作。连接算法通常将大部分时间花在扫描和尝试连接不满足连接条件且不生成任何结果的基本关系部分上。这会导致响应时间变慢，特别是在用户希望实时性能的交互式和探索性环境中。在本文中，我们概述了我们使用在线学习和适应来有效执行连接的愿景。在我们的方法中，扫描连接之前的操作符，了解在查询执行期间关系的哪些部分更有可能连接，并通过执行更少的I/O访问来更快地生成更多结果。我们使用标准基准进行的实证研究表明，这种方法的性能大大优于类似的方法。

引用次数: 4

Best of both worlds: combining traditional and machine learning models for cardinality estimation 两全其美:结合传统和机器学习模型进行基数估计

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Pub Date : 2020-06-14 DOI: 10.1145/3401071.3401658

Lucas Woltmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner

Cardinality estimation is a high-profile technique in database management systems with a serious impact on query performance. Thus, a lot of traditional approaches such as histograms-based or sampling-based methods have been developed over the last decades. With the advance of Machine Learning (ML) into the database world, cardinality estimation profits from several methods improving its quality as shown in different recent papers. However, neither an ML model nor a traditional approach meets all requirements for cardinality estimation, so that a one size fits all approach is difficult to imagine. For that reason, we advocate a better interlacing of ML models and traditional approaches for cardinality estimation and thoroughly consider their potential, advantages, and disadvantages in this paper. We start by proposing a classification of different estimation techniques and their usability for cardinality estimation. Then, we motivate a novel hybrid approach as the core proof of concept of this paper which uses the best of both worlds: ML models and the proven histogram approach. For this, we show in which cases it is beneficial to use ML models or when we can trust the traditional estimators. We evaluate our hybrid approach on two real-world data sets and conclude what can be done to improve the coexistence of traditional and ML approaches in DBMS. With all our proposals, we use ML to improve DBMS without abandoning years of valuable research in cardinality estimation.

基数估计是数据库管理系统中备受关注的一项技术，对查询性能有严重影响。因此，在过去的几十年里，许多传统的方法，如基于直方图或基于抽样的方法已经被开发出来。随着机器学习(ML)进入数据库领域，基数估计受益于几种提高其质量的方法，如最近不同的论文所示。然而，ML模型和传统方法都不能满足基数估计的所有要求，因此很难想象一刀切的方法。出于这个原因，我们提倡将ML模型和传统的基数估计方法更好地结合起来，并在本文中全面考虑它们的潜力、优点和缺点。我们首先提出了不同估计技术的分类及其对基数估计的可用性。然后，我们激发了一种新的混合方法作为本文的核心概念证明，它使用了两个世界的最佳方法:ML模型和经过验证的直方图方法。为此，我们展示了在哪些情况下使用ML模型是有益的，或者什么时候我们可以信任传统的估计器。我们在两个真实世界的数据集上评估了我们的混合方法，并得出结论，可以做些什么来改善DBMS中传统方法和ML方法的共存。在我们所有的建议中，我们使用ML来改进DBMS，而不放弃多年来在基数估计方面的有价值的研究。

{"title":"Best of both worlds: combining traditional and machine learning models for cardinality estimation","authors":"Lucas Woltmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner","doi":"10.1145/3401071.3401658","DOIUrl":"https://doi.org/10.1145/3401071.3401658","url":null,"abstract":"Cardinality estimation is a high-profile technique in database management systems with a serious impact on query performance. Thus, a lot of traditional approaches such as histograms-based or sampling-based methods have been developed over the last decades. With the advance of Machine Learning (ML) into the database world, cardinality estimation profits from several methods improving its quality as shown in different recent papers. However, neither an ML model nor a traditional approach meets all requirements for cardinality estimation, so that a one size fits all approach is difficult to imagine. For that reason, we advocate a better interlacing of ML models and traditional approaches for cardinality estimation and thoroughly consider their potential, advantages, and disadvantages in this paper. We start by proposing a classification of different estimation techniques and their usability for cardinality estimation. Then, we motivate a novel hybrid approach as the core proof of concept of this paper which uses the best of both worlds: ML models and the proven histogram approach. For this, we show in which cases it is beneficial to use ML models or when we can trust the traditional estimators. We evaluate our hybrid approach on two real-world data sets and conclude what can be done to improve the coexistence of traditional and ML approaches in DBMS. With all our proposals, we use ML to improve DBMS without abandoning years of valuable research in cardinality estimation.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131830698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

PartLy 部分

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Pub Date : 2020-06-14 DOI: 10.1145/3401071.3401660

A. S. Abdelhamid, Walid G. Aref

Data partitioning plays a critical role in data stream processing. Current data partitioning techniques use simple, static heuristics that do not incorporate feedback about the quality of the partitioning decision (i.e., fire and forget strategy). Hence, the data partitioner often repeatedly chooses the same decision. In this paper, we argue that reinforcement learning techniques can be applied to address this problem. The use of artificial neural networks can facilitate learning of efficient partitioning policies. We identify the challenges that emerge when applying machine learning techniques to the data partitioning problem for distributed data stream processing. Furthermore, we introduce PartLy, a proof-of-concept data partitioner, and present preliminary results that indicate PartLy's potential to match the performance of state-of-the-art techniques in terms of partitioning quality, while minimizing storage and processing overheads.

引用次数: 3

Automated tuning of query degree of parallelism via machine learning 通过机器学习自动调优查询并行度

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Pub Date : 2020-06-14 DOI: 10.1145/3401071.3401656

Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi

Determining the degree of parallelism (DOP) for query execution is of great importance to both performance and resource provisioning. However, recent work that applies machine learning (ML) to query optimization and query performance prediction in relational database management systems (RDBMSs) has ignored the effect of intra-query parallelism. In this work, we argue that determining the optimal or near-optimal DOP for query execution is a fundamental and challenging task that benefits both query performance and cost-benefit tradeoffs. We then present promising preliminary results on how ML techniques can be applied to automate DOP tuning. We conclude with a list of challenges we encountered, as well as future directions for our work.

确定查询执行的并行度(DOP)对于性能和资源供应都非常重要。然而，最近将机器学习(ML)应用于关系数据库管理系统(rdbms)中的查询优化和查询性能预测的工作忽略了查询内并行性的影响。在这项工作中，我们认为确定查询执行的最优或接近最优DOP是一项基本且具有挑战性的任务，它有利于查询性能和成本效益权衡。然后，我们就如何将ML技术应用于自动DOP调优提出了有希望的初步结果。最后，我们列出了我们遇到的挑战，以及我们未来的工作方向。

引用次数: 9

RadixSpline: a single-pass learned index RadixSpline:单次学习索引

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Pub Date : 2020-04-30 DOI: 10.1145/3401071.3401659

Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, A. Kemper, Tim Kraska, Thomas Neumann

Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing learned structures are often cumbersome to implement and are slow to build. In fact, most approaches that we are aware of require multiple training passes over the data. We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data and is competitive with state-of-the-art learned index models, like RMI, in size and lookup performance. We evaluate RS using the SOSD benchmark and show that it achieves competitive results on all datasets, despite the fact that it only has two parameters.

最近的研究表明，学习模型在大小和查找性能方面优于最先进的索引结构。虽然这是一个非常有希望的结果，但现有的学习结构通常难以实现且构建缓慢。事实上，我们所知道的大多数方法都需要对数据进行多次训练。我们介绍RadixSpline (RS)，这是一种学习索引，可以在一次数据传递中构建，并且在大小和查找性能方面与最先进的学习索引模型(如RMI)竞争。我们使用SOSD基准评估RS，并表明它在所有数据集上都取得了具有竞争力的结果，尽管它只有两个参数。

引用次数: 112

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management 第三届利用人工智能技术进行数据管理国际研讨会论文集

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Pub Date : 1900-01-01 DOI: 10.1145/3401071

引用次数: 1

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀