首页 > 最新文献

Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management最新文献

英文 中文
Research challenges in deep reinforcement learning-based join query optimization 基于深度强化学习的连接查询优化研究挑战
R. Guo, Khuzaima S. Daudjee
The order in which relations are joined and the physical join operators used are two aspects of query plans which have a significant impact on the execution latency of join queries. However, the set of valid query plans grows exponentially with the number of relations to be joined. Hence, it becomes computationally expensive to enumerate all such plans for a complex join query. Recently, several deep reinforcement learning (DRL) based approaches propose using neural networks to construct a query plan. They demonstrate that efficient query plans can be found without exhaustively enumerating the search space. We integrated our implementation of a DRL-based solution to optimize join order and operators into the PostgreSQL query optimizer. In practice, we found limitations in the quality of the query plans chosen which are not addressed in existing approaches. In this paper we highlight some of these limitations and propose future research challenges along with potential solutions.
连接关系的顺序和使用的物理连接操作符是查询计划的两个方面,这两个方面对连接查询的执行延迟有重大影响。但是,有效查询计划的集合随着要连接的关系的数量呈指数增长。因此,为一个复杂的连接查询枚举所有这样的计划在计算上是非常昂贵的。近年来,一些基于深度强化学习(DRL)的方法提出使用神经网络来构建查询计划。它们表明,不需要详尽地枚举搜索空间就可以找到有效的查询计划。我们将基于drl的解决方案集成到PostgreSQL查询优化器中,以优化连接顺序和操作符。在实践中,我们发现所选择的查询计划的质量存在限制,而这些限制在现有方法中没有得到解决。在本文中,我们强调了这些局限性,并提出了未来的研究挑战以及潜在的解决方案。
{"title":"Research challenges in deep reinforcement learning-based join query optimization","authors":"R. Guo, Khuzaima S. Daudjee","doi":"10.1145/3401071.3401657","DOIUrl":"https://doi.org/10.1145/3401071.3401657","url":null,"abstract":"The order in which relations are joined and the physical join operators used are two aspects of query plans which have a significant impact on the execution latency of join queries. However, the set of valid query plans grows exponentially with the number of relations to be joined. Hence, it becomes computationally expensive to enumerate all such plans for a complex join query. Recently, several deep reinforcement learning (DRL) based approaches propose using neural networks to construct a query plan. They demonstrate that efficient query plans can be found without exhaustively enumerating the search space. We integrated our implementation of a DRL-based solution to optimize join order and operators into the PostgreSQL query optimizer. In practice, we found limitations in the quality of the query plans chosen which are not addressed in existing approaches. In this paper we highlight some of these limitations and propose future research challenges along with potential solutions.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116571805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Bandit join: preliminary results 土匪加入:初步结果
Vahid Ghadakchi, Mian Xie, Arash Termehchy
Join is arguably the most costly and frequently used operation in relational query processing. Join algorithms usually spend the majority of their time on scanning and attempting to join the parts of the base relations that do not satisfy the join condition and do not generate any results. This causes slow response time, particularly, in interactive and exploratory environments where users would like real-time performance. In this paper, we outline our vision on using online learning and adaptation to execute joins efficiently. In our approach, scan operators that precede a join, learn which parts of the relations are more likely to join during the query execution and produce more results faster by doing fewer I/O accesses. Our empirical studies using standard benchmarks indicate that this approach outperforms similar methods considerably.
连接无疑是关系查询处理中成本最高、使用最频繁的操作。连接算法通常将大部分时间花在扫描和尝试连接不满足连接条件且不生成任何结果的基本关系部分上。这会导致响应时间变慢,特别是在用户希望实时性能的交互式和探索性环境中。在本文中,我们概述了我们使用在线学习和适应来有效执行连接的愿景。在我们的方法中,扫描连接之前的操作符,了解在查询执行期间关系的哪些部分更有可能连接,并通过执行更少的I/O访问来更快地生成更多结果。我们使用标准基准进行的实证研究表明,这种方法的性能大大优于类似的方法。
{"title":"Bandit join: preliminary results","authors":"Vahid Ghadakchi, Mian Xie, Arash Termehchy","doi":"10.1145/3401071.3401655","DOIUrl":"https://doi.org/10.1145/3401071.3401655","url":null,"abstract":"Join is arguably the most costly and frequently used operation in relational query processing. Join algorithms usually spend the majority of their time on scanning and attempting to join the parts of the base relations that do not satisfy the join condition and do not generate any results. This causes slow response time, particularly, in interactive and exploratory environments where users would like real-time performance. In this paper, we outline our vision on using online learning and adaptation to execute joins efficiently. In our approach, scan operators that precede a join, learn which parts of the relations are more likely to join during the query execution and produce more results faster by doing fewer I/O accesses. Our empirical studies using standard benchmarks indicate that this approach outperforms similar methods considerably.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122966570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Best of both worlds: combining traditional and machine learning models for cardinality estimation 两全其美:结合传统和机器学习模型进行基数估计
Lucas Woltmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner
Cardinality estimation is a high-profile technique in database management systems with a serious impact on query performance. Thus, a lot of traditional approaches such as histograms-based or sampling-based methods have been developed over the last decades. With the advance of Machine Learning (ML) into the database world, cardinality estimation profits from several methods improving its quality as shown in different recent papers. However, neither an ML model nor a traditional approach meets all requirements for cardinality estimation, so that a one size fits all approach is difficult to imagine. For that reason, we advocate a better interlacing of ML models and traditional approaches for cardinality estimation and thoroughly consider their potential, advantages, and disadvantages in this paper. We start by proposing a classification of different estimation techniques and their usability for cardinality estimation. Then, we motivate a novel hybrid approach as the core proof of concept of this paper which uses the best of both worlds: ML models and the proven histogram approach. For this, we show in which cases it is beneficial to use ML models or when we can trust the traditional estimators. We evaluate our hybrid approach on two real-world data sets and conclude what can be done to improve the coexistence of traditional and ML approaches in DBMS. With all our proposals, we use ML to improve DBMS without abandoning years of valuable research in cardinality estimation.
基数估计是数据库管理系统中备受关注的一项技术,对查询性能有严重影响。因此,在过去的几十年里,许多传统的方法,如基于直方图或基于抽样的方法已经被开发出来。随着机器学习(ML)进入数据库领域,基数估计受益于几种提高其质量的方法,如最近不同的论文所示。然而,ML模型和传统方法都不能满足基数估计的所有要求,因此很难想象一刀切的方法。出于这个原因,我们提倡将ML模型和传统的基数估计方法更好地结合起来,并在本文中全面考虑它们的潜力、优点和缺点。我们首先提出了不同估计技术的分类及其对基数估计的可用性。然后,我们激发了一种新的混合方法作为本文的核心概念证明,它使用了两个世界的最佳方法:ML模型和经过验证的直方图方法。为此,我们展示了在哪些情况下使用ML模型是有益的,或者什么时候我们可以信任传统的估计器。我们在两个真实世界的数据集上评估了我们的混合方法,并得出结论,可以做些什么来改善DBMS中传统方法和ML方法的共存。在我们所有的建议中,我们使用ML来改进DBMS,而不放弃多年来在基数估计方面的有价值的研究。
{"title":"Best of both worlds: combining traditional and machine learning models for cardinality estimation","authors":"Lucas Woltmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner","doi":"10.1145/3401071.3401658","DOIUrl":"https://doi.org/10.1145/3401071.3401658","url":null,"abstract":"Cardinality estimation is a high-profile technique in database management systems with a serious impact on query performance. Thus, a lot of traditional approaches such as histograms-based or sampling-based methods have been developed over the last decades. With the advance of Machine Learning (ML) into the database world, cardinality estimation profits from several methods improving its quality as shown in different recent papers. However, neither an ML model nor a traditional approach meets all requirements for cardinality estimation, so that a one size fits all approach is difficult to imagine. For that reason, we advocate a better interlacing of ML models and traditional approaches for cardinality estimation and thoroughly consider their potential, advantages, and disadvantages in this paper. We start by proposing a classification of different estimation techniques and their usability for cardinality estimation. Then, we motivate a novel hybrid approach as the core proof of concept of this paper which uses the best of both worlds: ML models and the proven histogram approach. For this, we show in which cases it is beneficial to use ML models or when we can trust the traditional estimators. We evaluate our hybrid approach on two real-world data sets and conclude what can be done to improve the coexistence of traditional and ML approaches in DBMS. With all our proposals, we use ML to improve DBMS without abandoning years of valuable research in cardinality estimation.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131830698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
PartLy 部分
A. S. Abdelhamid, Walid G. Aref
Data partitioning plays a critical role in data stream processing. Current data partitioning techniques use simple, static heuristics that do not incorporate feedback about the quality of the partitioning decision (i.e., fire and forget strategy). Hence, the data partitioner often repeatedly chooses the same decision. In this paper, we argue that reinforcement learning techniques can be applied to address this problem. The use of artificial neural networks can facilitate learning of efficient partitioning policies. We identify the challenges that emerge when applying machine learning techniques to the data partitioning problem for distributed data stream processing. Furthermore, we introduce PartLy, a proof-of-concept data partitioner, and present preliminary results that indicate PartLy's potential to match the performance of state-of-the-art techniques in terms of partitioning quality, while minimizing storage and processing overheads.
{"title":"PartLy","authors":"A. S. Abdelhamid, Walid G. Aref","doi":"10.1145/3401071.3401660","DOIUrl":"https://doi.org/10.1145/3401071.3401660","url":null,"abstract":"Data partitioning plays a critical role in data stream processing. Current data partitioning techniques use simple, static heuristics that do not incorporate feedback about the quality of the partitioning decision (i.e., fire and forget strategy). Hence, the data partitioner often repeatedly chooses the same decision. In this paper, we argue that reinforcement learning techniques can be applied to address this problem. The use of artificial neural networks can facilitate learning of efficient partitioning policies. We identify the challenges that emerge when applying machine learning techniques to the data partitioning problem for distributed data stream processing. Furthermore, we introduce PartLy, a proof-of-concept data partitioner, and present preliminary results that indicate PartLy's potential to match the performance of state-of-the-art techniques in terms of partitioning quality, while minimizing storage and processing overheads.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128777163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automated tuning of query degree of parallelism via machine learning 通过机器学习自动调优查询并行度
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
Determining the degree of parallelism (DOP) for query execution is of great importance to both performance and resource provisioning. However, recent work that applies machine learning (ML) to query optimization and query performance prediction in relational database management systems (RDBMSs) has ignored the effect of intra-query parallelism. In this work, we argue that determining the optimal or near-optimal DOP for query execution is a fundamental and challenging task that benefits both query performance and cost-benefit tradeoffs. We then present promising preliminary results on how ML techniques can be applied to automate DOP tuning. We conclude with a list of challenges we encountered, as well as future directions for our work.
确定查询执行的并行度(DOP)对于性能和资源供应都非常重要。然而,最近将机器学习(ML)应用于关系数据库管理系统(rdbms)中的查询优化和查询性能预测的工作忽略了查询内并行性的影响。在这项工作中,我们认为确定查询执行的最优或接近最优DOP是一项基本且具有挑战性的任务,它有利于查询性能和成本效益权衡。然后,我们就如何将ML技术应用于自动DOP调优提出了有希望的初步结果。最后,我们列出了我们遇到的挑战,以及我们未来的工作方向。
{"title":"Automated tuning of query degree of parallelism via machine learning","authors":"Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi","doi":"10.1145/3401071.3401656","DOIUrl":"https://doi.org/10.1145/3401071.3401656","url":null,"abstract":"Determining the degree of parallelism (DOP) for query execution is of great importance to both performance and resource provisioning. However, recent work that applies machine learning (ML) to query optimization and query performance prediction in relational database management systems (RDBMSs) has ignored the effect of intra-query parallelism. In this work, we argue that determining the optimal or near-optimal DOP for query execution is a fundamental and challenging task that benefits both query performance and cost-benefit tradeoffs. We then present promising preliminary results on how ML techniques can be applied to automate DOP tuning. We conclude with a list of challenges we encountered, as well as future directions for our work.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125271733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
RadixSpline: a single-pass learned index RadixSpline:单次学习索引
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, A. Kemper, Tim Kraska, Thomas Neumann
Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing learned structures are often cumbersome to implement and are slow to build. In fact, most approaches that we are aware of require multiple training passes over the data. We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data and is competitive with state-of-the-art learned index models, like RMI, in size and lookup performance. We evaluate RS using the SOSD benchmark and show that it achieves competitive results on all datasets, despite the fact that it only has two parameters.
最近的研究表明,学习模型在大小和查找性能方面优于最先进的索引结构。虽然这是一个非常有希望的结果,但现有的学习结构通常难以实现且构建缓慢。事实上,我们所知道的大多数方法都需要对数据进行多次训练。我们介绍RadixSpline (RS),这是一种学习索引,可以在一次数据传递中构建,并且在大小和查找性能方面与最先进的学习索引模型(如RMI)竞争。我们使用SOSD基准评估RS,并表明它在所有数据集上都取得了具有竞争力的结果,尽管它只有两个参数。
{"title":"RadixSpline: a single-pass learned index","authors":"Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, A. Kemper, Tim Kraska, Thomas Neumann","doi":"10.1145/3401071.3401659","DOIUrl":"https://doi.org/10.1145/3401071.3401659","url":null,"abstract":"Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing learned structures are often cumbersome to implement and are slow to build. In fact, most approaches that we are aware of require multiple training passes over the data. We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data and is competitive with state-of-the-art learned index models, like RMI, in size and lookup performance. We evaluate RS using the SOSD benchmark and show that it achieves competitive results on all datasets, despite the fact that it only has two parameters.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131459467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management 第三届利用人工智能技术进行数据管理国际研讨会论文集
{"title":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","authors":"","doi":"10.1145/3401071","DOIUrl":"https://doi.org/10.1145/3401071","url":null,"abstract":"","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124782096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1