Zizhong Meng, Peizhi Wu, Gao Cong, Rong Zhu, Shuai Ma
{"title":"Unsupervised Selectivity Estimation by Integrating Gaussian Mixture Models and an Autoregressive Model","authors":"Zizhong Meng, Peizhi Wu, Gao Cong, Rong Zhu, Shuai Ma","doi":"10.48786/edbt.2022.13","DOIUrl":null,"url":null,"abstract":"Selectivity estimation is a fundamental database task, which has been studied for decades. A recent trend is to use deep learning methods for selectivity estimation. Deep autoregressive models have been reported to achieve excellent accuracy. However, if the relation has continuous attributes with large domain sizes, the search space of query inference on deep autoregressive models can be very large, resulting in inaccurate estimation and inefficient inference. To address this challenge, we propose a new model that integrates multiple Gaussian mixture models and a deep autoregressive model. On the one hand, Gaussian mixture models can fit the distribution of continuous attributes and reduce their domain sizes. On the other hand, deep autoregressive model can learn the joint data distribution with reduced domain attributes. In experiments, we compare with multiple baselines on 4 real-world datasets containing continuous attributes, and the experimental results demonstrate that our model can achieve up to 20 times higher accuracy than the second best estimators, while using less space and inference time.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"10 1","pages":"2:247-2:259"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2022.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Selectivity estimation is a fundamental database task, which has been studied for decades. A recent trend is to use deep learning methods for selectivity estimation. Deep autoregressive models have been reported to achieve excellent accuracy. However, if the relation has continuous attributes with large domain sizes, the search space of query inference on deep autoregressive models can be very large, resulting in inaccurate estimation and inefficient inference. To address this challenge, we propose a new model that integrates multiple Gaussian mixture models and a deep autoregressive model. On the one hand, Gaussian mixture models can fit the distribution of continuous attributes and reduce their domain sizes. On the other hand, deep autoregressive model can learn the joint data distribution with reduced domain attributes. In experiments, we compare with multiple baselines on 4 real-world datasets containing continuous attributes, and the experimental results demonstrate that our model can achieve up to 20 times higher accuracy than the second best estimators, while using less space and inference time.