{"title":"Multidimensional Cluster Sampling View on Large Databases for Approximate Query Processing","authors":"Tomohiro Inoue, A. Krishna, R. Gopalan","doi":"10.1109/EDOC.2015.24","DOIUrl":null,"url":null,"abstract":"Approximate query processing with relatively small random samples is an effective way to deal with many queries on large databases. However, small random samples might miss relevant records for highly selective queries due to insufficient coverage. A multidimensional index tree called the k-MDI was proposed as an effective sampling scheme for highly selective decision support queries. It has been shown to support a fast response time and high accuracy, whereas implementation of the k-MDI on database tables was not discussed. This paper proposes the Multidimensional Cluster Sampling View based on the k-MDI. The view can be implemented with ease using common database tables and can be manipulated by SQL statements. Furthermore, it is able to provide trustable approximate answers quickly for any query condition. The response time and accuracy of approximation are validated on a large dataset based on TPC-DS specifications.","PeriodicalId":112281,"journal":{"name":"2015 IEEE 19th International Enterprise Distributed Object Computing Conference","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 19th International Enterprise Distributed Object Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EDOC.2015.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Approximate query processing with relatively small random samples is an effective way to deal with many queries on large databases. However, small random samples might miss relevant records for highly selective queries due to insufficient coverage. A multidimensional index tree called the k-MDI was proposed as an effective sampling scheme for highly selective decision support queries. It has been shown to support a fast response time and high accuracy, whereas implementation of the k-MDI on database tables was not discussed. This paper proposes the Multidimensional Cluster Sampling View based on the k-MDI. The view can be implemented with ease using common database tables and can be manipulated by SQL statements. Furthermore, it is able to provide trustable approximate answers quickly for any query condition. The response time and accuracy of approximation are validated on a large dataset based on TPC-DS specifications.