{"title":"Pragmatic estimation of join sizes and attribute correlations","authors":"D. Bell, D. H. O. Ling, S. McClean","doi":"10.1109/ICDE.1989.47202","DOIUrl":null,"url":null,"abstract":"A method is presented for modeling attribute value distributions in database relations for the purpose of obtaining accurate estimates of intermediate relation sizes during query evaluation. The basic idea is that instead of keeping a single (average) value to represent the number of occurrences of each attribute value, m (typically ten) parameters are kept, each representing the number of occurrences of attribute values in a piece, or partition, corresponding to a subrange of 1/mth of the original value range. The uniformity assumption, taken as an estimation technique rather than as an assumption, holds for each partition, hence the name piecewise uniform. The distribution method is extended to the modeling of important intrarelational attribute correlations. This and other enhancements to the technique such as application to semijoin operation are suggested. The technique is being used on two multidatabase management systems.<<ETX>>","PeriodicalId":329505,"journal":{"name":"[1989] Proceedings. Fifth International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1989] Proceedings. Fifth International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1989.47202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
A method is presented for modeling attribute value distributions in database relations for the purpose of obtaining accurate estimates of intermediate relation sizes during query evaluation. The basic idea is that instead of keeping a single (average) value to represent the number of occurrences of each attribute value, m (typically ten) parameters are kept, each representing the number of occurrences of attribute values in a piece, or partition, corresponding to a subrange of 1/mth of the original value range. The uniformity assumption, taken as an estimation technique rather than as an assumption, holds for each partition, hence the name piecewise uniform. The distribution method is extended to the modeling of important intrarelational attribute correlations. This and other enhancements to the technique such as application to semijoin operation are suggested. The technique is being used on two multidatabase management systems.<>