A Comparison of global and local probabilistic approximations in mining data with many missing attribute values

Patrick G. Clark, J. Grzymala-Busse
{"title":"A Comparison of global and local probabilistic approximations in mining data with many missing attribute values","authors":"Patrick G. Clark, J. Grzymala-Busse","doi":"10.1109/GrC.2013.6740384","DOIUrl":null,"url":null,"abstract":"We present results of a novel experimental comparison of global and local probabilistic approximations. Global approximations are unions of characteristic sets while local approximations are constructed from blocks of attributevalue pairs. Two interpretations of missing attribute values are discussed: lost values and “do not care” conditions. Our main objective was to compare global and local probabilistic approximations in terms of the error rate. For our experiments we used six incomplete data sets with many missing attribute values. The best results were accomplished by global approximations (for two data sets), by local approximations (for one data set), and for the remaining three data sets the experiments ended with ties. Our next objective was to check the quality of non-standard probabilistic approximations, i.e., probabilistic approximations that were neither lower nor upper approximations. For four data sets the smallest error rate was accomplished by non-standard probabilistic approximations, for the remaining two data sets the smallest error rate was accomplished by upper approximations. Our final objective was to compare two interpretations of missing attribute values. For three data sets the best interpretation was the lost value, for one data set it was the “do not care” condition, for the remaining two cases there was a tie.","PeriodicalId":415445,"journal":{"name":"2013 IEEE International Conference on Granular Computing (GrC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Granular Computing (GrC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GrC.2013.6740384","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We present results of a novel experimental comparison of global and local probabilistic approximations. Global approximations are unions of characteristic sets while local approximations are constructed from blocks of attributevalue pairs. Two interpretations of missing attribute values are discussed: lost values and “do not care” conditions. Our main objective was to compare global and local probabilistic approximations in terms of the error rate. For our experiments we used six incomplete data sets with many missing attribute values. The best results were accomplished by global approximations (for two data sets), by local approximations (for one data set), and for the remaining three data sets the experiments ended with ties. Our next objective was to check the quality of non-standard probabilistic approximations, i.e., probabilistic approximations that were neither lower nor upper approximations. For four data sets the smallest error rate was accomplished by non-standard probabilistic approximations, for the remaining two data sets the smallest error rate was accomplished by upper approximations. Our final objective was to compare two interpretations of missing attribute values. For three data sets the best interpretation was the lost value, for one data set it was the “do not care” condition, for the remaining two cases there was a tie.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多属性值缺失数据挖掘中全局与局部概率近似的比较
我们提出了一个新的实验结果比较全局和局部概率近似。全局逼近是特征集的并集,而局部逼近是由属性值对块构成的。讨论了缺失属性值的两种解释:丢失的值和“不关心”条件。我们的主要目标是比较全局概率近似和局部概率近似的错误率。在我们的实验中,我们使用了六个不完整的数据集,其中有许多缺失的属性值。最好的结果是通过全局近似(两个数据集),通过局部近似(一个数据集)完成的,对于剩下的三个数据集,实验以平局结束。我们的下一个目标是检查非标准概率近似的质量,即,既不是下近似也不是上近似的概率近似。对于四个数据集,最小错误率是通过非标准概率近似实现的,对于其余两个数据集,最小错误率是通过上近似实现的。我们的最终目标是比较对缺失属性值的两种解释。对于三个数据集,最好的解释是丢失的值,对于一个数据集,它是“不关心”的条件,对于剩下的两个情况,有一个平局。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An adaptive group recommender based on overlapping community detection An ad-hoc clustering algorithm based on ant colony algorithm Clothes style recommendation system Predicting movie sales revenue using online reviews Dimension reduction based on categorical fuzzy correlation degree for document categorization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1