Min Wang, Binqian Li, Fan Min, Jiaxue Liu, Manlong Wang
{"title":"Ensemble active imputation for incomplete data","authors":"Min Wang, Binqian Li, Fan Min, Jiaxue Liu, Manlong Wang","doi":"10.1109/ICNSC48988.2020.9238068","DOIUrl":null,"url":null,"abstract":"Real data is often incomplete, which hinders its usability and learnability. A reasonable machine learning scenario is to obtain some values and labels at cost upon request. In this paper, we propose a new ensemble active missing imputation (EAMI) algorithm to handle the learning task. First, we design five missing imputation methods, including mean filling, cubic spline interpolation filling, sample-based collaborative filtering weighed filling, attribute-based collaborative filtering weighted filling and k-nearest neighbor (KNN) filling. Second, we propose an ensemble imputation model through the linear weighting of attribute prediction values. Third, We propose a three-way decisions model that uses the variance of the predicted values to fill in missing values by querying true label or using predicted values. We conduct experiments on University of California Irvine(UCI) datasets. The results of significance test verify the effectiveness of EAMI and its superiority over KNN missing data imputation algorithms.","PeriodicalId":412290,"journal":{"name":"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNSC48988.2020.9238068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Real data is often incomplete, which hinders its usability and learnability. A reasonable machine learning scenario is to obtain some values and labels at cost upon request. In this paper, we propose a new ensemble active missing imputation (EAMI) algorithm to handle the learning task. First, we design five missing imputation methods, including mean filling, cubic spline interpolation filling, sample-based collaborative filtering weighed filling, attribute-based collaborative filtering weighted filling and k-nearest neighbor (KNN) filling. Second, we propose an ensemble imputation model through the linear weighting of attribute prediction values. Third, We propose a three-way decisions model that uses the variance of the predicted values to fill in missing values by querying true label or using predicted values. We conduct experiments on University of California Irvine(UCI) datasets. The results of significance test verify the effectiveness of EAMI and its superiority over KNN missing data imputation algorithms.