Lukman Syafie, Fitriyani Umar, Aliyazid Mude, Herdianti Darwis, Herman, Harlinda
{"title":"Missing Data Handling Using The Naive Bayes Logarithm (NBL) Formula","authors":"Lukman Syafie, Fitriyani Umar, Aliyazid Mude, Herdianti Darwis, Herman, Harlinda","doi":"10.1109/EIConCIT.2018.8878538","DOIUrl":null,"url":null,"abstract":"Missing data is one of the problems in classification that can reduce classification accuracy. This paper mainly studies the technique of fixing missing data by using deletion instances, mean imputation and median imputation. We use Naive Bayes based method which is used in many classification techniques. We proposed the improvement of the Naive Bayes formula into the Naive Bayes Logarithm (NBL) formula to anticipate the final result which can obtain zero for the prior probability of classifier. If the the prior probability of classifier obtained zero it will result failure in the classification process. In this research, we use Web-Kb dataset that has been used in other classification method. By Naive Bayes Logarithm, we study the effect of missing data on the classification accuracy in different types of method of fixing data. The results show the documents can be classified well in average 84.909% when using mean imputation, median imputation and deletion instances. It concludes that Naive Bayes Logarithm is reliable in the classification of documents.","PeriodicalId":424909,"journal":{"name":"2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCIT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EIConCIT.2018.8878538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Missing data is one of the problems in classification that can reduce classification accuracy. This paper mainly studies the technique of fixing missing data by using deletion instances, mean imputation and median imputation. We use Naive Bayes based method which is used in many classification techniques. We proposed the improvement of the Naive Bayes formula into the Naive Bayes Logarithm (NBL) formula to anticipate the final result which can obtain zero for the prior probability of classifier. If the the prior probability of classifier obtained zero it will result failure in the classification process. In this research, we use Web-Kb dataset that has been used in other classification method. By Naive Bayes Logarithm, we study the effect of missing data on the classification accuracy in different types of method of fixing data. The results show the documents can be classified well in average 84.909% when using mean imputation, median imputation and deletion instances. It concludes that Naive Bayes Logarithm is reliable in the classification of documents.