{"title":"采用多重插值和主成分分析相结合的方法处理任意模式的缺失值","authors":"Novita Anindita, H. A. Nugroho, T. B. Adji","doi":"10.1109/INAES.2017.8068537","DOIUrl":null,"url":null,"abstract":"Hepatitis is one of the major health problems which can progress to chronic hepatitis and cancer. Currently, computer based diagnosis is commonly use among medical examination. The diagnosis has been examined by using the disease dataset as a reference to make the decisions. However, the dataset was incomplete because it contained many instances containing missing values. This situation can lead the results of the analysis to be biased. One method of handling missing values is Multiple Imputation. Hepatitis dataset has an arbitrary pattern of missing values. This pattern can be handled by using Markov Chain Monte Carlo (MCMC) and Fully Conditional Specification (FCS) as Multiple Imputation algorithms. The research conducted an experiment to compare combinations of Multiple Imputations algorithm and Principal Component Analysis (PCA) as instance selection. Instance selection applied to reduce data by selecting variables that contribute greatly to the dataset. The goal was to improve the accuracy of the analysis on data which had missing values with the arbitrary pattern. The results showed that FCS-PCA is the best performance with the higher accuracy (98.80%) and the lowest error rate (0.0116).","PeriodicalId":382919,"journal":{"name":"2017 7th International Annual Engineering Seminar (InAES)","volume":"159 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Combination of multiple imputation and principal component analysis to handle missing value with arbitrary pattern\",\"authors\":\"Novita Anindita, H. A. Nugroho, T. B. Adji\",\"doi\":\"10.1109/INAES.2017.8068537\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hepatitis is one of the major health problems which can progress to chronic hepatitis and cancer. Currently, computer based diagnosis is commonly use among medical examination. The diagnosis has been examined by using the disease dataset as a reference to make the decisions. However, the dataset was incomplete because it contained many instances containing missing values. This situation can lead the results of the analysis to be biased. One method of handling missing values is Multiple Imputation. Hepatitis dataset has an arbitrary pattern of missing values. This pattern can be handled by using Markov Chain Monte Carlo (MCMC) and Fully Conditional Specification (FCS) as Multiple Imputation algorithms. The research conducted an experiment to compare combinations of Multiple Imputations algorithm and Principal Component Analysis (PCA) as instance selection. Instance selection applied to reduce data by selecting variables that contribute greatly to the dataset. The goal was to improve the accuracy of the analysis on data which had missing values with the arbitrary pattern. The results showed that FCS-PCA is the best performance with the higher accuracy (98.80%) and the lowest error rate (0.0116).\",\"PeriodicalId\":382919,\"journal\":{\"name\":\"2017 7th International Annual Engineering Seminar (InAES)\",\"volume\":\"159 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 7th International Annual Engineering Seminar (InAES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INAES.2017.8068537\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th International Annual Engineering Seminar (InAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INAES.2017.8068537","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Combination of multiple imputation and principal component analysis to handle missing value with arbitrary pattern
Hepatitis is one of the major health problems which can progress to chronic hepatitis and cancer. Currently, computer based diagnosis is commonly use among medical examination. The diagnosis has been examined by using the disease dataset as a reference to make the decisions. However, the dataset was incomplete because it contained many instances containing missing values. This situation can lead the results of the analysis to be biased. One method of handling missing values is Multiple Imputation. Hepatitis dataset has an arbitrary pattern of missing values. This pattern can be handled by using Markov Chain Monte Carlo (MCMC) and Fully Conditional Specification (FCS) as Multiple Imputation algorithms. The research conducted an experiment to compare combinations of Multiple Imputations algorithm and Principal Component Analysis (PCA) as instance selection. Instance selection applied to reduce data by selecting variables that contribute greatly to the dataset. The goal was to improve the accuracy of the analysis on data which had missing values with the arbitrary pattern. The results showed that FCS-PCA is the best performance with the higher accuracy (98.80%) and the lowest error rate (0.0116).