{"title":"The impact of different fold for cross validation of missing values imputation method on hepatitis dataset","authors":"T. Astuti, H. A. Nugroho, T. B. Adji","doi":"10.1109/QIR.2015.7374894","DOIUrl":null,"url":null,"abstract":"Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.","PeriodicalId":127270,"journal":{"name":"2015 International Conference on Quality in Research (QiR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Quality in Research (QiR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QIR.2015.7374894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.