{"title":"缺失值估算法在肝炎数据集上交叉验证的不同倍数的影响","authors":"T. Astuti, H. A. Nugroho, T. B. Adji","doi":"10.1109/QIR.2015.7374894","DOIUrl":null,"url":null,"abstract":"Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.","PeriodicalId":127270,"journal":{"name":"2015 International Conference on Quality in Research (QiR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"The impact of different fold for cross validation of missing values imputation method on hepatitis dataset\",\"authors\":\"T. Astuti, H. A. Nugroho, T. B. Adji\",\"doi\":\"10.1109/QIR.2015.7374894\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.\",\"PeriodicalId\":127270,\"journal\":{\"name\":\"2015 International Conference on Quality in Research (QiR)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Quality in Research (QiR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QIR.2015.7374894\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Quality in Research (QiR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QIR.2015.7374894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The impact of different fold for cross validation of missing values imputation method on hepatitis dataset
Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.