The impact of different fold for cross validation of missing values imputation method on hepatitis dataset

2015 International Conference on Quality in Research (QiR) Pub Date : 2015-08-01 DOI:10.1109/QIR.2015.7374894

T. Astuti, H. A. Nugroho, T. B. Adji

{"title":"The impact of different fold for cross validation of missing values imputation method on hepatitis dataset","authors":"T. Astuti, H. A. Nugroho, T. B. Adji","doi":"10.1109/QIR.2015.7374894","DOIUrl":null,"url":null,"abstract":"Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.","PeriodicalId":127270,"journal":{"name":"2015 International Conference on Quality in Research (QiR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Quality in Research (QiR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QIR.2015.7374894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

缺失值估算法在肝炎数据集上交叉验证的不同倍数的影响

肝炎是一种由肝炎病毒引起的肝脏疾病。如今，肝炎是一个全球性的健康问题，包括在印度尼西亚。慢性肝炎可导致肝硬化和肝癌，因此需要早期诊断。为了改善肝炎疾病的诊断过程，已经进行了一些计算机辅助系统开发的研究工作。加州欧文(UCI)机器学习存储库提供可公开访问的肝炎疾病数据集;然而，数据集包含许多缺失值。数据集中缺失值的存在可能会影响结果分析的质量。因此，需要对缺失值进行处理。本文分析了应用不同倍数对缺失值估算方法进行交叉验证的性能。该方法将特征选择方法和机器学习算法结合在肝炎数据集上。在k-fold交叉验证中应用于估算方法的结果没有显示出显著的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 International Conference on Quality in Research (QiR)

自引率

0.00%

发文量