{"title":"A novel imputation method for effective prediction of coronary Kidney disease","authors":"S. Arasu, R. Thirumalaiselvi","doi":"10.1109/ICCCT2.2017.7972256","DOIUrl":null,"url":null,"abstract":"Kidney disease is become a popular disease in around the world. The prediction of kidney disease is highly complex task while handling huge dataset. The kidney disease dataset contain patients information such as age, blood Pressure levels, albumin, sugar, counts of red blood cells etc., in the dataset there may be some missing values in some features that values may be important to predict kidney disease. Due to such missing values in the dataset will decrease the accuracy of kidney disease prediction. Several methods were proposed to fill up these missing values. An existing classification framework used a data preprocessing method but here the data cleaning process has been made in order to fill the missing values and to correct the erroneous ones. A recalculation process is performed on the chronic Kidney disease (CKD) stages and the values were recalculated and filled in for unknown values. Though this method is efficient, the influence of expert in the field of healthcare dataset values for CKD is needed. So to avoid this need and improve the preprocessing as a layman, Weighted Average Ensemble Learning Imputation (WAELI) is proposed. In this proposed work the single value imputation model used expectation-maximization (EM) and Random Forest (RF) which predict the missing values effectively in small dataset. For huge dataset the multiple value imputation model predict the missing values with the help of RF, Classification And Regression Tree, C4.5 are used to estimate the missing value. Hence the accuracy of kidney disease prediction will be improved by using WAELI. Then introducing priority assigning algorithm to assign priority for each features in the dataset then higher priority features are carried over for classification process. This makes classification process more efficient and time consumption for classification will be reduced.","PeriodicalId":445567,"journal":{"name":"2017 2nd International Conference on Computing and Communications Technologies (ICCCT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International Conference on Computing and Communications Technologies (ICCCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCT2.2017.7972256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Kidney disease is become a popular disease in around the world. The prediction of kidney disease is highly complex task while handling huge dataset. The kidney disease dataset contain patients information such as age, blood Pressure levels, albumin, sugar, counts of red blood cells etc., in the dataset there may be some missing values in some features that values may be important to predict kidney disease. Due to such missing values in the dataset will decrease the accuracy of kidney disease prediction. Several methods were proposed to fill up these missing values. An existing classification framework used a data preprocessing method but here the data cleaning process has been made in order to fill the missing values and to correct the erroneous ones. A recalculation process is performed on the chronic Kidney disease (CKD) stages and the values were recalculated and filled in for unknown values. Though this method is efficient, the influence of expert in the field of healthcare dataset values for CKD is needed. So to avoid this need and improve the preprocessing as a layman, Weighted Average Ensemble Learning Imputation (WAELI) is proposed. In this proposed work the single value imputation model used expectation-maximization (EM) and Random Forest (RF) which predict the missing values effectively in small dataset. For huge dataset the multiple value imputation model predict the missing values with the help of RF, Classification And Regression Tree, C4.5 are used to estimate the missing value. Hence the accuracy of kidney disease prediction will be improved by using WAELI. Then introducing priority assigning algorithm to assign priority for each features in the dataset then higher priority features are carried over for classification process. This makes classification process more efficient and time consumption for classification will be reduced.