{"title":"Improving mining of medical data by outliers prediction","authors":"V. Podgorelec, M. Heričko, I. Rozman","doi":"10.1109/CBMS.2005.68","DOIUrl":null,"url":null,"abstract":"In the paper a new outlier prediction method is presented that should improve the classification performance when mining the medical data. The method introduces the class confusion score metric that is based on the classification results of a set of classifiers, induced by an evolutionary decision tree induction algorithm. The classification improvement should be achieved by removing the identified outliers from a training set. Our proposition is that a classifier trained by a filtered dataset captures a better, more general knowledge model and should therefore perform better also on unseen cases. The proposed method is applied on the two cardio-vascular datasets and the obtained results are discussed.","PeriodicalId":119367,"journal":{"name":"18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2005.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 48
Abstract
In the paper a new outlier prediction method is presented that should improve the classification performance when mining the medical data. The method introduces the class confusion score metric that is based on the classification results of a set of classifiers, induced by an evolutionary decision tree induction algorithm. The classification improvement should be achieved by removing the identified outliers from a training set. Our proposition is that a classifier trained by a filtered dataset captures a better, more general knowledge model and should therefore perform better also on unseen cases. The proposed method is applied on the two cardio-vascular datasets and the obtained results are discussed.