{"title":"A preliminary attempt to attribute selection using Split-and-Rank tool","authors":"Wieslaw Paja","doi":"10.1109/DT.2016.7557176","DOIUrl":null,"url":null,"abstract":"In this paper, some preliminary attempt to attribute selection using Split-and-Rank tool were presented. This approach devotes to using three ways of splitting of data into subsets investigated separately. These methods apply sequential, random and random with repetitions split of dataset. Additionally, two methods for threshold of selection were defined. The first one was based on using SVM weight to find important feature, and the second one uses random forest importance to reduce the feature space. Implemented methods were applied on three datasets from UCI machine learning repository and results of classification and AUROC were mostly better after selection than using original datasets.","PeriodicalId":281446,"journal":{"name":"2016 International Conference on Information and Digital Technologies (IDT)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Information and Digital Technologies (IDT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DT.2016.7557176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, some preliminary attempt to attribute selection using Split-and-Rank tool were presented. This approach devotes to using three ways of splitting of data into subsets investigated separately. These methods apply sequential, random and random with repetitions split of dataset. Additionally, two methods for threshold of selection were defined. The first one was based on using SVM weight to find important feature, and the second one uses random forest importance to reduce the feature space. Implemented methods were applied on three datasets from UCI machine learning repository and results of classification and AUROC were mostly better after selection than using original datasets.