{"title":"Hybrid sampling for multiclass imbalanced problem: Case study of students' performance prediction","authors":"Wanthanee Prachuabsupakij, N. Soonthornphisaj","doi":"10.1109/ICACSIS.2014.7065824","DOIUrl":null,"url":null,"abstract":"The aim of this paper is to propose a method namely CLUSS - CLUstering and SMOTE Sampling that can improve the prediction performance on multiclass imbalanced problem with students' performance data. Firstly, the clustering approach is used to create a new subset from all majority classes. The new subsets consists of the groups of majority classes instances which have different characteristics. Secondly, oversampling technique is applied to generate the new synthetic minority class instances. Then, CLUSS constructs the new training set by combining all minority class instances and the majority class instances in each subset. Finally, for each training set decision tree is used as a classifier to predict the classes via majority vote. The experimental results show that CLUSS achieved high performance on both majority and minority classes.","PeriodicalId":443250,"journal":{"name":"2014 International Conference on Advanced Computer Science and Information System","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Advanced Computer Science and Information System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS.2014.7065824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The aim of this paper is to propose a method namely CLUSS - CLUstering and SMOTE Sampling that can improve the prediction performance on multiclass imbalanced problem with students' performance data. Firstly, the clustering approach is used to create a new subset from all majority classes. The new subsets consists of the groups of majority classes instances which have different characteristics. Secondly, oversampling technique is applied to generate the new synthetic minority class instances. Then, CLUSS constructs the new training set by combining all minority class instances and the majority class instances in each subset. Finally, for each training set decision tree is used as a classifier to predict the classes via majority vote. The experimental results show that CLUSS achieved high performance on both majority and minority classes.