{"title":"Online sequential classification of imbalanced data by combining extreme learning machine and improved SMOTE algorithm","authors":"Wentao Mao, Jinwan Wang, Liyun Wang","doi":"10.1109/IJCNN.2015.7280620","DOIUrl":null,"url":null,"abstract":"Presently, the data imbalance problems become more pronounced in the applications of machine learning and pattern recognition. However, many traditional machine learning methods suffer from the imbalanced data which are also collected in online sequential manner. To get fast and efficient classification for this special problem, a new online sequential extreme learning machine method with sequential SMOTE strategy is proposed. The key idea of this method is to reduce the randomness while generating virtual minority samples by means of the distribution characteristic of online sequential data. Utilizing online-sequential extreme learning machine as baseline algorithm, this method contains two stages. In offline stage, principal curve is introduced to model the each class's distribution based on which some virtual samples are generated by synthetic minority over-sampling technique(SMOTE). In online stage, each class's membership is determined according to the projection distance of sample to principal curve. With the help of these memberships, the redundant majority samples as well as unreasonable virtual minority samples are all excluded to lighten the imbalance level in online stage. The proposed method is evaluated on four UCI datasets and the real-world air pollutant forecasting dataset. The experimental results show that, the proposed method outperforms the classical ELM, OS-ELM and SMOTE-based OS-ELM in terms of generalization performance and numerical stability.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"18 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Presently, the data imbalance problems become more pronounced in the applications of machine learning and pattern recognition. However, many traditional machine learning methods suffer from the imbalanced data which are also collected in online sequential manner. To get fast and efficient classification for this special problem, a new online sequential extreme learning machine method with sequential SMOTE strategy is proposed. The key idea of this method is to reduce the randomness while generating virtual minority samples by means of the distribution characteristic of online sequential data. Utilizing online-sequential extreme learning machine as baseline algorithm, this method contains two stages. In offline stage, principal curve is introduced to model the each class's distribution based on which some virtual samples are generated by synthetic minority over-sampling technique(SMOTE). In online stage, each class's membership is determined according to the projection distance of sample to principal curve. With the help of these memberships, the redundant majority samples as well as unreasonable virtual minority samples are all excluded to lighten the imbalance level in online stage. The proposed method is evaluated on four UCI datasets and the real-world air pollutant forecasting dataset. The experimental results show that, the proposed method outperforms the classical ELM, OS-ELM and SMOTE-based OS-ELM in terms of generalization performance and numerical stability.