{"title":"基于核的SVM不平衡数据集分类SMOTE","authors":"Josey Mathew, Ming Luo, C. Pang, H. Chan","doi":"10.1109/IECON.2015.7392251","DOIUrl":null,"url":null,"abstract":"Datasets with an imbalanced class distribution pose a severe challenge to traditional learning algorithms that are designed to improve overall classification accuracy. Preprocessing methods like Synthetic Minority Over-sampling Technique (SMOTE) address this problem by generating data points in the input space to balance the training dataset. However, such artificial sampling methods can distort the performance of Support Vector Machine (SVM) classifiers that operate in a kernel induced feature space. This paper proposes a kernel-based SMOTE (K-SMOTE) algorithm that directly generates synthetically minority data points in the feature space of SVM classifier. The new data points are added by augmenting the original Gram matrix based on neighbourhood information in the feature space. The proposed algorithm is statistically shown to improve performance on 51 benchmark datasets. K-SMOTE is further applied to predict the stage of degradation in a semiconductor etching chamber where it achieves a higher accuracy for the imbalanced faulty stages.","PeriodicalId":190550,"journal":{"name":"IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society","volume":"C-28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":"{\"title\":\"Kernel-based SMOTE for SVM classification of imbalanced datasets\",\"authors\":\"Josey Mathew, Ming Luo, C. Pang, H. Chan\",\"doi\":\"10.1109/IECON.2015.7392251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Datasets with an imbalanced class distribution pose a severe challenge to traditional learning algorithms that are designed to improve overall classification accuracy. Preprocessing methods like Synthetic Minority Over-sampling Technique (SMOTE) address this problem by generating data points in the input space to balance the training dataset. However, such artificial sampling methods can distort the performance of Support Vector Machine (SVM) classifiers that operate in a kernel induced feature space. This paper proposes a kernel-based SMOTE (K-SMOTE) algorithm that directly generates synthetically minority data points in the feature space of SVM classifier. The new data points are added by augmenting the original Gram matrix based on neighbourhood information in the feature space. The proposed algorithm is statistically shown to improve performance on 51 benchmark datasets. K-SMOTE is further applied to predict the stage of degradation in a semiconductor etching chamber where it achieves a higher accuracy for the imbalanced faulty stages.\",\"PeriodicalId\":190550,\"journal\":{\"name\":\"IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society\",\"volume\":\"C-28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"67\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IECON.2015.7392251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IECON.2015.7392251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Kernel-based SMOTE for SVM classification of imbalanced datasets
Datasets with an imbalanced class distribution pose a severe challenge to traditional learning algorithms that are designed to improve overall classification accuracy. Preprocessing methods like Synthetic Minority Over-sampling Technique (SMOTE) address this problem by generating data points in the input space to balance the training dataset. However, such artificial sampling methods can distort the performance of Support Vector Machine (SVM) classifiers that operate in a kernel induced feature space. This paper proposes a kernel-based SMOTE (K-SMOTE) algorithm that directly generates synthetically minority data points in the feature space of SVM classifier. The new data points are added by augmenting the original Gram matrix based on neighbourhood information in the feature space. The proposed algorithm is statistically shown to improve performance on 51 benchmark datasets. K-SMOTE is further applied to predict the stage of degradation in a semiconductor etching chamber where it achieves a higher accuracy for the imbalanced faulty stages.