基于核的SVM不平衡数据集分类SMOTE

IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society Pub Date : 2015-11-01 DOI:10.1109/IECON.2015.7392251

Josey Mathew, Ming Luo, C. Pang, H. Chan

{"title":"基于核的SVM不平衡数据集分类SMOTE","authors":"Josey Mathew, Ming Luo, C. Pang, H. Chan","doi":"10.1109/IECON.2015.7392251","DOIUrl":null,"url":null,"abstract":"Datasets with an imbalanced class distribution pose a severe challenge to traditional learning algorithms that are designed to improve overall classification accuracy. Preprocessing methods like Synthetic Minority Over-sampling Technique (SMOTE) address this problem by generating data points in the input space to balance the training dataset. However, such artificial sampling methods can distort the performance of Support Vector Machine (SVM) classifiers that operate in a kernel induced feature space. This paper proposes a kernel-based SMOTE (K-SMOTE) algorithm that directly generates synthetically minority data points in the feature space of SVM classifier. The new data points are added by augmenting the original Gram matrix based on neighbourhood information in the feature space. The proposed algorithm is statistically shown to improve performance on 51 benchmark datasets. K-SMOTE is further applied to predict the stage of degradation in a semiconductor etching chamber where it achieves a higher accuracy for the imbalanced faulty stages.","PeriodicalId":190550,"journal":{"name":"IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society","volume":"C-28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":"{\"title\":\"Kernel-based SMOTE for SVM classification of imbalanced datasets\",\"authors\":\"Josey Mathew, Ming Luo, C. Pang, H. Chan\",\"doi\":\"10.1109/IECON.2015.7392251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Datasets with an imbalanced class distribution pose a severe challenge to traditional learning algorithms that are designed to improve overall classification accuracy. Preprocessing methods like Synthetic Minority Over-sampling Technique (SMOTE) address this problem by generating data points in the input space to balance the training dataset. However, such artificial sampling methods can distort the performance of Support Vector Machine (SVM) classifiers that operate in a kernel induced feature space. This paper proposes a kernel-based SMOTE (K-SMOTE) algorithm that directly generates synthetically minority data points in the feature space of SVM classifier. The new data points are added by augmenting the original Gram matrix based on neighbourhood information in the feature space. The proposed algorithm is statistically shown to improve performance on 51 benchmark datasets. K-SMOTE is further applied to predict the stage of degradation in a semiconductor etching chamber where it achieves a higher accuracy for the imbalanced faulty stages.\",\"PeriodicalId\":190550,\"journal\":{\"name\":\"IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society\",\"volume\":\"C-28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"67\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IECON.2015.7392251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IECON.2015.7392251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 67

摘要

类分布不平衡的数据集对旨在提高整体分类精度的传统学习算法提出了严峻的挑战。预处理方法如合成少数派过采样技术(SMOTE)通过在输入空间中生成数据点来平衡训练数据集来解决这个问题。然而，这种人工采样方法会扭曲在核诱导特征空间中运行的支持向量机(SVM)分类器的性能。本文提出了一种基于核的SMOTE (K-SMOTE)算法，该算法直接在SVM分类器的特征空间中合成少数派数据点。基于特征空间中的邻域信息，对原始Gram矩阵进行增广，从而增加新的数据点。在51个基准数据集上的统计结果表明，该算法提高了性能。K-SMOTE进一步应用于预测半导体蚀刻室的退化阶段，在那里它对不平衡故障阶段达到了更高的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Kernel-based SMOTE for SVM classification of imbalanced datasets

Datasets with an imbalanced class distribution pose a severe challenge to traditional learning algorithms that are designed to improve overall classification accuracy. Preprocessing methods like Synthetic Minority Over-sampling Technique (SMOTE) address this problem by generating data points in the input space to balance the training dataset. However, such artificial sampling methods can distort the performance of Support Vector Machine (SVM) classifiers that operate in a kernel induced feature space. This paper proposes a kernel-based SMOTE (K-SMOTE) algorithm that directly generates synthetically minority data points in the feature space of SVM classifier. The new data points are added by augmenting the original Gram matrix based on neighbourhood information in the feature space. The proposed algorithm is statistically shown to improve performance on 51 benchmark datasets. K-SMOTE is further applied to predict the stage of degradation in a semiconductor etching chamber where it achieves a higher accuracy for the imbalanced faulty stages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society

自引率

0.00%

发文量