{"title":"基于支持向量的欠采样多类不平衡数据分类方法","authors":"Md. Yasir Arafat, S. Hoque, Shuxiang Xu, D. Farid","doi":"10.1109/SKIMA47702.2019.8982391","DOIUrl":null,"url":null,"abstract":"Multi-class imbalanced data classification in supervised learning is one of the most challenging research issues in machine learning for data mining applications. Although several data sampling methods have been introduced by computational intelligence researchers in the past decades for handling imbalanced data, still learning from imbalanced data is a challenging task and played as a significant focused research interest as well. Traditional machine learning algorithms usually biased to the majority class instances whereas ignored the minority class instances. As a result, ignoring minority class instances may affect the prediction accuracy of classifiers. Generally, under-sampling and over-sampling methods are commonly used in single model classifiers or ensemble learning for dealing with imbalanced data. In this paper, we have introduced an under-sampling method with support vectors for classifying imbalanced data. The proposed approach selects the most informative majority class instances based on the support vectors that help to engender decision boundary. We have tested the performance of the proposed method with single classifiers (C4.5 Decision Tree classifier and naïve Bayes classifier) and ensemble classifiers (Random Forest and AdaBoost) on 13 benchmark imbalanced datasets. It is explicitly shown by the experimental result that the proposed method produces high accuracy when classifying both the minority and majority class instances compared to other existing methods.","PeriodicalId":245523,"journal":{"name":"2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","volume":"458 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification\",\"authors\":\"Md. Yasir Arafat, S. Hoque, Shuxiang Xu, D. Farid\",\"doi\":\"10.1109/SKIMA47702.2019.8982391\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-class imbalanced data classification in supervised learning is one of the most challenging research issues in machine learning for data mining applications. Although several data sampling methods have been introduced by computational intelligence researchers in the past decades for handling imbalanced data, still learning from imbalanced data is a challenging task and played as a significant focused research interest as well. Traditional machine learning algorithms usually biased to the majority class instances whereas ignored the minority class instances. As a result, ignoring minority class instances may affect the prediction accuracy of classifiers. Generally, under-sampling and over-sampling methods are commonly used in single model classifiers or ensemble learning for dealing with imbalanced data. In this paper, we have introduced an under-sampling method with support vectors for classifying imbalanced data. The proposed approach selects the most informative majority class instances based on the support vectors that help to engender decision boundary. We have tested the performance of the proposed method with single classifiers (C4.5 Decision Tree classifier and naïve Bayes classifier) and ensemble classifiers (Random Forest and AdaBoost) on 13 benchmark imbalanced datasets. It is explicitly shown by the experimental result that the proposed method produces high accuracy when classifying both the minority and majority class instances compared to other existing methods.\",\"PeriodicalId\":245523,\"journal\":{\"name\":\"2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)\",\"volume\":\"458 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SKIMA47702.2019.8982391\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKIMA47702.2019.8982391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
Multi-class imbalanced data classification in supervised learning is one of the most challenging research issues in machine learning for data mining applications. Although several data sampling methods have been introduced by computational intelligence researchers in the past decades for handling imbalanced data, still learning from imbalanced data is a challenging task and played as a significant focused research interest as well. Traditional machine learning algorithms usually biased to the majority class instances whereas ignored the minority class instances. As a result, ignoring minority class instances may affect the prediction accuracy of classifiers. Generally, under-sampling and over-sampling methods are commonly used in single model classifiers or ensemble learning for dealing with imbalanced data. In this paper, we have introduced an under-sampling method with support vectors for classifying imbalanced data. The proposed approach selects the most informative majority class instances based on the support vectors that help to engender decision boundary. We have tested the performance of the proposed method with single classifiers (C4.5 Decision Tree classifier and naïve Bayes classifier) and ensemble classifiers (Random Forest and AdaBoost) on 13 benchmark imbalanced datasets. It is explicitly shown by the experimental result that the proposed method produces high accuracy when classifying both the minority and majority class instances compared to other existing methods.