Qiang Zhang, Junjiang He, Tao Li, Xiaolong Lan, Wenbo Fang, Yihong Li
{"title":"SPAW-SMOTE: Space Partitioning Adaptive Weighted Synthetic Minority Oversampling Technique For Imbalanced Data Set Learning","authors":"Qiang Zhang, Junjiang He, Tao Li, Xiaolong Lan, Wenbo Fang, Yihong Li","doi":"10.1093/comjnl/bxad098","DOIUrl":null,"url":null,"abstract":"Abstract The problem of data imbalance is common in reality, which greatly affects the performance of classifiers. Most of the solutions are to balance the data set by generating new minority class samples, which are faced with the problems of selecting the appropriate area for generating samples, fuzzy classification boundary and uneven distribution of samples. To solve these problems, we propose a novel oversampling algorithm named space partitioning adaptive weighted synthetic minority oversampling technique (SPAW-SMOTE). We first divide the data space into boundary space and non-boundary space based on spatial partitioning techniques. The number of samples to be generated is assigned to different spaces by the designed adaptive weighting algorithm, which is used to solve the problems of uneven distribution of samples and easy to blur the classification boundary. Finally, we also endeavor to develop a new generation algorithm to reduce the probability of overlapping samples generated when synthesizing new samples and to ensure the diversity of new samples. Experimental results on 18 real-world data sets show that the average performance (G-mean, F1-measure and Area Under Curve) of SPAW-SMOTE is significantly better than other existing oversampling techniques.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"28 1","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/comjnl/bxad098","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract The problem of data imbalance is common in reality, which greatly affects the performance of classifiers. Most of the solutions are to balance the data set by generating new minority class samples, which are faced with the problems of selecting the appropriate area for generating samples, fuzzy classification boundary and uneven distribution of samples. To solve these problems, we propose a novel oversampling algorithm named space partitioning adaptive weighted synthetic minority oversampling technique (SPAW-SMOTE). We first divide the data space into boundary space and non-boundary space based on spatial partitioning techniques. The number of samples to be generated is assigned to different spaces by the designed adaptive weighting algorithm, which is used to solve the problems of uneven distribution of samples and easy to blur the classification boundary. Finally, we also endeavor to develop a new generation algorithm to reduce the probability of overlapping samples generated when synthesizing new samples and to ensure the diversity of new samples. Experimental results on 18 real-world data sets show that the average performance (G-mean, F1-measure and Area Under Curve) of SPAW-SMOTE is significantly better than other existing oversampling techniques.
期刊介绍:
The Computer Journal is one of the longest-established journals serving all branches of the academic computer science community. It is currently published in four sections.