{"title":"DG-SMOTE: A Distance-Angle-Based Genetic Synthetic Minority Over-Sampling Technique for Unbalanced Data Learning","authors":"Wenbin Pei;Yuyang Cui;Bing Xue;Mengjie Zhang;Jiqing Zhang;Yaqing Hou;Guangyu Zou;Qiang Zhang","doi":"10.1109/TEVC.2024.3515485","DOIUrl":null,"url":null,"abstract":"Many real-world applications often generate unbalanced data. Learning from such data may lead to biased classifiers that perform poorly on the class of interest. Oversampling methods have been shown to be effective in rebalancing unbalanced data to help classifiers avoid performance bias. However, many existing oversampling methods rely on a predesigned linear model structure and the neighborhood information of an original instance. This may lead to the generation of noisy instances when the original data has noise. In this study, we develop a novel oversampling method in which genetic programming is introduced to automatically select good-quality instances and evolve a model structure that combines the selected instances to create a new instance. In the proposed oversampling method, an individual is used to represent a generated instance, which is evaluated by the fitness function designed based on the Euclidean distance and the cosine theorem. In the experiments, we examine the effectiveness of the proposed oversampling method in assisting different types of classifiers to solve the issue of class imbalance, and compare it with popular sampling methods in unbalanced classification. The results have been analyzed comprehensively, indicating that the new method successfully addressed the class imbalance issue by generating a group of good-quality instances for the minority class and outperformed the compared sampling methods in almost all cases.","PeriodicalId":13206,"journal":{"name":"IEEE Transactions on Evolutionary Computation","volume":"29 6","pages":"2641-2655"},"PeriodicalIF":11.7000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10793073/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Many real-world applications often generate unbalanced data. Learning from such data may lead to biased classifiers that perform poorly on the class of interest. Oversampling methods have been shown to be effective in rebalancing unbalanced data to help classifiers avoid performance bias. However, many existing oversampling methods rely on a predesigned linear model structure and the neighborhood information of an original instance. This may lead to the generation of noisy instances when the original data has noise. In this study, we develop a novel oversampling method in which genetic programming is introduced to automatically select good-quality instances and evolve a model structure that combines the selected instances to create a new instance. In the proposed oversampling method, an individual is used to represent a generated instance, which is evaluated by the fitness function designed based on the Euclidean distance and the cosine theorem. In the experiments, we examine the effectiveness of the proposed oversampling method in assisting different types of classifiers to solve the issue of class imbalance, and compare it with popular sampling methods in unbalanced classification. The results have been analyzed comprehensively, indicating that the new method successfully addressed the class imbalance issue by generating a group of good-quality instances for the minority class and outperformed the compared sampling methods in almost all cases.
期刊介绍:
The IEEE Transactions on Evolutionary Computation is published by the IEEE Computational Intelligence Society on behalf of 13 societies: Circuits and Systems; Computer; Control Systems; Engineering in Medicine and Biology; Industrial Electronics; Industry Applications; Lasers and Electro-Optics; Oceanic Engineering; Power Engineering; Robotics and Automation; Signal Processing; Social Implications of Technology; and Systems, Man, and Cybernetics. The journal publishes original papers in evolutionary computation and related areas such as nature-inspired algorithms, population-based methods, optimization, and hybrid systems. It welcomes both purely theoretical papers and application papers that provide general insights into these areas of computation.