Jiapeng Yang, Lei Shi, Tielin Lu, Lu Yuan, Nanchang Cheng, Xiaohui Yang, Jia Luo, Mingying Xu
{"title":"针对不平衡数据的模糊近邻混合正样本增强算法","authors":"Jiapeng Yang, Lei Shi, Tielin Lu, Lu Yuan, Nanchang Cheng, Xiaohui Yang, Jia Luo, Mingying Xu","doi":"10.1007/s40815-024-01721-3","DOIUrl":null,"url":null,"abstract":"<p>The class imbalance problem is one of the critical research areas of machine learning and deep learning and has received widespread attention from researchers. To solve the class imbalance problem, current typical methods only use positive samples to generate synthetic samples that are similar to the minority class while ignoring the characteristic information of negative samples. Therefore, when the number of positive samples is too small and has highly similar features, it will cause the classifier to have fitting problems. In response to the above problems, we propose a new positive sample enhancement algorithm (PENH) to solve the class imbalance by simulating the process of chromosome cross-fusion. We select the fuzzy negative sample set around the positive sample by the <i>K</i>-nearest neighbor algorithm and adopt the beyond empirical risk minimization (Mixup) to randomly hybridize the positive sample with the negative sample of the set. To overcome the problem of sample imbalance, we adopt the One-class SVM with overfitting of positive samples to select the newly generated unlabeled samples to obtain the balanced dataset. We construct multiple experiments in 20 open datasets. The results show that our PENH outperforms the other six baseline methods in multiple evaluation indicator.</p>","PeriodicalId":14056,"journal":{"name":"International Journal of Fuzzy Systems","volume":"41 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Positive Sample Enhancement Algorithm with Fuzzy Nearest Neighbor Hybridization for Imbalance Data\",\"authors\":\"Jiapeng Yang, Lei Shi, Tielin Lu, Lu Yuan, Nanchang Cheng, Xiaohui Yang, Jia Luo, Mingying Xu\",\"doi\":\"10.1007/s40815-024-01721-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The class imbalance problem is one of the critical research areas of machine learning and deep learning and has received widespread attention from researchers. To solve the class imbalance problem, current typical methods only use positive samples to generate synthetic samples that are similar to the minority class while ignoring the characteristic information of negative samples. Therefore, when the number of positive samples is too small and has highly similar features, it will cause the classifier to have fitting problems. In response to the above problems, we propose a new positive sample enhancement algorithm (PENH) to solve the class imbalance by simulating the process of chromosome cross-fusion. We select the fuzzy negative sample set around the positive sample by the <i>K</i>-nearest neighbor algorithm and adopt the beyond empirical risk minimization (Mixup) to randomly hybridize the positive sample with the negative sample of the set. To overcome the problem of sample imbalance, we adopt the One-class SVM with overfitting of positive samples to select the newly generated unlabeled samples to obtain the balanced dataset. We construct multiple experiments in 20 open datasets. The results show that our PENH outperforms the other six baseline methods in multiple evaluation indicator.</p>\",\"PeriodicalId\":14056,\"journal\":{\"name\":\"International Journal of Fuzzy Systems\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Fuzzy Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s40815-024-01721-3\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Fuzzy Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40815-024-01721-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
A Positive Sample Enhancement Algorithm with Fuzzy Nearest Neighbor Hybridization for Imbalance Data
The class imbalance problem is one of the critical research areas of machine learning and deep learning and has received widespread attention from researchers. To solve the class imbalance problem, current typical methods only use positive samples to generate synthetic samples that are similar to the minority class while ignoring the characteristic information of negative samples. Therefore, when the number of positive samples is too small and has highly similar features, it will cause the classifier to have fitting problems. In response to the above problems, we propose a new positive sample enhancement algorithm (PENH) to solve the class imbalance by simulating the process of chromosome cross-fusion. We select the fuzzy negative sample set around the positive sample by the K-nearest neighbor algorithm and adopt the beyond empirical risk minimization (Mixup) to randomly hybridize the positive sample with the negative sample of the set. To overcome the problem of sample imbalance, we adopt the One-class SVM with overfitting of positive samples to select the newly generated unlabeled samples to obtain the balanced dataset. We construct multiple experiments in 20 open datasets. The results show that our PENH outperforms the other six baseline methods in multiple evaluation indicator.
期刊介绍:
The International Journal of Fuzzy Systems (IJFS) is an official journal of Taiwan Fuzzy Systems Association (TFSA) and is published semi-quarterly. IJFS will consider high quality papers that deal with the theory, design, and application of fuzzy systems, soft computing systems, grey systems, and extension theory systems ranging from hardware to software. Survey and expository submissions are also welcome.