{"title":"ISODF-ENN:基于改进扩散模型和ENN的不平衡数据混合采样方法","authors":"Zhenzhe Lv, Qicheng Liu","doi":"10.3233/jifs-233886","DOIUrl":null,"url":null,"abstract":"In the era of big data, the complexity of data is increasing. Problems such as data imbalance and class overlap pose challenges to traditional classifiers. Meanwhile, the importance of imbalanced data has become increasingly prominent, it is necessary to find appropriate methods to enhance classification performance of classifiers on such datasets. In response, this paper proposes a mixed sampling method (ISODF-ENN) based on iterative self-organizing (ISODATA) denoising diffusion algorithm and edited nearest neighbors (ENN) data cleaning algorithm. The algorithm first uses iterative self-organizing clustering algorithm to divide minority class into different sub-clusters, then it uses denoising diffusion algorithm to generate new minority class data for each sub-cluster, and finally it uses ENN algorithm to preprocess majority class data to remove the overlap with the minority class data. Each sub-cluster is oversampled according to sampling ratio, so that the oversampled minority class data also conforms to the distribution of original minority class data. Experimental results on keel datasets demonstrate that the proposed method outperforms other methods in terms of F-value and AUC, effectively addressing the issues of class imbalance and class overlap.","PeriodicalId":54795,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":"116 5","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ISODF-ENN:Imbalanced data mixed sampling method based on improved diffusion model and ENN\",\"authors\":\"Zhenzhe Lv, Qicheng Liu\",\"doi\":\"10.3233/jifs-233886\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of big data, the complexity of data is increasing. Problems such as data imbalance and class overlap pose challenges to traditional classifiers. Meanwhile, the importance of imbalanced data has become increasingly prominent, it is necessary to find appropriate methods to enhance classification performance of classifiers on such datasets. In response, this paper proposes a mixed sampling method (ISODF-ENN) based on iterative self-organizing (ISODATA) denoising diffusion algorithm and edited nearest neighbors (ENN) data cleaning algorithm. The algorithm first uses iterative self-organizing clustering algorithm to divide minority class into different sub-clusters, then it uses denoising diffusion algorithm to generate new minority class data for each sub-cluster, and finally it uses ENN algorithm to preprocess majority class data to remove the overlap with the minority class data. Each sub-cluster is oversampled according to sampling ratio, so that the oversampled minority class data also conforms to the distribution of original minority class data. Experimental results on keel datasets demonstrate that the proposed method outperforms other methods in terms of F-value and AUC, effectively addressing the issues of class imbalance and class overlap.\",\"PeriodicalId\":54795,\"journal\":{\"name\":\"Journal of Intelligent & Fuzzy Systems\",\"volume\":\"116 5\",\"pages\":\"0\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent & Fuzzy Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/jifs-233886\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jifs-233886","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
ISODF-ENN:Imbalanced data mixed sampling method based on improved diffusion model and ENN
In the era of big data, the complexity of data is increasing. Problems such as data imbalance and class overlap pose challenges to traditional classifiers. Meanwhile, the importance of imbalanced data has become increasingly prominent, it is necessary to find appropriate methods to enhance classification performance of classifiers on such datasets. In response, this paper proposes a mixed sampling method (ISODF-ENN) based on iterative self-organizing (ISODATA) denoising diffusion algorithm and edited nearest neighbors (ENN) data cleaning algorithm. The algorithm first uses iterative self-organizing clustering algorithm to divide minority class into different sub-clusters, then it uses denoising diffusion algorithm to generate new minority class data for each sub-cluster, and finally it uses ENN algorithm to preprocess majority class data to remove the overlap with the minority class data. Each sub-cluster is oversampled according to sampling ratio, so that the oversampled minority class data also conforms to the distribution of original minority class data. Experimental results on keel datasets demonstrate that the proposed method outperforms other methods in terms of F-value and AUC, effectively addressing the issues of class imbalance and class overlap.
期刊介绍:
The purpose of the Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology is to foster advancements of knowledge and help disseminate results concerning recent applications and case studies in the areas of fuzzy logic, intelligent systems, and web-based applications among working professionals and professionals in education and research, covering a broad cross-section of technical disciplines.