Liangliang Tao , Qingya Wang , Fen Yu , Hui Cao , Yage Liang , Huixia Luo , Jinghui Guo
{"title":"基于牛顿冷却定理的不平衡数据集局部重叠区域清理和过采样技术","authors":"Liangliang Tao , Qingya Wang , Fen Yu , Hui Cao , Yage Liang , Huixia Luo , Jinghui Guo","doi":"10.1016/j.neucom.2024.128959","DOIUrl":null,"url":null,"abstract":"<div><div>Imbalanced datasets pose significant challenges to machine learning tasks because traditional classifiers tend to favor the majority class. While numerous methods have been proposed to balance data distribution, recent studies have identified that imbalanced classification is also hindered by other data characteristics. Among these factors, the joint effects of class overlap and within-class imbalance are particularly harmful to classification performance. To the end, we propose a novel algorithm called <em>Newton Cooling Theorem-Based Local Overlapping Regions Cleaning and Oversampling</em> (NCLO-SMOTE). This method employs an adaptive semi-supervised clustering algorithm, which divides the minority class into several clusters without requiring a pre-set number of clusters. It quantifies both the overall and local overlapping degrees of the dataset based on the number of clusters and their local information. Additionally, it uses Newton’s Cooling Theorem to clean these overlapping regions and a cluster-weighted oversampling strategy to address within-class imbalance. Comparative experiments were conducted between NCLO-SMOTE and ten state-of-the-art sampling methods on 48 real-world imbalanced datasets. The experimental results demonstrate that our proposed method not only achieves superior performance but also exhibits strong robustness and versatility in handling the joint effects of class overlap and imbalance.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128959"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Newton cooling theorem-based local overlapping regions cleaning and oversampling techniques for imbalanced datasets\",\"authors\":\"Liangliang Tao , Qingya Wang , Fen Yu , Hui Cao , Yage Liang , Huixia Luo , Jinghui Guo\",\"doi\":\"10.1016/j.neucom.2024.128959\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Imbalanced datasets pose significant challenges to machine learning tasks because traditional classifiers tend to favor the majority class. While numerous methods have been proposed to balance data distribution, recent studies have identified that imbalanced classification is also hindered by other data characteristics. Among these factors, the joint effects of class overlap and within-class imbalance are particularly harmful to classification performance. To the end, we propose a novel algorithm called <em>Newton Cooling Theorem-Based Local Overlapping Regions Cleaning and Oversampling</em> (NCLO-SMOTE). This method employs an adaptive semi-supervised clustering algorithm, which divides the minority class into several clusters without requiring a pre-set number of clusters. It quantifies both the overall and local overlapping degrees of the dataset based on the number of clusters and their local information. Additionally, it uses Newton’s Cooling Theorem to clean these overlapping regions and a cluster-weighted oversampling strategy to address within-class imbalance. Comparative experiments were conducted between NCLO-SMOTE and ten state-of-the-art sampling methods on 48 real-world imbalanced datasets. The experimental results demonstrate that our proposed method not only achieves superior performance but also exhibits strong robustness and versatility in handling the joint effects of class overlap and imbalance.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"616 \",\"pages\":\"Article 128959\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224017302\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224017302","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Newton cooling theorem-based local overlapping regions cleaning and oversampling techniques for imbalanced datasets
Imbalanced datasets pose significant challenges to machine learning tasks because traditional classifiers tend to favor the majority class. While numerous methods have been proposed to balance data distribution, recent studies have identified that imbalanced classification is also hindered by other data characteristics. Among these factors, the joint effects of class overlap and within-class imbalance are particularly harmful to classification performance. To the end, we propose a novel algorithm called Newton Cooling Theorem-Based Local Overlapping Regions Cleaning and Oversampling (NCLO-SMOTE). This method employs an adaptive semi-supervised clustering algorithm, which divides the minority class into several clusters without requiring a pre-set number of clusters. It quantifies both the overall and local overlapping degrees of the dataset based on the number of clusters and their local information. Additionally, it uses Newton’s Cooling Theorem to clean these overlapping regions and a cluster-weighted oversampling strategy to address within-class imbalance. Comparative experiments were conducted between NCLO-SMOTE and ten state-of-the-art sampling methods on 48 real-world imbalanced datasets. The experimental results demonstrate that our proposed method not only achieves superior performance but also exhibits strong robustness and versatility in handling the joint effects of class overlap and imbalance.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.