Pub Date : 2024-08-10DOI: 10.1016/j.patrec.2024.08.004
Kaixin Jin, Xiaoling Gu, Zimeng Wang, Zhenzhong Kuang, Zizhao Wu, Min Tan, Jun Yu
High-fidelity facial avatar reconstruction from monocular videos is a prominent research problem in computer graphics and computer vision. Recent advancements in the Neural Radiance Field (NeRF) have demonstrated remarkable proficiency in rendering novel views and garnered attention for its potential in facial avatar reconstruction. However, previous methodologies have overlooked the complex motion dynamics present across the head, torso, and intricate facial features. Additionally, a deficiency exists in a generalized NeRF-based framework for facial avatar reconstruction adaptable to either 3DMM coefficients or audio input. To tackle these challenges, we propose an innovative framework that leverages semantic-aware hyper-space deformable NeRF, facilitating the reconstruction of high-fidelity facial avatars from either 3DMM coefficients or audio features. Our framework effectively addresses both localized facial movements and broader head and torso motions through semantic guidance and a unified hyper-space deformation module. Specifically, we adopt a dynamic weighted ray sampling strategy to allocate varying degrees of attention to distinct semantic regions, enhancing the deformable NeRF framework with semantic guidance to capture fine-grained details across diverse facial regions. Moreover, we introduce a hyper-space deformation module that enables the transformation of observation space coordinates into canonical hyper-space coordinates, allowing for the learning of natural facial deformation and head-torso movements. Extensive experiments validate the superiority of our framework over existing state-of-the-art methods, demonstrating its effectiveness in producing realistic and expressive facial avatars. Our code is available at https://github.com/jematy/SAHS-Deformable-Nerf.
{"title":"Semantic-aware hyper-space deformable neural radiance fields for facial avatar reconstruction","authors":"Kaixin Jin, Xiaoling Gu, Zimeng Wang, Zhenzhong Kuang, Zizhao Wu, Min Tan, Jun Yu","doi":"10.1016/j.patrec.2024.08.004","DOIUrl":"10.1016/j.patrec.2024.08.004","url":null,"abstract":"<div><p>High-fidelity facial avatar reconstruction from monocular videos is a prominent research problem in computer graphics and computer vision. Recent advancements in the Neural Radiance Field (NeRF) have demonstrated remarkable proficiency in rendering novel views and garnered attention for its potential in facial avatar reconstruction. However, previous methodologies have overlooked the complex motion dynamics present across the head, torso, and intricate facial features. Additionally, a deficiency exists in a generalized NeRF-based framework for facial avatar reconstruction adaptable to either 3DMM coefficients or audio input. To tackle these challenges, we propose an innovative framework that leverages semantic-aware hyper-space deformable NeRF, facilitating the reconstruction of high-fidelity facial avatars from either 3DMM coefficients or audio features. Our framework effectively addresses both localized facial movements and broader head and torso motions through semantic guidance and a unified hyper-space deformation module. Specifically, we adopt a dynamic weighted ray sampling strategy to allocate varying degrees of attention to distinct semantic regions, enhancing the deformable NeRF framework with semantic guidance to capture fine-grained details across diverse facial regions. Moreover, we introduce a hyper-space deformation module that enables the transformation of observation space coordinates into canonical hyper-space coordinates, allowing for the learning of natural facial deformation and head-torso movements. Extensive experiments validate the superiority of our framework over existing state-of-the-art methods, demonstrating its effectiveness in producing realistic and expressive facial avatars. Our code is available at <span><span>https://github.com/jematy/SAHS-Deformable-Nerf</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 160-166"},"PeriodicalIF":3.9,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.patrec.2024.08.002
Bernardete Ribeiro, Francisco Antunes, Dylan Perdigão, Catarina Silva
Spiking Neural Networks (SNNs) are regarded as the next frontier in AI, as they can be implemented on neuromorphic hardware, paving the way for advancements in real-world applications in the field. SNNs provide a biologically inspired solution that is event-driven, energy-efficient and sparse. While showing promising results, there are challenges that need to be addressed. For example, the design-build-evaluate process for integrating the architecture, learning, hyperparameter optimization and inference need to be tailored to a specific problem. This is particularly important in critical high-stakes industries such as finance services. In this paper, we present SpikeConv, a novel deep Convolutional Spiking Neural Network (CSNN), and investigate this process in the context of a highly imbalanced online bank account opening fraud problem. Our approach is compared with Deep Spiking Neural Networks (DSNNs) and Gradient Boosting Decision Trees (GBDT) showing competitive results.
{"title":"Convolutional Spiking Neural Networks targeting learning and inference in highly imbalanced datasets","authors":"Bernardete Ribeiro, Francisco Antunes, Dylan Perdigão, Catarina Silva","doi":"10.1016/j.patrec.2024.08.002","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.08.002","url":null,"abstract":"Spiking Neural Networks (SNNs) are regarded as the next frontier in AI, as they can be implemented on neuromorphic hardware, paving the way for advancements in real-world applications in the field. SNNs provide a biologically inspired solution that is event-driven, energy-efficient and sparse. While showing promising results, there are challenges that need to be addressed. For example, the design-build-evaluate process for integrating the architecture, learning, hyperparameter optimization and inference need to be tailored to a specific problem. This is particularly important in critical high-stakes industries such as finance services. In this paper, we present SpikeConv, a novel deep Convolutional Spiking Neural Network (CSNN), and investigate this process in the context of a highly imbalanced online bank account opening fraud problem. Our approach is compared with Deep Spiking Neural Networks (DSNNs) and Gradient Boosting Decision Trees (GBDT) showing competitive results.","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"86 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.patrec.2024.07.023
Fabio Narducci, Piercalo Dondi, David Freire Obregón, Florin Pop
{"title":"Introduction to the special issue on “Computer vision solutions for part-based image analysis and classification (CV_PARTIAL)”","authors":"Fabio Narducci, Piercalo Dondi, David Freire Obregón, Florin Pop","doi":"10.1016/j.patrec.2024.07.023","DOIUrl":"10.1016/j.patrec.2024.07.023","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 150-151"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.patrec.2024.08.001
Kuo-Liang Chung, Chia-Chi Hsu, Pei-Hsuan Hsieh
It is an important and challenging task to register two point clouds, and the estimated registration solution can be applied in 3D vision. In this paper, an outlier removal method is first proposed to delete redundant coplane-pair correspondences for constructing three feature-consistent coplane-pair correspondence subsets. Next, Rodrigues’ formula and a scoring-based method are adopted to solve the representative registration solution of each correspondence subset. Then, a robust fusion method is proposed to fuse the three representative solutions as the final registration solution. Based on typical testing datasets, comprehensive experimental results demonstrated that with good registration accuracy, our registration algorithm achieves significant execution time reduction effect when compared with the state-of-the-art methods.
{"title":"Feature-consistent coplane-pair correspondence- and fusion-based point cloud registration","authors":"Kuo-Liang Chung, Chia-Chi Hsu, Pei-Hsuan Hsieh","doi":"10.1016/j.patrec.2024.08.001","DOIUrl":"10.1016/j.patrec.2024.08.001","url":null,"abstract":"<div><p>It is an important and challenging task to register two point clouds, and the estimated registration solution can be applied in 3D vision. In this paper, an outlier removal method is first proposed to delete redundant coplane-pair correspondences for constructing three feature-consistent coplane-pair correspondence subsets. Next, Rodrigues’ formula and a scoring-based method are adopted to solve the representative registration solution of each correspondence subset. Then, a robust fusion method is proposed to fuse the three representative solutions as the final registration solution. Based on typical testing datasets, comprehensive experimental results demonstrated that with good registration accuracy, our registration algorithm achieves significant execution time reduction effect when compared with the state-of-the-art methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 143-149"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.patrec.2024.08.003
Lijuan Duan , Guangyuan Liu , Qing En , Zhaoying Liu , Zhi Gong , Bian Ma
Zero-shot object detection aims to identify objects from unseen categories not present during training. Existing methods rely on category labels to create pseudo-features for unseen categories, but they face limitations in exploring semantic information and lack robustness. To address these issues, we introduce a novel framework, EKZSD, enhancing zero-shot object detection by incorporating external knowledge and contrastive paradigms. This framework enriches semantic diversity, enhancing discriminative ability and robustness. Specifically, we introduce a novel external knowledge extraction module that leverages attribute and relationship prompts to enrich semantic information. Moreover, a novel external knowledge contrastive learning module is proposed to enhance the model’s discriminative and robust capabilities by exploring pseudo-visual features. Additionally, we use cycle consistency learning to align generated visual features with original semantic features and adversarial learning to align visual features with semantic features. Collaboratively trained with contrast learning loss, cycle consistency loss, adversarial learning loss, and classification loss, our framework outperforms superior performance on the MSCOCO and Ship-43 datasets, as demonstrated in experimental results.
{"title":"Enhancing zero-shot object detection with external knowledge-guided robust contrast learning","authors":"Lijuan Duan , Guangyuan Liu , Qing En , Zhaoying Liu , Zhi Gong , Bian Ma","doi":"10.1016/j.patrec.2024.08.003","DOIUrl":"10.1016/j.patrec.2024.08.003","url":null,"abstract":"<div><p>Zero-shot object detection aims to identify objects from unseen categories not present during training. Existing methods rely on category labels to create pseudo-features for unseen categories, but they face limitations in exploring semantic information and lack robustness. To address these issues, we introduce a novel framework, EKZSD, enhancing zero-shot object detection by incorporating external knowledge and contrastive paradigms. This framework enriches semantic diversity, enhancing discriminative ability and robustness. Specifically, we introduce a novel external knowledge extraction module that leverages attribute and relationship prompts to enrich semantic information. Moreover, a novel external knowledge contrastive learning module is proposed to enhance the model’s discriminative and robust capabilities by exploring pseudo-visual features. Additionally, we use cycle consistency learning to align generated visual features with original semantic features and adversarial learning to align visual features with semantic features. Collaboratively trained with contrast learning loss, cycle consistency loss, adversarial learning loss, and classification loss, our framework outperforms superior performance on the MSCOCO and Ship-43 datasets, as demonstrated in experimental results.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 152-159"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1016/j.patrec.2024.07.022
Guilherme F. Roberto, Danilo C. Pereira, Alessandro S. Martins, Thaína A.A. Tosta, Carlos Soares, Alessandra Lumini, Guilherme B. Rozendo, Leandro A. Neves, Marcelo Z. Nascimento
Covid-19 is a severe illness caused by the Sars-CoV-2 virus, initially identified in China in late 2019 and swiftly spreading globally. Since the virus primarily impacts the lungs, analyzing chest X-rays stands as a reliable and widely accessible means of diagnosing the infection. In computer vision, deep learning models such as CNNs have been the main adopted approach for detection of Covid-19 in chest X-ray images. However, we believe that handcrafted features can also provide relevant results, as shown previously in similar image classification challenges. In this study, we propose a method for identifying Covid-19 in chest X-ray images by extracting and classifying local and global percolation-based features. This technique was tested on three datasets: one comprising 2,002 segmented samples categorized into two groups (Covid-19 and Healthy); another with 1,125 non-segmented samples categorized into three groups (Covid-19, Healthy, and Pneumonia); and a third one composed of 4,809 non-segmented images representing three classes (Covid-19, Healthy, and Pneumonia). Then, 48 percolation features were extracted and give as input into six distinct classifiers. Subsequently, the AUC and accuracy metrics were assessed. We used the 10-fold cross-validation approach and evaluated lesion sub-types via binary and multiclass classification using the Hermite polynomial classifier, a novel approach in this domain. The Hermite polynomial classifier exhibited the most promising outcomes compared to five other machine learning algorithms, wherein the best obtained values for accuracy and AUC were 98.72% and 0.9917, respectively. We also evaluated the influence of noise in the features and in the classification accuracy. These results, based in the integration of percolation features with the Hermite polynomial, hold the potential for enhancing lesion detection and supporting clinicians in their diagnostic endeavors.
{"title":"Exploring percolation features with polynomial algorithms for classifying Covid-19 in chest X-ray images","authors":"Guilherme F. Roberto, Danilo C. Pereira, Alessandro S. Martins, Thaína A.A. Tosta, Carlos Soares, Alessandra Lumini, Guilherme B. Rozendo, Leandro A. Neves, Marcelo Z. Nascimento","doi":"10.1016/j.patrec.2024.07.022","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.07.022","url":null,"abstract":"Covid-19 is a severe illness caused by the Sars-CoV-2 virus, initially identified in China in late 2019 and swiftly spreading globally. Since the virus primarily impacts the lungs, analyzing chest X-rays stands as a reliable and widely accessible means of diagnosing the infection. In computer vision, deep learning models such as CNNs have been the main adopted approach for detection of Covid-19 in chest X-ray images. However, we believe that handcrafted features can also provide relevant results, as shown previously in similar image classification challenges. In this study, we propose a method for identifying Covid-19 in chest X-ray images by extracting and classifying local and global percolation-based features. This technique was tested on three datasets: one comprising 2,002 segmented samples categorized into two groups (Covid-19 and Healthy); another with 1,125 non-segmented samples categorized into three groups (Covid-19, Healthy, and Pneumonia); and a third one composed of 4,809 non-segmented images representing three classes (Covid-19, Healthy, and Pneumonia). Then, 48 percolation features were extracted and give as input into six distinct classifiers. Subsequently, the AUC and accuracy metrics were assessed. We used the 10-fold cross-validation approach and evaluated lesion sub-types via binary and multiclass classification using the Hermite polynomial classifier, a novel approach in this domain. The Hermite polynomial classifier exhibited the most promising outcomes compared to five other machine learning algorithms, wherein the best obtained values for accuracy and AUC were 98.72% and 0.9917, respectively. We also evaluated the influence of noise in the features and in the classification accuracy. These results, based in the integration of percolation features with the Hermite polynomial, hold the potential for enhancing lesion detection and supporting clinicians in their diagnostic endeavors.","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"24 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1016/j.patrec.2024.07.021
Ke Ni, Jing Chen, Jian Wang, Bo Liu, Ting Lei, Yongtian Wang
Recognition and understanding of facial images or eye images are critical for eye tracking. Recent studies have shown that the simultaneous use of facial and eye images can effectively lower gaze errors. However, these methods typically consider facial and eye images as two unrelated inputs, without taking into account their distinct representational abilities at the feature level. Additionally, implicitly learned head pose from highly coupled facial features would make the trained model less interpretable and prone to the gaze-head overfitting problem. To address these issues, we propose a method that aims to enhance task-relevant features while suppressing other noises by leveraging feature decomposition. We disentangle eye-related features from the facial image via a projection module and further make them distinctive with an attention-based head pose regression task, which could enhance the representation of gaze-related features and make the model less susceptible to task-irrelevant features. After that, the mutually separated eye features and head pose are recombined to achieve more accurate gaze estimation. Experimental results demonstrate that our method achieves state-of-the-art performance, with an estimation error of 3.90° on the MPIIGaze dataset and 5.15° error on the EyeDiap dataset, respectively.
{"title":"Feature decomposition-based gaze estimation with auxiliary head pose regression","authors":"Ke Ni, Jing Chen, Jian Wang, Bo Liu, Ting Lei, Yongtian Wang","doi":"10.1016/j.patrec.2024.07.021","DOIUrl":"10.1016/j.patrec.2024.07.021","url":null,"abstract":"<div><p>Recognition and understanding of facial images or eye images are critical for eye tracking. Recent studies have shown that the simultaneous use of facial and eye images can effectively lower gaze errors. However, these methods typically consider facial and eye images as two unrelated inputs, without taking into account their distinct representational abilities at the feature level. Additionally, implicitly learned head pose from highly coupled facial features would make the trained model less interpretable and prone to the gaze-head overfitting problem. To address these issues, we propose a method that aims to enhance task-relevant features while suppressing other noises by leveraging feature decomposition. We disentangle eye-related features from the facial image via a projection module and further make them distinctive with an attention-based head pose regression task, which could enhance the representation of gaze-related features and make the model less susceptible to task-irrelevant features. After that, the mutually separated eye features and head pose are recombined to achieve more accurate gaze estimation. Experimental results demonstrate that our method achieves state-of-the-art performance, with an estimation error of 3.90° on the MPIIGaze dataset and 5.15° error on the EyeDiap dataset, respectively.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 137-142"},"PeriodicalIF":3.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1016/j.patrec.2024.07.020
Zhuorong Li , Minghui Wu , Canghong Jin , Daiwei Yu , Hongchuan Yu
Adversarial training is currently one of the most promising ways to achieve adversarial robustness of deep models. However, even the most sophisticated training methods is far from satisfactory, as improvement in robustness requires either heuristic strategies or more annotated data, which might be problematic in real-world applications. To alleviate these issues, we propose an effective training scheme that avoids prohibitively high cost of additional labeled data by adapting self-training scheme to adversarial training. In particular, we first use the confident prediction for a randomly-augmented image as the pseudo-label for self-training. Then we enforce the consistency regularization by targeting the adversarially-perturbed version of the same image at the pseudo-label, which implicitly suppresses the distortion of representation in latent space. Despite its simplicity, extensive experiments show that our regularization could bring significant advancement in adversarial robustness of a wide range of adversarial training methods and helps the model to generalize its robustness to larger perturbations or even against unseen adversaries.
{"title":"Adversarial self-training for robustness and generalization","authors":"Zhuorong Li , Minghui Wu , Canghong Jin , Daiwei Yu , Hongchuan Yu","doi":"10.1016/j.patrec.2024.07.020","DOIUrl":"10.1016/j.patrec.2024.07.020","url":null,"abstract":"<div><p><em>Adversarial training</em> is currently one of the most promising ways to achieve adversarial robustness of deep models. However, even the most sophisticated training methods is far from satisfactory, as improvement in robustness requires either heuristic strategies or more annotated data, which might be problematic in real-world applications. To alleviate these issues, we propose an effective training scheme that avoids prohibitively high cost of additional labeled data by adapting self-training scheme to adversarial training. In particular, we first use the confident prediction for a randomly-augmented image as the pseudo-label for self-training. Then we enforce the consistency regularization by targeting the adversarially-perturbed version of the same image at the pseudo-label, which implicitly suppresses the distortion of representation in latent space. Despite its simplicity, extensive experiments show that our regularization could bring significant advancement in adversarial robustness of a wide range of adversarial training methods and helps the model to generalize its robustness to larger perturbations or even against unseen adversaries.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 117-123"},"PeriodicalIF":3.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141945959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.patrec.2024.04.012
The creation of large-scale datasets annotated by humans inevitably introduces noisy labels, leading to reduced generalization in deep-learning models. Sample selection-based learning with noisy labels is a recent approach that exhibits promising upbeat performance improvements. The selection of clean samples amongst the noisy samples is an important criterion in the learning process of these models. In this work, we delve deeper into the clean-noise split decision and highlight the aspect that effective demarcation of samples would lead to better performance. We identify the Global Noise Conundrum in the existing models, where the distribution of samples is treated globally. We propose a per-class-based local distribution of samples and demonstrate the effectiveness of this approach in having a better clean-noise split. We validate our proposal on several benchmarks — both real and synthetic, and show substantial improvements over different state-of-the-art algorithms. We further propose a new metric, classiness to extend our analysis and highlight the effectiveness of the proposed method. Source code and instructions to reproduce this paper are available at https://github.com/aldakata/CCLM/
{"title":"Decoding class dynamics in learning with noisy labels","authors":"","doi":"10.1016/j.patrec.2024.04.012","DOIUrl":"10.1016/j.patrec.2024.04.012","url":null,"abstract":"<div><p><span>The creation of large-scale datasets annotated by humans inevitably introduces noisy labels, leading to reduced generalization in deep-learning models. Sample selection-based learning with noisy labels is a recent approach that exhibits promising upbeat performance improvements<span>. The selection of clean samples amongst the noisy samples is an important criterion in the learning process of these models. In this work, we delve deeper into the clean-noise split decision and highlight the aspect that effective demarcation of samples would lead to better performance. We identify the Global Noise Conundrum in the existing models, where the distribution of samples is treated globally. We propose a per-class-based local distribution of samples and demonstrate the effectiveness of this approach in having a better clean-noise split. We validate our proposal on several benchmarks — both real and synthetic, and show substantial improvements over different state-of-the-art algorithms. We further propose a new metric, classiness to extend our analysis and highlight the effectiveness of the proposed method. Source code and instructions to reproduce this paper are available at </span></span><span><span>https://github.com/aldakata/CCLM/</span><svg><path></path></svg></span></p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Pages 239-245"},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140777367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}