Pub Date : 2024-10-01DOI: 10.1016/j.patrec.2024.10.010
Francesco Giuliari , Gianluca Scarpellini , Stefano Fiorini , Stuart James , Pietro Morerio , Yiming Wang , Alessio Del Bue
Positional reasoning is the process of ordering an unsorted set of parts into a consistent structure. To address this problem, we present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models. Using a diffusion process, we add Gaussian noise to the set elements’ position and map them to a random position in a continuous space. Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. To evaluate our method, we conduct extensive experiments on three different tasks and seven datasets, comparing our approach against the state-of-the-art methods for visual puzzle-solving, sentence ordering, and room arrangement, demonstrating that our method outperforms long-lasting research on puzzle solving with up to compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and room rearrangement. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks. We release our code at https://github.com/IIT-PAVIS/Positional_Diffusion.
{"title":"Positional diffusion: Graph-based diffusion models for set ordering","authors":"Francesco Giuliari , Gianluca Scarpellini , Stefano Fiorini , Stuart James , Pietro Morerio , Yiming Wang , Alessio Del Bue","doi":"10.1016/j.patrec.2024.10.010","DOIUrl":"10.1016/j.patrec.2024.10.010","url":null,"abstract":"<div><div>Positional reasoning is the process of ordering an unsorted set of parts into a consistent structure. To address this problem, we present <em>Positional Diffusion</em>, a plug-and-play graph formulation with Diffusion Probabilistic Models. Using a diffusion process, we add Gaussian noise to the set elements’ position and map them to a random position in a continuous space. <em>Positional Diffusion</em> learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. To evaluate our method, we conduct extensive experiments on three different tasks and seven datasets, comparing our approach against the state-of-the-art methods for visual puzzle-solving, sentence ordering, and room arrangement, demonstrating that our method outperforms long-lasting research on puzzle solving with up to <span><math><mrow><mo>+</mo><mn>17</mn><mtext>%</mtext></mrow></math></span> compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and room rearrangement. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks. We release our code at <span><span>https://github.com/IIT-PAVIS/Positional_Diffusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 272-278"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.patrec.2024.09.015
Mohammad Junayed Hasan , Kazi Rafat , Fuad Rahman , Nabeel Mohammed , Shafin Rahman
Distinguishing between spontaneous and posed smiles from videos poses a significant challenge in pattern classification literature. Researchers have developed feature-based and deep learning-based solutions for this problem. To this end, deep learning outperforms feature-based methods. However, certain aspects of feature-based methods could improve deep learning methods. For example, previous research has shown that Duchenne Marker (or D-Marker) features from the face play a vital role in spontaneous smiles, which can be useful to improve deep learning performances. In this study, we propose a deep learning solution that leverages D-Marker features to improve performance further. Our multi-task learning framework, named DeepMarkerNet, integrates a transformer network with the utilization of facial D-Markers for accurate smile classification. Unlike past methods, our approach simultaneously predicts the class of the smile and associated facial D-Markers using two different feed-forward neural networks, thus creating a symbiotic relationship that enriches the learning process. The novelty of our approach lies in incorporating supervisory signals from the pre-calculated D-Markers (instead of as input in previous works), harmonizing the loss functions through a weighted average. In this way, our training utilizes the benefits of D-Markers, but the inference does not require computing the D-Marker. We validate our model’s effectiveness on four well-known smile datasets: UvA-NEMO, BBC, MMI facial expression, and SPOS datasets, and achieve state-of-the-art results.
{"title":"DeepMarkerNet: Leveraging supervision from the Duchenne Marker for spontaneous smile recognition","authors":"Mohammad Junayed Hasan , Kazi Rafat , Fuad Rahman , Nabeel Mohammed , Shafin Rahman","doi":"10.1016/j.patrec.2024.09.015","DOIUrl":"10.1016/j.patrec.2024.09.015","url":null,"abstract":"<div><div>Distinguishing between spontaneous and posed smiles from videos poses a significant challenge in pattern classification literature. Researchers have developed feature-based and deep learning-based solutions for this problem. To this end, deep learning outperforms feature-based methods. However, certain aspects of feature-based methods could improve deep learning methods. For example, previous research has shown that Duchenne Marker (or D-Marker) features from the face play a vital role in spontaneous smiles, which can be useful to improve deep learning performances. In this study, we propose a deep learning solution that leverages D-Marker features to improve performance further. Our multi-task learning framework, named DeepMarkerNet, integrates a transformer network with the utilization of facial D-Markers for accurate smile classification. Unlike past methods, our approach simultaneously predicts the class of the smile and associated facial D-Markers using two different feed-forward neural networks, thus creating a symbiotic relationship that enriches the learning process. The novelty of our approach lies in incorporating supervisory signals from the pre-calculated D-Markers (instead of as input in previous works), harmonizing the loss functions through a weighted average. In this way, our training utilizes the benefits of D-Markers, but the inference does not require computing the D-Marker. We validate our model’s effectiveness on four well-known smile datasets: UvA-NEMO, BBC, MMI facial expression, and SPOS datasets, and achieve state-of-the-art results.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 148-155"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.patrec.2024.10.003
Bingbing Zhang , Ying Zhang , Jianxin Zhang , Qiule Sun , Rong Wang , Qiang Zhang
Vision-Language models (VLMs) have shown promising improvements on various visual tasks. Most existing VLMs employ two separate transformer-based encoders, each dedicated to modeling visual and language features independently. Because the visual features and language features are unaligned in the feature space, it is challenging for the multi-modal encoder to learn vision-language interactions. In this paper, we propose a Visual-guided Hierarchical Iterative Fusion (VgHIF) method for VLMs in video action recognition, which acquires more discriminative vision and language representation. VgHIF leverages visual features from different levels in visual encoder to interact with language representation. The interaction is processed by the attention mechanism to calculate the correlation between visual features and language representation. VgHIF learns grounded video-text representation and supports many different pre-trained VLMs in a flexible and efficient manner with a tiny computational cost. We conducted experiments on the Kinetics-400 Mini Kinetics 200 HMDB51, and UCF101 using VLMs: CLIP, X-CLIP, and ViFi-CLIP. The experiments were conducted under full supervision and few shot settings, and compared with the baseline multi-modal model without VgHIF, the Top-1 accuracy of the proposed method has been improved to varying degrees, and several groups of results have achieved comparable results with state-of-the-art performance, which strongly verified the effectiveness of the proposed method.
{"title":"Visual-guided hierarchical iterative fusion for multi-modal video action recognition","authors":"Bingbing Zhang , Ying Zhang , Jianxin Zhang , Qiule Sun , Rong Wang , Qiang Zhang","doi":"10.1016/j.patrec.2024.10.003","DOIUrl":"10.1016/j.patrec.2024.10.003","url":null,"abstract":"<div><div>Vision-Language models<!--> <!-->(VLMs) have shown promising improvements on various visual tasks. Most existing VLMs employ two separate transformer-based encoders, each dedicated to modeling visual and language features independently. Because the visual features and language features are unaligned in the feature space, it is challenging for the multi-modal encoder to learn vision-language interactions. In this paper, we propose a <strong>V</strong>isual-<strong>g</strong>uided <strong>H</strong>ierarchical <strong>I</strong>terative <strong>F</strong>usion (VgHIF) method for VLMs in video action recognition, which acquires more discriminative vision and language representation. VgHIF leverages visual features from different levels in visual encoder to interact with language representation. The interaction is processed by the attention mechanism to calculate the correlation between visual features and language representation. VgHIF learns grounded video-text representation and supports many different pre-trained VLMs in a flexible and efficient manner with a tiny computational cost. We conducted experiments on the Kinetics-400 Mini Kinetics 200 HMDB51, and UCF101 using VLMs: CLIP, X-CLIP, and ViFi-CLIP. The experiments were conducted under full supervision and few shot settings, and compared with the baseline multi-modal model without VgHIF, the Top-1 accuracy of the proposed method has been improved to varying degrees, and several groups of results have achieved comparable results with state-of-the-art performance, which strongly verified the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 213-220"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.patrec.2024.09.016
Alexander Kolpakov , Michael Werman
Robust Affine Matching with Grassmannians (RoAM) is a new algorithm to perform affine registration of point clouds. The algorithm is based on minimizing the Frobenius distance between two elements of the Grassmannian. For this purpose, an indefinite relaxation of the Quadratic Assignment Problem (QAP) is used, and several approaches to affine feature matching are studied and compared. Experiments demonstrate that RoAM is more robust to noise and point discrepancy than previous methods.
{"title":"Robust affine point matching via quadratic assignment on Grassmannians","authors":"Alexander Kolpakov , Michael Werman","doi":"10.1016/j.patrec.2024.09.016","DOIUrl":"10.1016/j.patrec.2024.09.016","url":null,"abstract":"<div><div>Robust Affine Matching with Grassmannians (RoAM) is a new algorithm to perform affine registration of point clouds. The algorithm is based on minimizing the Frobenius distance between two elements of the Grassmannian. For this purpose, an indefinite relaxation of the Quadratic Assignment Problem (QAP) is used, and several approaches to affine feature matching are studied and compared. Experiments demonstrate that RoAM is more robust to noise and point discrepancy than previous methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 265-271"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Detection of depressive symptoms from spoken content has emerged as an efficient Artificial Intelligence (AI) tool for diagnosing this serious mental health condition. Since speech is a highly sensitive form of data, privacy-enhancing measures need to be in place for this technology to be useful. A common approach to enhance speech privacy is by using adversarial learning that involves concealing speaker’s specific attributes/identity while maintaining performance of the primary task. Although this technique works well for applications such as speech recognition, they are often ineffective for depression detection due to the interplay between certain speaker attributes and the performance of depression detection. This paper studies such interplay through a systematic study on how obfuscating specific speaker attributes (age, education) through adversarial learning impact the performance of a depression detection model. We highlight the relevance of two previously unexplored speaker attributes to depression detection, while considering a multimodal (audio-lexical) setting to highlight the relative vulnerabilities of the modalities under obfuscation. Results on a publicly available, clinically validated, depression detection dataset shows that attempts to disentangle age/education attributes through adversarial learning result in a large drop in depression detection accuracy, especially for the text modality. This calls for a revisit to how privacy mitigation should to be achieved for depression detection and any human-centric applications for that matter.
{"title":"On the effects of obfuscating speaker attributes in privacy-aware depression detection","authors":"Nujud Aloshban , Anna Esposito , Alessandro Vinciarelli , Tanaya Guha","doi":"10.1016/j.patrec.2024.10.016","DOIUrl":"10.1016/j.patrec.2024.10.016","url":null,"abstract":"<div><div>Detection of depressive symptoms from spoken content has emerged as an efficient Artificial Intelligence (AI) tool for diagnosing this serious mental health condition. Since speech is a highly sensitive form of data, privacy-enhancing measures need to be in place for this technology to be useful. A common approach to enhance speech privacy is by using adversarial learning that involves concealing speaker’s specific attributes/identity while maintaining performance of the primary task. Although this technique works well for applications such as speech recognition, they are often ineffective for depression detection due to the interplay between certain speaker attributes and the performance of depression detection. This paper studies such interplay through a systematic study on how obfuscating specific speaker attributes (age, education) through adversarial learning impact the performance of a depression detection model. We highlight the relevance of two previously unexplored speaker attributes to depression detection, while considering a multimodal (audio-lexical) setting to highlight the relative vulnerabilities of the modalities under obfuscation. Results on a publicly available, clinically validated, depression detection dataset shows that attempts to disentangle age/education attributes through adversarial learning result in a large drop in depression detection accuracy, especially for the text modality. This calls for a revisit to how privacy mitigation should to be achieved for depression detection and any human-centric applications for that matter.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 300-305"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.patrec.2024.09.014
Shihui Zhang , Zhigang Huang , Sheng Zhan , Ping Li , Zhiguo Cui , Feiyu Li
Few-shot counting (FSC) is the task of counting the number of objects in an image that belong to the same category, by using a provided exemplar pattern. By replacing the exemplar, we can effectively count anything, even in cases where we have no prior knowledge of that category’s exemplar. However, due to the variations within the same category and the impact of inter-class similarity, it is challenging to achieve accurate intra-class similarity matching using conventional similarity comparison methods. To tackle these issues, we propose a novel few-shot counting method called Multi-stage Exemplar Attention Match Network (MEAMNet), which increases the accuracy of matching, reduces the impact of noise, and enhances similarity feature matching. Specifically, we propose a multi-stage matching strategy to obtain more stable and effective matching results by acquiring similar feature in different feature spaces. In addition, we propose a novel feature matching module called Exemplar Attention Match (EAM). With this module, the intra-class similarity representation in each stage will be enhanced to achieve a better matching of the key feature. Experimental results indicate that our method not only significantly surpasses the state-of-the-art (SOTA) methods in most evaluation metrics on the FSC-147 dataset but also achieves comprehensive superiority on the CARPK dataset. This highlights the outstanding accuracy and stability of our matching performance, as well as its exceptional transferability. We will release the code at https://github.com/hzg0505/MEAMNet.
{"title":"Innovative multi-stage matching for counting anything","authors":"Shihui Zhang , Zhigang Huang , Sheng Zhan , Ping Li , Zhiguo Cui , Feiyu Li","doi":"10.1016/j.patrec.2024.09.014","DOIUrl":"10.1016/j.patrec.2024.09.014","url":null,"abstract":"<div><div>Few-shot counting (FSC) is the task of counting the number of objects in an image that belong to the same category, by using a provided exemplar pattern. By replacing the exemplar, we can effectively count anything, even in cases where we have no prior knowledge of that category’s exemplar. However, due to the variations within the same category and the impact of inter-class similarity, it is challenging to achieve accurate intra-class similarity matching using conventional similarity comparison methods. To tackle these issues, we propose a novel few-shot counting method called Multi-stage Exemplar Attention Match Network (MEAMNet), which increases the accuracy of matching, reduces the impact of noise, and enhances similarity feature matching. Specifically, we propose a multi-stage matching strategy to obtain more stable and effective matching results by acquiring similar feature in different feature spaces. In addition, we propose a novel feature matching module called Exemplar Attention Match (EAM). With this module, the intra-class similarity representation in each stage will be enhanced to achieve a better matching of the key feature. Experimental results indicate that our method not only significantly surpasses the state-of-the-art (SOTA) methods in most evaluation metrics on the FSC-147 dataset but also achieves comprehensive superiority on the CARPK dataset. This highlights the outstanding accuracy and stability of our matching performance, as well as its exceptional transferability. We will release the code at <span><span>https://github.com/hzg0505/MEAMNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 141-147"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given the significant advance in visual SLAM (VSLAM), it might be assumed that the location and mapping problem has been solved. Nevertheless, VSLAM algorithms may exhibit poor performance in unstructured environments. This paper addresses the problem of VSLAM in unstructured planetary-like and agricultural environments. A performance study of state-of-the-art algorithms in these environments was conducted to evaluate their robustness. Quantitative and qualitative results of the study are reported, which exposes that the impressive performance of most state-of-the-art VSLAM algorithms is not generally reflected in unstructured planetary-like and agricultural environments. Statistical scene analysis was performed on datasets from well-known structured environments as well as planetary-like and agricultural datasets to identify visual differences between structured and unstructured environments, which cause VSLAM algorithms to fail. In addition, strategies to overcome the VSLAM algorithm limitations in unstructured planetary-like and agricultural environments are suggested to guide future research on VSLAM in these environments.
{"title":"Evaluation of visual SLAM algorithms in unstructured planetary-like and agricultural environments","authors":"Víctor Romero-Bautista, Leopoldo Altamirano-Robles, Raquel Díaz-Hernández, Saúl Zapotecas-Martínez, Nohemí Sanchez-Medel","doi":"10.1016/j.patrec.2024.09.025","DOIUrl":"10.1016/j.patrec.2024.09.025","url":null,"abstract":"<div><div>Given the significant advance in visual SLAM (VSLAM), it might be assumed that the location and mapping problem has been solved. Nevertheless, VSLAM algorithms may exhibit poor performance in unstructured environments. This paper addresses the problem of VSLAM in unstructured planetary-like and agricultural environments. A performance study of state-of-the-art algorithms in these environments was conducted to evaluate their robustness. Quantitative and qualitative results of the study are reported, which exposes that the impressive performance of most state-of-the-art VSLAM algorithms is not generally reflected in unstructured planetary-like and agricultural environments. Statistical scene analysis was performed on datasets from well-known structured environments as well as planetary-like and agricultural datasets to identify visual differences between structured and unstructured environments, which cause VSLAM algorithms to fail. In addition, strategies to overcome the VSLAM algorithm limitations in unstructured planetary-like and agricultural environments are suggested to guide future research on VSLAM in these environments.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 106-112"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.patrec.2024.10.015
Nadeem Iqbal Kajla , Malik Muhammad Saad Missen , Mickael Coustaty , Hafiz Muhammad Sanaullah Badar , Maruf Pasha , Faiza Belbachir
Deep learning has revolutionized the field of pattern recognition and machine learning by exhibiting exceptional efficiency in recognizing patterns. The success of deep learning can be seen in a wide range of applications including speech recognition, natural language processing, video processing, and image classification. Moreover, it has also been successful in recognizing structural patterns, such as graphs. Graph Neural Networks (GNNs) are models that employ message passing between nodes in a graph to capture its dependencies. These networks memorize a state that approximates graph information with greater depth compared to traditional neural networks. Although training a GNN can be challenging, recent advances in GNN variants, including Graph Convolutional Neural Networks, Gated Graph Neural Networks, and Graph Attention Networks, have shown promising results in solving various problems. In this work, we present a GNN-based approach for computing graph similarity and demonstrate its application to a classification problem. Our proposed method converts the similarity of two graphs into a score, and experiments on state-of-the-art datasets show that the proposed technique is effective and efficient. Results are summarized using a confusion matrix and mean square error metric, demonstrating the accuracy of our proposed technique.
{"title":"A histogram-based approach to calculate graph similarity using graph neural networks","authors":"Nadeem Iqbal Kajla , Malik Muhammad Saad Missen , Mickael Coustaty , Hafiz Muhammad Sanaullah Badar , Maruf Pasha , Faiza Belbachir","doi":"10.1016/j.patrec.2024.10.015","DOIUrl":"10.1016/j.patrec.2024.10.015","url":null,"abstract":"<div><div>Deep learning has revolutionized the field of pattern recognition and machine learning by exhibiting exceptional efficiency in recognizing patterns. The success of deep learning can be seen in a wide range of applications including speech recognition, natural language processing, video processing, and image classification. Moreover, it has also been successful in recognizing structural patterns, such as graphs. Graph Neural Networks (GNNs) are models that employ message passing between nodes in a graph to capture its dependencies. These networks memorize a state that approximates graph information with greater depth compared to traditional neural networks. Although training a GNN can be challenging, recent advances in GNN variants, including Graph Convolutional Neural Networks, Gated Graph Neural Networks, and Graph Attention Networks, have shown promising results in solving various problems. In this work, we present a GNN-based approach for computing graph similarity and demonstrate its application to a classification problem. Our proposed method converts the similarity of two graphs into a score, and experiments on state-of-the-art datasets show that the proposed technique is effective and efficient. Results are summarized using a confusion matrix and mean square error metric, demonstrating the accuracy of our proposed technique.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 286-291"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low-light, nighttime haze, and underwater images captured in harsh environments typically exhibit color deviations and reduced visibility due to light scattering and absorption. Additionally, we observe an almost complete loss of information in at least one color channel in these degraded images. To repair the lost information in each channel, we present an image preprocessing strategy called Local Reference Feature Transfer (LRFT), which employs the local feature to compensate for the color loss automatically. Specifically, we design a dedicated reference image by fusing the detail, salience, and uniform grayscale images of the raw image that ensures a balanced chromaticity distribution. Subsequently, we employ the local reference feature transfer strategy to migrate the local mean and variance of the reference image to the raw image to get a color-corrected image. Extensive evaluation experiments demonstrate that our proposed LRFT method has good preprocessing performance for the subsequent enhancement of images of different degradation types. The code is publicly available at: https://www.researchgate.net/publication/383528251_2024-LRFT.
{"title":"Local Reference Feature Transfer (LRFT): A simple pre-processing step for image enhancement","authors":"Ling Zhou , Weidong Zhang , Yuchao Zheng , Jianping Wang , Wenyi Zhao","doi":"10.1016/j.patrec.2024.10.013","DOIUrl":"10.1016/j.patrec.2024.10.013","url":null,"abstract":"<div><div>Low-light, nighttime haze, and underwater images captured in harsh environments typically exhibit color deviations and reduced visibility due to light scattering and absorption. Additionally, we observe an almost complete loss of information in at least one color channel in these degraded images. To repair the lost information in each channel, we present an image preprocessing strategy called Local Reference Feature Transfer (LRFT), which employs the local feature to compensate for the color loss automatically. Specifically, we design a dedicated reference image by fusing the detail, salience, and uniform grayscale images of the raw image that ensures a balanced chromaticity distribution. Subsequently, we employ the local reference feature transfer strategy to migrate the local mean and variance of the reference image to the raw image to get a color-corrected image. Extensive evaluation experiments demonstrate that our proposed LRFT method has good preprocessing performance for the subsequent enhancement of images of different degradation types. The code is publicly available at: <span><span>https://www.researchgate.net/publication/383528251_2024-LRFT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 330-336"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.patrec.2024.09.012
Yiliang Zhang, Yang Lu, Hanzi Wang
Existing deep learning methods often require a large amount of high-quality labeled data. Yet, the presence of noisy labels in the real-world training data seriously affects the generalization ability of the model. Sample selection techniques, the current dominant approach to mitigating the effects of noisy labels on models, use the consistency of sample predictions and observed labels to make clean selections. However, these methods rely heavily on the accuracy of the sample predictions and inevitably suffer when the model predictions are unstable. To address these issues, we propose an uncertainty-aware neighborhood sample selection method. Especially, it calibrates for sample prediction by neighbor prediction and reassigns model attention to the selected samples based on sample uncertainty. By alleviating the influence of prediction bias on sample selection and avoiding the occurrence of prediction bias, our proposed method achieves excellent performance in extensive experiments. In particular, we achieved an average of 5% improvement in asymmetric noise scenarios.
{"title":"Label-noise learning via uncertainty-aware neighborhood sample selection","authors":"Yiliang Zhang, Yang Lu, Hanzi Wang","doi":"10.1016/j.patrec.2024.09.012","DOIUrl":"10.1016/j.patrec.2024.09.012","url":null,"abstract":"<div><div>Existing deep learning methods often require a large amount of high-quality labeled data. Yet, the presence of noisy labels in the real-world training data seriously affects the generalization ability of the model. Sample selection techniques, the current dominant approach to mitigating the effects of noisy labels on models, use the consistency of sample predictions and observed labels to make clean selections. However, these methods rely heavily on the accuracy of the sample predictions and inevitably suffer when the model predictions are unstable. To address these issues, we propose an uncertainty-aware neighborhood sample selection method. Especially, it calibrates for sample prediction by neighbor prediction and reassigns model attention to the selected samples based on sample uncertainty. By alleviating the influence of prediction bias on sample selection and avoiding the occurrence of prediction bias, our proposed method achieves excellent performance in extensive experiments. In particular, we achieved an average of 5% improvement in asymmetric noise scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 191-197"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142433534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}