首页 > 最新文献

Pattern Recognition最新文献

英文 中文
FedPnP: Personalized graph-structured federated learning
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-15 DOI: 10.1016/j.patcog.2025.111455
Arash Rasti-Meymandi, Ahmad Sajedi, Konstantinos N. Plataniotis
In Personalized Federated Learning (PFL), current methods often fail to consider the fine-grained relationships between clients and their local datasets, hindering effective information exchange. Here, we propose “FedPnP”, a novel method that harnesses the inherent graph-based connections among clients. Clients linked by a graph tend to yield similar model responses to comparable input data. In the proposed FedPnP we present the graph-based optimization as an inverse problem. We then solve this optimization by employing a Half-Quadratic-Splitting technique (HQS) to divide it into two subproblems. The first ensures local model performance on respective datasets, acting as a data fidelity term, while the second promotes the smoothness of model weights on the graph. Notably, we present a structural proximal term in the first subproblem and demonstrate the integration of any graph denoiser in the second subproblem as a plug & play solution. Experiments on CIFAR10, CIFAR100, FashionMNIST, and SVHN demonstrate FedPnP’s superiority over 10 state-of-the-art algorithms, with accuracy improvements ranging from 0.2% to 3%. Notably, FedPnP excels in handling highly heterogeneous data, a critical challenge in real-world PFL scenarios. Additional evaluations show that FedPnP performs consistently well across various denoisers, with the Heat filter delivering the best results. This bridge between PFL algorithms and inverse problems opens up the potential for cross-pollination of solutions, yielding superior algorithms for PFL tasks. The GitHub code is available at https://github.com/arashrasti96/FedPnP.
{"title":"FedPnP: Personalized graph-structured federated learning","authors":"Arash Rasti-Meymandi,&nbsp;Ahmad Sajedi,&nbsp;Konstantinos N. Plataniotis","doi":"10.1016/j.patcog.2025.111455","DOIUrl":"10.1016/j.patcog.2025.111455","url":null,"abstract":"<div><div>In Personalized Federated Learning (PFL), current methods often fail to consider the fine-grained relationships between clients and their local datasets, hindering effective information exchange. Here, we propose “FedPnP”, a novel method that harnesses the inherent graph-based connections among clients. Clients linked by a graph tend to yield similar model responses to comparable input data. In the proposed FedPnP we present the graph-based optimization as an inverse problem. We then solve this optimization by employing a Half-Quadratic-Splitting technique (HQS) to divide it into two subproblems. The first ensures local model performance on respective datasets, acting as a data fidelity term, while the second promotes the smoothness of model weights on the graph. Notably, we present a structural proximal term in the first subproblem and demonstrate the integration of any graph denoiser in the second subproblem as a plug &amp; play solution. Experiments on CIFAR10, CIFAR100, FashionMNIST, and SVHN demonstrate FedPnP’s superiority over 10 state-of-the-art algorithms, with accuracy improvements ranging from 0.2% to 3%. Notably, FedPnP excels in handling highly heterogeneous data, a critical challenge in real-world PFL scenarios. Additional evaluations show that FedPnP performs consistently well across various denoisers, with the Heat filter delivering the best results. This bridge between PFL algorithms and inverse problems opens up the potential for cross-pollination of solutions, yielding superior algorithms for PFL tasks. The GitHub code is available at <span><span>https://github.com/arashrasti96/FedPnP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111455"},"PeriodicalIF":7.5,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143436847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge-Driven Compositional Action Recognition
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-14 DOI: 10.1016/j.patcog.2025.111452
Yang Liu, Fang Liu, Licheng Jiao, Qianyue Bao, Shuo Li, Lingling Li, Xu Liu
Human action often involves interaction with objects, so in action recognition, action labels can be defined by compositions of verbs and nouns. It is almost infeasible to collect and annotate enough training data for every possible composition in the real world. Therefore, the main challenge in compositional action recognition is to enable the model to understand “action-objects” compositions that have not been seen during training. We propose a Knowledge-Driven Composition Modulation Model (KCMM), which constructs unseen “action-objects” compositions to improve action recognition generalization. We first design a Grammar Knowledge-Driven Composition (GKC) module, which extracts the labels of verbs and nouns and their corresponding feature representations from compositional actions, and then modulates them under the guidance of grammatical rules to construct new “action-objects” actions. Subsequently, to verify the rationality of the new “action-objects” actions, we design a Common Knowledge-Driven Verification (CKV) module. This module extracts motion commonsense from ConceptNet and infuses it into the compositional labels to improve the comprehensiveness of the verification. It should be noted that GKC does not construct new videos, but directly composes verbs and nouns at the label and feature space to obtain new compositional action label-feature pairs. We conduct extensive experiments on Something-Else and NEU-I datasets, and our method significantly outperforms current state-of-the-art methods in both compositional settings and few-shot settings. The source code is available at https://github.com/XDLiuyyy/KCMM.
{"title":"Knowledge-Driven Compositional Action Recognition","authors":"Yang Liu,&nbsp;Fang Liu,&nbsp;Licheng Jiao,&nbsp;Qianyue Bao,&nbsp;Shuo Li,&nbsp;Lingling Li,&nbsp;Xu Liu","doi":"10.1016/j.patcog.2025.111452","DOIUrl":"10.1016/j.patcog.2025.111452","url":null,"abstract":"<div><div>Human action often involves interaction with objects, so in action recognition, action labels can be defined by compositions of verbs and nouns. It is almost infeasible to collect and annotate enough training data for every possible composition in the real world. Therefore, the main challenge in compositional action recognition is to enable the model to understand “action-objects” compositions that have not been seen during training. We propose a Knowledge-Driven Composition Modulation Model (KCMM), which constructs unseen “action-objects” compositions to improve action recognition generalization. We first design a Grammar Knowledge-Driven Composition (GKC) module, which extracts the labels of verbs and nouns and their corresponding feature representations from compositional actions, and then modulates them under the guidance of grammatical rules to construct new “action-objects” actions. Subsequently, to verify the rationality of the new “action-objects” actions, we design a Common Knowledge-Driven Verification (CKV) module. This module extracts motion commonsense from ConceptNet and infuses it into the compositional labels to improve the comprehensiveness of the verification. It should be noted that GKC does not construct new videos, but directly composes verbs and nouns at the label and feature space to obtain new compositional action label-feature pairs. We conduct extensive experiments on Something-Else and NEU-I datasets, and our method significantly outperforms current state-of-the-art methods in both compositional settings and few-shot settings. The source code is available at <span><span>https://github.com/XDLiuyyy/KCMM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111452"},"PeriodicalIF":7.5,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient and effective pore matching method using ResCNN descriptor and local outliers
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-14 DOI: 10.1016/j.patcog.2025.111446
Feng Liu , Qiuheng Wang , Yanfeng Xiao , Linlin Shen
With the advancement of high-resolution fingerprint scanners, sweat pores have emerged as a robust biometric feature for fingerprint representation and recognition. Numerous pore-matching algorithms have been developed to enhance the accuracy of automatic fingerprint recognition systems (AFRSs). However, existing models often suffer from inefficiencies and poor generalization performance. This article introduces a novel method that balances efficiency and effectiveness. After fingerprints are aligned and pores are annotated, a ResCNN-based pore descriptor is designed to capture both static and dynamic features of sweat pores, with an emphasis on inter-class differences and intra-class similarities. This leads to the generation of robust descriptors that can handle variations such as deformation and pressure changes. Additionally, the AdaLAM algorithm is refined to efficiently remove local outliers, which improves matching accuracy and reduces computational time. To adapt to different scenarios, different strategies are employed for partial and full fingerprint recognition. For partial fingerprints, the method addresses the challenge of small overlapping areas by incorporating distinctive pore matching results using AdaLAM. For full fingerprints, the method trains image descriptors and integrates fingerprint similarity with pore matching to further enhance accuracy. Experiments on the benchmark PolyU-HRF dataset demonstrate that the algorithm achieves an equal error rate (EER) of 1.71% for DBI (partial fingerprints) and 0.02% for DBII (full fingerprints). Compared to current state-of-the-art approaches, the method reduces the False Match Rate 1000 (FMR1000) by 38.88% for partial fingerprints and 100% for full fingerprints, with a speed improvement of approximately 90 times.
{"title":"An efficient and effective pore matching method using ResCNN descriptor and local outliers","authors":"Feng Liu ,&nbsp;Qiuheng Wang ,&nbsp;Yanfeng Xiao ,&nbsp;Linlin Shen","doi":"10.1016/j.patcog.2025.111446","DOIUrl":"10.1016/j.patcog.2025.111446","url":null,"abstract":"<div><div>With the advancement of high-resolution fingerprint scanners, sweat pores have emerged as a robust biometric feature for fingerprint representation and recognition. Numerous pore-matching algorithms have been developed to enhance the accuracy of automatic fingerprint recognition systems (AFRSs). However, existing models often suffer from inefficiencies and poor generalization performance. This article introduces a novel method that balances efficiency and effectiveness. After fingerprints are aligned and pores are annotated, a ResCNN-based pore descriptor is designed to capture both static and dynamic features of sweat pores, with an emphasis on inter-class differences and intra-class similarities. This leads to the generation of robust descriptors that can handle variations such as deformation and pressure changes. Additionally, the AdaLAM algorithm is refined to efficiently remove local outliers, which improves matching accuracy and reduces computational time. To adapt to different scenarios, different strategies are employed for partial and full fingerprint recognition. For partial fingerprints, the method addresses the challenge of small overlapping areas by incorporating distinctive pore matching results using AdaLAM. For full fingerprints, the method trains image descriptors and integrates fingerprint similarity with pore matching to further enhance accuracy. Experiments on the benchmark PolyU-HRF dataset demonstrate that the algorithm achieves an equal error rate (EER) of 1.71% for DBI (partial fingerprints) and 0.02% for DBII (full fingerprints). Compared to current state-of-the-art approaches, the method reduces the False Match Rate 1000 (FMR1000) by 38.88% for partial fingerprints and 100% for full fingerprints, with a speed improvement of approximately 90 times.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111446"},"PeriodicalIF":7.5,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143428779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-based material recognition via short-time contact sensing
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-14 DOI: 10.1016/j.patcog.2025.111448
Zhenyang Liu , Yitian Shao , Qiliang Li , Jingyong Su
Embodied intelligence needs haptic sensing for spontaneous and accurate material recognition. The haptic sensing module of an intelligent system can acquire material data through either sliding or tapping motions. Sliding movements are commonly adopted for collecting the spatial frequency features of the material but are less time-efficient than tapping. Here, we introduce a haptic sensing framework that can extract material features from short-time tapping signals. To improve the performance of material recognition, transfer learning is used by transferring the knowledge of pretrained model training on large-scale images into haptic sensing. The waveforms of the tapping signals are encoded as images to be input into a transformer model tailored for image recognition tasks. The encoding employs line graph image-point scaling, effectively accommodating signals that exhibit large variations in magnitude and temporal structures. Using the LMT haptic material database containing sliding and tapping data, our study showcases the efficacy of the proposed framework in material recognition tasks, especially for short-time (  60 ms) sensing via tapping interactions. The findings provide fresh insights into haptic sensing technologies and may help improve the physical interaction capabilities of embodied intelligence, such as medical and rescue robots.
{"title":"Transformer-based material recognition via short-time contact sensing","authors":"Zhenyang Liu ,&nbsp;Yitian Shao ,&nbsp;Qiliang Li ,&nbsp;Jingyong Su","doi":"10.1016/j.patcog.2025.111448","DOIUrl":"10.1016/j.patcog.2025.111448","url":null,"abstract":"<div><div>Embodied intelligence needs haptic sensing for spontaneous and accurate material recognition. The haptic sensing module of an intelligent system can acquire material data through either sliding or tapping motions. Sliding movements are commonly adopted for collecting the spatial frequency features of the material but are less time-efficient than tapping. Here, we introduce a haptic sensing framework that can extract material features from short-time tapping signals. To improve the performance of material recognition, transfer learning is used by transferring the knowledge of pretrained model training on large-scale images into haptic sensing. The waveforms of the tapping signals are encoded as images to be input into a transformer model tailored for image recognition tasks. The encoding employs line graph image-point scaling, effectively accommodating signals that exhibit large variations in magnitude and temporal structures. Using the LMT haptic material database containing sliding and tapping data, our study showcases the efficacy of the proposed framework in material recognition tasks, especially for short-time (<span><math><mo>≤</mo></math></span> <!--> <!-->60 ms) sensing via tapping interactions. The findings provide fresh insights into haptic sensing technologies and may help improve the physical interaction capabilities of embodied intelligence, such as medical and rescue robots.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111448"},"PeriodicalIF":7.5,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time dual-eye collaborative eyeblink detection with contrastive learning
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-13 DOI: 10.1016/j.patcog.2025.111440
Hanli Zhao , Yu Wang , Wanglong Lu , Zili Yi , Jun Liu , Minglun Gong
Real-time detection of eyeblinks in uncontrolled settings is crucial for applications such as driver fatigue monitoring, face spoofing prevention, and emotion analysis. This task, however, is significantly challenged by variations in facial poses, motion blur, and inconsistent lighting conditions, which frequently lead traditional facial landmark analysis tools to perform poorly, especially in low-light and dynamic environments. often lead to imprecise localization of key regions of interest, undermining the effectiveness of subsequent blink detection. To address these issues, we have developed a novel real-time dual-eye collaborative eyeblink detection method that incorporates contrastive learning. Our approach includes a consistent eye feature embedding technique that minimizes the impact of adverse lighting and extraneous noise during feature extraction. Through contrastive learning, we align feature embeddings of coarsely captured, low-light eye patches with those from finely detailed, well-lit patches. Furthermore, to enhance eyeblink detection and reduce false identifications of eye regions, we exploit the natural synchrony in blink patterns between the left and right eyes. We introduce a dual-eye collaborative spatio-temporal attention mechanism that captures both the inter-eye correlations and the temporal dynamics across sequences. Our collaborative learning approach maximizes the inherent synchrony and cooperation between the two eyes, significantly improving detection accuracy. Extensive experiments on three datasets and their low-light variants demonstrate that our method operates in real-time, adjusts effectively to varying lighting conditions, and performs robustly in untrimmed video scenarios.
{"title":"Real-time dual-eye collaborative eyeblink detection with contrastive learning","authors":"Hanli Zhao ,&nbsp;Yu Wang ,&nbsp;Wanglong Lu ,&nbsp;Zili Yi ,&nbsp;Jun Liu ,&nbsp;Minglun Gong","doi":"10.1016/j.patcog.2025.111440","DOIUrl":"10.1016/j.patcog.2025.111440","url":null,"abstract":"<div><div>Real-time detection of eyeblinks in uncontrolled settings is crucial for applications such as driver fatigue monitoring, face spoofing prevention, and emotion analysis. This task, however, is significantly challenged by variations in facial poses, motion blur, and inconsistent lighting conditions, which frequently lead traditional facial landmark analysis tools to perform poorly, especially in low-light and dynamic environments. often lead to imprecise localization of key regions of interest, undermining the effectiveness of subsequent blink detection. To address these issues, we have developed a novel real-time dual-eye collaborative eyeblink detection method that incorporates contrastive learning. Our approach includes a consistent eye feature embedding technique that minimizes the impact of adverse lighting and extraneous noise during feature extraction. Through contrastive learning, we align feature embeddings of coarsely captured, low-light eye patches with those from finely detailed, well-lit patches. Furthermore, to enhance eyeblink detection and reduce false identifications of eye regions, we exploit the natural synchrony in blink patterns between the left and right eyes. We introduce a dual-eye collaborative spatio-temporal attention mechanism that captures both the inter-eye correlations and the temporal dynamics across sequences. Our collaborative learning approach maximizes the inherent synchrony and cooperation between the two eyes, significantly improving detection accuracy. Extensive experiments on three datasets and their low-light variants demonstrate that our method operates in real-time, adjusts effectively to varying lighting conditions, and performs robustly in untrimmed video scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111440"},"PeriodicalIF":7.5,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning from not-all-negative pairwise data and unlabeled data
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-13 DOI: 10.1016/j.patcog.2025.111442
Shuying Huang, Junpeng Li, Changchun Hua, Yana Yang
A weakly-supervised approach utilizing data pairs with comparative or similarity/dissimilarity information has gained popularity in various fields due to its cost-effectiveness. However, the challenge of dealing with not all negative (i.e., pairwise data that includes at least one positive) or not all positive (i.e., pairwise data that includes at least one negative) data pairs has not been specifically addressed by any algorithm. To overcome this bottleneck, this paper explores a novelty weakly-supervision framework of learning from pairwise data that includes at least one positive and unlabeled data points (PposU) as a representative. The provided pairwise data ensures that each data pair contains at least one positive data point. Unlabeled data refers to data without labeled information. Firstly, this paper shows an unbiased risk estimator for PposU data and use risk correction functions to mitigate the overfitting caused by negative terms. In addition, the estimation error bound is established for the empirical risk minimizer and the optimal convergence rate is obtained. Finally, the detailed experimental process and results are presented to demonstrate the effectiveness of the proposed method.
{"title":"Learning from not-all-negative pairwise data and unlabeled data","authors":"Shuying Huang,&nbsp;Junpeng Li,&nbsp;Changchun Hua,&nbsp;Yana Yang","doi":"10.1016/j.patcog.2025.111442","DOIUrl":"10.1016/j.patcog.2025.111442","url":null,"abstract":"<div><div>A weakly-supervised approach utilizing data pairs with comparative or similarity/dissimilarity information has gained popularity in various fields due to its cost-effectiveness. However, the challenge of dealing with not all negative (<em>i.e</em>., pairwise data that includes at least one positive) or not all positive (<em>i.e</em>., pairwise data that includes at least one negative) data pairs has not been specifically addressed by any algorithm. To overcome this bottleneck, this paper explores a novelty weakly-supervision framework of learning from pairwise data that includes at least one positive and unlabeled data points (<span><math><mrow><msub><mrow><mi>P</mi></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi></mrow></msub><mi>U</mi></mrow></math></span>) as a representative. The provided pairwise data ensures that each data pair contains at least one positive data point. Unlabeled data refers to data without labeled information. Firstly, this paper shows an unbiased risk estimator for <span><math><mrow><msub><mrow><mi>P</mi></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi></mrow></msub><mi>U</mi></mrow></math></span> data and use risk correction functions to mitigate the overfitting caused by negative terms. In addition, the estimation error bound is established for the empirical risk minimizer and the optimal convergence rate is obtained. Finally, the detailed experimental process and results are presented to demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111442"},"PeriodicalIF":7.5,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143428780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive division and priori reinforcement part learning network for vehicle re-identification
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-13 DOI: 10.1016/j.patcog.2025.111453
Xiaoying Zhou , Xi Li , Houren Zhou , Xiyu Pang , Jiachen Tian , Xiushan Nie , Cheng Wang , Yilong Yin
Vehicle Re-identification (Re-ID) recognizes images belonging to the same vehicle from a large number of vehicle images captured by different cameras. Learning subtle discriminative information in parts is key to meeting the challenge of small interclass difference in vehicle Re-ID. Methods that use additional models and annotations can accurately locate parts to learn part-level features, however, they require more computational and labor costs. The rigid division strategy can fully utilize the priori information to learn interpretable part features, but it breaks semantic continuity of parts and makes the interference of noise larger. In this paper, we propose an adaptive division part learning module (ADP). It adaptively generates spatially nonoverlapping diversity part masks based on multi-head self-attention semantic aggregation process to decouple part learning. It lets each head focus on the semantic aggregation of different parts and does not need to resort to additional annotations or models. In addition, we propose a priori reinforcement parts learning module (PRP). PRP establishes links between one part and all parts obtained by rigid division through a self-attention mechanism. This process emphasizes important detail information within the part from a global viewpoint and suppresses noise interference. Finally, based on the above two modules, we construct an adaptive division and priori reinforcement part learning network (ADPRP-Net) to learn granular features in an adaptive and priori way to deal with the challenge of small interclass difference. Experimental results on the VeRi-776 and VehicleID datasets show that ADPRP-Net achieves excellent vehicle Re-ID performance. And on the small test subset of the VehicleID dataset, ADPRP-Net has 3.3% higher Rank-1 accuracy and 1.7% higher Rank-5 accuracy compared to the state-of-the-art (SOTA) transformer-based Re-ID method (DSN). Code is available at https://github.com/zxy1116/ADPRP-Net.
{"title":"Adaptive division and priori reinforcement part learning network for vehicle re-identification","authors":"Xiaoying Zhou ,&nbsp;Xi Li ,&nbsp;Houren Zhou ,&nbsp;Xiyu Pang ,&nbsp;Jiachen Tian ,&nbsp;Xiushan Nie ,&nbsp;Cheng Wang ,&nbsp;Yilong Yin","doi":"10.1016/j.patcog.2025.111453","DOIUrl":"10.1016/j.patcog.2025.111453","url":null,"abstract":"<div><div>Vehicle Re-identification (Re-ID) recognizes images belonging to the same vehicle from a large number of vehicle images captured by different cameras. Learning subtle discriminative information in parts is key to meeting the challenge of small interclass difference in vehicle Re-ID. Methods that use additional models and annotations can accurately locate parts to learn part-level features, however, they require more computational and labor costs. The rigid division strategy can fully utilize the priori information to learn interpretable part features, but it breaks semantic continuity of parts and makes the interference of noise larger. In this paper, we propose an adaptive division part learning module (ADP). It adaptively generates spatially nonoverlapping diversity part masks based on multi-head self-attention semantic aggregation process to decouple part learning. It lets each head focus on the semantic aggregation of different parts and does not need to resort to additional annotations or models. In addition, we propose a priori reinforcement parts learning module (PRP). PRP establishes links between one part and all parts obtained by rigid division through a self-attention mechanism. This process emphasizes important detail information within the part from a global viewpoint and suppresses noise interference. Finally, based on the above two modules, we construct an adaptive division and priori reinforcement part learning network (ADPRP-Net) to learn granular features in an adaptive and priori way to deal with the challenge of small interclass difference. Experimental results on the VeRi-776 and VehicleID datasets show that ADPRP-Net achieves excellent vehicle Re-ID performance. And on the small test subset of the VehicleID dataset, ADPRP-Net has 3.3% higher Rank-1 accuracy and 1.7% higher Rank-5 accuracy compared to the state-of-the-art (SOTA) transformer-based Re-ID method (DSN). Code is available at <span><span>https://github.com/zxy1116/ADPRP-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111453"},"PeriodicalIF":7.5,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143428781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSAN: Multiscale self-attention network for pansharpening
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-13 DOI: 10.1016/j.patcog.2025.111441
Hangyuan Lu , Yong Yang , Shuying Huang , Rixian Liu , Huimin Guo
Effective extraction of spectral–spatial features from multispectral (MS) and panchromatic (PAN) images is critical for high-quality pansharpening. However, existing deep learning methods often overlook local misalignment and struggle to integrate local and long-range features effectively, resulting in spectral and spatial distortions. To address these challenges, this paper proposes a refined detail injection model that adaptively learns injection coefficients using long-range features. Building upon this model, a multiscale self-attention network (MSAN) is proposed, consisting of a feature extraction branch and a self-attention mechanism branch. In the former branch, a two-stage multiscale convolution network is designed to fully extract detail features with multiple receptive fields. In the latter branch, a streamlined Swin Transformer (SST) is proposed to efficiently generate multiscale self-attention maps by learning the correlation between local and long-range features. To better preserve spectral–spatial information, a revised Swin Transformer block is proposed by incorporating spectral and spatial attention within the block. The obtained self-attention maps from SST serve as the injection coefficients to refine the extracted details, which are then injected into the upsampled MS image to produce the final fused image. Experimental validation demonstrates the superiority of MSAN over traditional and state-of-the-art methods, with competitive efficiency. The code of this work will be released on GitHub once the paper is accepted.
{"title":"MSAN: Multiscale self-attention network for pansharpening","authors":"Hangyuan Lu ,&nbsp;Yong Yang ,&nbsp;Shuying Huang ,&nbsp;Rixian Liu ,&nbsp;Huimin Guo","doi":"10.1016/j.patcog.2025.111441","DOIUrl":"10.1016/j.patcog.2025.111441","url":null,"abstract":"<div><div>Effective extraction of spectral–spatial features from multispectral (MS) and panchromatic (PAN) images is critical for high-quality pansharpening. However, existing deep learning methods often overlook local misalignment and struggle to integrate local and long-range features effectively, resulting in spectral and spatial distortions. To address these challenges, this paper proposes a refined detail injection model that adaptively learns injection coefficients using long-range features. Building upon this model, a multiscale self-attention network (MSAN) is proposed, consisting of a feature extraction branch and a self-attention mechanism branch. In the former branch, a two-stage multiscale convolution network is designed to fully extract detail features with multiple receptive fields. In the latter branch, a streamlined Swin Transformer (SST) is proposed to efficiently generate multiscale self-attention maps by learning the correlation between local and long-range features. To better preserve spectral–spatial information, a revised Swin Transformer block is proposed by incorporating spectral and spatial attention within the block. The obtained self-attention maps from SST serve as the injection coefficients to refine the extracted details, which are then injected into the upsampled MS image to produce the final fused image. Experimental validation demonstrates the superiority of MSAN over traditional and state-of-the-art methods, with competitive efficiency. The code of this work will be released on GitHub once the paper is accepted.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111441"},"PeriodicalIF":7.5,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive transformer with Pyramid Fusion for cloth-changing Person Re-Identification
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-12 DOI: 10.1016/j.patcog.2025.111443
Guoqing Zhang , Jieqiong Zhou , Yuhui Zheng , Gaven Martin , Ruili Wang
Recently, Transformer-based methods have made great progress in person re-identification (Re-ID), especially in handling identity changes in clothing-changing scenarios. Most current studies usually use biometric information-assisted methods such as human pose estimation to enhance the local perception ability of clothes-changing Re-ID. However, it is usually difficult for them to establish the connection between local biometric information and global identity semantics during training, resulting in the lack of local perception ability during the inference phase, which limits the improvement of model performance. In this paper, we propose a Transformer-based Adaptive-Aware Attention and Pyramid Fusion Network (A3PFN) for CC Re-ID, which can capture and integrate multi-scale visual information to enhance recognition ability. Firstly, to improve the information utilization efficiency of the model in cloth-changing scenarios, we propose a Multi-Layer Dynamic Concentration module (MLDC) to evaluate the importance features at each layer in real time and reduce the computational overlap between related layers. Secondly, we propose a Local Pyramid Aggregation Module (LPAM) to extract multi-scale features, aiming to maintain global perceptual capability and focus on key local information. In this module, we also combine the Fast Fourier Transform (FFT) with self-attention mechanism to more effectively identify and analyze pedestrian gait and other structural details in the frequency domain and reduce the computational complexity of processing high-dimensional data in the self-attention mechanism. Finally, we build a new dataset incorporating diverse atmospheric conditions (for instance wind and rain) to more realistically simulate natural scenarios for the changing of clothes. Extensive experiments on multiple cloth-changing datasets clearly confirm the superior performance of A3PFN. The dataset and related code are available on the website: https://github.com/jieqiongz1999/vcclothes-w-r.
{"title":"Adaptive transformer with Pyramid Fusion for cloth-changing Person Re-Identification","authors":"Guoqing Zhang ,&nbsp;Jieqiong Zhou ,&nbsp;Yuhui Zheng ,&nbsp;Gaven Martin ,&nbsp;Ruili Wang","doi":"10.1016/j.patcog.2025.111443","DOIUrl":"10.1016/j.patcog.2025.111443","url":null,"abstract":"<div><div>Recently, Transformer-based methods have made great progress in person re-identification (Re-ID), especially in handling identity changes in clothing-changing scenarios. Most current studies usually use biometric information-assisted methods such as human pose estimation to enhance the local perception ability of clothes-changing Re-ID. However, it is usually difficult for them to establish the connection between local biometric information and global identity semantics during training, resulting in the lack of local perception ability during the inference phase, which limits the improvement of model performance. In this paper, we propose a Transformer-based Adaptive-Aware Attention and Pyramid Fusion Network (<span><math><mrow><msup><mrow><mi>A</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>P</mi><mi>F</mi><mi>N</mi></mrow></math></span>) for CC Re-ID, which can capture and integrate multi-scale visual information to enhance recognition ability. Firstly, to improve the information utilization efficiency of the model in cloth-changing scenarios, we propose a Multi-Layer Dynamic Concentration module (MLDC) to evaluate the importance features at each layer in real time and reduce the computational overlap between related layers. Secondly, we propose a Local Pyramid Aggregation Module (LPAM) to extract multi-scale features, aiming to maintain global perceptual capability and focus on key local information. In this module, we also combine the Fast Fourier Transform (FFT) with self-attention mechanism to more effectively identify and analyze pedestrian gait and other structural details in the frequency domain and reduce the computational complexity of processing high-dimensional data in the self-attention mechanism. Finally, we build a new dataset incorporating diverse atmospheric conditions (for instance wind and rain) to more realistically simulate natural scenarios for the changing of clothes. Extensive experiments on multiple cloth-changing datasets clearly confirm the superior performance of <span><math><mrow><msup><mrow><mi>A</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>P</mi><mi>F</mi><mi>N</mi></mrow></math></span>. The dataset and related code are available on the website: <span><span>https://github.com/jieqiongz1999/vcclothes-w-r</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111443"},"PeriodicalIF":7.5,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised video object segmentation via pseudo label rectification
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-12 DOI: 10.1016/j.patcog.2025.111428
Pinxue Guo , Wei Zhang , Xiaoqiang Li , Jianping Fan , Wenqiang Zhang
In this paper we propose a novel self-supervised framework for video object segmentation (VOS) which consists of siamese encoders and bi-decoders. Siamese encoders extract multi-level features and generate pseudo labels for each pixel by cross attention in visual-semantic space. Such siamese encoders are learned via the colorization task without any labeled video data. Bi-decoders take in features from different layers of the encoder and output refined segmentation masks. Such bi-decoders are trained by the pseudo labels, and in turn pseudo labels are rectified via bi-decoders mutual learning. The variation of the bi-decoders’ outputs is minimized such that the gap between pseudo labels and the ground-truth is reduced. Experimental results on the challenging datasets DAVIS-2017 and YouTube-VOS demonstrate the effectiveness of our proposed approach.
{"title":"Self-supervised video object segmentation via pseudo label rectification","authors":"Pinxue Guo ,&nbsp;Wei Zhang ,&nbsp;Xiaoqiang Li ,&nbsp;Jianping Fan ,&nbsp;Wenqiang Zhang","doi":"10.1016/j.patcog.2025.111428","DOIUrl":"10.1016/j.patcog.2025.111428","url":null,"abstract":"<div><div>In this paper we propose a novel self-supervised framework for video object segmentation (VOS) which consists of siamese encoders and bi-decoders. Siamese encoders extract multi-level features and generate pseudo labels for each pixel by cross attention in visual-semantic space. Such siamese encoders are learned via the colorization task without any labeled video data. Bi-decoders take in features from different layers of the encoder and output refined segmentation masks. Such bi-decoders are trained by the pseudo labels, and in turn pseudo labels are rectified via bi-decoders mutual learning. The variation of the bi-decoders’ outputs is minimized such that the gap between pseudo labels and the ground-truth is reduced. Experimental results on the challenging datasets DAVIS-2017 and YouTube-VOS demonstrate the effectiveness of our proposed approach.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111428"},"PeriodicalIF":7.5,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1