Pub Date : 2026-01-30DOI: 10.1016/j.patcog.2026.113210
Yu Guo , Yi Liu , Guoqing Chen , Tieyong Zeng , Qiyu Jin , Michael Kwok-Po Ng
Graph structures are effective for capturing low-dimensional manifolds within high-dimensional data spaces and are frequently utilized as regularization terms to smooth graph signals. A crucial element in this process is the construction of the graph Laplacian. However, the normalization of this Laplacian often necessitates computationally expensive inverse operations. To address this limitation, this paper introduces quaternion graph regularity and proposes the quaternion adaptive approximation normalization graph (QAANG). QAANG offers a computationally efficient solution by requiring only a single adaptive scalar for approximate normalization, thereby circumventing the need for inverse operations. To promote the low rank of the graph, we implicitly embed the low rank into the data fidelity term. This approach not only avoids the significant costs associated with the explicit computation of the low-rank of quaternion matrices, but also eliminates the need to balance multiple regularization terms and adjust hyperparameters. Experimental results demonstrate that QAANG surpasses current state-of-the-art quaternion methods in both completion performance and robustness.
{"title":"Quaternion adaptive approximation normalization graph guided implicit low rank for robust matrix completion","authors":"Yu Guo , Yi Liu , Guoqing Chen , Tieyong Zeng , Qiyu Jin , Michael Kwok-Po Ng","doi":"10.1016/j.patcog.2026.113210","DOIUrl":"10.1016/j.patcog.2026.113210","url":null,"abstract":"<div><div>Graph structures are effective for capturing low-dimensional manifolds within high-dimensional data spaces and are frequently utilized as regularization terms to smooth graph signals. A crucial element in this process is the construction of the graph Laplacian. However, the normalization of this Laplacian often necessitates computationally expensive inverse operations. To address this limitation, this paper introduces quaternion graph regularity and proposes the quaternion adaptive approximation normalization graph (QAANG). QAANG offers a computationally efficient solution by requiring only a single adaptive scalar for approximate normalization, thereby circumventing the need for inverse operations. To promote the low rank of the graph, we implicitly embed the low rank into the data fidelity term. This approach not only avoids the significant costs associated with the explicit computation of the low-rank of quaternion matrices, but also eliminates the need to balance multiple regularization terms and adjust hyperparameters. Experimental results demonstrate that QAANG surpasses current state-of-the-art quaternion methods in both completion performance and robustness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113210"},"PeriodicalIF":7.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1016/j.patcog.2026.113199
Kejian Yu, Zhaohui Zhang, Chaochao Hu, Jiehao Luo
Facial micro-expressions (MEs) are fleeting, involuntary facial movements that reveal genuine emotions and play a key role in lie detection and affective computing. However, existing micro-expression recognition (MER) methods often rely on imprecise priors, such as facial landmark localization and action unit annotations, resulting in incomplete or noisy motion representations. To address these limitations, we propose the Symmetric Optical Flow Perception (SOFP) framework, which partitions the optical flow (OF) into four bilaterally symmetric facial regions to capture both local and global motion cues through attention-guided encoders. Furthermore, we introduce the Symmetric Region-Aware Attention Fusion (SRAF) module, which (i) enforces semantic consistency between symmetric facial regions, (ii) models inter-region dependencies via structure-guided self-attention, and (iii) integrates global-to-local information through adaptive cross-attention fusion. Extensive experiments are conducted under both composite database evaluation (CDE) with leave-one-subject-out (LOSO) cross-validation and single dataset evaluation (SDE) settings. On the Composite (Full) dataset, SOFP achieves state-of-the-art performance with a UF1 of 92.84% and a UAR of 92.93%. In SDE, SOFP consistently outperforms existing methods across the SAMM, CASME II, CAS(ME)3, and DFME datasets. These results demonstrate that explicitly modeling facial symmetry and region-level motion information enables robust and accurate MER. The source code is publicly available at https://github.com/Healer-ML/MER.
{"title":"SOFP: Capturing subtle facial dynamics with symmetric optical flow perception for micro-expression recognition","authors":"Kejian Yu, Zhaohui Zhang, Chaochao Hu, Jiehao Luo","doi":"10.1016/j.patcog.2026.113199","DOIUrl":"10.1016/j.patcog.2026.113199","url":null,"abstract":"<div><div>Facial micro-expressions (MEs) are fleeting, involuntary facial movements that reveal genuine emotions and play a key role in lie detection and affective computing. However, existing micro-expression recognition (MER) methods often rely on imprecise priors, such as facial landmark localization and action unit annotations, resulting in incomplete or noisy motion representations. To address these limitations, we propose the Symmetric Optical Flow Perception (SOFP) framework, which partitions the optical flow (OF) into four bilaterally symmetric facial regions to capture both local and global motion cues through attention-guided encoders. Furthermore, we introduce the Symmetric Region-Aware Attention Fusion (SRAF) module, which (i) enforces semantic consistency between symmetric facial regions, (ii) models inter-region dependencies via structure-guided self-attention, and (iii) integrates global-to-local information through adaptive cross-attention fusion. Extensive experiments are conducted under both composite database evaluation (CDE) with leave-one-subject-out (LOSO) cross-validation and single dataset evaluation (SDE) settings. On the Composite (Full) dataset, SOFP achieves state-of-the-art performance with a UF1 of 92.84% and a UAR of 92.93%. In SDE, SOFP consistently outperforms existing methods across the SAMM, CASME II, CAS(ME)<sup>3</sup>, and DFME datasets. These results demonstrate that explicitly modeling facial symmetry and region-level motion information enables robust and accurate MER. The source code is publicly available at <span><span>https://github.com/Healer-ML/MER</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113199"},"PeriodicalIF":7.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1016/j.patcog.2026.113195
Minghua Wan , Taotao Chen , Hai Tan , Mingwei Tang , Guowei Yang
Facing the challenges of ubiquitous noise in high-dimensional datasets and the embedding of data samples in low-dimensional manifolds, traditional robust NMF algorithms have limitations in noise reduction and preserving the geometric structure of data. This paper proposes a novel algorithm, Manifold Regularized Non-negative Principal Component Analysis (ℓ2,p-MRNPCA), which enhances the model’s robustness to noise by introducing ℓ2,p norm constraints and maintains the intrinsic geometric structure of the data. The algorithm further incorporates a Laplacian graph regularization term to preserve local manifold structure, and additionally imposes an independent ℓ2,1-norm penalty on the residual matrix to enhance robustness. Compared to ℓ2,p-PCA, ℓ2,p-MRNPCA demonstrates stronger local learning ability in image data processing, more effectively recognizing image details and patterns. The main contribution of this study is the proposal of a new method that integrates ℓ2,p regularization, NMF, and manifold learning, enhancing the model’s robustness and recognition capabilities. During the optimization of the projection matrix, this method effectively reduces the impact of noise and maintains the geometric integrity of the original data, thus obtaining superior part-based representations. Finally, we designed a Lagrangian–KKT multiplicative update framework to solve ℓ2,p-MRNPCA and conducted experiments on three common datasets and the handwritten MNIST dataset, demonstrating optimal performance.
{"title":"Manifold regularized non-negative PCA with robust ℓ2,p-norm enhancement","authors":"Minghua Wan , Taotao Chen , Hai Tan , Mingwei Tang , Guowei Yang","doi":"10.1016/j.patcog.2026.113195","DOIUrl":"10.1016/j.patcog.2026.113195","url":null,"abstract":"<div><div>Facing the challenges of ubiquitous noise in high-dimensional datasets and the embedding of data samples in low-dimensional manifolds, traditional robust NMF algorithms have limitations in noise reduction and preserving the geometric structure of data. This paper proposes a novel algorithm, Manifold Regularized Non-negative Principal Component Analysis (ℓ<sub>2,<em>p</em></sub>-MRNPCA), which enhances the model’s robustness to noise by introducing ℓ<sub>2,<em>p</em></sub> norm constraints and maintains the intrinsic geometric structure of the data. The algorithm further incorporates a Laplacian graph regularization term to preserve local manifold structure, and additionally imposes an independent ℓ<sub>2,1</sub>-norm penalty on the residual matrix to enhance robustness. Compared to ℓ<sub>2,<em>p</em></sub>-PCA, ℓ<sub>2,<em>p</em></sub>-MRNPCA demonstrates stronger local learning ability in image data processing, more effectively recognizing image details and patterns. The main contribution of this study is the proposal of a new method that integrates ℓ<sub>2,<em>p</em></sub> regularization, NMF, and manifold learning, enhancing the model’s robustness and recognition capabilities. During the optimization of the projection matrix, this method effectively reduces the impact of noise and maintains the geometric integrity of the original data, thus obtaining superior part-based representations. Finally, we designed a Lagrangian–KKT multiplicative update framework to solve ℓ<sub>2,<em>p</em></sub>-MRNPCA and conducted experiments on three common datasets and the handwritten MNIST dataset, demonstrating optimal performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113195"},"PeriodicalIF":7.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.patcog.2026.113191
Qing-Ling Shu , Si-Bao Chen , Xiao Wang , Zhi-Hui You , Wei Lu , Jin Tang , Bin Luo
Accurate detection of road and bridge changes is crucial for urban planning and transportation management, yet presents unique challenges for general change detection (CD). Key difficulties arise from maintaining the continuity of roads and bridges as linear structures and disambiguating visually similar land covers (e.g., road construction vs. bare land). Existing spatial-domain models struggle with these issues, further hindered by the lack of specialized, semantically rich datasets. To fill these gaps, we introduce the Road and Bridge Semantic Change Detection (RB-SCD) dataset. Unlike existing benchmarks that primarily focus on general land cover changes, RB-SCD is the first to systematically target 11 specific semantic change transition types (e.g., water → bridge) anchored to traffic infrastructure. This enables a detailed analysis of traffic infrastructure evolution. Building on this, we propose a novel framework, the Multimodal Frequency-Driven Change Detector (MFDCD). MFDCD integrates multimodal features in the frequency domain through two key components: (1) the Dynamic Frequency Coupler (DFC), which leverages wavelet transform to decompose visual features, enabling it to robustly model the continuity of linear transitions; and (2) the Textual Frequency Filter (TFF), which encodes semantic priors into frequency-domain graphs and applies filter banks to align them with visual features, resolving semantic ambiguities. Experiments demonstrate the state-of-the-art performance of MFDCD on RB-SCD and three public CD datasets. The code will be available at https://github.com/DaGuangDaGuang/RB-SCD.
{"title":"Semantic change detection of roads and bridges: A fine-grained dataset and multimodal frequency-driven detector","authors":"Qing-Ling Shu , Si-Bao Chen , Xiao Wang , Zhi-Hui You , Wei Lu , Jin Tang , Bin Luo","doi":"10.1016/j.patcog.2026.113191","DOIUrl":"10.1016/j.patcog.2026.113191","url":null,"abstract":"<div><div>Accurate detection of road and bridge changes is crucial for urban planning and transportation management, yet presents unique challenges for general change detection (CD). Key difficulties arise from maintaining the continuity of roads and bridges as linear structures and disambiguating visually similar land covers (e.g., road construction vs. bare land). Existing spatial-domain models struggle with these issues, further hindered by the lack of specialized, semantically rich datasets. To fill these gaps, we introduce the Road and Bridge Semantic Change Detection (RB-SCD) dataset. Unlike existing benchmarks that primarily focus on general land cover changes, RB-SCD is the first to systematically target 11 specific semantic change transition types (e.g., water → bridge) anchored to traffic infrastructure. This enables a detailed analysis of traffic infrastructure evolution. Building on this, we propose a novel framework, the Multimodal Frequency-Driven Change Detector (MFDCD). MFDCD integrates multimodal features in the frequency domain through two key components: (1) the Dynamic Frequency Coupler (DFC), which leverages wavelet transform to decompose visual features, enabling it to robustly model the continuity of linear transitions; and (2) the Textual Frequency Filter (TFF), which encodes semantic priors into frequency-domain graphs and applies filter banks to align them with visual features, resolving semantic ambiguities. Experiments demonstrate the state-of-the-art performance of MFDCD on RB-SCD and three public CD datasets. The code will be available at <span><span>https://github.com/DaGuangDaGuang/RB-SCD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113191"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.patcog.2026.113192
Tingting Hang , Ya Guo , Jun Huang , Yirui Wu , Umapada Pal , Shivakumara Palaiahnakote
Continual Relation Extraction (CRE) has achieved significant success due to its ability to adapt to new relations without frequent retraining. However, existing methods still face challenges such as overfitting and representation bias. Inspired by the wake-sleep memory consolidation process of the human brain, this paper proposes a Wake-Sleep Memory Consolidation (WSMC) framework to address these issues systematically. During the wake phase, the model simulates the brain’s information processing mechanism, quickly encoding new relations and storing them in short-term memory. We also introduce the Experience Iterative Learning (EIL) approach, which dynamically adjusts the distribution of relation samples. This approach corrects the model’s representation bias and enhances memory stability through experience replay. During the sleep phase, the model consolidates existing knowledge by replaying long-term memory. Moreover, the framework generates diverse dream data from existing memory sets, thereby increasing the diversity of the training data and improving the model’s generalization capability. Experimental results show that WSMC significantly outperforms other CRE baseline methods on FewRel and TACRED datasets, demonstrating its superior performance compared to baseline methods. Our source code is available at https://github.com/Gyanis9/WSMC.git.
{"title":"Continual relation extraction with wake-sleep memory consolidation","authors":"Tingting Hang , Ya Guo , Jun Huang , Yirui Wu , Umapada Pal , Shivakumara Palaiahnakote","doi":"10.1016/j.patcog.2026.113192","DOIUrl":"10.1016/j.patcog.2026.113192","url":null,"abstract":"<div><div>Continual Relation Extraction (CRE) has achieved significant success due to its ability to adapt to new relations without frequent retraining. However, existing methods still face challenges such as overfitting and representation bias. Inspired by the wake-sleep memory consolidation process of the human brain, this paper proposes a <strong>W</strong>ake-<strong>S</strong>leep <strong>M</strong>emory <strong>C</strong>onsolidation (WSMC) framework to address these issues systematically. During the wake phase, the model simulates the brain’s information processing mechanism, quickly encoding new relations and storing them in short-term memory. We also introduce the Experience Iterative Learning (EIL) approach, which dynamically adjusts the distribution of relation samples. This approach corrects the model’s representation bias and enhances memory stability through experience replay. During the sleep phase, the model consolidates existing knowledge by replaying long-term memory. Moreover, the framework generates diverse dream data from existing memory sets, thereby increasing the diversity of the training data and improving the model’s generalization capability. Experimental results show that WSMC significantly outperforms other CRE baseline methods on FewRel and TACRED datasets, demonstrating its superior performance compared to baseline methods. Our source code is available at <span><span>https://github.com/Gyanis9/WSMC.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113192"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.patcog.2026.113181
Wenxin Shen , Zhixuan Deng , Tianrui Li , Keyu Liu , Deyou Xia , Dayong Deng
As a clustering model based on probability distribution, the Gaussian mixture model (GMM) is extensively implemented in data stream learning. Most GMMs rely on historical instances to adapt to concept drift, but determining how to distinguish drift instances to reduce drift’s negative impact on GMM remains difficult. In addition, owing to the uncertainty of drift ranges, GMM may incorrectly adapt to instances on the drift boundary, resulting in the distribution of sub-clusters being inconsistent with the current data distribution. To address the two issues, this study proposes an incremental Gaussian mixture model based on fuzzy three-way decision (IGMMFTWD). In contrast to existing GMMs for data streams, IGMMFTWD adapts to concept drift based on drift risk and updates drifting sub-clusters locally. First, a fuzzy nearest neighbour method is proposed to construct the region that is suitable for the current drift range. Subsequently, a novel drift risk estimation based on three-way decisions is proposed. This method can reduce the misjudgement costs of drift instances. Finally, the Gaussian mixture model completes incremental adaptation with the local update method. In the experiment, the proposed model is compared and verified in terms of classification accuracy and G-mean. The results show that IGMMFTWD outperforms six state-of-the-art methods.
{"title":"A novel incremental Gaussian mixture model based on fuzzy three-way decision for concept drift adaptation","authors":"Wenxin Shen , Zhixuan Deng , Tianrui Li , Keyu Liu , Deyou Xia , Dayong Deng","doi":"10.1016/j.patcog.2026.113181","DOIUrl":"10.1016/j.patcog.2026.113181","url":null,"abstract":"<div><div>As a clustering model based on probability distribution, the Gaussian mixture model (GMM) is extensively implemented in data stream learning. Most GMMs rely on historical instances to adapt to concept drift, but determining how to distinguish drift instances to reduce drift’s negative impact on GMM remains difficult. In addition, owing to the uncertainty of drift ranges, GMM may incorrectly adapt to instances on the drift boundary, resulting in the distribution of sub-clusters being inconsistent with the current data distribution. To address the two issues, this study proposes an incremental Gaussian mixture model based on fuzzy three-way decision (IGMMFTWD). In contrast to existing GMMs for data streams, IGMMFTWD adapts to concept drift based on drift risk and updates drifting sub-clusters locally. First, a fuzzy nearest neighbour method is proposed to construct the region that is suitable for the current drift range. Subsequently, a novel drift risk estimation based on three-way decisions is proposed. This method can reduce the misjudgement costs of drift instances. Finally, the Gaussian mixture model completes incremental adaptation with the local update method. In the experiment, the proposed model is compared and verified in terms of classification accuracy and G-mean. The results show that IGMMFTWD outperforms six state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113181"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.patcog.2026.113173
Ahmed Mamdouh , Moumen El-Melegy , Samia Ali , Ron Kikinis
This research addresses the challenge of limited data in tabular data classification, particularly prevalent in domains with constraints like healthcare. We propose Tab2Visual, a novel approach that transforms heterogeneous tabular data into visual representations, enabling the application of powerful deep learning models. Tab2Visual effectively addresses data scarcity by incorporating novel image augmentation techniques and facilitating transfer learning. We extensively evaluate the proposed approach on diverse tabular datasets, comparing its performance against a wide range of machine learning algorithms, including classical methods, tree-based ensembles, and state-of-the-art deep learning models specifically designed for tabular data. We also perform an in-depth analysis of factors influencing Tab2Visual’s performance. Our experimental results demonstrate that Tab2Visual outperforms other methods in classification problems with limited tabular data.
{"title":"Tab2Visual: Deep learning for limited tabular data via visual representations and augmentation","authors":"Ahmed Mamdouh , Moumen El-Melegy , Samia Ali , Ron Kikinis","doi":"10.1016/j.patcog.2026.113173","DOIUrl":"10.1016/j.patcog.2026.113173","url":null,"abstract":"<div><div>This research addresses the challenge of limited data in tabular data classification, particularly prevalent in domains with constraints like healthcare. We propose Tab2Visual, a novel approach that transforms heterogeneous tabular data into visual representations, enabling the application of powerful deep learning models. Tab2Visual effectively addresses data scarcity by incorporating novel image augmentation techniques and facilitating transfer learning. We extensively evaluate the proposed approach on diverse tabular datasets, comparing its performance against a wide range of machine learning algorithms, including classical methods, tree-based ensembles, and state-of-the-art deep learning models specifically designed for tabular data. We also perform an in-depth analysis of factors influencing Tab2Visual’s performance. Our experimental results demonstrate that Tab2Visual outperforms other methods in classification problems with limited tabular data.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113173"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.patcog.2026.113163
Yanchao Li , Hongwei Dou , Guanxiao Li , Guangwei Gao , Huiyu Zhou
In plenty of real-world applications, data are generated/collected in a streaming way, and it is hard to obtain their accurate labels of known (seen) classes. Moreover, there are several unknown (unseen/novel) classes would emerge with evolving stream data. In the literatures, existing approaches suffer from three limitations: (1) a gap in intra-class variance arises when seen classes are learned more faster than novel classes; (2) a significant issue arises regarding the imbalance in feature weighting among the learning procedures for both new and old classes; (3) a catastrophic forgetting can occur if we exclusively update the model with new data, resulting in the loss of knowledge acquired from known classes when integrating information related to the current novel classes. This paper investigates the problem of learning with unseen classes detection over a non-stationary data stream. Particularly, we introduce uncertainty adaptive margin mechanism from open-world semi-supervised learning to address the bias stemming from the faster learning of discriminative features for seen classes compared to novel classes. We also develop adaptive weighting scheme to dynamically balance the usage of seen and novel classes data by updating their aggregation weights. In addition, we propose a model updating scheme to gradually incorporated the stored memory and novel class information, thereby reducing the risk of forgetting distinctive attributes associated with known classes. Finally, we formulate the objective in a bi-level optimization that enables our model to maintain consistent performance under class distribution shifts, detect unseen classes with minimal supervision, and achieve robust continual learning in open-world streaming scenarios. Our empirical evaluation of this framework using real-world datasets highlights its superior performance when compared to existing methods.
{"title":"INSERTION: From traditional incremental learning to open-world stream learning","authors":"Yanchao Li , Hongwei Dou , Guanxiao Li , Guangwei Gao , Huiyu Zhou","doi":"10.1016/j.patcog.2026.113163","DOIUrl":"10.1016/j.patcog.2026.113163","url":null,"abstract":"<div><div>In plenty of real-world applications, data are generated/collected in a streaming way, and it is hard to obtain their accurate labels of known (seen) classes. Moreover, there are several unknown (unseen/novel) classes would emerge with evolving stream data. In the literatures, existing approaches suffer from three limitations: (1) a gap in intra-class variance arises when seen classes are learned more faster than novel classes; (2) a significant issue arises regarding the imbalance in feature weighting among the learning procedures for both new and old classes; (3) a catastrophic forgetting can occur if we exclusively update the model with new data, resulting in the loss of knowledge acquired from known classes when integrating information related to the current novel classes. This paper investigates the problem of learning with unseen classes detection over a non-stationary data stream. Particularly, we introduce uncertainty adaptive margin mechanism from open-world semi-supervised learning to address the bias stemming from the faster learning of discriminative features for seen classes compared to novel classes. We also develop adaptive weighting scheme to dynamically balance the usage of seen and novel classes data by updating their aggregation weights. In addition, we propose a model updating scheme to gradually incorporated the stored memory and novel class information, thereby reducing the risk of forgetting distinctive attributes associated with known classes. Finally, we formulate the objective in a bi-level optimization that enables our model to maintain consistent performance under class distribution shifts, detect unseen classes with minimal supervision, and achieve robust continual learning in open-world streaming scenarios. Our empirical evaluation of this framework using real-world datasets highlights its superior performance when compared to existing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113163"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.patcog.2026.113189
Jifeng Shen , Haibo Zhan , Xin Zuo , Heng Fan , Xiaohui Yuan , Jun Li , Wankou Yang
Current multispectral object detection methods often retain extraneous background or noise during feature fusion, limiting perceptual performance. To address this, we propose a feature fusion framework based on cross-modal feature contrastive and screening strategy, diverging from conventional approaches. The proposed method adaptively enhances salient structures by fusing object-aware complementary cross-modal features while suppressing shared background interference. Our solution centers on two novel, specially designed modules: the Mutual Feature Refinement Module (MFRM) and the Differential Feature Feedback Module (DFFM). The MFRM enhances intra- and inter-modal feature representations by modeling their relationships, thereby improving cross-modal alignment and discriminative power. Inspired by feedback differential amplifiers, the DFFM dynamically computes inter-modal differential features as guidance signals and feeds them back to the MFRM, enabling adaptive fusion of complementary information while suppressing common-mode noise across modalities. To enable robust feature learning, the MFRM and DFFM are integrated into a unified framework, which is formally formulated as an Iterative Relation-Map Differential Guided Feature Fusion mechanism, termed IRDFusion. IRDFusion enables high-quality cross-modal fusion by progressively amplifying salient relational signals through iterative feedback, while suppressing feature noise, leading to significant performance gains. In extensive experiments on FLIR, LLVIP and M3FD datasets, IRDFusion achieves state-of-the-art performance and consistently outperforms existing methods across diverse challenging scenarios, demonstrating its robustness and effectiveness. Code will be available at https://github.com/61s61min/IRDFusion.git.
{"title":"IRDFusion: Iterative relation-map difference guided feature fusion for multispectral object detection","authors":"Jifeng Shen , Haibo Zhan , Xin Zuo , Heng Fan , Xiaohui Yuan , Jun Li , Wankou Yang","doi":"10.1016/j.patcog.2026.113189","DOIUrl":"10.1016/j.patcog.2026.113189","url":null,"abstract":"<div><div>Current multispectral object detection methods often retain extraneous background or noise during feature fusion, limiting perceptual performance. To address this, we propose a feature fusion framework based on cross-modal feature contrastive and screening strategy, diverging from conventional approaches. The proposed method adaptively enhances salient structures by fusing object-aware complementary cross-modal features while suppressing shared background interference. Our solution centers on two novel, specially designed modules: the Mutual Feature Refinement Module (MFRM) and the Differential Feature Feedback Module (DFFM). The MFRM enhances intra- and inter-modal feature representations by modeling their relationships, thereby improving cross-modal alignment and discriminative power. Inspired by feedback differential amplifiers, the DFFM dynamically computes inter-modal differential features as guidance signals and feeds them back to the MFRM, enabling adaptive fusion of complementary information while suppressing common-mode noise across modalities. To enable robust feature learning, the MFRM and DFFM are integrated into a unified framework, which is formally formulated as an Iterative Relation-Map Differential Guided Feature Fusion mechanism, termed IRDFusion. IRDFusion enables high-quality cross-modal fusion by progressively amplifying salient relational signals through iterative feedback, while suppressing feature noise, leading to significant performance gains. In extensive experiments on FLIR, LLVIP and M<sup>3</sup>FD datasets, IRDFusion achieves state-of-the-art performance and consistently outperforms existing methods across diverse challenging scenarios, demonstrating its robustness and effectiveness. Code will be available at <span><span>https://github.com/61s61min/IRDFusion.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113189"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.patcog.2026.113137
Qianshan Zhan , Xiao-Jun Zeng , Qian Wang
In transfer learning, one fundamental problem is transferability estimation, where a metric measures transfer performance without training. Existing metrics face two issues: 1) requiring target domain labels, and 2) only focusing on task speciality but ignoring equally important domain commonality. To overcome these limitations, we propose TranSAC, a Transferability metric based on task Speciality And domain Commonality, capturing the separation between classes and the similarity between domains. Its main advantages are: 1) unsupervised, 2) fine-tuning free, and 3) applicable to source-dependent and source-free transfer scenarios. To achieve this, we investigate the upper and lower bounds of transfer performance based on fixed representations extracted from the pre-trained model. Theoretical results reveal that unsupervised transfer performance is characterized by entropy-based quantities, naturally reflecting task specificity and domain commonality. These insights motivate the design of TranSAC, which integrates both factors to enhance transferability. Extensive experiments are performed across 12 target datasets with 36 pre-trained models, including supervised CNNs, self-supervised CNNs, and ViTs. Results demonstrate the importance of domain commonality and task speciality, allowing TranSAC as superior to state-of-the-art metrics for pre-trained model ranking, target domain ranking, and source domain ranking.
{"title":"TranSAC: An unsupervised transferability metric based on task speciality and domain commonality","authors":"Qianshan Zhan , Xiao-Jun Zeng , Qian Wang","doi":"10.1016/j.patcog.2026.113137","DOIUrl":"10.1016/j.patcog.2026.113137","url":null,"abstract":"<div><div>In transfer learning, one fundamental problem is transferability estimation, where a metric measures transfer performance without training. Existing metrics face two issues: 1) requiring target domain labels, and 2) only focusing on task speciality but ignoring equally important domain commonality. To overcome these limitations, we propose TranSAC, a <strong>Tran</strong>sferability metric based on task <strong>S</strong>peciality <strong>A</strong>nd domain <strong>C</strong>ommonality, capturing the separation between classes and the similarity between domains. Its main advantages are: 1) unsupervised, 2) fine-tuning free, and 3) applicable to source-dependent and source-free transfer scenarios. To achieve this, we investigate the upper and lower bounds of transfer performance based on fixed representations extracted from the pre-trained model. Theoretical results reveal that unsupervised transfer performance is characterized by entropy-based quantities, naturally reflecting task specificity and domain commonality. These insights motivate the design of TranSAC, which integrates both factors to enhance transferability. Extensive experiments are performed across 12 target datasets with 36 pre-trained models, including supervised CNNs, self-supervised CNNs, and ViTs. Results demonstrate the importance of domain commonality and task speciality, allowing TranSAC as superior to state-of-the-art metrics for pre-trained model ranking, target domain ranking, and source domain ranking.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113137"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}