Pub Date : 2026-01-01Epub Date: 2025-11-15DOI: 10.1016/j.patrec.2025.11.009
Dirko Coetsee , Steve Kroon , Ralf Kistner , Adem Kikaj , McElory Hoffmann , Luc De Raedt
Tabular data is ubiquitous in pattern recognition, yet accurately measuring differences between tables remains challenging. Conventional methods rely on cell substitutions and row/column insertions and deletions, often overestimating the difference when cells are simply repositioned. We propose a distance metric that considers move operations, capturing structural changes more faithfully. Although exact computation is NP-complete, a greedy approach computes an effective approximation in practice. Experimental results on real-world datasets demonstrate that our approach yields a more compact and intuitive measure of table dissimilarity, enhancing applications such as clustering, table extraction evaluation, and version history recovery.
{"title":"Tadmo: A tabular distance measure with move operations","authors":"Dirko Coetsee , Steve Kroon , Ralf Kistner , Adem Kikaj , McElory Hoffmann , Luc De Raedt","doi":"10.1016/j.patrec.2025.11.009","DOIUrl":"10.1016/j.patrec.2025.11.009","url":null,"abstract":"<div><div>Tabular data is ubiquitous in pattern recognition, yet accurately measuring differences between tables remains challenging. Conventional methods rely on cell substitutions and row/column insertions and deletions, often overestimating the difference when cells are simply repositioned. We propose a distance metric that considers move operations, capturing structural changes more faithfully. Although exact computation is NP-complete, a greedy approach computes an effective approximation in practice. Experimental results on real-world datasets demonstrate that our approach yields a more compact and intuitive measure of table dissimilarity, enhancing applications such as clustering, table extraction evaluation, and version history recovery.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 212-218"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-13DOI: 10.1016/j.patrec.2025.11.025
Shuwen Jin, Junzhu Mao, Zeren Sun, Yazhou Yao
Pruning is widely recognized as a promising approach for reducing the computational and storage demands of deep neural networks, facilitating lightweight model deployment on resource-limited devices. However, most existing pruning techniques assume the availability of accurate training labels, overlooking the prevalence of noisy labels in real-world settings. Deep networks have strong memorization capability, making them prone to overfitting noisy labels and thereby sensitive to the removal of network parameters. As a result, existing methods often encounter limitations when directly applied to the task of pruning models trained with noisy labels. To this end, we propose Discriminative Response Pruning (DRP) to robustly prune models trained with noisy labels. Specifically, DRP begins by identifying clean and noisy samples and reorganizing them into class-specific subsets. Then, it estimates the importance of model parameters by evaluating their responses to each subset, rewarding parameters exhibiting strong responses to clean data and penalizing those overfitting to noisy data. A class-wise reweighted aggregation strategy is then employed to compute the final importance score, which guides the pruning decisions. Extensive experiments across various models and noise conditions are conducted to demonstrate the efficacy and robustness of our method.
{"title":"Discriminative response pruning for robust and efficient deep networks under label noise","authors":"Shuwen Jin, Junzhu Mao, Zeren Sun, Yazhou Yao","doi":"10.1016/j.patrec.2025.11.025","DOIUrl":"10.1016/j.patrec.2025.11.025","url":null,"abstract":"<div><div>Pruning is widely recognized as a promising approach for reducing the computational and storage demands of deep neural networks, facilitating lightweight model deployment on resource-limited devices. However, most existing pruning techniques assume the availability of accurate training labels, overlooking the prevalence of noisy labels in real-world settings. Deep networks have strong memorization capability, making them prone to overfitting noisy labels and thereby sensitive to the removal of network parameters. As a result, existing methods often encounter limitations when directly applied to the task of pruning models trained with noisy labels. To this end, we propose Discriminative Response Pruning (DRP) to robustly prune models trained with noisy labels. Specifically, DRP begins by identifying clean and noisy samples and reorganizing them into class-specific subsets. Then, it estimates the importance of model parameters by evaluating their responses to each subset, rewarding parameters exhibiting strong responses to clean data and penalizing those overfitting to noisy data. A class-wise reweighted aggregation strategy is then employed to compute the final importance score, which guides the pruning decisions. Extensive experiments across various models and noise conditions are conducted to demonstrate the efficacy and robustness of our method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 170-177"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate segmentation of Multiple Sclerosis (MS) lesions remains a critical challenge in medical image analysis due to their small size, irregular shape, and sparse distribution. Despite recent progress in vision foundation models — such as SAM and its medical variant MedSAM — these models have not yet been explored in the context of MS lesion segmentation. Moreover, their reliance on manually crafted prompts and high inference-time computational cost limits their applicability in clinical workflows, especially in resource-constrained environments. In this work, we introduce a novel training-time framework for effective and efficient MS lesion segmentation. Our method leverages SAM solely during training to guide a prompt learner that automatically discovers task-specific embeddings. At inference, SAM is replaced by a lightweight convolutional aggregator that maps the learned embeddings directly into segmentation masks—enabling fully automated, low-cost deployment. We show that our approach significantly outperforms existing specialized methods on the public MSLesSeg dataset, establishing new performance benchmarks in a domain where foundation models had not previously been applied. To assess generalizability, we also evaluate our method on pancreas and prostate segmentation tasks, where it achieves competitive accuracy while requiring an order of magnitude fewer parameters and computational resources compared to SAM-based pipelines. By eliminating the need for foundation models at inference time, our framework enables efficient segmentation without sacrificing accuracy. This design bridges the gap between large-scale pretraining and real-world clinical deployment, offering a scalable and practical solution for MS lesion segmentation and beyond. Code is available at https://github.com/perceivelab/MS-SAM-LESS.
{"title":"SAM-guided prompt learning for Multiple Sclerosis lesion segmentation","authors":"Federica Proietto Salanitri , Giovanni Bellitto , Salvatore Calcagno , Ulas Bagci , Concetto Spampinato , Manuela Pennisi","doi":"10.1016/j.patrec.2025.11.018","DOIUrl":"10.1016/j.patrec.2025.11.018","url":null,"abstract":"<div><div>Accurate segmentation of Multiple Sclerosis (MS) lesions remains a critical challenge in medical image analysis due to their small size, irregular shape, and sparse distribution. Despite recent progress in vision foundation models — such as SAM and its medical variant MedSAM — these models have not yet been explored in the context of MS lesion segmentation. Moreover, their reliance on manually crafted prompts and high inference-time computational cost limits their applicability in clinical workflows, especially in resource-constrained environments. In this work, we introduce a novel training-time framework for effective and efficient MS lesion segmentation. Our method leverages SAM solely during training to guide a prompt learner that automatically discovers task-specific embeddings. At inference, SAM is replaced by a lightweight convolutional aggregator that maps the learned embeddings directly into segmentation masks—enabling fully automated, low-cost deployment. We show that our approach significantly outperforms existing specialized methods on the public MSLesSeg dataset, establishing new performance benchmarks in a domain where foundation models had not previously been applied. To assess generalizability, we also evaluate our method on pancreas and prostate segmentation tasks, where it achieves competitive accuracy while requiring an order of magnitude fewer parameters and computational resources compared to SAM-based pipelines. By eliminating the need for foundation models at inference time, our framework enables efficient segmentation without sacrificing accuracy. This design bridges the gap between large-scale pretraining and real-world clinical deployment, offering a scalable and practical solution for MS lesion segmentation and beyond. Code is available at <span><span>https://github.com/perceivelab/MS-SAM-LESS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 205-211"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-10DOI: 10.1016/j.patrec.2025.11.011
Yang Xue, Haosheng Cai, Zhuoming Li, Lianwen Jin
Table Structure Recognition (TSR) can adopt image-to-sequence solutions to predict both logical and physical structure simultaneously. However, while these models excel at identifying the logical structure, they often struggle with accurate cell detection. To address this challenge, we propose a Transformer-based Dynamic cell bounding Box refinement for end-to-end TSR, named DynamicBoxTransformer. Specifically, we incorporate a cell bounding box regression decoder, which takes the output of the HTML sequence decoder as input. The cell regression decoder uses reference bounding box coordinates to create spatial queries that provide explicit guidance to key areas and enhance the accuracy of cell bounding boxes layer by layer. To mitigate error accumulation, we introduce denoising training, particularly focusing on the offset of rows and columns. In addition, we design masks that enable the model to make full use of contextual information. Experimental results show that our DynamicBoxTransformer achieves competitive performance on natural scene table datasets. Compared to previous image-to-sequence approaches, DynamicBoxTransformer demonstrates significant improvements in accurate cell detection.
{"title":"Transformer-based dynamic cell bounding box refinement for end-to-end Table Structure Recognition","authors":"Yang Xue, Haosheng Cai, Zhuoming Li, Lianwen Jin","doi":"10.1016/j.patrec.2025.11.011","DOIUrl":"10.1016/j.patrec.2025.11.011","url":null,"abstract":"<div><div>Table Structure Recognition (TSR) can adopt image-to-sequence solutions to predict both logical and physical structure simultaneously. However, while these models excel at identifying the logical structure, they often struggle with accurate cell detection. To address this challenge, we propose a Transformer-based Dynamic cell bounding Box refinement for end-to-end TSR, named DynamicBoxTransformer. Specifically, we incorporate a cell bounding box regression decoder, which takes the output of the HTML sequence decoder as input. The cell regression decoder uses reference bounding box coordinates to create spatial queries that provide explicit guidance to key areas and enhance the accuracy of cell bounding boxes layer by layer. To mitigate error accumulation, we introduce denoising training, particularly focusing on the offset of rows and columns. In addition, we design masks that enable the model to make full use of contextual information. Experimental results show that our DynamicBoxTransformer achieves competitive performance on natural scene table datasets. Compared to previous image-to-sequence approaches, DynamicBoxTransformer demonstrates significant improvements in accurate cell detection.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 106-112"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields. It serves as a highly valuable pre-processing step for data analysis. While traditional decomposition techniques often rely on mathematical models, recent research suggests that applying the latest deep learning models to this very ill-posed inverse problem represents an exciting, unexplored area with promising potential. This work presents a novel method for the additive decomposition of one-dimensional signals. We leverage the Transformer architecture to decompose signals into their constituent components: piecewise constant, smooth (trend), highly-oscillatory, and noise components. Our model, trained on synthetic data, achieves excellent accuracy in modeling and decomposing input signals from the same distribution, as demonstrated by the experimental results.
{"title":"Additive decomposition of one-dimensional signals using Transformers","authors":"Samuele Salti , Andrea Pinto , Alessandro Lanza , Serena Morigi","doi":"10.1016/j.patrec.2025.11.002","DOIUrl":"10.1016/j.patrec.2025.11.002","url":null,"abstract":"<div><div>One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields. It serves as a highly valuable pre-processing step for data analysis. While traditional decomposition techniques often rely on mathematical models, recent research suggests that applying the latest deep learning models to this very ill-posed inverse problem represents an exciting, unexplored area with promising potential. This work presents a novel method for the additive decomposition of one-dimensional signals. We leverage the Transformer architecture to decompose signals into their constituent components: piecewise constant, smooth (trend), highly-oscillatory, and noise components. Our model, trained on synthetic data, achieves excellent accuracy in modeling and decomposing input signals from the same distribution, as demonstrated by the experimental results.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 239-245"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-01DOI: 10.1016/j.patrec.2025.10.020
Yaya Huang , Litong Liu , Tianzhen Zhang , Sisi Wang , Chee-Ming Ting
Accurate segmentation of brain tumors from multimodal MRI is essential for diagnosis and treatment planning. However, most existing approaches can only process single type of data modality, without exploiting the complementary information across different modalities. To overcome this limitation, a novel framework called MFMamba which integrates modality-aware masked autoencoder pretraining, a gated fusion strategy, and a Mamba-based backbone for efficient long-range modeling is proposed. In this design, one modality is fully masked while others are partially masked, forcing the network to reconstruct missing data through cross-modal learning. The gated fusion module then selectively incorporates generative priors into task-specific features, enhancing multimodal representations. Experimental results on the BraTS 2023 dataset show that MFMamba achieves Dice score of 93.77% for Whole Tumor and 92.69% for Tumor Core, corresponding to 1.6–2.1% improvements over state-of-the-art baselines. The gains are statistically significant (), indicating the framework’s ability to deliver more precise tumor boundary delineation. Overall, the results suggest that modality-aware fusion can enhance segmentation quality while maintaining computational efficiency, underscoring its potential application for clinical image analysis. The implementation is publicly available at https://github.com/ministerhuang/MFMamba.
{"title":"Multi-Modal masked autoencoder and parallel Mamba for 3D brain tumor segmentation","authors":"Yaya Huang , Litong Liu , Tianzhen Zhang , Sisi Wang , Chee-Ming Ting","doi":"10.1016/j.patrec.2025.10.020","DOIUrl":"10.1016/j.patrec.2025.10.020","url":null,"abstract":"<div><div>Accurate segmentation of brain tumors from multimodal MRI is essential for diagnosis and treatment planning. However, most existing approaches can only process single type of data modality, without exploiting the complementary information across different modalities. To overcome this limitation, a novel framework called MFMamba which integrates modality-aware masked autoencoder pretraining, a gated fusion strategy, and a Mamba-based backbone for efficient long-range modeling is proposed. In this design, one modality is fully masked while others are partially masked, forcing the network to reconstruct missing data through cross-modal learning. The gated fusion module then selectively incorporates generative priors into task-specific features, enhancing multimodal representations. Experimental results on the BraTS 2023 dataset show that MFMamba achieves Dice score of 93.77% for Whole Tumor and 92.69% for Tumor Core, corresponding to 1.6–2.1% improvements over state-of-the-art baselines. The gains are statistically significant (<span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>05</mn></mrow></math></span>), indicating the framework’s ability to deliver more precise tumor boundary delineation. Overall, the results suggest that modality-aware fusion can enhance segmentation quality while maintaining computational efficiency, underscoring its potential application for clinical image analysis. The implementation is publicly available at <span><span>https://github.com/ministerhuang/MFMamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 40-46"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brain studies require the use of several complementary imaging modalities. When some modality is unavailable, Artificial Intelligence (AI) has recently provided ways to estimate them. Radiologists modulate the use of the available modalities depending on the task they have to perform. We aim to trace artificially the radiological process through a multibranch neural network architecture, the StarNet. The goal is to explain how and where different imaging modalities, either really collected or artificially reconstructed, are used in different radiological tasks by reading inside the structure of the network. To do that, StarNet includes several satellite networks, one per source modality, connected at each layer by a central unit. This design enables us to assess the contribution of each imaging modality, identifying where the contribution occurs, and to quantify the variations if certain modalities are substituted with AI-generated counterparts. The ultimate goal is to enable data-related and task-related ablation studies through the complete explainability of StarNet, thus offering radiologists clear guidance on which imaging sequences contribute to the task, to what extent, and at which stages of the process. As an example, we applied the proposed architecture to the 2D slices extracted from 3D volumes acquired with multimodal magnetic resonance imaging (MRI), to assess: 1. The role of the used imaging modalities; 2. The change in role when the radiological task changes; 3. The effects of synthetic data on the process. The results are presented and discussed.
{"title":"Explainable multimodal brain imaging through a multiple-branch neural network","authors":"Giuseppe Placidi , Alessia Cipriani , Michele Nappi , Matteo Polsinelli","doi":"10.1016/j.patrec.2025.11.030","DOIUrl":"10.1016/j.patrec.2025.11.030","url":null,"abstract":"<div><div>Brain studies require the use of several complementary imaging modalities. When some modality is unavailable, Artificial Intelligence (AI) has recently provided ways to estimate them. Radiologists modulate the use of the available modalities depending on the task they have to perform. We aim to trace artificially the radiological process through a multibranch neural network architecture, the StarNet. The goal is to explain how and where different imaging modalities, either really collected or artificially reconstructed, are used in different radiological tasks by reading inside the structure of the network. To do that, StarNet includes several satellite networks, one per source modality, connected at each layer by a central unit. This design enables us to assess the contribution of each imaging modality, identifying where the contribution occurs, and to quantify the variations if certain modalities are substituted with AI-generated counterparts. The ultimate goal is to enable data-related and task-related ablation studies through the complete explainability of StarNet, thus offering radiologists clear guidance on which imaging sequences contribute to the task, to what extent, and at which stages of the process. As an example, we applied the proposed architecture to the 2D slices extracted from 3D volumes acquired with multimodal magnetic resonance imaging (MRI), to assess: 1. The role of the used imaging modalities; 2. The change in role when the radiological task changes; 3. The effects of synthetic data on the process. The results are presented and discussed.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 254-260"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-04DOI: 10.1016/j.patrec.2025.11.007
Xiaodong Han , Yibing Zhan , Jun Ni , Baosheng Yu , Dapeng Tao
Large language models (LLMs) are increasingly deployed in medical applications, yet the sensitivity of these systems to input data quality has been underexplored. To address this issue, this paper constructs a low-quality medical records (LQMR) dataset to systematically simulate three common categories of structured data anomalies: missing values, plausibility errors, and conformance violations. This resource is employed to evaluate the performance of both general-purpose LLMs (e.g., GPT, DeepSeek) and medicine-specific LLMs on diagnostic tasks under controlled data degradation. Our experiments show that data anomalies significantly degrade diagnostic accuracy, with plausibility errors having the most detrimental effect. For instance, performance drops by up to 16.79% when plausibility errors are introduced, while conformance violations cause a 11.45% drop. Moreover, we find that current LLMs, including domain-specific models, struggle to detect subtle yet critical errors in clinical records, often leading to incorrect diagnoses. These findings underscore the critical importance of data quality in medical AI applications. We also explore future directions, including the need for anomaly-aware training, data quality conditioning, and integrating symbolic medical knowledge to enhance model robustness and error detection in real-world clinical settings. We hope our findings could contribute to the development of more reliable and resilient medical AI systems.
{"title":"Mind the data: Evaluating data quality sensitivity in medical LLMs","authors":"Xiaodong Han , Yibing Zhan , Jun Ni , Baosheng Yu , Dapeng Tao","doi":"10.1016/j.patrec.2025.11.007","DOIUrl":"10.1016/j.patrec.2025.11.007","url":null,"abstract":"<div><div>Large language models (LLMs) are increasingly deployed in medical applications, yet the sensitivity of these systems to input data quality has been underexplored. To address this issue, this paper constructs a low-quality medical records (LQMR) dataset to systematically simulate three common categories of structured data anomalies: missing values, plausibility errors, and conformance violations. This resource is employed to evaluate the performance of both general-purpose LLMs (e.g., GPT, DeepSeek) and medicine-specific LLMs on diagnostic tasks under controlled data degradation. Our experiments show that data anomalies significantly degrade diagnostic accuracy, with plausibility errors having the most detrimental effect. For instance, performance drops by up to 16.79% when plausibility errors are introduced, while conformance violations cause a 11.45% drop. Moreover, we find that current LLMs, including domain-specific models, struggle to detect subtle yet critical errors in clinical records, often leading to incorrect diagnoses. These findings underscore the critical importance of data quality in medical AI applications. We also explore future directions, including the need for anomaly-aware training, data quality conditioning, and integrating symbolic medical knowledge to enhance model robustness and error detection in real-world clinical settings. We hope our findings could contribute to the development of more reliable and resilient medical AI systems.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 68-74"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-06DOI: 10.1016/j.patrec.2025.11.003
Xuan Zhang , Ming Zhao , Lixiang Ma
The demand for fine-grained sketch-based image retrieval is rapidly growing. However, it faces two major challenges: the difficulty in capturing fine-grained details and the large domain gap between modalities. To address these challenges, we propose a novel framework: collaborative feature alignment with global–local fusion network, including fine-grained mask-based feature extraction module, global–local adaptive normalization feature fusion module, feature completion and augmentation module and collaborative feature alignment strategy. Specifically, we introduce a channel attention based mask to direct the network’s focus towards detailed regions to capture fine-grained information. Then, the dual-level adaptive normalization fusion mechanism is employed to align style discrepancies at both global and local levels, facilitating more consistent representations. Features are disentangled into style-related representations and structure-related representations, and style-related information is cross-modal supplemented to enhance feature expressiveness. Additionally, an alignment loss is introduced, enabling efficient retrieval while avoiding additional alignment during inference. Extensive experiments are conducted on QMUL-ShoeV2 and QMUL-ChairV2 datasets validate the effectiveness of the proposed method.
{"title":"Collaborative feature alignment with global–local fusion for fine-grained sketch-based image retrieval","authors":"Xuan Zhang , Ming Zhao , Lixiang Ma","doi":"10.1016/j.patrec.2025.11.003","DOIUrl":"10.1016/j.patrec.2025.11.003","url":null,"abstract":"<div><div>The demand for fine-grained sketch-based image retrieval is rapidly growing. However, it faces two major challenges: the difficulty in capturing fine-grained details and the large domain gap between modalities. To address these challenges, we propose a novel framework: collaborative feature alignment with global–local fusion network, including fine-grained mask-based feature extraction module, global–local adaptive normalization feature fusion module, feature completion and augmentation module and collaborative feature alignment strategy. Specifically, we introduce a channel attention based mask to direct the network’s focus towards detailed regions to capture fine-grained information. Then, the dual-level adaptive normalization fusion mechanism is employed to align style discrepancies at both global and local levels, facilitating more consistent representations. Features are disentangled into style-related representations and structure-related representations, and style-related information is cross-modal supplemented to enhance feature expressiveness. Additionally, an alignment loss is introduced, enabling efficient retrieval while avoiding additional alignment during inference. Extensive experiments are conducted on QMUL-ShoeV2 and QMUL-ChairV2 datasets validate the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 135-141"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-11DOI: 10.1016/j.patrec.2025.11.024
Qiang Fang, Xin Xu
Deep supervised learning has achieved remarkable success in many fields, but it often relies on a large amount of annotated data, leading to high costs. An alternative solution is active learning, which aims to enable models to achieve optimal performance with less annotated data. Most standard active learning methods focus on proposing better selection strategies for labeling representative samples while ignoring other unlabeled samples. Inspired by the fact that the reasonable utilization of unlabeled data can improve model performance, we present a novel framework for active learning with pseudo-labeling in this paper. The core of our approach is a novel pseudo-labeling method with an adaptive threshold. Extensive experiments on three typical image classification tasks demonstrate that our approach achieves state-of-the-art performance compared to existing baseline methods. Moreover, our approach is efficient, flexible, and task-agnostic, making it compatible with most standard active learning strategies. Our code will be available at https://github.com/nudtqiangfang/AdaPL.
{"title":"AdaPL: Adaptive Pseudo Labeling for deep active learning in image classification","authors":"Qiang Fang, Xin Xu","doi":"10.1016/j.patrec.2025.11.024","DOIUrl":"10.1016/j.patrec.2025.11.024","url":null,"abstract":"<div><div>Deep supervised learning has achieved remarkable success in many fields, but it often relies on a large amount of annotated data, leading to high costs. An alternative solution is active learning, which aims to enable models to achieve optimal performance with less annotated data. Most standard active learning methods focus on proposing better selection strategies for labeling representative samples while ignoring other unlabeled samples. Inspired by the fact that the reasonable utilization of unlabeled data can improve model performance, we present a novel framework for active learning with pseudo-labeling in this paper. The core of our approach is a novel pseudo-labeling method with an adaptive threshold. Extensive experiments on three typical image classification tasks demonstrate that our approach achieves state-of-the-art performance compared to existing baseline methods. Moreover, our approach is efficient, flexible, and task-agnostic, making it compatible with most standard active learning strategies. Our code will be available at <span><span>https://github.com/nudtqiangfang/AdaPL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 185-190"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}