Pub Date : 2026-01-01Epub Date: 2025-10-30DOI: 10.1016/j.patrec.2025.10.017
Qun Li , Peng Gu , Xinping Gao , Bir Bhanu
Since anomalous events are much rarer than normal events in videos, current methods for Weakly Supervised Video Anomaly Detection (WSVAD) struggle to use both normal and abnormal data effectively, blurring the normality-abnormality boundary. To tackle this, we propose a novel two-step separation pipeline based on Threshold Shrinkage Memory network (TSMnet) for WSVAD. It mimics the human visual system to better understand video anomalies. We introduce a threshold shrinkage memory module that emulates the human brain’s memory, storing patterns and reducing normal memory redundancy via threshold-based shrinkage. A dual-branch contrastive learning module sharpens the normal-abnormal feature boundary for better classification. A global-to-local spatio-temporal adapter captures both global and local spatio-temporal information. Experimental results show that our method outperforms the state-of-the-art works.
{"title":"TSMnet: Two-step separation pipeline based on threshold shrinkage memory network for weakly-supervised video anomaly detection","authors":"Qun Li , Peng Gu , Xinping Gao , Bir Bhanu","doi":"10.1016/j.patrec.2025.10.017","DOIUrl":"10.1016/j.patrec.2025.10.017","url":null,"abstract":"<div><div>Since anomalous events are much rarer than normal events in videos, current methods for Weakly Supervised Video Anomaly Detection (WSVAD) struggle to use both normal and abnormal data effectively, blurring the normality-abnormality boundary. To tackle this, we propose a novel two-step separation pipeline based on Threshold Shrinkage Memory network (TSMnet) for WSVAD. It mimics the human visual system to better understand video anomalies. We introduce a threshold shrinkage memory module that emulates the human brain’s memory, storing patterns and reducing normal memory redundancy via threshold-based shrinkage. A dual-branch contrastive learning module sharpens the normal-abnormal feature boundary for better classification. A global-to-local spatio-temporal adapter captures both global and local spatio-temporal information. Experimental results show that our method outperforms the state-of-the-art works.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 13-20"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145420254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-25DOI: 10.1016/j.patrec.2025.11.035
Haiyan Xu , Min Wang , Gang Xu , Qian Shen
Deep learning-based change detection methods often rely on many annotations, and automated sample generation methods for change detection are usually implemented via pixelwise comparisons after performing bitemporal image classification. In bitemporal images, high-rise buildings have different directional height displacements caused by different viewing angles; this issue generally causes serious false alarms in the automatic samples generated by post-classification comparison (PCC). In this study, by utilizing features such as the roof textures and facade geometry features of bitemporal buildings, automatic high-rise building change discrimination is implemented by matching the features of the building roofs and conducting height displacement triangle comparisons, which eliminates the false changes caused by building height displacements and preserves the true changes. Furthermore, method validation experiments were conducted on high-resolution images of Nanjing and Suzhou, two Chinese cities, and the results verify that the proposed method can automatically generate high-quality building samples with height displacement, which facilitates the training of deep learning-based change detection models.
{"title":"Identifying real changes for height displaced buildings to aid in deep learning training sample generation","authors":"Haiyan Xu , Min Wang , Gang Xu , Qian Shen","doi":"10.1016/j.patrec.2025.11.035","DOIUrl":"10.1016/j.patrec.2025.11.035","url":null,"abstract":"<div><div>Deep learning-based change detection methods often rely on many annotations, and automated sample generation methods for change detection are usually implemented via pixelwise comparisons after performing bitemporal image classification. In bitemporal images, high-rise buildings have different directional height displacements caused by different viewing angles; this issue generally causes serious false alarms in the automatic samples generated by post-classification comparison (PCC). In this study, by utilizing features such as the roof textures and facade geometry features of bitemporal buildings, automatic high-rise building change discrimination is implemented by matching the features of the building roofs and conducting height displacement triangle comparisons, which eliminates the false changes caused by building height displacements and preserves the true changes. Furthermore, method validation experiments were conducted on high-resolution images of Nanjing and Suzhou, two Chinese cities, and the results verify that the proposed method can automatically generate high-quality building samples with height displacement, which facilitates the training of deep learning-based change detection models.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 269-277"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-14DOI: 10.1016/j.patrec.2025.11.027
Samuel Maddox , Lemuel Puglisi , Fatemeh Darabifard , Alzheimer’s Disease Neuroimaging Initiative , Australian Imaging Biomarkers and Lifestyle flagship study of aging , Saber Sami , Daniele Ravi
Accurate brain age prediction from MRI is a promising biomarker for brain health and neurodegenerative disease risk, but current deep learning models often lack anatomical specificity and clinical insight. We present a regional patch-based ensemble framework that uses 3D Convolutional Neural Networks (CNNs) trained on bilateral patches from ten subcortical structures, enhancing anatomical sensitivity. Ensemble predictions are combined with cognitive assessments to derive a cognitively informed proxy for cognitive reserve (CR-Proxy), quantifying resilience to age-related brain changes. We train our framework on a large, multi-cohort dataset of healthy controls and test it on independent samples that include individuals with Alzheimer’s disease and mild cognitive impairment. The results demonstrate that our method achieves robust brain age prediction and provides a practical, interpretable CR-Proxy capable of distinguishing diagnostic groups and identifying individuals with high or low cognitive reserve. This pipeline offers a scalable, clinically accessible tool for early risk assessment and personalized brain health monitoring.
{"title":"Regional patch-based MRI brain age modeling with an interpretable cognitive reserve proxy","authors":"Samuel Maddox , Lemuel Puglisi , Fatemeh Darabifard , Alzheimer’s Disease Neuroimaging Initiative , Australian Imaging Biomarkers and Lifestyle flagship study of aging , Saber Sami , Daniele Ravi","doi":"10.1016/j.patrec.2025.11.027","DOIUrl":"10.1016/j.patrec.2025.11.027","url":null,"abstract":"<div><div>Accurate brain age prediction from MRI is a promising biomarker for brain health and neurodegenerative disease risk, but current deep learning models often lack anatomical specificity and clinical insight. We present a regional patch-based ensemble framework that uses 3D Convolutional Neural Networks (CNNs) trained on bilateral patches from ten subcortical structures, enhancing anatomical sensitivity. Ensemble predictions are combined with cognitive assessments to derive a cognitively informed proxy for cognitive reserve (CR-Proxy), quantifying resilience to age-related brain changes. We train our framework on a large, multi-cohort dataset of healthy controls and test it on independent samples that include individuals with Alzheimer’s disease and mild cognitive impairment. The results demonstrate that our method achieves robust brain age prediction and provides a practical, interpretable CR-Proxy capable of distinguishing diagnostic groups and identifying individuals with high or low cognitive reserve. This pipeline offers a scalable, clinically accessible tool for early risk assessment and personalized brain health monitoring.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 219-224"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-07DOI: 10.1016/j.patrec.2025.11.008
Hanwen Shi , Shubo Zhou , Yinghua Xie , Feng Pan , Zhijun Fang , Xue-Qin Jiang
Transformer-based methods have achieved remarkable performance, as the self-attention mechanism enables the modeling of long-range dependencies for better high-resolution image reconstruction. However, due to the computational cost of key matrix operations, most existing methods require substantial resources, making them difficult to deploy on low-power devices. In this paper, we propose a lightweight network that combines convolution operations with the attention mechanism, leveraging the strengths of both convolutional neural networks and Transformers. To effectively model both global and local features for single-image super-resolution, we design a convolution-attention fusion module (CAIM) specifically tailored for single-image super-resolution, capturing long-range dependencies while preserving fine-grained local textures. Furthermore, to enhance the representation of local information, we introduce a CNN-based module (LFEB) to encode local contextual features while reducing computational complexity. Experimental results on several mainstream benchmark datasets demonstrate the effectiveness and efficiency of the proposed EHCA. Our model shows strong capability in restoring high-resolution images with improved edge and texture fidelity.
{"title":"Enhancing lightweight image super-resolution with hybrid convolution and attention","authors":"Hanwen Shi , Shubo Zhou , Yinghua Xie , Feng Pan , Zhijun Fang , Xue-Qin Jiang","doi":"10.1016/j.patrec.2025.11.008","DOIUrl":"10.1016/j.patrec.2025.11.008","url":null,"abstract":"<div><div>Transformer-based methods have achieved remarkable performance, as the self-attention mechanism enables the modeling of long-range dependencies for better high-resolution image reconstruction. However, due to the computational cost of key matrix operations, most existing methods require substantial resources, making them difficult to deploy on low-power devices. In this paper, we propose a lightweight network that combines convolution operations with the attention mechanism, leveraging the strengths of both convolutional neural networks and Transformers. To effectively model both global and local features for single-image super-resolution, we design a convolution-attention fusion module (CAIM) specifically tailored for single-image super-resolution, capturing long-range dependencies while preserving fine-grained local textures. Furthermore, to enhance the representation of local information, we introduce a CNN-based module (LFEB) to encode local contextual features while reducing computational complexity. Experimental results on several mainstream benchmark datasets demonstrate the effectiveness and efficiency of the proposed EHCA. Our model shows strong capability in restoring high-resolution images with improved edge and texture fidelity.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 191-197"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-27DOI: 10.1016/j.patrec.2025.10.015
Sujay Mudalgi, Anh Tuan Bui
Data splitting is imperative when building a statistical or machine learning model, among other use cases. To obtain statistically representative samples, numerous methods have been proposed. Twinning is the state-of-the-art method in this space. It is based on minimizing energy distance between the subsets and original dataset. However, the execution speed of Twinning is not desirable for large datasets. This article proposes scalable Twinning (s-Twinning) to improve data splitting speed while maintaining accuracy. The performance lift for larger datasets from s-Twinning over the state-of-the-art data splitting methods is demonstrated through real examples.
{"title":"Scalable data twinning","authors":"Sujay Mudalgi, Anh Tuan Bui","doi":"10.1016/j.patrec.2025.10.015","DOIUrl":"10.1016/j.patrec.2025.10.015","url":null,"abstract":"<div><div>Data splitting is imperative when building a statistical or machine learning model, among other use cases. To obtain statistically representative samples, numerous methods have been proposed. Twinning is the state-of-the-art method in this space. It is based on minimizing energy distance between the subsets and original dataset. However, the execution speed of Twinning is not desirable for large datasets. This article proposes scalable Twinning (<em>s-</em>Twinning) to improve data splitting speed while maintaining accuracy. The performance lift for larger datasets from <em>s-</em>Twinning over the state-of-the-art data splitting methods is demonstrated through real examples.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 34-39"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-09-04DOI: 10.1016/j.patrec.2025.08.022
Bhargav Dave , Prasenjit Majumder
This paper presents SqCLIRIL (Spoken Query Cross-lingual Information Retrieval in Indian Languages), a comprehensive benchmark designed to evaluate spoken query-based cross-lingual retrieval across five Indian languages: Hindi, Gujarati, Bengali, Kannada, and English. The task encompasses monolingual and cross-lingual retrieval settings across five language pairs, incorporating spoken queries (male and female voices) and document collections in text. The primary objective is to assess retrieval effectiveness in a low-resource, multilingual setting that reflects real-world language diversity and access constraints.
We investigate four retrieval architectures: (i) sparse lexical matching using BM25, (ii) dense semantic retrieval via bi-encoder models, (iii) hybrid ranking through Reciprocal Rank Fusion (RRF) of sparse and dense scores, and (iv) a Large Language Model (LLM)-based pointwise fusion strategy (LPF) that integrates generative semantic alignment. Experimental evaluations on the human-translated TREC DL’19 and DL’20 query sets in the above five languages show that while dense retrieval improves substantially over traditional sparse models, fusion-based approaches – particularly LPF – consistently yield superior nDCG scores across most query-document language pairs.
The results underscore the utility of generative AI in enhancing retrieval performance in multilingual, low-resource, speech-centric IR scenarios. This benchmark contributes to developing scalable, speech-first, and language-agnostic retrieval systems, with implications for inclusive information access in linguistically fragmented regions.
{"title":"SqCLIRIL: Spoken query cross-lingual information retrieval in Indian languages","authors":"Bhargav Dave , Prasenjit Majumder","doi":"10.1016/j.patrec.2025.08.022","DOIUrl":"10.1016/j.patrec.2025.08.022","url":null,"abstract":"<div><div>This paper presents SqCLIRIL (Spoken Query Cross-lingual Information Retrieval in Indian Languages), a comprehensive benchmark designed to evaluate spoken query-based cross-lingual retrieval across five Indian languages: Hindi, Gujarati, Bengali, Kannada, and English. The task encompasses monolingual and cross-lingual retrieval settings across five language pairs, incorporating spoken queries (male and female voices) and document collections in text. The primary objective is to assess retrieval effectiveness in a low-resource, multilingual setting that reflects real-world language diversity and access constraints.</div><div>We investigate four retrieval architectures: (i) sparse lexical matching using BM25, (ii) dense semantic retrieval via bi-encoder models, (iii) hybrid ranking through Reciprocal Rank Fusion (RRF) of sparse and dense scores, and (iv) a Large Language Model (LLM)-based pointwise fusion strategy (LPF) that integrates generative semantic alignment. Experimental evaluations on the human-translated TREC DL’19 and DL’20 query sets in the above five languages show that while dense retrieval improves substantially over traditional sparse models, fusion-based approaches – particularly LPF – consistently yield superior nDCG scores across most query-document language pairs.</div><div>The results underscore the utility of generative AI in enhancing retrieval performance in multilingual, low-resource, speech-centric IR scenarios. This benchmark contributes to developing scalable, speech-first, and language-agnostic retrieval systems, with implications for inclusive information access in linguistically fragmented regions.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 288-294"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-10-21DOI: 10.1016/j.patrec.2025.10.011
Liang Shang, Zhengyang Lou, William A. Sethares, Andrew L. Alexander, Vivek Prabhakaran, Veena A. Nair, Nagesh Adluru
Accurate segmentation of small brain lesions in magnetic resonance imaging (MRI) is essential for understanding neurological disorders and guiding clinical decisions. However, detecting small lesions remains challenging due to low contrast and limited size. This study proposes two simple yet effective labeling strategies, Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), that can seamlessly integrate into existing segmentation networks. MSL groups lesions based on volume to enable size-aware learning, while DBL emphasizes lesion boundaries to enhance structural sensitivity. We evaluate our approach on two benchmark datasets: stroke lesion segmentation using the Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0 dataset and multiple sclerosis lesion segmentation using the Multiple Sclerosis Lesion Segmentation (MSLesSeg) dataset. On ATLAS v2.0, our approach achieved higher Dice (+1.3%), F1 (+2.4%), precision (+7.2%), and recall (+3.6%) scores compared to the top-performing method from a previous challenge. On MSLesSeg, our approach achieved the highest Dice score (0.7146) and ranked first among 16 international teams. Additionally, we examined the effectiveness of attention-based and mamba-based segmentation models but found that our proposed labeling strategies yielded more consistent improvements. These findings demonstrate that MSL and DBL offer a robust and generalizable solution for enhancing small brain lesion segmentation across various tasks and architectures. Our code is available at: https://github.com/nadluru/StrokeLesSeg.
{"title":"Plug and play labeling strategies for boosting small brain lesion segmentation","authors":"Liang Shang, Zhengyang Lou, William A. Sethares, Andrew L. Alexander, Vivek Prabhakaran, Veena A. Nair, Nagesh Adluru","doi":"10.1016/j.patrec.2025.10.011","DOIUrl":"10.1016/j.patrec.2025.10.011","url":null,"abstract":"<div><div>Accurate segmentation of small brain lesions in magnetic resonance imaging (MRI) is essential for understanding neurological disorders and guiding clinical decisions. However, detecting small lesions remains challenging due to low contrast and limited size. This study proposes two simple yet effective labeling strategies, Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), that can seamlessly integrate into existing segmentation networks. MSL groups lesions based on volume to enable size-aware learning, while DBL emphasizes lesion boundaries to enhance structural sensitivity. We evaluate our approach on two benchmark datasets: stroke lesion segmentation using the Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0 dataset and multiple sclerosis lesion segmentation using the Multiple Sclerosis Lesion Segmentation (MSLesSeg) dataset. On ATLAS v2.0, our approach achieved higher Dice (+1.3%), F1 (+2.4%), precision (+7.2%), and recall (+3.6%) scores compared to the top-performing method from a previous challenge. On MSLesSeg, our approach achieved the highest Dice score (0.7146) and ranked first among 16 international teams. Additionally, we examined the effectiveness of attention-based and mamba-based segmentation models but found that our proposed labeling strategies yielded more consistent improvements. These findings demonstrate that MSL and DBL offer a robust and generalizable solution for enhancing small brain lesion segmentation across various tasks and architectures. Our code is available at: <span><span>https://github.com/nadluru/StrokeLesSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 90-97"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-14DOI: 10.1016/j.patrec.2025.11.010
Hongjie Chen , Pei Lu , Xiaoyong Liu , Yuan Ling
In recent years, deep learning has achieved significant breakthroughs in image classification. However, many practical scenarios are severely constrained by limited labeled data. To address this issue, few-shot learning has arisen as a solution, whereby features are extracted from limited training data and generalized to new categories. Existing approaches primarily rely on features extracted from backbone networks that predominantly focus on local regions while neglecting global contextual relationships, thereby limiting the model’s ability to distinguish fine-grained features. This paper introduces a lightweight Channel Scaling Module (CSM) to address this limitation. The proposed CSM operates by unfolding feature maps, applying channel scaling, and performing 3D convolution operations to enrich feature representations. This process simultaneously compresses the number of feature channels while expanding feature dimensions, enhancing the expressiveness of the representations with minimal computational overhead, and improving sensitivity to both local and global features. A series of comprehensive experiments were conducted, using multiple datasets covering standard few-shot classification, fine-grained few-shot classification, and cross-domain few-shot classification. The empirical results indicate that the proposed method consistently attains performance that is either comparable to or superior to that of current state-of-the-art approaches under the majority of scenarios.
{"title":"Channel scaling: An efficient feature representation to enhance the generalization of few-shot learning","authors":"Hongjie Chen , Pei Lu , Xiaoyong Liu , Yuan Ling","doi":"10.1016/j.patrec.2025.11.010","DOIUrl":"10.1016/j.patrec.2025.11.010","url":null,"abstract":"<div><div>In recent years, deep learning has achieved significant breakthroughs in image classification. However, many practical scenarios are severely constrained by limited labeled data. To address this issue, few-shot learning has arisen as a solution, whereby features are extracted from limited training data and generalized to new categories. Existing approaches primarily rely on features extracted from backbone networks that predominantly focus on local regions while neglecting global contextual relationships, thereby limiting the model’s ability to distinguish fine-grained features. This paper introduces a lightweight Channel Scaling Module (CSM) to address this limitation. The proposed CSM operates by unfolding feature maps, applying channel scaling, and performing 3D convolution operations to enrich feature representations. This process simultaneously compresses the number of feature channels while expanding feature dimensions, enhancing the expressiveness of the representations with minimal computational overhead, and improving sensitivity to both local and global features. A series of comprehensive experiments were conducted, using multiple datasets covering standard few-shot classification, fine-grained few-shot classification, and cross-domain few-shot classification. The empirical results indicate that the proposed method consistently attains performance that is either comparable to or superior to that of current state-of-the-art approaches under the majority of scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 163-169"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-12DOI: 10.1016/j.patrec.2025.11.019
Salvatore Calcagno , Marco Finocchiaro , Giovanni Bellitto, Concetto Spampinato, Federica Proietto Salanitri
Modeling visual attention from brain activity offers a powerful route to understanding how spatial salience is encoded in the human visual system. While deep learning models can accurately predict fixations from image content, it remains unclear whether similar saliency maps can be reconstructed directly from neural signals. In this study, we investigate the feasibility of decoding high-resolution spatial attention maps from 3T fMRI data. This study is the first to demonstrate that high-resolution, behaviorally-validated saliency maps can be decoded directly from 3T fMRI signals. We propose a two-stage decoder that transforms multivariate voxel responses from region-specific visual areas into spatial saliency distributions, using DeepGaze II maps as proxy supervision. Evaluation is conducted against new eye-tracking data collected on a held-out set of natural images. Results show that decoded maps significantly correlate with human fixations, particularly when using activity from early visual areas (V1–V4), which contribute most strongly to reconstruction accuracy. Higher-level areas yield above-chance performance but weaker predictions. These findings suggest that spatial attention is robustly represented in early visual cortex and support the use of fMRI-based decoding as a tool for probing the neural basis of salience in naturalistic viewing. Our code and eye-tracking annotations are available on GitHub.
{"title":"Decoding attention from the visual cortex: fMRI-based prediction of human saliency maps","authors":"Salvatore Calcagno , Marco Finocchiaro , Giovanni Bellitto, Concetto Spampinato, Federica Proietto Salanitri","doi":"10.1016/j.patrec.2025.11.019","DOIUrl":"10.1016/j.patrec.2025.11.019","url":null,"abstract":"<div><div>Modeling visual attention from brain activity offers a powerful route to understanding how spatial salience is encoded in the human visual system. While deep learning models can accurately predict fixations from image content, it remains unclear whether similar saliency maps can be reconstructed directly from neural signals. In this study, we investigate the feasibility of decoding high-resolution spatial attention maps from 3T fMRI data. This study is the first to demonstrate that high-resolution, behaviorally-validated saliency maps can be decoded directly from 3T fMRI signals. We propose a two-stage decoder that transforms multivariate voxel responses from region-specific visual areas into spatial saliency distributions, using DeepGaze II maps as proxy supervision. Evaluation is conducted against new eye-tracking data collected on a held-out set of natural images. Results show that decoded maps significantly correlate with human fixations, particularly when using activity from early visual areas (V1–V4), which contribute most strongly to reconstruction accuracy. Higher-level areas yield above-chance performance but weaker predictions. These findings suggest that spatial attention is robustly represented in early visual cortex and support the use of fMRI-based decoding as a tool for probing the neural basis of salience in naturalistic viewing. Our code and eye-tracking annotations are available on <span><span>GitHub</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 156-162"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-07DOI: 10.1016/j.patrec.2025.11.013
Aswathi Varma , Daniel Scholz , Ayhan Can Erdur , Jan C. Peeken , Daniel Rueckert , Benedikt Wiestler
Brain lesion segmentation is critical for diagnosing and monitoring neurological diseases such as Multiple Sclerosis (MS). However, lesion variability and differences in scanners and acquisition techniques pose a significant challenge to the robust generalization of automated segmentation models beyond their training domain. Traditional augmentations, such as rotation, intensity shifts, and scalings, often fail to capture the wide diversity observed across patient cases, limiting model generalizability. Random Convolutions (RC) address this limitation by introducing diverse intensity variations while preserving anatomical structures. Using an nnUNet-based model enhanced with RC augmentations, we achieved 5th place in the MSLesSeg challenge, highlighting that RC augmentations offer competitive in-domain performance. Building on this, we further assess model performance, both in terms of lesion detection and segmentation, in- and out-of-domain. We compare RC with several state-of-the-art augmentation and domain generalization strategies and show that an nnUNet trained with the RC augmentation is competitive in-domain and demonstrates superior generalization performance.
{"title":"Improving out-of-domain generalization in Multiple Sclerosis detection and segmentation using Random Convolutions","authors":"Aswathi Varma , Daniel Scholz , Ayhan Can Erdur , Jan C. Peeken , Daniel Rueckert , Benedikt Wiestler","doi":"10.1016/j.patrec.2025.11.013","DOIUrl":"10.1016/j.patrec.2025.11.013","url":null,"abstract":"<div><div>Brain lesion segmentation is critical for diagnosing and monitoring neurological diseases such as Multiple Sclerosis (MS). However, lesion variability and differences in scanners and acquisition techniques pose a significant challenge to the robust generalization of automated segmentation models beyond their training domain. Traditional augmentations, such as rotation, intensity shifts, and scalings, often fail to capture the wide diversity observed across patient cases, limiting model generalizability. Random Convolutions (RC) address this limitation by introducing diverse intensity variations while preserving anatomical structures. Using an nnUNet-based model enhanced with RC augmentations, we achieved 5th place in the MSLesSeg challenge, highlighting that RC augmentations offer competitive in-domain performance. Building on this, we further assess model performance, both in terms of lesion detection and segmentation, in- and out-of-domain. We compare RC with several state-of-the-art augmentation and domain generalization strategies and show that an nnUNet trained with the RC augmentation is competitive in-domain and demonstrates superior generalization performance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 98-105"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}