首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
TSMnet: Two-step separation pipeline based on threshold shrinkage memory network for weakly-supervised video anomaly detection 基于阈值收缩记忆网络的两步分离管道弱监督视频异常检测
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-10-30 DOI: 10.1016/j.patrec.2025.10.017
Qun Li , Peng Gu , Xinping Gao , Bir Bhanu
Since anomalous events are much rarer than normal events in videos, current methods for Weakly Supervised Video Anomaly Detection (WSVAD) struggle to use both normal and abnormal data effectively, blurring the normality-abnormality boundary. To tackle this, we propose a novel two-step separation pipeline based on Threshold Shrinkage Memory network (TSMnet) for WSVAD. It mimics the human visual system to better understand video anomalies. We introduce a threshold shrinkage memory module that emulates the human brain’s memory, storing patterns and reducing normal memory redundancy via threshold-based shrinkage. A dual-branch contrastive learning module sharpens the normal-abnormal feature boundary for better classification. A global-to-local spatio-temporal adapter captures both global and local spatio-temporal information. Experimental results show that our method outperforms the state-of-the-art works.
由于视频中的异常事件比正常事件少得多,目前的弱监督视频异常检测(WSVAD)方法难以有效地利用正常和异常数据,模糊了正常和异常的边界。为了解决这个问题,我们提出了一种新的基于阈值收缩记忆网络(TSMnet)的两步分离管道。它模仿人类视觉系统,以更好地理解视频异常。我们引入了一个阈值收缩记忆模块,模拟人类大脑的记忆,存储模式,并通过阈值收缩减少正常的记忆冗余。双分支对比学习模块锐化正常-异常特征边界,实现更好的分类。全局到局部时空适配器捕获全局和局部时空信息。实验结果表明,该方法的性能优于目前最先进的算法。
{"title":"TSMnet: Two-step separation pipeline based on threshold shrinkage memory network for weakly-supervised video anomaly detection","authors":"Qun Li ,&nbsp;Peng Gu ,&nbsp;Xinping Gao ,&nbsp;Bir Bhanu","doi":"10.1016/j.patrec.2025.10.017","DOIUrl":"10.1016/j.patrec.2025.10.017","url":null,"abstract":"<div><div>Since anomalous events are much rarer than normal events in videos, current methods for Weakly Supervised Video Anomaly Detection (WSVAD) struggle to use both normal and abnormal data effectively, blurring the normality-abnormality boundary. To tackle this, we propose a novel two-step separation pipeline based on Threshold Shrinkage Memory network (TSMnet) for WSVAD. It mimics the human visual system to better understand video anomalies. We introduce a threshold shrinkage memory module that emulates the human brain’s memory, storing patterns and reducing normal memory redundancy via threshold-based shrinkage. A dual-branch contrastive learning module sharpens the normal-abnormal feature boundary for better classification. A global-to-local spatio-temporal adapter captures both global and local spatio-temporal information. Experimental results show that our method outperforms the state-of-the-art works.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 13-20"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145420254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying real changes for height displaced buildings to aid in deep learning training sample generation 识别高度移位建筑物的真实变化,以帮助深度学习训练样本生成
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-25 DOI: 10.1016/j.patrec.2025.11.035
Haiyan Xu , Min Wang , Gang Xu , Qian Shen
Deep learning-based change detection methods often rely on many annotations, and automated sample generation methods for change detection are usually implemented via pixelwise comparisons after performing bitemporal image classification. In bitemporal images, high-rise buildings have different directional height displacements caused by different viewing angles; this issue generally causes serious false alarms in the automatic samples generated by post-classification comparison (PCC). In this study, by utilizing features such as the roof textures and facade geometry features of bitemporal buildings, automatic high-rise building change discrimination is implemented by matching the features of the building roofs and conducting height displacement triangle comparisons, which eliminates the false changes caused by building height displacements and preserves the true changes. Furthermore, method validation experiments were conducted on high-resolution images of Nanjing and Suzhou, two Chinese cities, and the results verify that the proposed method can automatically generate high-quality building samples with height displacement, which facilitates the training of deep learning-based change detection models.
基于深度学习的变化检测方法通常依赖于许多注释,而用于变化检测的自动样本生成方法通常是在执行双时图像分类后通过像素比较实现的。在双时影像中,高层建筑因视角不同而产生不同的方向高度位移;这个问题通常会导致PCC自动生成的样本出现严重的误报。本研究利用双时态建筑的屋顶纹理、立面几何特征等特征,通过匹配建筑屋顶特征并进行高度位移三角比对,实现高层建筑变化自动判别,消除了建筑高度位移引起的虚假变化,保留了真实变化。在南京和苏州两个城市的高分辨率图像上进行了方法验证实验,结果验证了该方法能够自动生成高质量的具有高度位移的建筑样本,为基于深度学习的变化检测模型的训练提供了便利。
{"title":"Identifying real changes for height displaced buildings to aid in deep learning training sample generation","authors":"Haiyan Xu ,&nbsp;Min Wang ,&nbsp;Gang Xu ,&nbsp;Qian Shen","doi":"10.1016/j.patrec.2025.11.035","DOIUrl":"10.1016/j.patrec.2025.11.035","url":null,"abstract":"<div><div>Deep learning-based change detection methods often rely on many annotations, and automated sample generation methods for change detection are usually implemented via pixelwise comparisons after performing bitemporal image classification. In bitemporal images, high-rise buildings have different directional height displacements caused by different viewing angles; this issue generally causes serious false alarms in the automatic samples generated by post-classification comparison (PCC). In this study, by utilizing features such as the roof textures and facade geometry features of bitemporal buildings, automatic high-rise building change discrimination is implemented by matching the features of the building roofs and conducting height displacement triangle comparisons, which eliminates the false changes caused by building height displacements and preserves the true changes. Furthermore, method validation experiments were conducted on high-resolution images of Nanjing and Suzhou, two Chinese cities, and the results verify that the proposed method can automatically generate high-quality building samples with height displacement, which facilitates the training of deep learning-based change detection models.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 269-277"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regional patch-based MRI brain age modeling with an interpretable cognitive reserve proxy 基于区域斑块的MRI脑年龄模型与可解释的认知储备代理
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-14 DOI: 10.1016/j.patrec.2025.11.027
Samuel Maddox , Lemuel Puglisi , Fatemeh Darabifard , Alzheimer’s Disease Neuroimaging Initiative , Australian Imaging Biomarkers and Lifestyle flagship study of aging , Saber Sami , Daniele Ravi
Accurate brain age prediction from MRI is a promising biomarker for brain health and neurodegenerative disease risk, but current deep learning models often lack anatomical specificity and clinical insight. We present a regional patch-based ensemble framework that uses 3D Convolutional Neural Networks (CNNs) trained on bilateral patches from ten subcortical structures, enhancing anatomical sensitivity. Ensemble predictions are combined with cognitive assessments to derive a cognitively informed proxy for cognitive reserve (CR-Proxy), quantifying resilience to age-related brain changes. We train our framework on a large, multi-cohort dataset of healthy controls and test it on independent samples that include individuals with Alzheimer’s disease and mild cognitive impairment. The results demonstrate that our method achieves robust brain age prediction and provides a practical, interpretable CR-Proxy capable of distinguishing diagnostic groups and identifying individuals with high or low cognitive reserve. This pipeline offers a scalable, clinically accessible tool for early risk assessment and personalized brain health monitoring.
从MRI中准确预测脑年龄是一种很有前途的脑健康和神经退行性疾病风险的生物标志物,但目前的深度学习模型往往缺乏解剖学特异性和临床洞察力。我们提出了一个基于区域斑块的集成框架,该框架使用3D卷积神经网络(cnn)对来自10个皮层下结构的双侧斑块进行训练,提高了解剖灵敏度。集合预测与认知评估相结合,得出认知储备的认知知情代理(CR-Proxy),量化与年龄相关的大脑变化的弹性。我们在健康对照的大型多队列数据集上训练我们的框架,并在包括患有阿尔茨海默病和轻度认知障碍的个体在内的独立样本上进行测试。结果表明,我们的方法实现了稳健的脑年龄预测,并提供了一个实用的、可解释的CR-Proxy,能够区分诊断组和识别具有高或低认知储备的个体。该管道为早期风险评估和个性化大脑健康监测提供了可扩展的、临床可访问的工具。
{"title":"Regional patch-based MRI brain age modeling with an interpretable cognitive reserve proxy","authors":"Samuel Maddox ,&nbsp;Lemuel Puglisi ,&nbsp;Fatemeh Darabifard ,&nbsp;Alzheimer’s Disease Neuroimaging Initiative ,&nbsp;Australian Imaging Biomarkers and Lifestyle flagship study of aging ,&nbsp;Saber Sami ,&nbsp;Daniele Ravi","doi":"10.1016/j.patrec.2025.11.027","DOIUrl":"10.1016/j.patrec.2025.11.027","url":null,"abstract":"<div><div>Accurate brain age prediction from MRI is a promising biomarker for brain health and neurodegenerative disease risk, but current deep learning models often lack anatomical specificity and clinical insight. We present a regional patch-based ensemble framework that uses 3D Convolutional Neural Networks (CNNs) trained on bilateral patches from ten subcortical structures, enhancing anatomical sensitivity. Ensemble predictions are combined with cognitive assessments to derive a cognitively informed proxy for cognitive reserve (CR-Proxy), quantifying resilience to age-related brain changes. We train our framework on a large, multi-cohort dataset of healthy controls and test it on independent samples that include individuals with Alzheimer’s disease and mild cognitive impairment. The results demonstrate that our method achieves robust brain age prediction and provides a practical, interpretable CR-Proxy capable of distinguishing diagnostic groups and identifying individuals with high or low cognitive reserve. This pipeline offers a scalable, clinically accessible tool for early risk assessment and personalized brain health monitoring.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 219-224"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing lightweight image super-resolution with hybrid convolution and attention 利用混合卷积和注意力增强轻量级图像超分辨率
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-07 DOI: 10.1016/j.patrec.2025.11.008
Hanwen Shi , Shubo Zhou , Yinghua Xie , Feng Pan , Zhijun Fang , Xue-Qin Jiang
Transformer-based methods have achieved remarkable performance, as the self-attention mechanism enables the modeling of long-range dependencies for better high-resolution image reconstruction. However, due to the computational cost of key matrix operations, most existing methods require substantial resources, making them difficult to deploy on low-power devices. In this paper, we propose a lightweight network that combines convolution operations with the attention mechanism, leveraging the strengths of both convolutional neural networks and Transformers. To effectively model both global and local features for single-image super-resolution, we design a convolution-attention fusion module (CAIM) specifically tailored for single-image super-resolution, capturing long-range dependencies while preserving fine-grained local textures. Furthermore, to enhance the representation of local information, we introduce a CNN-based module (LFEB) to encode local contextual features while reducing computational complexity. Experimental results on several mainstream benchmark datasets demonstrate the effectiveness and efficiency of the proposed EHCA. Our model shows strong capability in restoring high-resolution images with improved edge and texture fidelity.
基于变压器的方法已经取得了显著的性能,因为自关注机制能够对远程依赖关系进行建模,从而获得更好的高分辨率图像重建。然而,由于关键矩阵运算的计算成本,大多数现有方法需要大量资源,这使得它们难以在低功耗设备上部署。在本文中,我们提出了一个轻量级的网络,将卷积操作与注意机制结合起来,利用卷积神经网络和变压器的优势。为了有效地模拟单幅图像超分辨率的全局和局部特征,我们设计了一个专门为单幅图像超分辨率量身定制的卷积-注意力融合模块(CAIM),在捕获远程依赖关系的同时保留细粒度的局部纹理。此外,为了增强局部信息的表示,我们引入了基于cnn的模块(LFEB)来编码局部上下文特征,同时降低了计算复杂度。在多个主流基准数据集上的实验结果证明了该方法的有效性和高效性。该模型具有较强的复原高分辨率图像的能力,具有较好的边缘和纹理保真度。
{"title":"Enhancing lightweight image super-resolution with hybrid convolution and attention","authors":"Hanwen Shi ,&nbsp;Shubo Zhou ,&nbsp;Yinghua Xie ,&nbsp;Feng Pan ,&nbsp;Zhijun Fang ,&nbsp;Xue-Qin Jiang","doi":"10.1016/j.patrec.2025.11.008","DOIUrl":"10.1016/j.patrec.2025.11.008","url":null,"abstract":"<div><div>Transformer-based methods have achieved remarkable performance, as the self-attention mechanism enables the modeling of long-range dependencies for better high-resolution image reconstruction. However, due to the computational cost of key matrix operations, most existing methods require substantial resources, making them difficult to deploy on low-power devices. In this paper, we propose a lightweight network that combines convolution operations with the attention mechanism, leveraging the strengths of both convolutional neural networks and Transformers. To effectively model both global and local features for single-image super-resolution, we design a convolution-attention fusion module (CAIM) specifically tailored for single-image super-resolution, capturing long-range dependencies while preserving fine-grained local textures. Furthermore, to enhance the representation of local information, we introduce a CNN-based module (LFEB) to encode local contextual features while reducing computational complexity. Experimental results on several mainstream benchmark datasets demonstrate the effectiveness and efficiency of the proposed EHCA. Our model shows strong capability in restoring high-resolution images with improved edge and texture fidelity.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 191-197"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable data twinning 可扩展数据孪生
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-10-27 DOI: 10.1016/j.patrec.2025.10.015
Sujay Mudalgi, Anh Tuan Bui
Data splitting is imperative when building a statistical or machine learning model, among other use cases. To obtain statistically representative samples, numerous methods have been proposed. Twinning is the state-of-the-art method in this space. It is based on minimizing energy distance between the subsets and original dataset. However, the execution speed of Twinning is not desirable for large datasets. This article proposes scalable Twinning (s-Twinning) to improve data splitting speed while maintaining accuracy. The performance lift for larger datasets from s-Twinning over the state-of-the-art data splitting methods is demonstrated through real examples.
在构建统计或机器学习模型以及其他用例时,数据分割是必不可少的。为了获得具有统计代表性的样本,提出了许多方法。孪生是这个领域最先进的方法。它基于最小化子集与原始数据集之间的能量距离。然而,对于大型数据集,twin的执行速度并不理想。本文提出了可扩展的孪生(s-Twinning)来提高数据分割速度,同时保持准确性。通过实际示例演示了s- twin对大型数据集的性能提升,而不是最先进的数据分割方法。
{"title":"Scalable data twinning","authors":"Sujay Mudalgi,&nbsp;Anh Tuan Bui","doi":"10.1016/j.patrec.2025.10.015","DOIUrl":"10.1016/j.patrec.2025.10.015","url":null,"abstract":"<div><div>Data splitting is imperative when building a statistical or machine learning model, among other use cases. To obtain statistically representative samples, numerous methods have been proposed. Twinning is the state-of-the-art method in this space. It is based on minimizing energy distance between the subsets and original dataset. However, the execution speed of Twinning is not desirable for large datasets. This article proposes scalable Twinning (<em>s-</em>Twinning) to improve data splitting speed while maintaining accuracy. The performance lift for larger datasets from <em>s-</em>Twinning over the state-of-the-art data splitting methods is demonstrated through real examples.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 34-39"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SqCLIRIL: Spoken query cross-lingual information retrieval in Indian languages SqCLIRIL:印度语言的口语查询跨语言信息检索
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-09-04 DOI: 10.1016/j.patrec.2025.08.022
Bhargav Dave , Prasenjit Majumder
This paper presents SqCLIRIL (Spoken Query Cross-lingual Information Retrieval in Indian Languages), a comprehensive benchmark designed to evaluate spoken query-based cross-lingual retrieval across five Indian languages: Hindi, Gujarati, Bengali, Kannada, and English. The task encompasses monolingual and cross-lingual retrieval settings across five language pairs, incorporating spoken queries (male and female voices) and document collections in text. The primary objective is to assess retrieval effectiveness in a low-resource, multilingual setting that reflects real-world language diversity and access constraints.
We investigate four retrieval architectures: (i) sparse lexical matching using BM25, (ii) dense semantic retrieval via bi-encoder models, (iii) hybrid ranking through Reciprocal Rank Fusion (RRF) of sparse and dense scores, and (iv) a Large Language Model (LLM)-based pointwise fusion strategy (LPF) that integrates generative semantic alignment. Experimental evaluations on the human-translated TREC DL’19 and DL’20 query sets in the above five languages show that while dense retrieval improves substantially over traditional sparse models, fusion-based approaches – particularly LPF – consistently yield superior nDCG scores across most query-document language pairs.
The results underscore the utility of generative AI in enhancing retrieval performance in multilingual, low-resource, speech-centric IR scenarios. This benchmark contributes to developing scalable, speech-first, and language-agnostic retrieval systems, with implications for inclusive information access in linguistically fragmented regions.
本文介绍了SqCLIRIL(印度语言的口语查询跨语言信息检索),这是一个综合基准,旨在评估跨五种印度语言(印地语、古吉拉特语、孟加拉语、卡纳达语和英语)的基于口语查询的跨语言检索。该任务包括跨五种语言对的单语言和跨语言检索设置,包括语音查询(男声和女声)和文本文档集合。主要目标是评估在低资源、多语言环境下的检索效率,这反映了现实世界的语言多样性和访问限制。我们研究了四种检索架构:(i)使用BM25进行稀疏词汇匹配,(ii)通过双编码器模型进行密集语义检索,(iii)通过稀疏和密集分数的互反秩融合(RRF)进行混合排序,以及(iv)基于大型语言模型(LLM)的集成生成语义对齐的点向融合策略(LPF)。对上述五种语言的人工翻译TREC DL ' 19和DL ' 20查询集的实验评估表明,尽管密集检索比传统的稀疏模型有了很大的改进,但基于融合的方法(尤其是LPF)在大多数查询文档语言对中始终产生更高的nDCG分数。研究结果强调了生成式人工智能在多语言、低资源、以语音为中心的IR场景中提高检索性能的效用。该基准有助于开发可扩展、语音优先和语言无关的检索系统,对语言碎片化地区的包容性信息访问具有重要意义。
{"title":"SqCLIRIL: Spoken query cross-lingual information retrieval in Indian languages","authors":"Bhargav Dave ,&nbsp;Prasenjit Majumder","doi":"10.1016/j.patrec.2025.08.022","DOIUrl":"10.1016/j.patrec.2025.08.022","url":null,"abstract":"<div><div>This paper presents SqCLIRIL (Spoken Query Cross-lingual Information Retrieval in Indian Languages), a comprehensive benchmark designed to evaluate spoken query-based cross-lingual retrieval across five Indian languages: Hindi, Gujarati, Bengali, Kannada, and English. The task encompasses monolingual and cross-lingual retrieval settings across five language pairs, incorporating spoken queries (male and female voices) and document collections in text. The primary objective is to assess retrieval effectiveness in a low-resource, multilingual setting that reflects real-world language diversity and access constraints.</div><div>We investigate four retrieval architectures: (i) sparse lexical matching using BM25, (ii) dense semantic retrieval via bi-encoder models, (iii) hybrid ranking through Reciprocal Rank Fusion (RRF) of sparse and dense scores, and (iv) a Large Language Model (LLM)-based pointwise fusion strategy (LPF) that integrates generative semantic alignment. Experimental evaluations on the human-translated TREC DL’19 and DL’20 query sets in the above five languages show that while dense retrieval improves substantially over traditional sparse models, fusion-based approaches – particularly LPF – consistently yield superior nDCG scores across most query-document language pairs.</div><div>The results underscore the utility of generative AI in enhancing retrieval performance in multilingual, low-resource, speech-centric IR scenarios. This benchmark contributes to developing scalable, speech-first, and language-agnostic retrieval systems, with implications for inclusive information access in linguistically fragmented regions.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 288-294"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Plug and play labeling strategies for boosting small brain lesion segmentation 即插即用标签策略促进小脑损伤分割
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-10-21 DOI: 10.1016/j.patrec.2025.10.011
Liang Shang, Zhengyang Lou, William A. Sethares, Andrew L. Alexander, Vivek Prabhakaran, Veena A. Nair, Nagesh Adluru
Accurate segmentation of small brain lesions in magnetic resonance imaging (MRI) is essential for understanding neurological disorders and guiding clinical decisions. However, detecting small lesions remains challenging due to low contrast and limited size. This study proposes two simple yet effective labeling strategies, Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), that can seamlessly integrate into existing segmentation networks. MSL groups lesions based on volume to enable size-aware learning, while DBL emphasizes lesion boundaries to enhance structural sensitivity. We evaluate our approach on two benchmark datasets: stroke lesion segmentation using the Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0 dataset and multiple sclerosis lesion segmentation using the Multiple Sclerosis Lesion Segmentation (MSLesSeg) dataset. On ATLAS v2.0, our approach achieved higher Dice (+1.3%), F1 (+2.4%), precision (+7.2%), and recall (+3.6%) scores compared to the top-performing method from a previous challenge. On MSLesSeg, our approach achieved the highest Dice score (0.7146) and ranked first among 16 international teams. Additionally, we examined the effectiveness of attention-based and mamba-based segmentation models but found that our proposed labeling strategies yielded more consistent improvements. These findings demonstrate that MSL and DBL offer a robust and generalizable solution for enhancing small brain lesion segmentation across various tasks and architectures. Our code is available at: https://github.com/nadluru/StrokeLesSeg.
磁共振成像(MRI)对脑小病变的准确分割对于理解神经系统疾病和指导临床决策至关重要。然而,由于低对比度和有限的尺寸,检测小病变仍然具有挑战性。本研究提出了两种简单有效的标记策略,即多尺寸标记(MSL)和基于距离的标记(DBL),它们可以无缝地集成到现有的分割网络中。MSL根据体积对病变进行分组,实现大小感知学习;DBL强调病变边界,增强结构敏感性。我们在两个基准数据集上评估了我们的方法:使用卒中后病变解剖追踪(ATLAS) v2.0数据集的卒中病变分割和使用多发性硬化症病变分割(MSLesSeg)数据集的多发性硬化症病变分割。在ATLAS v2.0上,我们的方法获得了更高的Dice(+1.3%)、F1(+2.4%)、准确率(+7.2%)和召回率(+3.6%)分数。在MSLesSeg上,我们的方法获得了最高的Dice得分(0.7146),在16支国际队伍中排名第一。此外,我们检查了基于注意力和基于曼巴的分割模型的有效性,但发现我们提出的标签策略产生了更一致的改进。这些发现表明,MSL和DBL提供了一种强大的、通用的解决方案,可以在不同的任务和架构中增强小脑损伤的分割。我们的代码可在:https://github.com/nadluru/StrokeLesSeg。
{"title":"Plug and play labeling strategies for boosting small brain lesion segmentation","authors":"Liang Shang,&nbsp;Zhengyang Lou,&nbsp;William A. Sethares,&nbsp;Andrew L. Alexander,&nbsp;Vivek Prabhakaran,&nbsp;Veena A. Nair,&nbsp;Nagesh Adluru","doi":"10.1016/j.patrec.2025.10.011","DOIUrl":"10.1016/j.patrec.2025.10.011","url":null,"abstract":"<div><div>Accurate segmentation of small brain lesions in magnetic resonance imaging (MRI) is essential for understanding neurological disorders and guiding clinical decisions. However, detecting small lesions remains challenging due to low contrast and limited size. This study proposes two simple yet effective labeling strategies, Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), that can seamlessly integrate into existing segmentation networks. MSL groups lesions based on volume to enable size-aware learning, while DBL emphasizes lesion boundaries to enhance structural sensitivity. We evaluate our approach on two benchmark datasets: stroke lesion segmentation using the Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0 dataset and multiple sclerosis lesion segmentation using the Multiple Sclerosis Lesion Segmentation (MSLesSeg) dataset. On ATLAS v2.0, our approach achieved higher Dice (+1.3%), F1 (+2.4%), precision (+7.2%), and recall (+3.6%) scores compared to the top-performing method from a previous challenge. On MSLesSeg, our approach achieved the highest Dice score (0.7146) and ranked first among 16 international teams. Additionally, we examined the effectiveness of attention-based and mamba-based segmentation models but found that our proposed labeling strategies yielded more consistent improvements. These findings demonstrate that MSL and DBL offer a robust and generalizable solution for enhancing small brain lesion segmentation across various tasks and architectures. Our code is available at: <span><span>https://github.com/nadluru/StrokeLesSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 90-97"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Channel scaling: An efficient feature representation to enhance the generalization of few-shot learning 通道缩放:一种有效的特征表示,以增强少镜头学习的泛化
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-14 DOI: 10.1016/j.patrec.2025.11.010
Hongjie Chen , Pei Lu , Xiaoyong Liu , Yuan Ling
In recent years, deep learning has achieved significant breakthroughs in image classification. However, many practical scenarios are severely constrained by limited labeled data. To address this issue, few-shot learning has arisen as a solution, whereby features are extracted from limited training data and generalized to new categories. Existing approaches primarily rely on features extracted from backbone networks that predominantly focus on local regions while neglecting global contextual relationships, thereby limiting the model’s ability to distinguish fine-grained features. This paper introduces a lightweight Channel Scaling Module (CSM) to address this limitation. The proposed CSM operates by unfolding feature maps, applying channel scaling, and performing 3D convolution operations to enrich feature representations. This process simultaneously compresses the number of feature channels while expanding feature dimensions, enhancing the expressiveness of the representations with minimal computational overhead, and improving sensitivity to both local and global features. A series of comprehensive experiments were conducted, using multiple datasets covering standard few-shot classification, fine-grained few-shot classification, and cross-domain few-shot classification. The empirical results indicate that the proposed method consistently attains performance that is either comparable to or superior to that of current state-of-the-art approaches under the majority of scenarios.
近年来,深度学习在图像分类方面取得了重大突破。然而,许多实际场景受到有限标记数据的严重限制。为了解决这个问题,少量学习作为一种解决方案出现了,即从有限的训练数据中提取特征并将其推广到新的类别。现有的方法主要依赖于从骨干网络中提取的特征,这些特征主要关注局部区域,而忽略了全局上下文关系,从而限制了模型区分细粒度特征的能力。本文介绍了一个轻量级通道缩放模块(CSM)来解决这个限制。所提出的CSM通过展开特征映射、应用通道缩放和执行3D卷积操作来丰富特征表示。该过程在扩展特征维度的同时压缩了特征通道的数量,以最小的计算开销增强了表征的表达性,并提高了对局部和全局特征的敏感性。采用标准少弹分类、细粒度少弹分类、跨域少弹分类等多数据集进行了一系列综合实验。实证结果表明,在大多数情况下,所提出的方法始终能够达到与当前最先进的方法相当或优于当前最先进方法的性能。
{"title":"Channel scaling: An efficient feature representation to enhance the generalization of few-shot learning","authors":"Hongjie Chen ,&nbsp;Pei Lu ,&nbsp;Xiaoyong Liu ,&nbsp;Yuan Ling","doi":"10.1016/j.patrec.2025.11.010","DOIUrl":"10.1016/j.patrec.2025.11.010","url":null,"abstract":"<div><div>In recent years, deep learning has achieved significant breakthroughs in image classification. However, many practical scenarios are severely constrained by limited labeled data. To address this issue, few-shot learning has arisen as a solution, whereby features are extracted from limited training data and generalized to new categories. Existing approaches primarily rely on features extracted from backbone networks that predominantly focus on local regions while neglecting global contextual relationships, thereby limiting the model’s ability to distinguish fine-grained features. This paper introduces a lightweight Channel Scaling Module (CSM) to address this limitation. The proposed CSM operates by unfolding feature maps, applying channel scaling, and performing 3D convolution operations to enrich feature representations. This process simultaneously compresses the number of feature channels while expanding feature dimensions, enhancing the expressiveness of the representations with minimal computational overhead, and improving sensitivity to both local and global features. A series of comprehensive experiments were conducted, using multiple datasets covering standard few-shot classification, fine-grained few-shot classification, and cross-domain few-shot classification. The empirical results indicate that the proposed method consistently attains performance that is either comparable to or superior to that of current state-of-the-art approaches under the majority of scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 163-169"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding attention from the visual cortex: fMRI-based prediction of human saliency maps 从视觉皮层解码注意力:基于功能磁共振成像的人类显著性图预测
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-12 DOI: 10.1016/j.patrec.2025.11.019
Salvatore Calcagno , Marco Finocchiaro , Giovanni Bellitto, Concetto Spampinato, Federica Proietto Salanitri
Modeling visual attention from brain activity offers a powerful route to understanding how spatial salience is encoded in the human visual system. While deep learning models can accurately predict fixations from image content, it remains unclear whether similar saliency maps can be reconstructed directly from neural signals. In this study, we investigate the feasibility of decoding high-resolution spatial attention maps from 3T fMRI data. This study is the first to demonstrate that high-resolution, behaviorally-validated saliency maps can be decoded directly from 3T fMRI signals. We propose a two-stage decoder that transforms multivariate voxel responses from region-specific visual areas into spatial saliency distributions, using DeepGaze II maps as proxy supervision. Evaluation is conducted against new eye-tracking data collected on a held-out set of natural images. Results show that decoded maps significantly correlate with human fixations, particularly when using activity from early visual areas (V1–V4), which contribute most strongly to reconstruction accuracy. Higher-level areas yield above-chance performance but weaker predictions. These findings suggest that spatial attention is robustly represented in early visual cortex and support the use of fMRI-based decoding as a tool for probing the neural basis of salience in naturalistic viewing. Our code and eye-tracking annotations are available on GitHub.
从大脑活动中模拟视觉注意为理解空间显著性如何在人类视觉系统中编码提供了一条强有力的途径。虽然深度学习模型可以准确地预测图像内容的注视,但尚不清楚是否可以直接从神经信号中重建类似的显著性地图。在这项研究中,我们探讨了从3T fMRI数据解码高分辨率空间注意图的可行性。这项研究首次证明了高分辨率、经过行为验证的显著性图可以直接从3T fMRI信号中解码。我们提出了一种两阶段解码器,使用DeepGaze II地图作为代理监督,将区域特定视觉区域的多变量体素响应转换为空间显著性分布。评估是根据在一组自然图像上收集的新的眼动追踪数据进行的。结果表明,解码后的地图与人类注视显著相关,特别是当使用早期视觉区域(V1-V4)的活动时,这对重建精度贡献最大。较高水平的区域产生高于机会的表现,但较弱的预测。这些发现表明,空间注意力在早期视觉皮层中得到了强有力的表征,并支持使用基于fmri的解码作为探索自然观看中显著性的神经基础的工具。我们的代码和眼球追踪注释可以在GitHub上找到。
{"title":"Decoding attention from the visual cortex: fMRI-based prediction of human saliency maps","authors":"Salvatore Calcagno ,&nbsp;Marco Finocchiaro ,&nbsp;Giovanni Bellitto,&nbsp;Concetto Spampinato,&nbsp;Federica Proietto Salanitri","doi":"10.1016/j.patrec.2025.11.019","DOIUrl":"10.1016/j.patrec.2025.11.019","url":null,"abstract":"<div><div>Modeling visual attention from brain activity offers a powerful route to understanding how spatial salience is encoded in the human visual system. While deep learning models can accurately predict fixations from image content, it remains unclear whether similar saliency maps can be reconstructed directly from neural signals. In this study, we investigate the feasibility of decoding high-resolution spatial attention maps from 3T fMRI data. This study is the first to demonstrate that high-resolution, behaviorally-validated saliency maps can be decoded directly from 3T fMRI signals. We propose a two-stage decoder that transforms multivariate voxel responses from region-specific visual areas into spatial saliency distributions, using DeepGaze II maps as proxy supervision. Evaluation is conducted against new eye-tracking data collected on a held-out set of natural images. Results show that decoded maps significantly correlate with human fixations, particularly when using activity from early visual areas (V1–V4), which contribute most strongly to reconstruction accuracy. Higher-level areas yield above-chance performance but weaker predictions. These findings suggest that spatial attention is robustly represented in early visual cortex and support the use of fMRI-based decoding as a tool for probing the neural basis of salience in naturalistic viewing. Our code and eye-tracking annotations are available on <span><span>GitHub</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 156-162"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving out-of-domain generalization in Multiple Sclerosis detection and segmentation using Random Convolutions 利用随机卷积改进多发性硬化症检测和分割的域外泛化
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-07 DOI: 10.1016/j.patrec.2025.11.013
Aswathi Varma , Daniel Scholz , Ayhan Can Erdur , Jan C. Peeken , Daniel Rueckert , Benedikt Wiestler
Brain lesion segmentation is critical for diagnosing and monitoring neurological diseases such as Multiple Sclerosis (MS). However, lesion variability and differences in scanners and acquisition techniques pose a significant challenge to the robust generalization of automated segmentation models beyond their training domain. Traditional augmentations, such as rotation, intensity shifts, and scalings, often fail to capture the wide diversity observed across patient cases, limiting model generalizability. Random Convolutions (RC) address this limitation by introducing diverse intensity variations while preserving anatomical structures. Using an nnUNet-based model enhanced with RC augmentations, we achieved 5th place in the MSLesSeg challenge, highlighting that RC augmentations offer competitive in-domain performance. Building on this, we further assess model performance, both in terms of lesion detection and segmentation, in- and out-of-domain. We compare RC with several state-of-the-art augmentation and domain generalization strategies and show that an nnUNet trained with the RC augmentation is competitive in-domain and demonstrates superior generalization performance.
脑损伤分割对于多发性硬化症(MS)等神经系统疾病的诊断和监测至关重要。然而,病变的可变性以及扫描仪和采集技术的差异对自动分割模型在训练域之外的鲁棒泛化提出了重大挑战。传统的增强方法,如旋转、强度变化和缩放,往往无法捕捉到在患者病例中观察到的广泛多样性,从而限制了模型的可泛化性。随机卷积(RC)通过在保留解剖结构的同时引入不同的强度变化来解决这一限制。使用基于nnunet的增强RC增强模型,我们在MSLesSeg挑战赛中获得了第五名,突出了RC增强提供了具有竞争力的域内性能。在此基础上,我们进一步评估模型的性能,包括病灶检测和分割,域内和域外。我们将RC与几种最先进的增强和域泛化策略进行了比较,并表明使用RC增强训练的nnUNet在域内具有竞争力,并表现出优越的泛化性能。
{"title":"Improving out-of-domain generalization in Multiple Sclerosis detection and segmentation using Random Convolutions","authors":"Aswathi Varma ,&nbsp;Daniel Scholz ,&nbsp;Ayhan Can Erdur ,&nbsp;Jan C. Peeken ,&nbsp;Daniel Rueckert ,&nbsp;Benedikt Wiestler","doi":"10.1016/j.patrec.2025.11.013","DOIUrl":"10.1016/j.patrec.2025.11.013","url":null,"abstract":"<div><div>Brain lesion segmentation is critical for diagnosing and monitoring neurological diseases such as Multiple Sclerosis (MS). However, lesion variability and differences in scanners and acquisition techniques pose a significant challenge to the robust generalization of automated segmentation models beyond their training domain. Traditional augmentations, such as rotation, intensity shifts, and scalings, often fail to capture the wide diversity observed across patient cases, limiting model generalizability. Random Convolutions (RC) address this limitation by introducing diverse intensity variations while preserving anatomical structures. Using an nnUNet-based model enhanced with RC augmentations, we achieved 5th place in the MSLesSeg challenge, highlighting that RC augmentations offer competitive in-domain performance. Building on this, we further assess model performance, both in terms of lesion detection and segmentation, in- and out-of-domain. We compare RC with several state-of-the-art augmentation and domain generalization strategies and show that an nnUNet trained with the RC augmentation is competitive in-domain and demonstrates superior generalization performance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 98-105"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1