首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
TSMnet: Two-step separation pipeline based on threshold shrinkage memory network for weakly-supervised video anomaly detection 基于阈值收缩记忆网络的两步分离管道弱监督视频异常检测
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-30 DOI: 10.1016/j.patrec.2025.10.017
Qun Li , Peng Gu , Xinping Gao , Bir Bhanu
Since anomalous events are much rarer than normal events in videos, current methods for Weakly Supervised Video Anomaly Detection (WSVAD) struggle to use both normal and abnormal data effectively, blurring the normality-abnormality boundary. To tackle this, we propose a novel two-step separation pipeline based on Threshold Shrinkage Memory network (TSMnet) for WSVAD. It mimics the human visual system to better understand video anomalies. We introduce a threshold shrinkage memory module that emulates the human brain’s memory, storing patterns and reducing normal memory redundancy via threshold-based shrinkage. A dual-branch contrastive learning module sharpens the normal-abnormal feature boundary for better classification. A global-to-local spatio-temporal adapter captures both global and local spatio-temporal information. Experimental results show that our method outperforms the state-of-the-art works.
由于视频中的异常事件比正常事件少得多,目前的弱监督视频异常检测(WSVAD)方法难以有效地利用正常和异常数据,模糊了正常和异常的边界。为了解决这个问题,我们提出了一种新的基于阈值收缩记忆网络(TSMnet)的两步分离管道。它模仿人类视觉系统,以更好地理解视频异常。我们引入了一个阈值收缩记忆模块,模拟人类大脑的记忆,存储模式,并通过阈值收缩减少正常的记忆冗余。双分支对比学习模块锐化正常-异常特征边界,实现更好的分类。全局到局部时空适配器捕获全局和局部时空信息。实验结果表明,该方法的性能优于目前最先进的算法。
{"title":"TSMnet: Two-step separation pipeline based on threshold shrinkage memory network for weakly-supervised video anomaly detection","authors":"Qun Li ,&nbsp;Peng Gu ,&nbsp;Xinping Gao ,&nbsp;Bir Bhanu","doi":"10.1016/j.patrec.2025.10.017","DOIUrl":"10.1016/j.patrec.2025.10.017","url":null,"abstract":"<div><div>Since anomalous events are much rarer than normal events in videos, current methods for Weakly Supervised Video Anomaly Detection (WSVAD) struggle to use both normal and abnormal data effectively, blurring the normality-abnormality boundary. To tackle this, we propose a novel two-step separation pipeline based on Threshold Shrinkage Memory network (TSMnet) for WSVAD. It mimics the human visual system to better understand video anomalies. We introduce a threshold shrinkage memory module that emulates the human brain’s memory, storing patterns and reducing normal memory redundancy via threshold-based shrinkage. A dual-branch contrastive learning module sharpens the normal-abnormal feature boundary for better classification. A global-to-local spatio-temporal adapter captures both global and local spatio-temporal information. Experimental results show that our method outperforms the state-of-the-art works.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 13-20"},"PeriodicalIF":3.3,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145420254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MTW-DETR: A multi-task collaborative optimization model for adverse weather object detection MTW-DETR:一种多任务协同优化的恶劣天气目标检测模型
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-29 DOI: 10.1016/j.patrec.2025.10.018
Bo Peng, Chao Ma, Yifan Chen, Mi Zhu, Ningsheng Liao
Most object detection models are trained under ideal lighting and weather conditions. However, when deployed in adverse weather conditions such as haze, rain, and snow, these models suffer from image quality degradation and target occlusion problems, leading to deteriorated detection performance. To address these challenges, this paper proposes MTW-DETR, a multi-task collaborative detection model that employs a dual-stream network architecture to achieve joint optimization of image restoration and object detection. The model enhances feature representation capabilities for low-quality images through a cross-task feature sharing mechanism and a feature enhancement module. Specifically, within the joint learning framework, we design three key components. First, a restoration subnetwork embedded with a Channel Pixel Attention module achieves fine-grained image restoration and adopts a dynamic feature calibration strategy, thereby improving degraded image quality. Furthermore, a Weight Space Reconstruction Module is integrated into the backbone network to enhance multi-scale feature representation capabilities. Finally, a Branch Shift Convolution Module is incorporated in the neck to improve global information extraction ability, enhance understanding of the overall image structure and feature representation. Experimental results demonstrate that on the real haze dataset RTTS, our model achieves 38% AP, representing a 3.7% improvement over the baseline model RT-DETR. In cross-domain evaluations on synthetic rain and fog datasets, the model shows significant accuracy improvements and exhibits excellent generalization ability across diverse weather scenarios.
大多数目标检测模型都是在理想的光照和天气条件下训练的。然而,当部署在恶劣的天气条件下,如雾霾、雨和雪,这些模型遭受图像质量下降和目标遮挡问题,导致检测性能下降。针对这些挑战,本文提出了MTW-DETR多任务协同检测模型,该模型采用双流网络架构实现图像恢复和目标检测的联合优化。该模型通过跨任务特征共享机制和特征增强模块增强了低质量图像的特征表示能力。具体来说,在联合学习框架内,我们设计了三个关键组成部分。首先,嵌入Channel Pixel Attention模块的恢复子网实现细粒度图像恢复,并采用动态特征校准策略,改善退化图像质量。此外,在骨干网中集成了权重空间重构模块,增强了多尺度特征表示能力。最后,在颈部加入分支移位卷积模块,提高全局信息提取能力,增强对图像整体结构和特征表示的理解。实验结果表明,在真实雾霾数据集RTTS上,我们的模型达到38%的AP,比基线模型RT-DETR提高了3.7%。在对合成雨雾数据集的跨域评估中,该模型的精度有了显著提高,并在不同天气情景下表现出出色的泛化能力。
{"title":"MTW-DETR: A multi-task collaborative optimization model for adverse weather object detection","authors":"Bo Peng,&nbsp;Chao Ma,&nbsp;Yifan Chen,&nbsp;Mi Zhu,&nbsp;Ningsheng Liao","doi":"10.1016/j.patrec.2025.10.018","DOIUrl":"10.1016/j.patrec.2025.10.018","url":null,"abstract":"<div><div>Most object detection models are trained under ideal lighting and weather conditions. However, when deployed in adverse weather conditions such as haze, rain, and snow, these models suffer from image quality degradation and target occlusion problems, leading to deteriorated detection performance. To address these challenges, this paper proposes MTW-DETR, a multi-task collaborative detection model that employs a dual-stream network architecture to achieve joint optimization of image restoration and object detection. The model enhances feature representation capabilities for low-quality images through a cross-task feature sharing mechanism and a feature enhancement module. Specifically, within the joint learning framework, we design three key components. First, a restoration subnetwork embedded with a Channel Pixel Attention module achieves fine-grained image restoration and adopts a dynamic feature calibration strategy, thereby improving degraded image quality. Furthermore, a Weight Space Reconstruction Module is integrated into the backbone network to enhance multi-scale feature representation capabilities. Finally, a Branch Shift Convolution Module is incorporated in the neck to improve global information extraction ability, enhance understanding of the overall image structure and feature representation. Experimental results demonstrate that on the real haze dataset RTTS, our model achieves 38% AP, representing a 3.7% improvement over the baseline model RT-DETR. In cross-domain evaluations on synthetic rain and fog datasets, the model shows significant accuracy improvements and exhibits excellent generalization ability across diverse weather scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 7-12"},"PeriodicalIF":3.3,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145384598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ADFNeT: Adaptive decomposition and fusion for color constancy ADFNeT:色彩稳定性的自适应分解和融合
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-28 DOI: 10.1016/j.patrec.2025.10.006
Zhuo-Ming Du , Hong-An Li , Qian Yu
Achieving color constancy is a critical yet challenging task, requiring the estimation of global illumination from a single RGB image to remove color casts caused by non-standard lighting. This paper introduces ADFNet (Adaptive Decomposition and Fusion Network), an end-to-end framework comprising two key modules: ADCL (Adaptive Decomposition and Coefficient Learning) and SWIP (Semantic Weighting for Illumination Prediction). ADCL decomposes the input image into three interpretable components (Mean Intensity, Variation Magnitude, and Variation Direction), while jointly learning adaptive weights and offsets for accurate recomposition. These components are fused into an HDR-like representation via an Adaptive Fusion Module. SWIP further refines this representation through semantic-aware weighting and predicts the global illumination using a lightweight convolutional network. Extensive experiments demonstrate that ADFNet achieves state-of-the-art accuracy and robustness, highlighting its potential for real-world applications such as photographic enhancement and vision-based perception systems.
实现颜色恒定是一项关键而具有挑战性的任务,需要从单个RGB图像中估计全局照明,以消除由非标准照明引起的偏色。本文介绍了自适应分解和融合网络(ADFNet),这是一个端到端框架,包括两个关键模块:ADCL(自适应分解和系数学习)和SWIP(照明预测的语义加权)。ADCL将输入图像分解为三个可解释的分量(Mean Intensity、Variation Magnitude和Variation Direction),同时共同学习自适应权重和偏移量,以实现精确的重组。这些组件通过自适应融合模块融合成类似hdr的表示。SWIP通过语义感知加权进一步细化这种表示,并使用轻量级卷积网络预测全局照明。大量的实验表明,ADFNet达到了最先进的精度和鲁棒性,突出了其在现实世界中的应用潜力,如照片增强和基于视觉的感知系统。
{"title":"ADFNeT: Adaptive decomposition and fusion for color constancy","authors":"Zhuo-Ming Du ,&nbsp;Hong-An Li ,&nbsp;Qian Yu","doi":"10.1016/j.patrec.2025.10.006","DOIUrl":"10.1016/j.patrec.2025.10.006","url":null,"abstract":"<div><div>Achieving color constancy is a critical yet challenging task, requiring the estimation of global illumination from a single RGB image to remove color casts caused by non-standard lighting. This paper introduces ADFNet (Adaptive Decomposition and Fusion Network), an end-to-end framework comprising two key modules: ADCL (Adaptive Decomposition and Coefficient Learning) and SWIP (Semantic Weighting for Illumination Prediction). ADCL decomposes the input image into three interpretable components (Mean Intensity, Variation Magnitude, and Variation Direction), while jointly learning adaptive weights and offsets for accurate recomposition. These components are fused into an HDR-like representation via an Adaptive Fusion Module. SWIP further refines this representation through semantic-aware weighting and predicts the global illumination using a lightweight convolutional network. Extensive experiments demonstrate that ADFNet achieves state-of-the-art accuracy and robustness, highlighting its potential for real-world applications such as photographic enhancement and vision-based perception systems.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 1-6"},"PeriodicalIF":3.3,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145384597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foreword to the Special Section on SIBGRAPI 2024 SIBGRAPI 2024专题部分前言
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-28 DOI: 10.1016/j.patrec.2025.10.014
Jurandy Almeida, Carla M.D.S. Freitas, Nicu Sebe, Alexandru C. Telea
{"title":"Foreword to the Special Section on SIBGRAPI 2024","authors":"Jurandy Almeida,&nbsp;Carla M.D.S. Freitas,&nbsp;Nicu Sebe,&nbsp;Alexandru C. Telea","doi":"10.1016/j.patrec.2025.10.014","DOIUrl":"10.1016/j.patrec.2025.10.014","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 123-124"},"PeriodicalIF":3.3,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Special Section Forum for Information Retrieval Evaluation (FIRE) 2024 第二十四届信息检索评价专题论坛
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-27 DOI: 10.1016/j.patrec.2025.10.012
Thomas Mandl , Prasenjit Majumder
The Forum for Information Retrieval Evaluation (FIRE) is an evaluation initiative focused on resources for languages of India. FIRE 2024 was the 16th edition and comprised 10 evaluation tracks which ran as shared tasks. Three contributions showcase the research contributed to these evaluation tracks of the FIRE conference. The first contribution is related spoken language retrieval for six languages of India. The second paper deals with source code retrieval for the language C and employs LLMs for the task of generating good comments. The third contribution discusses performance patterns of classifiers for hate speech collections for languages of India. Furthermore, connections to the Pattern Recognition community are discussed.
信息检索评价论坛(FIRE)是一个以印度语言资源为重点的评价倡议。FIRE 2024是第16版,包含10个评估轨道,作为共享任务运行。三篇文章展示了为FIRE会议的这些评估轨道所做的研究。第一个贡献是有关印度六种语言的口语检索。第二篇论文涉及C语言的源代码检索,并使用llm来生成好的注释。第三个贡献讨论了印度语言的仇恨言论集合的分类器的性能模式。此外,还讨论了与模式识别社区的联系。
{"title":"Editorial: Special Section Forum for Information Retrieval Evaluation (FIRE) 2024","authors":"Thomas Mandl ,&nbsp;Prasenjit Majumder","doi":"10.1016/j.patrec.2025.10.012","DOIUrl":"10.1016/j.patrec.2025.10.012","url":null,"abstract":"<div><div>The Forum for Information Retrieval Evaluation (FIRE) is an evaluation initiative focused on resources for languages of India. FIRE 2024 was the 16th edition and comprised 10 evaluation tracks which ran as shared tasks. Three contributions showcase the research contributed to these evaluation tracks of the FIRE conference. The first contribution is related spoken language retrieval for six languages of India. The second paper deals with source code retrieval for the language C and employs LLMs for the task of generating good comments. The third contribution discusses performance patterns of classifiers for hate speech collections for languages of India. Furthermore, connections to the Pattern Recognition community are discussed.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 285-287"},"PeriodicalIF":3.3,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable data twinning 可扩展数据孪生
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-27 DOI: 10.1016/j.patrec.2025.10.015
Sujay Mudalgi, Anh Tuan Bui
Data splitting is imperative when building a statistical or machine learning model, among other use cases. To obtain statistically representative samples, numerous methods have been proposed. Twinning is the state-of-the-art method in this space. It is based on minimizing energy distance between the subsets and original dataset. However, the execution speed of Twinning is not desirable for large datasets. This article proposes scalable Twinning (s-Twinning) to improve data splitting speed while maintaining accuracy. The performance lift for larger datasets from s-Twinning over the state-of-the-art data splitting methods is demonstrated through real examples.
在构建统计或机器学习模型以及其他用例时,数据分割是必不可少的。为了获得具有统计代表性的样本,提出了许多方法。孪生是这个领域最先进的方法。它基于最小化子集与原始数据集之间的能量距离。然而,对于大型数据集,twin的执行速度并不理想。本文提出了可扩展的孪生(s-Twinning)来提高数据分割速度,同时保持准确性。通过实际示例演示了s- twin对大型数据集的性能提升,而不是最先进的数据分割方法。
{"title":"Scalable data twinning","authors":"Sujay Mudalgi,&nbsp;Anh Tuan Bui","doi":"10.1016/j.patrec.2025.10.015","DOIUrl":"10.1016/j.patrec.2025.10.015","url":null,"abstract":"<div><div>Data splitting is imperative when building a statistical or machine learning model, among other use cases. To obtain statistically representative samples, numerous methods have been proposed. Twinning is the state-of-the-art method in this space. It is based on minimizing energy distance between the subsets and original dataset. However, the execution speed of Twinning is not desirable for large datasets. This article proposes scalable Twinning (<em>s-</em>Twinning) to improve data splitting speed while maintaining accuracy. The performance lift for larger datasets from <em>s-</em>Twinning over the state-of-the-art data splitting methods is demonstrated through real examples.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 34-39"},"PeriodicalIF":3.3,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MS2ADM-BTS: Multi-scale Dual Attention Guided Diffusion Model for Volumetric Brain Tumor Segmentation MS2ADM-BTS:脑肿瘤体积分割的多尺度双注意引导扩散模型
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-22 DOI: 10.1016/j.patrec.2025.10.010
Dolly Uppal, Surya Prakash
Accurate segmentation of brain tumors plays a vital role in diagnosis and clinical evaluation. Diffusion models have emerged as a promising approach in medical image segmentation, due to their ability to generate high-quality representations. Existing diffusion-based approaches exhibit limited integration of multimodal images across multiple scales, and effectively eliminating noise interference in brain tumor images remains a major limitation in segmentation tasks. Further, these methods face challenges in effectively balancing global and local feature extraction and integration, specifically in multi-label segmentation tasks. To address these challenges, we propose a Multi-scale Dual Attention Guided Diffusion Model, named MS2ADM-BTS, tailored for Volumetric Brain Tumor Segmentation in multimodal Magnetic Resonance Imaging (MRI) images. It consists of a Context-Aware Feature (CxAF) encoder and a Dual-Stage Multi-Scale Feature Fusion (DS-MSFF) denoising network that learns the denoising process to generate multi-label segmentation predictions. Further, the DS-MSFF denoising network includes an Attention-Guided Cross-Scale Feature Fusion (AGCS-FF) module that effectively models long-range dependencies in high-resolution feature maps and enhances feature representation and reconstruction quality. In addition, we introduce a novel inference-time sampling procedure that incorporates a Spectral-Guided Noise Initialization mechanism to mitigate the training-inference gap and Uncertainty-Guided Diffusion Sampling to provide robust segmentation outcomes. We evaluate the efficacy of the proposed approach using the benchmark datasets from the 2020 Multimodal Brain Tumor Segmentation (BraTS) Challenge and the Medical Segmentation Decathlon (MSD) BraTS dataset. The results show that the proposed approach outperforms existing state-of-the-art methods due to its effective denoising capability. The code is available at https://github.com/Dolly-Uppal/MS2ADM-BTS.
脑肿瘤的准确分割在诊断和临床评价中起着至关重要的作用。扩散模型已经成为医学图像分割的一种很有前途的方法,因为它们能够产生高质量的表示。现有的基于扩散的方法对多尺度多模态图像的整合能力有限,有效消除脑肿瘤图像中的噪声干扰仍然是分割任务的主要限制。此外,这些方法在有效平衡全局和局部特征提取和集成方面面临挑战,特别是在多标签分割任务中。为了解决这些挑战,我们提出了一个多尺度双注意引导扩散模型,命名为MS2ADM-BTS,专门用于多模态磁共振成像(MRI)图像的体积脑肿瘤分割。它由上下文感知特征(CxAF)编码器和双阶段多尺度特征融合(DS-MSFF)去噪网络组成,该网络学习去噪过程以生成多标签分割预测。此外,DS-MSFF去噪网络包括一个注意力引导的跨尺度特征融合(AGCS-FF)模块,该模块可以有效地模拟高分辨率特征图中的远程依赖关系,提高特征表示和重建质量。此外,我们引入了一种新的推理时间采样过程,该过程结合了光谱引导噪声初始化机制来缓解训练-推理差距和不确定性引导扩散采样,以提供鲁棒的分割结果。我们使用来自2020年多模态脑肿瘤分割(BraTS)挑战赛和医学分割十项全能(MSD) BraTS数据集的基准数据集来评估所提出方法的有效性。结果表明,该方法具有有效的去噪能力,优于现有的先进方法。代码可在https://github.com/Dolly-Uppal/MS2ADM-BTS上获得。
{"title":"MS2ADM-BTS: Multi-scale Dual Attention Guided Diffusion Model for Volumetric Brain Tumor Segmentation","authors":"Dolly Uppal,&nbsp;Surya Prakash","doi":"10.1016/j.patrec.2025.10.010","DOIUrl":"10.1016/j.patrec.2025.10.010","url":null,"abstract":"<div><div>Accurate segmentation of brain tumors plays a vital role in diagnosis and clinical evaluation. Diffusion models have emerged as a promising approach in medical image segmentation, due to their ability to generate high-quality representations. Existing diffusion-based approaches exhibit limited integration of multimodal images across multiple scales, and effectively eliminating noise interference in brain tumor images remains a major limitation in segmentation tasks. Further, these methods face challenges in effectively balancing global and local feature extraction and integration, specifically in multi-label segmentation tasks. To address these challenges, we propose a Multi-scale Dual Attention Guided Diffusion Model, named MS2ADM-BTS, tailored for Volumetric Brain Tumor Segmentation in multimodal Magnetic Resonance Imaging (MRI) images. It consists of a Context-Aware Feature (CxAF) encoder and a Dual-Stage Multi-Scale Feature Fusion (DS-MSFF) denoising network that learns the denoising process to generate multi-label segmentation predictions. Further, the DS-MSFF denoising network includes an Attention-Guided Cross-Scale Feature Fusion (AGCS-FF) module that effectively models long-range dependencies in high-resolution feature maps and enhances feature representation and reconstruction quality. In addition, we introduce a novel inference-time sampling procedure that incorporates a Spectral-Guided Noise Initialization mechanism to mitigate the training-inference gap and Uncertainty-Guided Diffusion Sampling to provide robust segmentation outcomes. We evaluate the efficacy of the proposed approach using the benchmark datasets from the 2020 Multimodal Brain Tumor Segmentation (BraTS) Challenge and the Medical Segmentation Decathlon (MSD) BraTS dataset. The results show that the proposed approach outperforms existing state-of-the-art methods due to its effective denoising capability. The code is available at <span><span>https://github.com/Dolly-Uppal/MS2ADM-BTS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 115-122"},"PeriodicalIF":3.3,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Plug and play labeling strategies for boosting small brain lesion segmentation 即插即用标签策略促进小脑损伤分割
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-21 DOI: 10.1016/j.patrec.2025.10.011
Liang Shang, Zhengyang Lou, William A. Sethares, Andrew L. Alexander, Vivek Prabhakaran, Veena A. Nair, Nagesh Adluru
Accurate segmentation of small brain lesions in magnetic resonance imaging (MRI) is essential for understanding neurological disorders and guiding clinical decisions. However, detecting small lesions remains challenging due to low contrast and limited size. This study proposes two simple yet effective labeling strategies, Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), that can seamlessly integrate into existing segmentation networks. MSL groups lesions based on volume to enable size-aware learning, while DBL emphasizes lesion boundaries to enhance structural sensitivity. We evaluate our approach on two benchmark datasets: stroke lesion segmentation using the Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0 dataset and multiple sclerosis lesion segmentation using the Multiple Sclerosis Lesion Segmentation (MSLesSeg) dataset. On ATLAS v2.0, our approach achieved higher Dice (+1.3%), F1 (+2.4%), precision (+7.2%), and recall (+3.6%) scores compared to the top-performing method from a previous challenge. On MSLesSeg, our approach achieved the highest Dice score (0.7146) and ranked first among 16 international teams. Additionally, we examined the effectiveness of attention-based and mamba-based segmentation models but found that our proposed labeling strategies yielded more consistent improvements. These findings demonstrate that MSL and DBL offer a robust and generalizable solution for enhancing small brain lesion segmentation across various tasks and architectures. Our code is available at: https://github.com/nadluru/StrokeLesSeg.
磁共振成像(MRI)对脑小病变的准确分割对于理解神经系统疾病和指导临床决策至关重要。然而,由于低对比度和有限的尺寸,检测小病变仍然具有挑战性。本研究提出了两种简单有效的标记策略,即多尺寸标记(MSL)和基于距离的标记(DBL),它们可以无缝地集成到现有的分割网络中。MSL根据体积对病变进行分组,实现大小感知学习;DBL强调病变边界,增强结构敏感性。我们在两个基准数据集上评估了我们的方法:使用卒中后病变解剖追踪(ATLAS) v2.0数据集的卒中病变分割和使用多发性硬化症病变分割(MSLesSeg)数据集的多发性硬化症病变分割。在ATLAS v2.0上,我们的方法获得了更高的Dice(+1.3%)、F1(+2.4%)、准确率(+7.2%)和召回率(+3.6%)分数。在MSLesSeg上,我们的方法获得了最高的Dice得分(0.7146),在16支国际队伍中排名第一。此外,我们检查了基于注意力和基于曼巴的分割模型的有效性,但发现我们提出的标签策略产生了更一致的改进。这些发现表明,MSL和DBL提供了一种强大的、通用的解决方案,可以在不同的任务和架构中增强小脑损伤的分割。我们的代码可在:https://github.com/nadluru/StrokeLesSeg。
{"title":"Plug and play labeling strategies for boosting small brain lesion segmentation","authors":"Liang Shang,&nbsp;Zhengyang Lou,&nbsp;William A. Sethares,&nbsp;Andrew L. Alexander,&nbsp;Vivek Prabhakaran,&nbsp;Veena A. Nair,&nbsp;Nagesh Adluru","doi":"10.1016/j.patrec.2025.10.011","DOIUrl":"10.1016/j.patrec.2025.10.011","url":null,"abstract":"<div><div>Accurate segmentation of small brain lesions in magnetic resonance imaging (MRI) is essential for understanding neurological disorders and guiding clinical decisions. However, detecting small lesions remains challenging due to low contrast and limited size. This study proposes two simple yet effective labeling strategies, Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), that can seamlessly integrate into existing segmentation networks. MSL groups lesions based on volume to enable size-aware learning, while DBL emphasizes lesion boundaries to enhance structural sensitivity. We evaluate our approach on two benchmark datasets: stroke lesion segmentation using the Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0 dataset and multiple sclerosis lesion segmentation using the Multiple Sclerosis Lesion Segmentation (MSLesSeg) dataset. On ATLAS v2.0, our approach achieved higher Dice (+1.3%), F1 (+2.4%), precision (+7.2%), and recall (+3.6%) scores compared to the top-performing method from a previous challenge. On MSLesSeg, our approach achieved the highest Dice score (0.7146) and ranked first among 16 international teams. Additionally, we examined the effectiveness of attention-based and mamba-based segmentation models but found that our proposed labeling strategies yielded more consistent improvements. These findings demonstrate that MSL and DBL offer a robust and generalizable solution for enhancing small brain lesion segmentation across various tasks and architectures. Our code is available at: <span><span>https://github.com/nadluru/StrokeLesSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 90-97"},"PeriodicalIF":3.3,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep graph neural network architecture enhanced by self-attention aggregation mechanism 基于自关注聚合机制的深度图神经网络结构
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-13 DOI: 10.1016/j.patrec.2025.10.009
Wei Chen, Wenxu Yan, Wenyuan Wang
Adaptive aggregation in deep graph neural networks (DGNNs) enhances node representation by dynamically weighting neighbors at varying distances. However, existing methods often increase complexity and reduce interpretability. This paper proposes SAAG, a self-attention aggregation mechanism that effectively captures relationships across different-hop neighbors with fewer parameters and an expanded receptive field. Extensive experiments on node classification benchmarks demonstrate that SAAG-DGNN consistently outperforms state-of-the-art methods in accuracy.
深度图神经网络(dgnn)中的自适应聚合通过动态加权不同距离的邻居来增强节点表示。然而,现有的方法往往增加复杂性和降低可解释性。本文提出了一种自关注聚合机制SAAG,该机制能够以更少的参数和更大的接受域有效地捕获不同跳邻居之间的关系。在节点分类基准上进行的大量实验表明,SAAG-DGNN在准确性方面始终优于最先进的方法。
{"title":"Deep graph neural network architecture enhanced by self-attention aggregation mechanism","authors":"Wei Chen,&nbsp;Wenxu Yan,&nbsp;Wenyuan Wang","doi":"10.1016/j.patrec.2025.10.009","DOIUrl":"10.1016/j.patrec.2025.10.009","url":null,"abstract":"<div><div>Adaptive aggregation in deep graph neural networks (DGNNs) enhances node representation by dynamically weighting neighbors at varying distances. However, existing methods often increase complexity and reduce interpretability. This paper proposes SAAG, a self-attention aggregation mechanism that effectively captures relationships across different-hop neighbors with fewer parameters and an expanded receptive field. Extensive experiments on node classification benchmarks demonstrate that SAAG-DGNN consistently outperforms state-of-the-art methods in accuracy.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 101-107"},"PeriodicalIF":3.3,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145364078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GCAESeg: Grouped Channel Attention Enhanced network for thyroid nodule segmentation in ultrasound images GCAESeg:分组通道注意增强网络在超声图像中分割甲状腺结节
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-11 DOI: 10.1016/j.patrec.2025.10.008
Kang Kang , Anruo Wei , Xian-Xian Liu , Hong Chen , Jie Yang
In this paper, we propose a novel Grouped Channel Attention Enhanced network (GCAESeg) architecture for thyroid nodules, which aims to improve segmentation accuracy and facilitate early diagnosis of thyroid cancer. This architecture integrates the cross-modal features of visual-linguistic models with enhanced visual processing mechanisms in order to achieve accurate semantic predictions. The GCAESeg network consists of four core components: Enhanced Clip Feature Extractor (ECFE), Enhanced Vision Feature Extractor (EVFE), Grouped Channel Attention (GCA) module and Enhanced Prediction Head (EPH) module. The ECFE uses the CLIP backbone network to extract multi-scale semantic features and incorporates an adaptive attention mechanism. The EVFE captures spatially detailed features through a U-shaped coding and decoding structure. The GCA module performs selective feature enhancement before feature fusion, and the EPH implements cross-modal enhancement and realizes the refined fusion of cross-modal features. Experimental results show that the GCAESeg model outperforms other compared models in several performance metrics, particularly when dealing with boundary-ambiguous and tiny lesions. On the PKTN dataset, the GCAESeg model achieved IoU, Dice and foreground IoU metrics of 87.83%,92.62% and 78.08%, respectively, all of which were significantly improved compared to the baseline ClipTNSeg model. Ablation experiments verified the important role of the ECFE, EVFE and EPH modules and proved their contribution to improving the model’s segmentation accuracy. This study is expected to promote the further development of early diagnosis technology for thyroid cancer and provide a more accurate and reliable basis for clinical diagnosis. Our code are available at https://github.com/Acekang/GCAESegNet.
本文提出了一种新的针对甲状腺结节的分组通道注意力增强网络(GCAESeg)架构,旨在提高甲状腺结节的分割准确率,促进甲状腺癌的早期诊断。该体系结构将视觉语言模型的跨模态特征与增强的视觉处理机制相结合,以实现准确的语义预测。GCAESeg网络由四个核心部分组成:增强型片段特征提取器(ECFE)、增强型视觉特征提取器(EVFE)、分组信道注意(GCA)模块和增强型预测头(EPH)模块。ECFE利用CLIP骨干网提取多尺度语义特征,并引入自适应注意机制。EVFE通过u型编码和解码结构捕获空间细节特征。GCA模块在特征融合前进行选择性特征增强,EPH模块进行跨模态增强,实现跨模态特征的精细化融合。实验结果表明,GCAESeg模型在几个性能指标上优于其他比较模型,特别是在处理边界模糊和微小病变时。在PKTN数据集上,GCAESeg模型的IoU、Dice和前景IoU指标分别达到87.83%、92.62%和78.08%,与基线ClipTNSeg模型相比均有显著提高。烧蚀实验验证了ECFE、EVFE和EPH模块的重要作用,证明了它们对提高模型分割精度的贡献。本研究有望促进甲状腺癌早期诊断技术的进一步发展,为临床诊断提供更加准确可靠的依据。我们的代码可在https://github.com/Acekang/GCAESegNet上获得。
{"title":"GCAESeg: Grouped Channel Attention Enhanced network for thyroid nodule segmentation in ultrasound images","authors":"Kang Kang ,&nbsp;Anruo Wei ,&nbsp;Xian-Xian Liu ,&nbsp;Hong Chen ,&nbsp;Jie Yang","doi":"10.1016/j.patrec.2025.10.008","DOIUrl":"10.1016/j.patrec.2025.10.008","url":null,"abstract":"<div><div>In this paper, we propose a novel Grouped Channel Attention Enhanced network (GCAESeg) architecture for thyroid nodules, which aims to improve segmentation accuracy and facilitate early diagnosis of thyroid cancer. This architecture integrates the cross-modal features of visual-linguistic models with enhanced visual processing mechanisms in order to achieve accurate semantic predictions. The GCAESeg network consists of four core components: Enhanced Clip Feature Extractor (ECFE), Enhanced Vision Feature Extractor (EVFE), Grouped Channel Attention (GCA) module and Enhanced Prediction Head (EPH) module. The ECFE uses the CLIP backbone network to extract multi-scale semantic features and incorporates an adaptive attention mechanism. The EVFE captures spatially detailed features through a U-shaped coding and decoding structure. The GCA module performs selective feature enhancement before feature fusion, and the EPH implements cross-modal enhancement and realizes the refined fusion of cross-modal features. Experimental results show that the GCAESeg model outperforms other compared models in several performance metrics, particularly when dealing with boundary-ambiguous and tiny lesions. On the PKTN dataset, the GCAESeg model achieved IoU, Dice and foreground IoU metrics of 87.83%,92.62% and 78.08%, respectively, all of which were significantly improved compared to the baseline ClipTNSeg model. Ablation experiments verified the important role of the ECFE, EVFE and EPH modules and proved their contribution to improving the model’s segmentation accuracy. This study is expected to promote the further development of early diagnosis technology for thyroid cancer and provide a more accurate and reliable basis for clinical diagnosis. Our code are available at <span><span>https://github.com/Acekang/GCAESegNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 108-114"},"PeriodicalIF":3.3,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145364077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1