Pub Date : 2025-10-30DOI: 10.1016/j.patrec.2025.10.017
Qun Li , Peng Gu , Xinping Gao , Bir Bhanu
Since anomalous events are much rarer than normal events in videos, current methods for Weakly Supervised Video Anomaly Detection (WSVAD) struggle to use both normal and abnormal data effectively, blurring the normality-abnormality boundary. To tackle this, we propose a novel two-step separation pipeline based on Threshold Shrinkage Memory network (TSMnet) for WSVAD. It mimics the human visual system to better understand video anomalies. We introduce a threshold shrinkage memory module that emulates the human brain’s memory, storing patterns and reducing normal memory redundancy via threshold-based shrinkage. A dual-branch contrastive learning module sharpens the normal-abnormal feature boundary for better classification. A global-to-local spatio-temporal adapter captures both global and local spatio-temporal information. Experimental results show that our method outperforms the state-of-the-art works.
{"title":"TSMnet: Two-step separation pipeline based on threshold shrinkage memory network for weakly-supervised video anomaly detection","authors":"Qun Li , Peng Gu , Xinping Gao , Bir Bhanu","doi":"10.1016/j.patrec.2025.10.017","DOIUrl":"10.1016/j.patrec.2025.10.017","url":null,"abstract":"<div><div>Since anomalous events are much rarer than normal events in videos, current methods for Weakly Supervised Video Anomaly Detection (WSVAD) struggle to use both normal and abnormal data effectively, blurring the normality-abnormality boundary. To tackle this, we propose a novel two-step separation pipeline based on Threshold Shrinkage Memory network (TSMnet) for WSVAD. It mimics the human visual system to better understand video anomalies. We introduce a threshold shrinkage memory module that emulates the human brain’s memory, storing patterns and reducing normal memory redundancy via threshold-based shrinkage. A dual-branch contrastive learning module sharpens the normal-abnormal feature boundary for better classification. A global-to-local spatio-temporal adapter captures both global and local spatio-temporal information. Experimental results show that our method outperforms the state-of-the-art works.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 13-20"},"PeriodicalIF":3.3,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145420254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-29DOI: 10.1016/j.patrec.2025.10.018
Bo Peng, Chao Ma, Yifan Chen, Mi Zhu, Ningsheng Liao
Most object detection models are trained under ideal lighting and weather conditions. However, when deployed in adverse weather conditions such as haze, rain, and snow, these models suffer from image quality degradation and target occlusion problems, leading to deteriorated detection performance. To address these challenges, this paper proposes MTW-DETR, a multi-task collaborative detection model that employs a dual-stream network architecture to achieve joint optimization of image restoration and object detection. The model enhances feature representation capabilities for low-quality images through a cross-task feature sharing mechanism and a feature enhancement module. Specifically, within the joint learning framework, we design three key components. First, a restoration subnetwork embedded with a Channel Pixel Attention module achieves fine-grained image restoration and adopts a dynamic feature calibration strategy, thereby improving degraded image quality. Furthermore, a Weight Space Reconstruction Module is integrated into the backbone network to enhance multi-scale feature representation capabilities. Finally, a Branch Shift Convolution Module is incorporated in the neck to improve global information extraction ability, enhance understanding of the overall image structure and feature representation. Experimental results demonstrate that on the real haze dataset RTTS, our model achieves 38% AP, representing a 3.7% improvement over the baseline model RT-DETR. In cross-domain evaluations on synthetic rain and fog datasets, the model shows significant accuracy improvements and exhibits excellent generalization ability across diverse weather scenarios.
{"title":"MTW-DETR: A multi-task collaborative optimization model for adverse weather object detection","authors":"Bo Peng, Chao Ma, Yifan Chen, Mi Zhu, Ningsheng Liao","doi":"10.1016/j.patrec.2025.10.018","DOIUrl":"10.1016/j.patrec.2025.10.018","url":null,"abstract":"<div><div>Most object detection models are trained under ideal lighting and weather conditions. However, when deployed in adverse weather conditions such as haze, rain, and snow, these models suffer from image quality degradation and target occlusion problems, leading to deteriorated detection performance. To address these challenges, this paper proposes MTW-DETR, a multi-task collaborative detection model that employs a dual-stream network architecture to achieve joint optimization of image restoration and object detection. The model enhances feature representation capabilities for low-quality images through a cross-task feature sharing mechanism and a feature enhancement module. Specifically, within the joint learning framework, we design three key components. First, a restoration subnetwork embedded with a Channel Pixel Attention module achieves fine-grained image restoration and adopts a dynamic feature calibration strategy, thereby improving degraded image quality. Furthermore, a Weight Space Reconstruction Module is integrated into the backbone network to enhance multi-scale feature representation capabilities. Finally, a Branch Shift Convolution Module is incorporated in the neck to improve global information extraction ability, enhance understanding of the overall image structure and feature representation. Experimental results demonstrate that on the real haze dataset RTTS, our model achieves 38% AP, representing a 3.7% improvement over the baseline model RT-DETR. In cross-domain evaluations on synthetic rain and fog datasets, the model shows significant accuracy improvements and exhibits excellent generalization ability across diverse weather scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 7-12"},"PeriodicalIF":3.3,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145384598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1016/j.patrec.2025.10.006
Zhuo-Ming Du , Hong-An Li , Qian Yu
Achieving color constancy is a critical yet challenging task, requiring the estimation of global illumination from a single RGB image to remove color casts caused by non-standard lighting. This paper introduces ADFNet (Adaptive Decomposition and Fusion Network), an end-to-end framework comprising two key modules: ADCL (Adaptive Decomposition and Coefficient Learning) and SWIP (Semantic Weighting for Illumination Prediction). ADCL decomposes the input image into three interpretable components (Mean Intensity, Variation Magnitude, and Variation Direction), while jointly learning adaptive weights and offsets for accurate recomposition. These components are fused into an HDR-like representation via an Adaptive Fusion Module. SWIP further refines this representation through semantic-aware weighting and predicts the global illumination using a lightweight convolutional network. Extensive experiments demonstrate that ADFNet achieves state-of-the-art accuracy and robustness, highlighting its potential for real-world applications such as photographic enhancement and vision-based perception systems.
{"title":"ADFNeT: Adaptive decomposition and fusion for color constancy","authors":"Zhuo-Ming Du , Hong-An Li , Qian Yu","doi":"10.1016/j.patrec.2025.10.006","DOIUrl":"10.1016/j.patrec.2025.10.006","url":null,"abstract":"<div><div>Achieving color constancy is a critical yet challenging task, requiring the estimation of global illumination from a single RGB image to remove color casts caused by non-standard lighting. This paper introduces ADFNet (Adaptive Decomposition and Fusion Network), an end-to-end framework comprising two key modules: ADCL (Adaptive Decomposition and Coefficient Learning) and SWIP (Semantic Weighting for Illumination Prediction). ADCL decomposes the input image into three interpretable components (Mean Intensity, Variation Magnitude, and Variation Direction), while jointly learning adaptive weights and offsets for accurate recomposition. These components are fused into an HDR-like representation via an Adaptive Fusion Module. SWIP further refines this representation through semantic-aware weighting and predicts the global illumination using a lightweight convolutional network. Extensive experiments demonstrate that ADFNet achieves state-of-the-art accuracy and robustness, highlighting its potential for real-world applications such as photographic enhancement and vision-based perception systems.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 1-6"},"PeriodicalIF":3.3,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145384597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1016/j.patrec.2025.10.014
Jurandy Almeida, Carla M.D.S. Freitas, Nicu Sebe, Alexandru C. Telea
{"title":"Foreword to the Special Section on SIBGRAPI 2024","authors":"Jurandy Almeida, Carla M.D.S. Freitas, Nicu Sebe, Alexandru C. Telea","doi":"10.1016/j.patrec.2025.10.014","DOIUrl":"10.1016/j.patrec.2025.10.014","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 123-124"},"PeriodicalIF":3.3,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-27DOI: 10.1016/j.patrec.2025.10.012
Thomas Mandl , Prasenjit Majumder
The Forum for Information Retrieval Evaluation (FIRE) is an evaluation initiative focused on resources for languages of India. FIRE 2024 was the 16th edition and comprised 10 evaluation tracks which ran as shared tasks. Three contributions showcase the research contributed to these evaluation tracks of the FIRE conference. The first contribution is related spoken language retrieval for six languages of India. The second paper deals with source code retrieval for the language C and employs LLMs for the task of generating good comments. The third contribution discusses performance patterns of classifiers for hate speech collections for languages of India. Furthermore, connections to the Pattern Recognition community are discussed.
{"title":"Editorial: Special Section Forum for Information Retrieval Evaluation (FIRE) 2024","authors":"Thomas Mandl , Prasenjit Majumder","doi":"10.1016/j.patrec.2025.10.012","DOIUrl":"10.1016/j.patrec.2025.10.012","url":null,"abstract":"<div><div>The Forum for Information Retrieval Evaluation (FIRE) is an evaluation initiative focused on resources for languages of India. FIRE 2024 was the 16th edition and comprised 10 evaluation tracks which ran as shared tasks. Three contributions showcase the research contributed to these evaluation tracks of the FIRE conference. The first contribution is related spoken language retrieval for six languages of India. The second paper deals with source code retrieval for the language C and employs LLMs for the task of generating good comments. The third contribution discusses performance patterns of classifiers for hate speech collections for languages of India. Furthermore, connections to the Pattern Recognition community are discussed.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 285-287"},"PeriodicalIF":3.3,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-27DOI: 10.1016/j.patrec.2025.10.015
Sujay Mudalgi, Anh Tuan Bui
Data splitting is imperative when building a statistical or machine learning model, among other use cases. To obtain statistically representative samples, numerous methods have been proposed. Twinning is the state-of-the-art method in this space. It is based on minimizing energy distance between the subsets and original dataset. However, the execution speed of Twinning is not desirable for large datasets. This article proposes scalable Twinning (s-Twinning) to improve data splitting speed while maintaining accuracy. The performance lift for larger datasets from s-Twinning over the state-of-the-art data splitting methods is demonstrated through real examples.
{"title":"Scalable data twinning","authors":"Sujay Mudalgi, Anh Tuan Bui","doi":"10.1016/j.patrec.2025.10.015","DOIUrl":"10.1016/j.patrec.2025.10.015","url":null,"abstract":"<div><div>Data splitting is imperative when building a statistical or machine learning model, among other use cases. To obtain statistically representative samples, numerous methods have been proposed. Twinning is the state-of-the-art method in this space. It is based on minimizing energy distance between the subsets and original dataset. However, the execution speed of Twinning is not desirable for large datasets. This article proposes scalable Twinning (<em>s-</em>Twinning) to improve data splitting speed while maintaining accuracy. The performance lift for larger datasets from <em>s-</em>Twinning over the state-of-the-art data splitting methods is demonstrated through real examples.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 34-39"},"PeriodicalIF":3.3,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-22DOI: 10.1016/j.patrec.2025.10.010
Dolly Uppal, Surya Prakash
Accurate segmentation of brain tumors plays a vital role in diagnosis and clinical evaluation. Diffusion models have emerged as a promising approach in medical image segmentation, due to their ability to generate high-quality representations. Existing diffusion-based approaches exhibit limited integration of multimodal images across multiple scales, and effectively eliminating noise interference in brain tumor images remains a major limitation in segmentation tasks. Further, these methods face challenges in effectively balancing global and local feature extraction and integration, specifically in multi-label segmentation tasks. To address these challenges, we propose a Multi-scale Dual Attention Guided Diffusion Model, named MS2ADM-BTS, tailored for Volumetric Brain Tumor Segmentation in multimodal Magnetic Resonance Imaging (MRI) images. It consists of a Context-Aware Feature (CxAF) encoder and a Dual-Stage Multi-Scale Feature Fusion (DS-MSFF) denoising network that learns the denoising process to generate multi-label segmentation predictions. Further, the DS-MSFF denoising network includes an Attention-Guided Cross-Scale Feature Fusion (AGCS-FF) module that effectively models long-range dependencies in high-resolution feature maps and enhances feature representation and reconstruction quality. In addition, we introduce a novel inference-time sampling procedure that incorporates a Spectral-Guided Noise Initialization mechanism to mitigate the training-inference gap and Uncertainty-Guided Diffusion Sampling to provide robust segmentation outcomes. We evaluate the efficacy of the proposed approach using the benchmark datasets from the 2020 Multimodal Brain Tumor Segmentation (BraTS) Challenge and the Medical Segmentation Decathlon (MSD) BraTS dataset. The results show that the proposed approach outperforms existing state-of-the-art methods due to its effective denoising capability. The code is available at https://github.com/Dolly-Uppal/MS2ADM-BTS.
{"title":"MS2ADM-BTS: Multi-scale Dual Attention Guided Diffusion Model for Volumetric Brain Tumor Segmentation","authors":"Dolly Uppal, Surya Prakash","doi":"10.1016/j.patrec.2025.10.010","DOIUrl":"10.1016/j.patrec.2025.10.010","url":null,"abstract":"<div><div>Accurate segmentation of brain tumors plays a vital role in diagnosis and clinical evaluation. Diffusion models have emerged as a promising approach in medical image segmentation, due to their ability to generate high-quality representations. Existing diffusion-based approaches exhibit limited integration of multimodal images across multiple scales, and effectively eliminating noise interference in brain tumor images remains a major limitation in segmentation tasks. Further, these methods face challenges in effectively balancing global and local feature extraction and integration, specifically in multi-label segmentation tasks. To address these challenges, we propose a Multi-scale Dual Attention Guided Diffusion Model, named MS2ADM-BTS, tailored for Volumetric Brain Tumor Segmentation in multimodal Magnetic Resonance Imaging (MRI) images. It consists of a Context-Aware Feature (CxAF) encoder and a Dual-Stage Multi-Scale Feature Fusion (DS-MSFF) denoising network that learns the denoising process to generate multi-label segmentation predictions. Further, the DS-MSFF denoising network includes an Attention-Guided Cross-Scale Feature Fusion (AGCS-FF) module that effectively models long-range dependencies in high-resolution feature maps and enhances feature representation and reconstruction quality. In addition, we introduce a novel inference-time sampling procedure that incorporates a Spectral-Guided Noise Initialization mechanism to mitigate the training-inference gap and Uncertainty-Guided Diffusion Sampling to provide robust segmentation outcomes. We evaluate the efficacy of the proposed approach using the benchmark datasets from the 2020 Multimodal Brain Tumor Segmentation (BraTS) Challenge and the Medical Segmentation Decathlon (MSD) BraTS dataset. The results show that the proposed approach outperforms existing state-of-the-art methods due to its effective denoising capability. The code is available at <span><span>https://github.com/Dolly-Uppal/MS2ADM-BTS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 115-122"},"PeriodicalIF":3.3,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21DOI: 10.1016/j.patrec.2025.10.011
Liang Shang, Zhengyang Lou, William A. Sethares, Andrew L. Alexander, Vivek Prabhakaran, Veena A. Nair, Nagesh Adluru
Accurate segmentation of small brain lesions in magnetic resonance imaging (MRI) is essential for understanding neurological disorders and guiding clinical decisions. However, detecting small lesions remains challenging due to low contrast and limited size. This study proposes two simple yet effective labeling strategies, Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), that can seamlessly integrate into existing segmentation networks. MSL groups lesions based on volume to enable size-aware learning, while DBL emphasizes lesion boundaries to enhance structural sensitivity. We evaluate our approach on two benchmark datasets: stroke lesion segmentation using the Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0 dataset and multiple sclerosis lesion segmentation using the Multiple Sclerosis Lesion Segmentation (MSLesSeg) dataset. On ATLAS v2.0, our approach achieved higher Dice (+1.3%), F1 (+2.4%), precision (+7.2%), and recall (+3.6%) scores compared to the top-performing method from a previous challenge. On MSLesSeg, our approach achieved the highest Dice score (0.7146) and ranked first among 16 international teams. Additionally, we examined the effectiveness of attention-based and mamba-based segmentation models but found that our proposed labeling strategies yielded more consistent improvements. These findings demonstrate that MSL and DBL offer a robust and generalizable solution for enhancing small brain lesion segmentation across various tasks and architectures. Our code is available at: https://github.com/nadluru/StrokeLesSeg.
{"title":"Plug and play labeling strategies for boosting small brain lesion segmentation","authors":"Liang Shang, Zhengyang Lou, William A. Sethares, Andrew L. Alexander, Vivek Prabhakaran, Veena A. Nair, Nagesh Adluru","doi":"10.1016/j.patrec.2025.10.011","DOIUrl":"10.1016/j.patrec.2025.10.011","url":null,"abstract":"<div><div>Accurate segmentation of small brain lesions in magnetic resonance imaging (MRI) is essential for understanding neurological disorders and guiding clinical decisions. However, detecting small lesions remains challenging due to low contrast and limited size. This study proposes two simple yet effective labeling strategies, Multi-Size Labeling (MSL) and Distance-Based Labeling (DBL), that can seamlessly integrate into existing segmentation networks. MSL groups lesions based on volume to enable size-aware learning, while DBL emphasizes lesion boundaries to enhance structural sensitivity. We evaluate our approach on two benchmark datasets: stroke lesion segmentation using the Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0 dataset and multiple sclerosis lesion segmentation using the Multiple Sclerosis Lesion Segmentation (MSLesSeg) dataset. On ATLAS v2.0, our approach achieved higher Dice (+1.3%), F1 (+2.4%), precision (+7.2%), and recall (+3.6%) scores compared to the top-performing method from a previous challenge. On MSLesSeg, our approach achieved the highest Dice score (0.7146) and ranked first among 16 international teams. Additionally, we examined the effectiveness of attention-based and mamba-based segmentation models but found that our proposed labeling strategies yielded more consistent improvements. These findings demonstrate that MSL and DBL offer a robust and generalizable solution for enhancing small brain lesion segmentation across various tasks and architectures. Our code is available at: <span><span>https://github.com/nadluru/StrokeLesSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 90-97"},"PeriodicalIF":3.3,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13DOI: 10.1016/j.patrec.2025.10.009
Wei Chen, Wenxu Yan, Wenyuan Wang
Adaptive aggregation in deep graph neural networks (DGNNs) enhances node representation by dynamically weighting neighbors at varying distances. However, existing methods often increase complexity and reduce interpretability. This paper proposes SAAG, a self-attention aggregation mechanism that effectively captures relationships across different-hop neighbors with fewer parameters and an expanded receptive field. Extensive experiments on node classification benchmarks demonstrate that SAAG-DGNN consistently outperforms state-of-the-art methods in accuracy.
{"title":"Deep graph neural network architecture enhanced by self-attention aggregation mechanism","authors":"Wei Chen, Wenxu Yan, Wenyuan Wang","doi":"10.1016/j.patrec.2025.10.009","DOIUrl":"10.1016/j.patrec.2025.10.009","url":null,"abstract":"<div><div>Adaptive aggregation in deep graph neural networks (DGNNs) enhances node representation by dynamically weighting neighbors at varying distances. However, existing methods often increase complexity and reduce interpretability. This paper proposes SAAG, a self-attention aggregation mechanism that effectively captures relationships across different-hop neighbors with fewer parameters and an expanded receptive field. Extensive experiments on node classification benchmarks demonstrate that SAAG-DGNN consistently outperforms state-of-the-art methods in accuracy.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 101-107"},"PeriodicalIF":3.3,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145364078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-11DOI: 10.1016/j.patrec.2025.10.008
Kang Kang , Anruo Wei , Xian-Xian Liu , Hong Chen , Jie Yang
In this paper, we propose a novel Grouped Channel Attention Enhanced network (GCAESeg) architecture for thyroid nodules, which aims to improve segmentation accuracy and facilitate early diagnosis of thyroid cancer. This architecture integrates the cross-modal features of visual-linguistic models with enhanced visual processing mechanisms in order to achieve accurate semantic predictions. The GCAESeg network consists of four core components: Enhanced Clip Feature Extractor (ECFE), Enhanced Vision Feature Extractor (EVFE), Grouped Channel Attention (GCA) module and Enhanced Prediction Head (EPH) module. The ECFE uses the CLIP backbone network to extract multi-scale semantic features and incorporates an adaptive attention mechanism. The EVFE captures spatially detailed features through a U-shaped coding and decoding structure. The GCA module performs selective feature enhancement before feature fusion, and the EPH implements cross-modal enhancement and realizes the refined fusion of cross-modal features. Experimental results show that the GCAESeg model outperforms other compared models in several performance metrics, particularly when dealing with boundary-ambiguous and tiny lesions. On the PKTN dataset, the GCAESeg model achieved IoU, Dice and foreground IoU metrics of 87.83%,92.62% and 78.08%, respectively, all of which were significantly improved compared to the baseline ClipTNSeg model. Ablation experiments verified the important role of the ECFE, EVFE and EPH modules and proved their contribution to improving the model’s segmentation accuracy. This study is expected to promote the further development of early diagnosis technology for thyroid cancer and provide a more accurate and reliable basis for clinical diagnosis. Our code are available at https://github.com/Acekang/GCAESegNet.
{"title":"GCAESeg: Grouped Channel Attention Enhanced network for thyroid nodule segmentation in ultrasound images","authors":"Kang Kang , Anruo Wei , Xian-Xian Liu , Hong Chen , Jie Yang","doi":"10.1016/j.patrec.2025.10.008","DOIUrl":"10.1016/j.patrec.2025.10.008","url":null,"abstract":"<div><div>In this paper, we propose a novel Grouped Channel Attention Enhanced network (GCAESeg) architecture for thyroid nodules, which aims to improve segmentation accuracy and facilitate early diagnosis of thyroid cancer. This architecture integrates the cross-modal features of visual-linguistic models with enhanced visual processing mechanisms in order to achieve accurate semantic predictions. The GCAESeg network consists of four core components: Enhanced Clip Feature Extractor (ECFE), Enhanced Vision Feature Extractor (EVFE), Grouped Channel Attention (GCA) module and Enhanced Prediction Head (EPH) module. The ECFE uses the CLIP backbone network to extract multi-scale semantic features and incorporates an adaptive attention mechanism. The EVFE captures spatially detailed features through a U-shaped coding and decoding structure. The GCA module performs selective feature enhancement before feature fusion, and the EPH implements cross-modal enhancement and realizes the refined fusion of cross-modal features. Experimental results show that the GCAESeg model outperforms other compared models in several performance metrics, particularly when dealing with boundary-ambiguous and tiny lesions. On the PKTN dataset, the GCAESeg model achieved IoU, Dice and foreground IoU metrics of 87.83%,92.62% and 78.08%, respectively, all of which were significantly improved compared to the baseline ClipTNSeg model. Ablation experiments verified the important role of the ECFE, EVFE and EPH modules and proved their contribution to improving the model’s segmentation accuracy. This study is expected to promote the further development of early diagnosis technology for thyroid cancer and provide a more accurate and reliable basis for clinical diagnosis. Our code are available at <span><span>https://github.com/Acekang/GCAESegNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 108-114"},"PeriodicalIF":3.3,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145364077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}