Pub Date : 2026-03-15Epub Date: 2025-12-31DOI: 10.1016/j.dsp.2025.105854
Pengqiang Ge , Shuqing Cao , Xiaofang Kong , Guirong Weng , Guohua Gu , Qian Chen , Minjie Wan
Infrared (IR) image segmentation plays a vital role in various applications such as marine search and rescue, and military surveillance. Active contour model (ACM) is a commonly used tool for image segmentation due to its capability to accurately delineate object boundaries. However, most existing ACMs rely on local data-driven force (LDDF) and ignore global data-driven force (GDDF), causing segmentation errors while handling IR images with gray-level non-uniformity. Furthermore, the local fitting functions are repeatedly updated during level set evolution (LSE), rendering it computationally expensive. To resolve these problems, a hybrid ACM driven by multiplicative bias field correction (MBFC) for IR image segmentation is proposed. Firstly, the multi-feature (MF) GDDF utilizes the global averages of gray-level, roughness, and gradient features to prevent local minima trapping. Next, the MF local fitting functions are pre-estimated before LSE to reduce convolutions. After that, an adaptive weight function (AWF) is especially designed to fuse MF GDDF and MF LDDF properly. During LSE, the evolution curve is smoothed and shortened using average filtering. Meanwhile, the range of level set function (LSF) is normalized. Lastly, the zero level set converges near the actual target edge through finite iterations. Compared with recently developed ACMs and deep learning-based models for segmenting IR images, the segmentation accuracy, including intersection over union (IoU) and dice similarity coefficient (DSC), demonstrates obvious superiority on average and exhibits potential for generalization to the Berkeley segmentation dataset 500 (BSDS500).
{"title":"Multiplicative bias field correction-based hybrid active contour model for infrared image segmentation","authors":"Pengqiang Ge , Shuqing Cao , Xiaofang Kong , Guirong Weng , Guohua Gu , Qian Chen , Minjie Wan","doi":"10.1016/j.dsp.2025.105854","DOIUrl":"10.1016/j.dsp.2025.105854","url":null,"abstract":"<div><div>Infrared (IR) image segmentation plays a vital role in various applications such as marine search and rescue, and military surveillance. Active contour model (ACM) is a commonly used tool for image segmentation due to its capability to accurately delineate object boundaries. However, most existing ACMs rely on local data-driven force (LDDF) and ignore global data-driven force (GDDF), causing segmentation errors while handling IR images with gray-level non-uniformity. Furthermore, the local fitting functions are repeatedly updated during level set evolution (LSE), rendering it computationally expensive. To resolve these problems, a hybrid ACM driven by multiplicative bias field correction (MBFC) for IR image segmentation is proposed. Firstly, the multi-feature (MF) GDDF utilizes the global averages of gray-level, roughness, and gradient features to prevent local minima trapping. Next, the MF local fitting functions are pre-estimated before LSE to reduce convolutions. After that, an adaptive weight function (AWF) is especially designed to fuse MF GDDF and MF LDDF properly. During LSE, the evolution curve is smoothed and shortened using average filtering. Meanwhile, the range of level set function (LSF) is normalized. Lastly, the zero level set converges near the actual target edge through finite iterations. Compared with recently developed ACMs and deep learning-based models for segmenting IR images, the segmentation accuracy, including intersection over union (IoU) and dice similarity coefficient (DSC), demonstrates obvious superiority on average and exhibits potential for generalization to the Berkeley segmentation dataset 500 (BSDS500).</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105854"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-19DOI: 10.1016/j.dsp.2025.105837
Naseem Alsadi , John Yawney , Mohammad Alshabi , S. Andrew Gadsden
Traditional state estimation methods often become unreliable in the presence of measurement anomalies, abrupt disturbances, and nonlinear dynamics. Such conditions are ubiquitous in high-stakes operational settings, including air traffic surveillance, autonomous systems, and advanced manufacturing. These challenges expose an enduring methodological gap: the inability to ensure both strong robustness to uncertainty and stable, continuous correction behaviour. This paper aims to address these limitations by developing estimation methods that maintain stability while adapting intelligently to uncertainty. To this end, we introduce the Sliding Sigmoid Filter (SSF), a novel estimator that combines sliding-mode robustness with a continuous sigmoid-based gain function, and further extend it to the Adaptive Sliding Sigmoid Filter (ASSF), which adjusts its gain online using recent innovation statistics for fault detection and adaptive correction. Using linear and nonlinear simulation benchmarks together with a full experimental pipeline involving physics-informed neural network parameter identification and SSF-based state estimation for a magnetorheological damper, we evaluate the performance of the proposed filters against classical methods. The results show that SSF and ASSF significantly reduce estimation error, attenuate outliers more smoothly than threshold-based approaches, and provide faster recovery under measurement faults. Overall, the findings demonstrate that the proposed filters offer a practical and theoretically grounded alternative for robust state estimation in uncertain and fault-prone environments.
{"title":"The sliding sigmoid filter","authors":"Naseem Alsadi , John Yawney , Mohammad Alshabi , S. Andrew Gadsden","doi":"10.1016/j.dsp.2025.105837","DOIUrl":"10.1016/j.dsp.2025.105837","url":null,"abstract":"<div><div>Traditional state estimation methods often become unreliable in the presence of measurement anomalies, abrupt disturbances, and nonlinear dynamics. Such conditions are ubiquitous in high-stakes operational settings, including air traffic surveillance, autonomous systems, and advanced manufacturing. These challenges expose an enduring methodological gap: the inability to ensure both strong robustness to uncertainty and stable, continuous correction behaviour. This paper aims to address these limitations by developing estimation methods that maintain stability while adapting intelligently to uncertainty. To this end, we introduce the Sliding Sigmoid Filter (SSF), a novel estimator that combines sliding-mode robustness with a continuous sigmoid-based gain function, and further extend it to the Adaptive Sliding Sigmoid Filter (ASSF), which adjusts its gain online using recent innovation statistics for fault detection and adaptive correction. Using linear and nonlinear simulation benchmarks together with a full experimental pipeline involving physics-informed neural network parameter identification and SSF-based state estimation for a magnetorheological damper, we evaluate the performance of the proposed filters against classical methods. The results show that SSF and ASSF significantly reduce estimation error, attenuate outliers more smoothly than threshold-based approaches, and provide faster recovery under measurement faults. Overall, the findings demonstrate that the proposed filters offer a practical and theoretically grounded alternative for robust state estimation in uncertain and fault-prone environments.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105837"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-30DOI: 10.1016/j.dsp.2025.105875
Nikhil S. Shinde , Tejas Sharma , Karthik R
Fire and smoke pose serious risks to human life, infrastructure, and the ecosystem. The increasing prevalence of wildfires and urban fire incidents has resulted in significant loss of life, property damage, and environmental decline. Conventional sensor-based detection systems exhibit critical inadequacies. These include delayed response times, false alarms, and restricted coverage. Such limitations underscore the need for exploration of advanced deep learning techniques. This research proposes a novel deep learning architecture for fire and smoke detection. To the best of our knowledge, this is the first attempt to integrate a dual-track backbone into an object detection framework specifically for fire and smoke detection. It combines a Convolutional Neural Network (CNN) track that extracts low-level features and a Swin transformer track that extracts global features. The CNN track excels at capturing detailed spatial patterns. The Swin transformer track captures hierarchical contextual relationships across the image. Feature maps from both tracks are processed through a Spatial Pyramid Pooling Fast (SPPF) block to enhance multi-scale representation. The backbone concatenates feature maps at three distinct scales. These feature maps are refined using Efficient Channel Attention (ECA), which enhances channel-wise feature representations with minimal computational overhead. The refined features are fused in the neck via a Bidirectional Feature Pyramid Network (BiFPN) to enhance multi-scale representation for robust detection. The head employs a decoupled design to generate final predictions for accurate fire and smoke detection. The proposed network was trained on the DFire dataset and obtained a mean Average Precision ([email protected]) of 81.0%. To evaluate the performance and generalizability of the proposed network on external datasets, it was tested on the DFS and Indoor Fire and Smoke Computer Vision datasets, achieving mean average precision values of 79.8% and 91.8%, respectively. These observations indicate that the network can effectively detect fire- and smoke-related patterns.
{"title":"A dual track YOLO-based network with multi-scale feature fusion for fire and smoke detection","authors":"Nikhil S. Shinde , Tejas Sharma , Karthik R","doi":"10.1016/j.dsp.2025.105875","DOIUrl":"10.1016/j.dsp.2025.105875","url":null,"abstract":"<div><div>Fire and smoke pose serious risks to human life, infrastructure, and the ecosystem. The increasing prevalence of wildfires and urban fire incidents has resulted in significant loss of life, property damage, and environmental decline. Conventional sensor-based detection systems exhibit critical inadequacies. These include delayed response times, false alarms, and restricted coverage. Such limitations underscore the need for exploration of advanced deep learning techniques. This research proposes a novel deep learning architecture for fire and smoke detection. To the best of our knowledge, this is the first attempt to integrate a dual-track backbone into an object detection framework specifically for fire and smoke detection. It combines a Convolutional Neural Network (CNN) track that extracts low-level features and a Swin transformer track that extracts global features. The CNN track excels at capturing detailed spatial patterns. The Swin transformer track captures hierarchical contextual relationships across the image. Feature maps from both tracks are processed through a Spatial Pyramid Pooling Fast (SPPF) block to enhance multi-scale representation. The backbone concatenates feature maps at three distinct scales. These feature maps are refined using Efficient Channel Attention (ECA), which enhances channel-wise feature representations with minimal computational overhead. The refined features are fused in the neck via a Bidirectional Feature Pyramid Network (BiFPN) to enhance multi-scale representation for robust detection. The head employs a decoupled design to generate final predictions for accurate fire and smoke detection. The proposed network was trained on the DFire dataset and obtained a mean Average Precision ([email protected]) of 81.0%. To evaluate the performance and generalizability of the proposed network on external datasets, it was tested on the DFS and Indoor Fire and Smoke Computer Vision datasets, achieving mean average precision values of 79.8% and 91.8%, respectively. These observations indicate that the network can effectively detect fire- and smoke-related patterns.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105875"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-27DOI: 10.1016/j.dsp.2025.105863
Guimin Lin , Gangqin Xi , Chen Huang , Zuoyong Li
Multiphoton microscopy (MPM) enables high-resolution imaging of tissue microstructures, while superior specificity for identifying nuclear atypia is provided by hematoxylin and eosin (H&E) staining. These complementary advantages are leveraged through multimodal analysis (MPM combined with H&E) to achieve a more comprehensive characterization of pathological features (e.g., structural, compositional, and cellular abnormalities), thereby enriching diagnostic and mechanistic information for disease investigation. However, MPM imaging systems are prohibitively expensive, and paired MPM-H&E image datasets remain scarce. To address this, an unsupervised cross-modal medical image translation framework named CMGAN is proposed, based on CycleGAN architecture, which facilitates bidirectional translation between H&E and MPM image modalities. Cycle-consistency constraints are employed to enable training on unpaired datasets. Building upon this foundation, two regularization methods are introduced: deep feature consistency and salient component consistency, through which the generator is guided to synthesize realistic and reliable target-domain images. Furthermore, to capture direct associations in image distributions, a directional discriminator is incorporated during training to enhance recognition of inter-modal relationships. In MPM-to-H&E and H&E-to-MPM translation tasks, CMGAN is compared against state-of-the-art GAN models. Experimental results demonstrate superior performance of CMGAN over benchmark methods in both quantitative metrics and qualitative evaluations.
{"title":"Unsupervised medical image mapping between multiphoton microscopy and hematoxylin-eosin staining via an enhanced CycleGAN","authors":"Guimin Lin , Gangqin Xi , Chen Huang , Zuoyong Li","doi":"10.1016/j.dsp.2025.105863","DOIUrl":"10.1016/j.dsp.2025.105863","url":null,"abstract":"<div><div>Multiphoton microscopy (MPM) enables high-resolution imaging of tissue microstructures, while superior specificity for identifying nuclear atypia is provided by hematoxylin and eosin (H&E) staining. These complementary advantages are leveraged through multimodal analysis (MPM combined with H&E) to achieve a more comprehensive characterization of pathological features (e.g., structural, compositional, and cellular abnormalities), thereby enriching diagnostic and mechanistic information for disease investigation. However, MPM imaging systems are prohibitively expensive, and paired MPM-H&E image datasets remain scarce. To address this, an unsupervised cross-modal medical image translation framework named CMGAN is proposed, based on CycleGAN architecture, which facilitates bidirectional translation between H&E and MPM image modalities. Cycle-consistency constraints are employed to enable training on unpaired datasets. Building upon this foundation, two regularization methods are introduced: deep feature consistency and salient component consistency, through which the generator is guided to synthesize realistic and reliable target-domain images. Furthermore, to capture direct associations in image distributions, a directional discriminator is incorporated during training to enhance recognition of inter-modal relationships. In MPM-to-H&E and H&E-to-MPM translation tasks, CMGAN is compared against state-of-the-art GAN models. Experimental results demonstrate superior performance of CMGAN over benchmark methods in both quantitative metrics and qualitative evaluations.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105863"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-29DOI: 10.1016/j.dsp.2025.105856
Min Wang, Liang Zhong, Ran Zhang, Shuyi Zhang
Multichannel image denoising under mixture noise remains a significant challenge due to the complex noise distributions and the need to preserve fine structural details. Traditional methods, such as Total Variation (TV) based methods, often rely on the sparsity of first-order gradients, which may not hold for real-world images rich in textures and edges, leading to oversmoothing and detail loss. In this paper, we propose a novel self-supervised denoising framework that leverages neural gradient learning and Spectral-Spatial Total Variation (SSTV) regularization to effectively handle mixture noise in multichannel images. The framework consists of two cooperative networks: one generates the denoised image by exploiting deep image priors, while the other predicts the corresponding gradient map to better capture edge and structure information. Unlike traditional TV-based methods, our approach learns gradient representations directly from noisy data and constrains the second-order gradients via SSTV to model their inherent sparsity. The entire framework is trained without clean references, making it highly adaptable to real-world applications. Extensive experiments on various multichannel datasets demonstrate that our method outperforms existing approaches in both quantitative metrics and visual quality under diverse noise conditions.
{"title":"Self-supervised denoising of multichannel images with mixture noise via neural gradient learning and spectral-spatial total variation regularization","authors":"Min Wang, Liang Zhong, Ran Zhang, Shuyi Zhang","doi":"10.1016/j.dsp.2025.105856","DOIUrl":"10.1016/j.dsp.2025.105856","url":null,"abstract":"<div><div>Multichannel image denoising under mixture noise remains a significant challenge due to the complex noise distributions and the need to preserve fine structural details. Traditional methods, such as Total Variation (TV) based methods, often rely on the sparsity of first-order gradients, which may not hold for real-world images rich in textures and edges, leading to oversmoothing and detail loss. In this paper, we propose a novel self-supervised denoising framework that leverages neural gradient learning and Spectral-Spatial Total Variation (SSTV) regularization to effectively handle mixture noise in multichannel images. The framework consists of two cooperative networks: one generates the denoised image by exploiting deep image priors, while the other predicts the corresponding gradient map to better capture edge and structure information. Unlike traditional TV-based methods, our approach learns gradient representations directly from noisy data and constrains the second-order gradients via SSTV to model their inherent sparsity. The entire framework is trained without clean references, making it highly adaptable to real-world applications. Extensive experiments on various multichannel datasets demonstrate that our method outperforms existing approaches in both quantitative metrics and visual quality under diverse noise conditions.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105856"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-06DOI: 10.1016/j.dsp.2026.105891
Hao Fang , Yu Sun , Shuai Zhang , Xuyang Teng , Xiaohui Li , Xiaodong Yu
Medical imaging advancements have made dermoscopic images a critical tool in clinical diagnosis. However, segmenting skin lesions remains challenging due to blurred boundaries, low contrast with healthy tissue, and interference from hair and vasculature. To overcome these challenges, we propose HSPF-Net, a novel series-parallel hybrid network that combines the strengths of CNN and Transformer architectures for precise lesion segmentation. We propose a Multi-Receptive Field Fusion Module (MRFF) that performs dual-branch feature fusion by computing attention across features extracted from multiple receptive fields. Furthermore, a Fine-Grained Spatial-Channel Attention Gate (FG-SCAG) is designed to dynamically suppress irrelevant information and enhance feature representation. Experiments demonstrate that HSPF-Net handles artefacts such as hair occlusion, illumination noise, and irregular lesion shapes. Evaluated on three public datasets-ISIC2017, ISIC2018, and PH². Our model achieves state-of-the-art performance, significantly improving segmentation accuracy in the Dice coefficient and IoU compared to existing methods.
{"title":"HSPF-Net: Hybrid CNN-transformer with serial-parallel fusion for skin lesion segmentation","authors":"Hao Fang , Yu Sun , Shuai Zhang , Xuyang Teng , Xiaohui Li , Xiaodong Yu","doi":"10.1016/j.dsp.2026.105891","DOIUrl":"10.1016/j.dsp.2026.105891","url":null,"abstract":"<div><div>Medical imaging advancements have made dermoscopic images a critical tool in clinical diagnosis. However, segmenting skin lesions remains challenging due to blurred boundaries, low contrast with healthy tissue, and interference from hair and vasculature. To overcome these challenges, we propose HSPF-Net, a novel series-parallel hybrid network that combines the strengths of CNN and Transformer architectures for precise lesion segmentation. We propose a Multi-Receptive Field Fusion Module (MRFF) that performs dual-branch feature fusion by computing attention across features extracted from multiple receptive fields. Furthermore, a Fine-Grained Spatial-Channel Attention Gate (FG-SCAG) is designed to dynamically suppress irrelevant information and enhance feature representation. Experiments demonstrate that HSPF-Net handles artefacts such as hair occlusion, illumination noise, and irregular lesion shapes. Evaluated on three public datasets-ISIC2017, ISIC2018, and PH². Our model achieves state-of-the-art performance, significantly improving segmentation accuracy in the Dice coefficient and IoU compared to existing methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105891"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-27DOI: 10.1016/j.dsp.2025.105850
Changhui Ding , Haiyan Li , Yajie Liu , Bingbing He , Xun Lang , Guanbo Wang
Problem
Remote sensing-based forest fire detection is critical for early warning and rapid response, yet existing methods struggle with detecting multi-scale fire instances, modeling long-range contextual dependencies, and effectively fusing hierarchical features-particularly in complex natural environments with varying illumination, terrain, and smoke-flame mixtures.
Aim
To address these challenges, this work aims to develop a robust and efficient deep learning framework that enhances multi-scale representation, strengthens contextual correlation, and optimizes feature fusion for improved detection accuracy and real-time applicability.
Method
We propose a Multi-Scale Enhanced Contextual Transformer Network (MECT-Net), a novel architecture integrating convolutional and Transformer-based components. First, the Contextual Transformer with Scale Attention (CTSA) module combines a Multi-Scale Cross-Channel Fusion (MSCCF) block with an Enhanced Neighborhood-Aware Transformer (ENATM) to simultaneously capture local details and global context. Second, the Feature Memory Augmentation Network (FMAN) leverages Multi-Scale Group Convolution (MGC) and hybrid attention (SE and CBAM) to model long-range channel dependencies and refine multi-scale features. Third, the Multi-Scale Enhancement Feature Pyramid Network (MSE-FPN) enables bidirectional feature propagation and aggregation for balanced fine-grained and semantic learning. To support training and evaluation, a hybrid dataset combining synthetic and real-world fire imagery is constructed.
Results
Extensive experiments conducted on the proposed benchmark dataset demonstrate that MECT-Net achieves state-of-the-art detection performance. Specifically, MECT-Net (n) achieves a mAP50fire of 91.1%, while MECT-Net (s) attains a comparable mAP50fire of 91.0%, with superior performance across multiple evaluation metrics compared to the majority of mainstream one-stage object detectors. Notably, despite its competitive accuracy, MECT-Net exhibits significantly reduced model complexity-its parameter count is substantially lower than that of architectures with similar performance. Furthermore, it maintains a high inference speed on NVIDIA RTX 4070 Ti, confirming its efficiency and suitability for real-time deployment in resource-constrained environments.
Conclusion
MECT-Net provides an effective and deployable solution for real-time forest fire detection in aerial remote sensing, advancing the integration of hybrid neural architectures for visual anomaly detection. The proposed modules and hybrid dataset offer valuable resources for future research in wildfire monitoring.
{"title":"Multi-scale enhanced contextual transformer network for forest fire detection","authors":"Changhui Ding , Haiyan Li , Yajie Liu , Bingbing He , Xun Lang , Guanbo Wang","doi":"10.1016/j.dsp.2025.105850","DOIUrl":"10.1016/j.dsp.2025.105850","url":null,"abstract":"<div><h3>Problem</h3><div>Remote sensing-based forest fire detection is critical for early warning and rapid response, yet existing methods struggle with detecting multi-scale fire instances, modeling long-range contextual dependencies, and effectively fusing hierarchical features-particularly in complex natural environments with varying illumination, terrain, and smoke-flame mixtures.</div></div><div><h3>Aim</h3><div>To address these challenges, this work aims to develop a robust and efficient deep learning framework that enhances multi-scale representation, strengthens contextual correlation, and optimizes feature fusion for improved detection accuracy and real-time applicability.</div></div><div><h3>Method</h3><div>We propose a Multi-Scale Enhanced Contextual Transformer Network (MECT-Net), a novel architecture integrating convolutional and Transformer-based components. First, the Contextual Transformer with Scale Attention (CTSA) module combines a Multi-Scale Cross-Channel Fusion (MSCCF) block with an Enhanced Neighborhood-Aware Transformer (ENATM) to simultaneously capture local details and global context. Second, the Feature Memory Augmentation Network (FMAN) leverages Multi-Scale Group Convolution (MGC) and hybrid attention (SE and CBAM) to model long-range channel dependencies and refine multi-scale features. Third, the Multi-Scale Enhancement Feature Pyramid Network (MSE-FPN) enables bidirectional feature propagation and aggregation for balanced fine-grained and semantic learning. To support training and evaluation, a hybrid dataset combining synthetic and real-world fire imagery is constructed.</div></div><div><h3>Results</h3><div>Extensive experiments conducted on the proposed benchmark dataset demonstrate that MECT-Net achieves state-of-the-art detection performance. Specifically, MECT-Net (n) achieves a mAP<sub>50</sub><sup>fire</sup> of 91.1%, while MECT-Net (s) attains a comparable mAP<sub>50</sub><sup>fire</sup> of 91.0%, with superior performance across multiple evaluation metrics compared to the majority of mainstream one-stage object detectors. Notably, despite its competitive accuracy, MECT-Net exhibits significantly reduced model complexity-its parameter count is substantially lower than that of architectures with similar performance. Furthermore, it maintains a high inference speed on NVIDIA RTX 4070 Ti, confirming its efficiency and suitability for real-time deployment in resource-constrained environments.</div></div><div><h3>Conclusion</h3><div>MECT-Net provides an effective and deployable solution for real-time forest fire detection in aerial remote sensing, advancing the integration of hybrid neural architectures for visual anomaly detection. The proposed modules and hybrid dataset offer valuable resources for future research in wildfire monitoring.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105850"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-31DOI: 10.1016/j.dsp.2025.105879
Fengyun Li , Lanping Xu , Zhendi Ma, Yuxin Zhao, Xiaobo Li
Early detection of colorectal polyps is critical for effective screening and prevention of colorectal cancer. However, accurate segmentation remains challenging due to blurred boundaries between polyps and the rectal wall, along with low contrast, which reduce the reliability of shallow features. Additionally, uneven illumination and background noise further degrade model performance by obscuring key region identification. Existing methods also exhibit limitations in multi-scale feature fusion and long-range dependency modeling, leading to suboptimal global structure and boundary delineation—particularly for small polyps. To address these challenges, we propose APU-Net, a novel U-Net-based segmentation network incorporating Adaptive Weight Fusion (AWF) and Pyramid Dual Cross-Attention (PDCA) modules. The AWF module adaptively integrates multi-scale features through an image-aware weighting mechanism, enhancing contextual representation and salient region perception. The PDCA module combines pyramid attention with dual cross-attention to enable hierarchical feature modeling and information interaction, improving global structural understanding and boundary delineation of polyps. Extensive experiments on five publicly available colonoscopy polyp datasets demonstrate that APU-Net outperforms several existing segmentation methods in both Dice and IoU metrics, and shows particularly strong performance in segmenting small polyps. Specifically, Dice scores increase by 9.1% and 10.2% on the ETIS and CVC-ColonDB datasets, respectively. On the CVC-300 dataset, the model achieves performance comparable to the current state-of-the-art, confirming the effectiveness and robustness of the proposed network.
{"title":"APU-Net: A U-Net enhanced network with dynamic feature fusion and pyramid cross-attention mechanism for polyp segmentation","authors":"Fengyun Li , Lanping Xu , Zhendi Ma, Yuxin Zhao, Xiaobo Li","doi":"10.1016/j.dsp.2025.105879","DOIUrl":"10.1016/j.dsp.2025.105879","url":null,"abstract":"<div><div>Early detection of colorectal polyps is critical for effective screening and prevention of colorectal cancer. However, accurate segmentation remains challenging due to blurred boundaries between polyps and the rectal wall, along with low contrast, which reduce the reliability of shallow features. Additionally, uneven illumination and background noise further degrade model performance by obscuring key region identification. Existing methods also exhibit limitations in multi-scale feature fusion and long-range dependency modeling, leading to suboptimal global structure and boundary delineation—particularly for small polyps. To address these challenges, we propose APU-Net, a novel U-Net-based segmentation network incorporating Adaptive Weight Fusion (AWF) and Pyramid Dual Cross-Attention (PDCA) modules. The AWF module adaptively integrates multi-scale features through an image-aware weighting mechanism, enhancing contextual representation and salient region perception. The PDCA module combines pyramid attention with dual cross-attention to enable hierarchical feature modeling and information interaction, improving global structural understanding and boundary delineation of polyps. Extensive experiments on five publicly available colonoscopy polyp datasets demonstrate that APU-Net outperforms several existing segmentation methods in both Dice and IoU metrics, and shows particularly strong performance in segmenting small polyps. Specifically, Dice scores increase by 9.1% and 10.2% on the ETIS and CVC-ColonDB datasets, respectively. On the CVC-300 dataset, the model achieves performance comparable to the current state-of-the-art, confirming the effectiveness and robustness of the proposed network.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105879"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-02DOI: 10.1016/j.dsp.2025.105874
Yaochen Liu, Mingyue Han, Jianwei Fan
Infrared and visible image fusion aims to generate a fused image with rich texture detail information around the clock. However, existing fusion methods adopt fixed fusion to integrate features from different modalities, making them difficult to adapt to drastic illumination variations in day-night alternating scenes. To address this challenge, this paper proposes a dynamic parameter tuning for infrared and visible image fusion (DEFusion), which can flexibly adjust network parameters based on the differences in information of input images, thus effectively adapting to the complex characteristics of alternating day-night scenes. Specifically, DEFusion designs dynamic parameter tuning sub-networks that dynamically adjust the contribution of features from different modalities based on the feature information of the input image. Meanwhile, each layer of the network is equipped with an infrared and visible dual-information extraction module and a bidirectional cross-modal enhancement module. The former is responsible for preserving the unique features of unimodal images, while the latter achieves feature complementation and enhancement between modalities by performing bidirectional cross-modal interactions in parallel. In addition, the network introduces a dynamic selection algorithm, which adaptively adjusts the propagation weights of each module by sensing scene changes in real-time, so as to construct the optimal fusion path that fits the current day-night scene characteristics. On the public MSRS and TNO datasets, this method achieves maximum improvements of 59.9 % and 68.0 % in the Average Gradient (AG) metric, and 32.3 % and 37.4 % in the Spatial Frequency (SF) metric, respectively. Both qualitative and quantitative evaluations demonstrate that our model exhibits strong robustness in alternating day-night scenes.
{"title":"DEFusion: Dynamic parameter tuning for infrared-visible image fusion in day-night alternating environments","authors":"Yaochen Liu, Mingyue Han, Jianwei Fan","doi":"10.1016/j.dsp.2025.105874","DOIUrl":"10.1016/j.dsp.2025.105874","url":null,"abstract":"<div><div>Infrared and visible image fusion aims to generate a fused image with rich texture detail information around the clock. However, existing fusion methods adopt fixed fusion to integrate features from different modalities, making them difficult to adapt to drastic illumination variations in day-night alternating scenes. To address this challenge, this paper proposes a dynamic parameter tuning for infrared and visible image fusion (DEFusion), which can flexibly adjust network parameters based on the differences in information of input images, thus effectively adapting to the complex characteristics of alternating day-night scenes. Specifically, DEFusion designs dynamic parameter tuning sub-networks that dynamically adjust the contribution of features from different modalities based on the feature information of the input image. Meanwhile, each layer of the network is equipped with an infrared and visible dual-information extraction module and a bidirectional cross-modal enhancement module. The former is responsible for preserving the unique features of unimodal images, while the latter achieves feature complementation and enhancement between modalities by performing bidirectional cross-modal interactions in parallel. In addition, the network introduces a dynamic selection algorithm, which adaptively adjusts the propagation weights of each module by sensing scene changes in real-time, so as to construct the optimal fusion path that fits the current day-night scene characteristics. On the public MSRS and TNO datasets, this method achieves maximum improvements of 59.9 % and 68.0 % in the Average Gradient (AG) metric, and 32.3 % and 37.4 % in the Spatial Frequency (SF) metric, respectively. Both qualitative and quantitative evaluations demonstrate that our model exhibits strong robustness in alternating day-night scenes.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105874"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-02DOI: 10.1016/j.dsp.2025.105857
Haikun Chen , Shuwan Pan , Qin Ye , Yuanda Lin , Lixin Zheng
Accurate identification and tracking of surgical instruments are critical for computer-assisted minimally invasive surgery. To improve the detection accuracy of surgical instruments, we propose a Multi-Level MixEnhance Network (MLME-Net), whose core component is a novel Multi-branch Multi-Level MixEnhance (M2LME) module. The M2LME module employs a multi-level attention-guided architecture for weight redistribution, specifically designed to strengthen discriminative feature extractive capabilities for fine-grained through multi-level feature integration. To further enhance performance, MLME-Net integrates two critical components: the Multi-Order Gated Aggregation Block (MOGAB) for cross-complexity feature interaction through gating mechanisms, and the Coordinate Attention (CA) module for accurate instrument localization in complex surgical environments. Additionally, we address class imbalance among surgical instruments by introducing Adaptive Threshold Focal Loss (ATFL), which dynamically adjusts loss weights through an adaptive mechanism. Experimental results demonstrate that MLME-Net achieves a mean Average Precision at 50% IoU (mAP50) of 94.9% on the m2cai16-tool-locations dataset, outperforming the baseline by 1.1%. Notably, detection accuracy of the Grasper and Irrigator classes has improved by 3.3% and 2.6%, respectively.
{"title":"MLME -Net: A high-accuracy model for surgical instrument detection via multi-level MixEnhance network","authors":"Haikun Chen , Shuwan Pan , Qin Ye , Yuanda Lin , Lixin Zheng","doi":"10.1016/j.dsp.2025.105857","DOIUrl":"10.1016/j.dsp.2025.105857","url":null,"abstract":"<div><div>Accurate identification and tracking of surgical instruments are critical for computer-assisted minimally invasive surgery. To improve the detection accuracy of surgical instruments, we propose a Multi-Level MixEnhance Network (MLME-Net), whose core component is a novel Multi-branch Multi-Level MixEnhance (M<sup>2</sup>LME) module. The M<sup>2</sup>LME module employs a multi-level attention-guided architecture for weight redistribution, specifically designed to strengthen discriminative feature extractive capabilities for fine-grained through multi-level feature integration. To further enhance performance, MLME-Net integrates two critical components: the Multi-Order Gated Aggregation Block (MOGAB) for cross-complexity feature interaction through gating mechanisms, and the Coordinate Attention (CA) module for accurate instrument localization in complex surgical environments. Additionally, we address class imbalance among surgical instruments by introducing Adaptive Threshold Focal Loss (ATFL), which dynamically adjusts loss weights through an adaptive mechanism. Experimental results demonstrate that MLME-Net achieves a mean Average Precision at 50% IoU (mAP50) of 94.9% on the m2cai16-tool-locations dataset, outperforming the baseline by 1.1%. Notably, detection accuracy of the Grasper and Irrigator classes has improved by 3.3% and 2.6%, respectively.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105857"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}