首页 > 最新文献

Digital Signal Processing最新文献

英文 中文
Multiplicative bias field correction-based hybrid active contour model for infrared image segmentation 基于乘偏场校正的红外图像分割混合主动轮廓模型
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2025-12-31 DOI: 10.1016/j.dsp.2025.105854
Pengqiang Ge , Shuqing Cao , Xiaofang Kong , Guirong Weng , Guohua Gu , Qian Chen , Minjie Wan
Infrared (IR) image segmentation plays a vital role in various applications such as marine search and rescue, and military surveillance. Active contour model (ACM) is a commonly used tool for image segmentation due to its capability to accurately delineate object boundaries. However, most existing ACMs rely on local data-driven force (LDDF) and ignore global data-driven force (GDDF), causing segmentation errors while handling IR images with gray-level non-uniformity. Furthermore, the local fitting functions are repeatedly updated during level set evolution (LSE), rendering it computationally expensive. To resolve these problems, a hybrid ACM driven by multiplicative bias field correction (MBFC) for IR image segmentation is proposed. Firstly, the multi-feature (MF) GDDF utilizes the global averages of gray-level, roughness, and gradient features to prevent local minima trapping. Next, the MF local fitting functions are pre-estimated before LSE to reduce convolutions. After that, an adaptive weight function (AWF) is especially designed to fuse MF GDDF and MF LDDF properly. During LSE, the evolution curve is smoothed and shortened using average filtering. Meanwhile, the range of level set function (LSF) is normalized. Lastly, the zero level set converges near the actual target edge through finite iterations. Compared with recently developed ACMs and deep learning-based models for segmenting IR images, the segmentation accuracy, including intersection over union (IoU) and dice similarity coefficient (DSC), demonstrates obvious superiority on average and exhibits potential for generalization to the Berkeley segmentation dataset 500 (BSDS500).
红外图像分割在海上搜救、军事监视等应用中起着至关重要的作用。活动轮廓模型(ACM)是一种常用的图像分割工具,因为它能够准确地描绘物体的边界。然而,现有的算法大多依赖于局部数据驱动力(local data-driven force, LDDF)而忽略了全局数据驱动力(global data-driven force, GDDF),导致在处理灰度不均匀的红外图像时出现分割错误。此外,在水平集进化(LSE)过程中,局部拟合函数被反复更新,使得计算成本很高。为了解决这些问题,提出了一种基于乘偏场校正(MBFC)驱动的混合红外图像分割算法。首先,多特征GDDF利用灰度、粗糙度和梯度特征的全局平均值来防止局部最小值捕获;其次,在LSE之前预估计MF局部拟合函数以减少卷积。在此基础上,设计了一种自适应权函数(AWF),将MF - GDDF和MF - LDDF进行融合。在LSE过程中,利用平均滤波对演化曲线进行平滑和缩短。同时,对水平集函数(LSF)的取值范围进行了归一化。最后,通过有限次迭代,使零水平集收敛到实际目标边缘附近。与近年来发展起来的基于深度学习的红外图像分割模型相比,该模型的分割精度(包括交集超过联合(IoU)和dice相似系数(DSC))平均具有明显的优势,并具有推广到Berkeley分割数据集500 (BSDS500)的潜力。
{"title":"Multiplicative bias field correction-based hybrid active contour model for infrared image segmentation","authors":"Pengqiang Ge ,&nbsp;Shuqing Cao ,&nbsp;Xiaofang Kong ,&nbsp;Guirong Weng ,&nbsp;Guohua Gu ,&nbsp;Qian Chen ,&nbsp;Minjie Wan","doi":"10.1016/j.dsp.2025.105854","DOIUrl":"10.1016/j.dsp.2025.105854","url":null,"abstract":"<div><div>Infrared (IR) image segmentation plays a vital role in various applications such as marine search and rescue, and military surveillance. Active contour model (ACM) is a commonly used tool for image segmentation due to its capability to accurately delineate object boundaries. However, most existing ACMs rely on local data-driven force (LDDF) and ignore global data-driven force (GDDF), causing segmentation errors while handling IR images with gray-level non-uniformity. Furthermore, the local fitting functions are repeatedly updated during level set evolution (LSE), rendering it computationally expensive. To resolve these problems, a hybrid ACM driven by multiplicative bias field correction (MBFC) for IR image segmentation is proposed. Firstly, the multi-feature (MF) GDDF utilizes the global averages of gray-level, roughness, and gradient features to prevent local minima trapping. Next, the MF local fitting functions are pre-estimated before LSE to reduce convolutions. After that, an adaptive weight function (AWF) is especially designed to fuse MF GDDF and MF LDDF properly. During LSE, the evolution curve is smoothed and shortened using average filtering. Meanwhile, the range of level set function (LSF) is normalized. Lastly, the zero level set converges near the actual target edge through finite iterations. Compared with recently developed ACMs and deep learning-based models for segmenting IR images, the segmentation accuracy, including intersection over union (IoU) and dice similarity coefficient (DSC), demonstrates obvious superiority on average and exhibits potential for generalization to the Berkeley segmentation dataset 500 (BSDS500).</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105854"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The sliding sigmoid filter 滑动s型滤波器
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2025-12-19 DOI: 10.1016/j.dsp.2025.105837
Naseem Alsadi , John Yawney , Mohammad Alshabi , S. Andrew Gadsden
Traditional state estimation methods often become unreliable in the presence of measurement anomalies, abrupt disturbances, and nonlinear dynamics. Such conditions are ubiquitous in high-stakes operational settings, including air traffic surveillance, autonomous systems, and advanced manufacturing. These challenges expose an enduring methodological gap: the inability to ensure both strong robustness to uncertainty and stable, continuous correction behaviour. This paper aims to address these limitations by developing estimation methods that maintain stability while adapting intelligently to uncertainty. To this end, we introduce the Sliding Sigmoid Filter (SSF), a novel estimator that combines sliding-mode robustness with a continuous sigmoid-based gain function, and further extend it to the Adaptive Sliding Sigmoid Filter (ASSF), which adjusts its gain online using recent innovation statistics for fault detection and adaptive correction. Using linear and nonlinear simulation benchmarks together with a full experimental pipeline involving physics-informed neural network parameter identification and SSF-based state estimation for a magnetorheological damper, we evaluate the performance of the proposed filters against classical methods. The results show that SSF and ASSF significantly reduce estimation error, attenuate outliers more smoothly than threshold-based approaches, and provide faster recovery under measurement faults. Overall, the findings demonstrate that the proposed filters offer a practical and theoretically grounded alternative for robust state estimation in uncertain and fault-prone environments.
传统的状态估计方法在测量异常、突变干扰和非线性动力学的情况下往往变得不可靠。这种情况在高风险的操作环境中无处不在,包括空中交通监控、自主系统和先进制造业。这些挑战暴露出一个长期存在的方法论差距:无法确保对不确定性的强大稳健性和稳定、持续的修正行为。本文旨在通过开发在智能适应不确定性的同时保持稳定性的估计方法来解决这些限制。为此,我们引入了滑动Sigmoid滤波器(SSF),这是一种将滑模鲁棒性与基于连续Sigmoid的增益函数相结合的新型估计器,并将其进一步扩展到自适应滑动Sigmoid滤波器(ASSF),该滤波器使用最新的创新统计数据在线调整其增益,用于故障检测和自适应校正。利用线性和非线性仿真基准以及完整的实验管道,包括物理信息的神经网络参数识别和基于ssf的磁流变阻尼器状态估计,我们评估了所提出的滤波器与经典方法的性能。结果表明,与基于阈值的方法相比,SSF和ASSF显著降低了估计误差,更平滑地衰减了异常值,并且在测量故障下提供了更快的恢复。总体而言,研究结果表明,所提出的滤波器为不确定和易故障环境中的鲁棒状态估计提供了一种实用和理论基础的替代方案。
{"title":"The sliding sigmoid filter","authors":"Naseem Alsadi ,&nbsp;John Yawney ,&nbsp;Mohammad Alshabi ,&nbsp;S. Andrew Gadsden","doi":"10.1016/j.dsp.2025.105837","DOIUrl":"10.1016/j.dsp.2025.105837","url":null,"abstract":"<div><div>Traditional state estimation methods often become unreliable in the presence of measurement anomalies, abrupt disturbances, and nonlinear dynamics. Such conditions are ubiquitous in high-stakes operational settings, including air traffic surveillance, autonomous systems, and advanced manufacturing. These challenges expose an enduring methodological gap: the inability to ensure both strong robustness to uncertainty and stable, continuous correction behaviour. This paper aims to address these limitations by developing estimation methods that maintain stability while adapting intelligently to uncertainty. To this end, we introduce the Sliding Sigmoid Filter (SSF), a novel estimator that combines sliding-mode robustness with a continuous sigmoid-based gain function, and further extend it to the Adaptive Sliding Sigmoid Filter (ASSF), which adjusts its gain online using recent innovation statistics for fault detection and adaptive correction. Using linear and nonlinear simulation benchmarks together with a full experimental pipeline involving physics-informed neural network parameter identification and SSF-based state estimation for a magnetorheological damper, we evaluate the performance of the proposed filters against classical methods. The results show that SSF and ASSF significantly reduce estimation error, attenuate outliers more smoothly than threshold-based approaches, and provide faster recovery under measurement faults. Overall, the findings demonstrate that the proposed filters offer a practical and theoretically grounded alternative for robust state estimation in uncertain and fault-prone environments.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105837"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dual track YOLO-based network with multi-scale feature fusion for fire and smoke detection 一种基于多尺度特征融合的双轨yolo网络用于火灾和烟雾探测
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2025-12-30 DOI: 10.1016/j.dsp.2025.105875
Nikhil S. Shinde , Tejas Sharma , Karthik R
Fire and smoke pose serious risks to human life, infrastructure, and the ecosystem. The increasing prevalence of wildfires and urban fire incidents has resulted in significant loss of life, property damage, and environmental decline. Conventional sensor-based detection systems exhibit critical inadequacies. These include delayed response times, false alarms, and restricted coverage. Such limitations underscore the need for exploration of advanced deep learning techniques. This research proposes a novel deep learning architecture for fire and smoke detection. To the best of our knowledge, this is the first attempt to integrate a dual-track backbone into an object detection framework specifically for fire and smoke detection. It combines a Convolutional Neural Network (CNN) track that extracts low-level features and a Swin transformer track that extracts global features. The CNN track excels at capturing detailed spatial patterns. The Swin transformer track captures hierarchical contextual relationships across the image. Feature maps from both tracks are processed through a Spatial Pyramid Pooling Fast (SPPF) block to enhance multi-scale representation. The backbone concatenates feature maps at three distinct scales. These feature maps are refined using Efficient Channel Attention (ECA), which enhances channel-wise feature representations with minimal computational overhead. The refined features are fused in the neck via a Bidirectional Feature Pyramid Network (BiFPN) to enhance multi-scale representation for robust detection. The head employs a decoupled design to generate final predictions for accurate fire and smoke detection. The proposed network was trained on the DFire dataset and obtained a mean Average Precision ([email protected]) of 81.0%. To evaluate the performance and generalizability of the proposed network on external datasets, it was tested on the DFS and Indoor Fire and Smoke Computer Vision datasets, achieving mean average precision values of 79.8% and 91.8%, respectively. These observations indicate that the network can effectively detect fire- and smoke-related patterns.
火灾和烟雾对人类生命、基础设施和生态系统构成严重威胁。野火和城市火灾事件的日益普遍造成了重大的生命损失、财产损失和环境恶化。传统的基于传感器的检测系统表现出严重的不足。这些问题包括延迟的响应时间、假警报和受限的覆盖范围。这些限制强调了探索高级深度学习技术的必要性。本研究提出了一种用于火灾和烟雾探测的新型深度学习架构。据我们所知,这是第一次尝试将双轨主干集成到专门用于火灾和烟雾探测的目标检测框架中。它结合了提取低级特征的卷积神经网络(CNN)轨道和提取全局特征的Swin变压器轨道。CNN跟踪擅长捕捉详细的空间模式。Swin变压器轨道捕获整个图像的分层上下文关系。通过空间金字塔池快速(SPPF)块处理来自两个轨道的特征映射,以增强多尺度表示。主干以三种不同的比例连接特征图。这些特征映射使用高效通道注意(ECA)进行细化,该方法以最小的计算开销增强了基于通道的特征表示。通过双向特征金字塔网络(bibidirectional Feature Pyramid Network, BiFPN)在颈部融合精细特征,增强多尺度表示,实现鲁棒检测。头部采用解耦设计来生成准确的火灾和烟雾探测的最终预测。该网络在DFire数据集上进行了训练,获得了81.0%的平均精度([email protected])。为了评估所提出的网络在外部数据集上的性能和泛化性,在DFS和室内火灾和烟雾计算机视觉数据集上进行了测试,平均精度分别达到79.8%和91.8%。这些观察结果表明,该网络可以有效地探测到与火和烟有关的模式。
{"title":"A dual track YOLO-based network with multi-scale feature fusion for fire and smoke detection","authors":"Nikhil S. Shinde ,&nbsp;Tejas Sharma ,&nbsp;Karthik R","doi":"10.1016/j.dsp.2025.105875","DOIUrl":"10.1016/j.dsp.2025.105875","url":null,"abstract":"<div><div>Fire and smoke pose serious risks to human life, infrastructure, and the ecosystem. The increasing prevalence of wildfires and urban fire incidents has resulted in significant loss of life, property damage, and environmental decline. Conventional sensor-based detection systems exhibit critical inadequacies. These include delayed response times, false alarms, and restricted coverage. Such limitations underscore the need for exploration of advanced deep learning techniques. This research proposes a novel deep learning architecture for fire and smoke detection. To the best of our knowledge, this is the first attempt to integrate a dual-track backbone into an object detection framework specifically for fire and smoke detection. It combines a Convolutional Neural Network (CNN) track that extracts low-level features and a Swin transformer track that extracts global features. The CNN track excels at capturing detailed spatial patterns. The Swin transformer track captures hierarchical contextual relationships across the image. Feature maps from both tracks are processed through a Spatial Pyramid Pooling Fast (SPPF) block to enhance multi-scale representation. The backbone concatenates feature maps at three distinct scales. These feature maps are refined using Efficient Channel Attention (ECA), which enhances channel-wise feature representations with minimal computational overhead. The refined features are fused in the neck via a Bidirectional Feature Pyramid Network (BiFPN) to enhance multi-scale representation for robust detection. The head employs a decoupled design to generate final predictions for accurate fire and smoke detection. The proposed network was trained on the DFire dataset and obtained a mean Average Precision ([email protected]) of 81.0%. To evaluate the performance and generalizability of the proposed network on external datasets, it was tested on the DFS and Indoor Fire and Smoke Computer Vision datasets, achieving mean average precision values of 79.8% and 91.8%, respectively. These observations indicate that the network can effectively detect fire- and smoke-related patterns.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105875"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised medical image mapping between multiphoton microscopy and hematoxylin-eosin staining via an enhanced CycleGAN 无监督医学图像映射之间的多光子显微镜和苏木精-伊红染色通过增强型CycleGAN
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2025-12-27 DOI: 10.1016/j.dsp.2025.105863
Guimin Lin , Gangqin Xi , Chen Huang , Zuoyong Li
Multiphoton microscopy (MPM) enables high-resolution imaging of tissue microstructures, while superior specificity for identifying nuclear atypia is provided by hematoxylin and eosin (H&E) staining. These complementary advantages are leveraged through multimodal analysis (MPM combined with H&E) to achieve a more comprehensive characterization of pathological features (e.g., structural, compositional, and cellular abnormalities), thereby enriching diagnostic and mechanistic information for disease investigation. However, MPM imaging systems are prohibitively expensive, and paired MPM-H&E image datasets remain scarce. To address this, an unsupervised cross-modal medical image translation framework named CMGAN is proposed, based on CycleGAN architecture, which facilitates bidirectional translation between H&E and MPM image modalities. Cycle-consistency constraints are employed to enable training on unpaired datasets. Building upon this foundation, two regularization methods are introduced: deep feature consistency and salient component consistency, through which the generator is guided to synthesize realistic and reliable target-domain images. Furthermore, to capture direct associations in image distributions, a directional discriminator is incorporated during training to enhance recognition of inter-modal relationships. In MPM-to-H&E and H&E-to-MPM translation tasks, CMGAN is compared against state-of-the-art GAN models. Experimental results demonstrate superior performance of CMGAN over benchmark methods in both quantitative metrics and qualitative evaluations.
多光子显微镜(MPM)能够实现高分辨率的组织显微结构成像,而苏木精和伊红(H&;E)染色提供了鉴别核异型性的优越特异性。通过多模态分析(MPM结合H&;E)利用这些互补优势,实现更全面的病理特征(如结构、成分和细胞异常)表征,从而丰富疾病调查的诊断和机制信息。然而,MPM成像系统过于昂贵,配对的MPM- h&;E图像数据集仍然稀缺。为了解决这个问题,提出了一个基于CycleGAN架构的无监督跨模式医学图像翻译框架CMGAN,该框架促进了H&;E和MPM图像模式之间的双向翻译。采用循环一致性约束对非配对数据集进行训练。在此基础上,引入了深度特征一致性和显著成分一致性两种正则化方法,引导生成器合成真实可靠的目标域图像。此外,为了捕获图像分布中的直接关联,在训练过程中加入方向鉴别器以增强对多模态关系的识别。在mpm到H&;E和H&;E到mpm的翻译任务中,CMGAN与最先进的GAN模型进行了比较。实验结果表明,CMGAN在定量指标和定性评价方面都优于基准方法。
{"title":"Unsupervised medical image mapping between multiphoton microscopy and hematoxylin-eosin staining via an enhanced CycleGAN","authors":"Guimin Lin ,&nbsp;Gangqin Xi ,&nbsp;Chen Huang ,&nbsp;Zuoyong Li","doi":"10.1016/j.dsp.2025.105863","DOIUrl":"10.1016/j.dsp.2025.105863","url":null,"abstract":"<div><div>Multiphoton microscopy (MPM) enables high-resolution imaging of tissue microstructures, while superior specificity for identifying nuclear atypia is provided by hematoxylin and eosin (H&amp;E) staining. These complementary advantages are leveraged through multimodal analysis (MPM combined with H&amp;E) to achieve a more comprehensive characterization of pathological features (e.g., structural, compositional, and cellular abnormalities), thereby enriching diagnostic and mechanistic information for disease investigation. However, MPM imaging systems are prohibitively expensive, and paired MPM-H&amp;E image datasets remain scarce. To address this, an unsupervised cross-modal medical image translation framework named CMGAN is proposed, based on CycleGAN architecture, which facilitates bidirectional translation between H&amp;E and MPM image modalities. Cycle-consistency constraints are employed to enable training on unpaired datasets. Building upon this foundation, two regularization methods are introduced: deep feature consistency and salient component consistency, through which the generator is guided to synthesize realistic and reliable target-domain images. Furthermore, to capture direct associations in image distributions, a directional discriminator is incorporated during training to enhance recognition of inter-modal relationships. In MPM-to-H&amp;E and H&amp;E-to-MPM translation tasks, CMGAN is compared against state-of-the-art GAN models. Experimental results demonstrate superior performance of CMGAN over benchmark methods in both quantitative metrics and qualitative evaluations.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105863"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised denoising of multichannel images with mixture noise via neural gradient learning and spectral-spatial total variation regularization 基于神经梯度学习和光谱空间全变分正则化的混合噪声多通道图像自监督去噪
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2025-12-29 DOI: 10.1016/j.dsp.2025.105856
Min Wang, Liang Zhong, Ran Zhang, Shuyi Zhang
Multichannel image denoising under mixture noise remains a significant challenge due to the complex noise distributions and the need to preserve fine structural details. Traditional methods, such as Total Variation (TV) based methods, often rely on the sparsity of first-order gradients, which may not hold for real-world images rich in textures and edges, leading to oversmoothing and detail loss. In this paper, we propose a novel self-supervised denoising framework that leverages neural gradient learning and Spectral-Spatial Total Variation (SSTV) regularization to effectively handle mixture noise in multichannel images. The framework consists of two cooperative networks: one generates the denoised image by exploiting deep image priors, while the other predicts the corresponding gradient map to better capture edge and structure information. Unlike traditional TV-based methods, our approach learns gradient representations directly from noisy data and constrains the second-order gradients via SSTV to model their inherent sparsity. The entire framework is trained without clean references, making it highly adaptable to real-world applications. Extensive experiments on various multichannel datasets demonstrate that our method outperforms existing approaches in both quantitative metrics and visual quality under diverse noise conditions.
混合噪声下的多通道图像去噪一直是一项重大挑战,因为噪声分布复杂且需要保留精细的结构细节。传统的方法,如基于总变分(TV)的方法,通常依赖于一阶梯度的稀疏性,这可能不适用于富含纹理和边缘的真实图像,导致过度平滑和细节丢失。在本文中,我们提出了一种新的自监督去噪框架,该框架利用神经梯度学习和频谱空间全变分(SSTV)正则化来有效地处理多通道图像中的混合噪声。该框架由两个协同网络组成:一个利用深度图像先验产生去噪图像,另一个预测相应的梯度映射,更好地捕获边缘和结构信息。与传统的基于电视的方法不同,我们的方法直接从噪声数据中学习梯度表示,并通过SSTV约束二阶梯度来建模其固有的稀疏性。整个框架在没有清晰引用的情况下进行训练,使其能够高度适应现实世界的应用程序。在各种多通道数据集上进行的大量实验表明,在不同噪声条件下,我们的方法在定量指标和视觉质量方面都优于现有方法。
{"title":"Self-supervised denoising of multichannel images with mixture noise via neural gradient learning and spectral-spatial total variation regularization","authors":"Min Wang,&nbsp;Liang Zhong,&nbsp;Ran Zhang,&nbsp;Shuyi Zhang","doi":"10.1016/j.dsp.2025.105856","DOIUrl":"10.1016/j.dsp.2025.105856","url":null,"abstract":"<div><div>Multichannel image denoising under mixture noise remains a significant challenge due to the complex noise distributions and the need to preserve fine structural details. Traditional methods, such as Total Variation (TV) based methods, often rely on the sparsity of first-order gradients, which may not hold for real-world images rich in textures and edges, leading to oversmoothing and detail loss. In this paper, we propose a novel self-supervised denoising framework that leverages neural gradient learning and Spectral-Spatial Total Variation (SSTV) regularization to effectively handle mixture noise in multichannel images. The framework consists of two cooperative networks: one generates the denoised image by exploiting deep image priors, while the other predicts the corresponding gradient map to better capture edge and structure information. Unlike traditional TV-based methods, our approach learns gradient representations directly from noisy data and constrains the second-order gradients via SSTV to model their inherent sparsity. The entire framework is trained without clean references, making it highly adaptable to real-world applications. Extensive experiments on various multichannel datasets demonstrate that our method outperforms existing approaches in both quantitative metrics and visual quality under diverse noise conditions.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105856"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HSPF-Net: Hybrid CNN-transformer with serial-parallel fusion for skin lesion segmentation HSPF-Net:混合CNN-transformer与串并联融合用于皮肤损伤分割
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2026-01-06 DOI: 10.1016/j.dsp.2026.105891
Hao Fang , Yu Sun , Shuai Zhang , Xuyang Teng , Xiaohui Li , Xiaodong Yu
Medical imaging advancements have made dermoscopic images a critical tool in clinical diagnosis. However, segmenting skin lesions remains challenging due to blurred boundaries, low contrast with healthy tissue, and interference from hair and vasculature. To overcome these challenges, we propose HSPF-Net, a novel series-parallel hybrid network that combines the strengths of CNN and Transformer architectures for precise lesion segmentation. We propose a Multi-Receptive Field Fusion Module (MRFF) that performs dual-branch feature fusion by computing attention across features extracted from multiple receptive fields. Furthermore, a Fine-Grained Spatial-Channel Attention Gate (FG-SCAG) is designed to dynamically suppress irrelevant information and enhance feature representation. Experiments demonstrate that HSPF-Net handles artefacts such as hair occlusion, illumination noise, and irregular lesion shapes. Evaluated on three public datasets-ISIC2017, ISIC2018, and PH². Our model achieves state-of-the-art performance, significantly improving segmentation accuracy in the Dice coefficient and IoU compared to existing methods.
医学影像学的进步使皮肤镜图像成为临床诊断的重要工具。然而,由于边界模糊,与健康组织对比度低,以及头发和脉管系统的干扰,分割皮肤病变仍然具有挑战性。为了克服这些挑战,我们提出了HSPF-Net,这是一种新型的串联-并联混合网络,结合了CNN和Transformer架构的优势,用于精确的病灶分割。我们提出了一种多接受场融合模块(MRFF),它通过计算从多个接受场提取的特征之间的注意力来进行双分支特征融合。此外,设计了细粒度空间通道注意门(FG-SCAG)来动态抑制无关信息,增强特征表征。实验证明,HSPF-Net可以处理毛发遮挡、光照噪声和不规则病变形状等伪影。在三个公共数据集- isic2017, ISIC2018和PH²上进行评估。与现有方法相比,我们的模型实现了最先进的性能,显着提高了Dice系数和IoU的分割精度。
{"title":"HSPF-Net: Hybrid CNN-transformer with serial-parallel fusion for skin lesion segmentation","authors":"Hao Fang ,&nbsp;Yu Sun ,&nbsp;Shuai Zhang ,&nbsp;Xuyang Teng ,&nbsp;Xiaohui Li ,&nbsp;Xiaodong Yu","doi":"10.1016/j.dsp.2026.105891","DOIUrl":"10.1016/j.dsp.2026.105891","url":null,"abstract":"<div><div>Medical imaging advancements have made dermoscopic images a critical tool in clinical diagnosis. However, segmenting skin lesions remains challenging due to blurred boundaries, low contrast with healthy tissue, and interference from hair and vasculature. To overcome these challenges, we propose HSPF-Net, a novel series-parallel hybrid network that combines the strengths of CNN and Transformer architectures for precise lesion segmentation. We propose a Multi-Receptive Field Fusion Module (MRFF) that performs dual-branch feature fusion by computing attention across features extracted from multiple receptive fields. Furthermore, a Fine-Grained Spatial-Channel Attention Gate (FG-SCAG) is designed to dynamically suppress irrelevant information and enhance feature representation. Experiments demonstrate that HSPF-Net handles artefacts such as hair occlusion, illumination noise, and irregular lesion shapes. Evaluated on three public datasets-ISIC2017, ISIC2018, and PH². Our model achieves state-of-the-art performance, significantly improving segmentation accuracy in the Dice coefficient and IoU compared to existing methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105891"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale enhanced contextual transformer network for forest fire detection 用于森林火灾探测的多尺度增强型情境变压器网络
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2025-12-27 DOI: 10.1016/j.dsp.2025.105850
Changhui Ding , Haiyan Li , Yajie Liu , Bingbing He , Xun Lang , Guanbo Wang

Problem

Remote sensing-based forest fire detection is critical for early warning and rapid response, yet existing methods struggle with detecting multi-scale fire instances, modeling long-range contextual dependencies, and effectively fusing hierarchical features-particularly in complex natural environments with varying illumination, terrain, and smoke-flame mixtures.

Aim

To address these challenges, this work aims to develop a robust and efficient deep learning framework that enhances multi-scale representation, strengthens contextual correlation, and optimizes feature fusion for improved detection accuracy and real-time applicability.

Method

We propose a Multi-Scale Enhanced Contextual Transformer Network (MECT-Net), a novel architecture integrating convolutional and Transformer-based components. First, the Contextual Transformer with Scale Attention (CTSA) module combines a Multi-Scale Cross-Channel Fusion (MSCCF) block with an Enhanced Neighborhood-Aware Transformer (ENATM) to simultaneously capture local details and global context. Second, the Feature Memory Augmentation Network (FMAN) leverages Multi-Scale Group Convolution (MGC) and hybrid attention (SE and CBAM) to model long-range channel dependencies and refine multi-scale features. Third, the Multi-Scale Enhancement Feature Pyramid Network (MSE-FPN) enables bidirectional feature propagation and aggregation for balanced fine-grained and semantic learning. To support training and evaluation, a hybrid dataset combining synthetic and real-world fire imagery is constructed.

Results

Extensive experiments conducted on the proposed benchmark dataset demonstrate that MECT-Net achieves state-of-the-art detection performance. Specifically, MECT-Net (n) achieves a mAP50fire of 91.1%, while MECT-Net (s) attains a comparable mAP50fire of 91.0%, with superior performance across multiple evaluation metrics compared to the majority of mainstream one-stage object detectors. Notably, despite its competitive accuracy, MECT-Net exhibits significantly reduced model complexity-its parameter count is substantially lower than that of architectures with similar performance. Furthermore, it maintains a high inference speed on NVIDIA RTX 4070 Ti, confirming its efficiency and suitability for real-time deployment in resource-constrained environments.

Conclusion

MECT-Net provides an effective and deployable solution for real-time forest fire detection in aerial remote sensing, advancing the integration of hybrid neural architectures for visual anomaly detection. The proposed modules and hybrid dataset offer valuable resources for future research in wildfire monitoring.
基于遥感的森林火灾探测对于早期预警和快速反应至关重要,然而现有的方法在检测多尺度火灾实例、建立远程上下文依赖关系和有效融合分层特征方面存在困难,特别是在具有不同光照、地形和烟焰混合的复杂自然环境中。为了应对这些挑战,本研究旨在开发一个鲁棒且高效的深度学习框架,增强多尺度表示,增强上下文相关性,并优化特征融合,以提高检测精度和实时适用性。方法提出了一种多尺度增强上下文变压器网络(MECT-Net),这是一种集成了卷积和基于变压器的组件的新架构。首先,具有尺度注意的上下文变压器(CTSA)模块将多尺度跨通道融合(MSCCF)块与增强的邻居感知变压器(ENATM)相结合,以同时捕获局部细节和全局上下文。其次,特征记忆增强网络(FMAN)利用多尺度群卷积(MGC)和混合注意(SE和CBAM)来建模远程通道依赖并细化多尺度特征。第三,多尺度增强特征金字塔网络(MSE-FPN)实现了双向特征传播和聚合,以实现平衡的细粒度和语义学习。为了支持训练和评估,构建了合成火灾图像和真实火灾图像的混合数据集。结果在提出的基准数据集上进行的大量实验表明,MECT-Net达到了最先进的检测性能。具体来说,MECT-Net (n)实现了91.1%的mAP50fire,而MECT-Net (s)实现了91.0%的mAP50fire,与大多数主流单级目标检测器相比,在多个评估指标上具有优越的性能。值得注意的是,尽管其精度具有竞争力,但MECT-Net显示出显着降低的模型复杂性-其参数计数大大低于具有类似性能的体系结构。此外,它在NVIDIA RTX 4070 Ti上保持了较高的推理速度,证实了其在资源受限环境下实时部署的效率和适用性。结论mect - net为航空遥感森林火灾实时检测提供了有效的可部署解决方案,推进了混合神经结构在视觉异常检测中的集成。提出的模块和混合数据集为未来的野火监测研究提供了宝贵的资源。
{"title":"Multi-scale enhanced contextual transformer network for forest fire detection","authors":"Changhui Ding ,&nbsp;Haiyan Li ,&nbsp;Yajie Liu ,&nbsp;Bingbing He ,&nbsp;Xun Lang ,&nbsp;Guanbo Wang","doi":"10.1016/j.dsp.2025.105850","DOIUrl":"10.1016/j.dsp.2025.105850","url":null,"abstract":"<div><h3>Problem</h3><div>Remote sensing-based forest fire detection is critical for early warning and rapid response, yet existing methods struggle with detecting multi-scale fire instances, modeling long-range contextual dependencies, and effectively fusing hierarchical features-particularly in complex natural environments with varying illumination, terrain, and smoke-flame mixtures.</div></div><div><h3>Aim</h3><div>To address these challenges, this work aims to develop a robust and efficient deep learning framework that enhances multi-scale representation, strengthens contextual correlation, and optimizes feature fusion for improved detection accuracy and real-time applicability.</div></div><div><h3>Method</h3><div>We propose a Multi-Scale Enhanced Contextual Transformer Network (MECT-Net), a novel architecture integrating convolutional and Transformer-based components. First, the Contextual Transformer with Scale Attention (CTSA) module combines a Multi-Scale Cross-Channel Fusion (MSCCF) block with an Enhanced Neighborhood-Aware Transformer (ENATM) to simultaneously capture local details and global context. Second, the Feature Memory Augmentation Network (FMAN) leverages Multi-Scale Group Convolution (MGC) and hybrid attention (SE and CBAM) to model long-range channel dependencies and refine multi-scale features. Third, the Multi-Scale Enhancement Feature Pyramid Network (MSE-FPN) enables bidirectional feature propagation and aggregation for balanced fine-grained and semantic learning. To support training and evaluation, a hybrid dataset combining synthetic and real-world fire imagery is constructed.</div></div><div><h3>Results</h3><div>Extensive experiments conducted on the proposed benchmark dataset demonstrate that MECT-Net achieves state-of-the-art detection performance. Specifically, MECT-Net (n) achieves a mAP<sub>50</sub><sup>fire</sup> of 91.1%, while MECT-Net (s) attains a comparable mAP<sub>50</sub><sup>fire</sup> of 91.0%, with superior performance across multiple evaluation metrics compared to the majority of mainstream one-stage object detectors. Notably, despite its competitive accuracy, MECT-Net exhibits significantly reduced model complexity-its parameter count is substantially lower than that of architectures with similar performance. Furthermore, it maintains a high inference speed on NVIDIA RTX 4070 Ti, confirming its efficiency and suitability for real-time deployment in resource-constrained environments.</div></div><div><h3>Conclusion</h3><div>MECT-Net provides an effective and deployable solution for real-time forest fire detection in aerial remote sensing, advancing the integration of hybrid neural architectures for visual anomaly detection. The proposed modules and hybrid dataset offer valuable resources for future research in wildfire monitoring.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105850"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
APU-Net: A U-Net enhanced network with dynamic feature fusion and pyramid cross-attention mechanism for polyp segmentation APU-Net:一种基于动态特征融合和金字塔交叉注意机制的U-Net增强的息肉分割网络
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2025-12-31 DOI: 10.1016/j.dsp.2025.105879
Fengyun Li , Lanping Xu , Zhendi Ma, Yuxin Zhao, Xiaobo Li
Early detection of colorectal polyps is critical for effective screening and prevention of colorectal cancer. However, accurate segmentation remains challenging due to blurred boundaries between polyps and the rectal wall, along with low contrast, which reduce the reliability of shallow features. Additionally, uneven illumination and background noise further degrade model performance by obscuring key region identification. Existing methods also exhibit limitations in multi-scale feature fusion and long-range dependency modeling, leading to suboptimal global structure and boundary delineation—particularly for small polyps. To address these challenges, we propose APU-Net, a novel U-Net-based segmentation network incorporating Adaptive Weight Fusion (AWF) and Pyramid Dual Cross-Attention (PDCA) modules. The AWF module adaptively integrates multi-scale features through an image-aware weighting mechanism, enhancing contextual representation and salient region perception. The PDCA module combines pyramid attention with dual cross-attention to enable hierarchical feature modeling and information interaction, improving global structural understanding and boundary delineation of polyps. Extensive experiments on five publicly available colonoscopy polyp datasets demonstrate that APU-Net outperforms several existing segmentation methods in both Dice and IoU metrics, and shows particularly strong performance in segmenting small polyps. Specifically, Dice scores increase by 9.1% and 10.2% on the ETIS and CVC-ColonDB datasets, respectively. On the CVC-300 dataset, the model achieves performance comparable to the current state-of-the-art, confirming the effectiveness and robustness of the proposed network.
早期发现结直肠息肉是有效筛查和预防结直肠癌的关键。然而,由于息肉和直肠壁之间的边界模糊,以及对比度低,降低了浅层特征的可靠性,因此准确分割仍然具有挑战性。此外,不均匀的光照和背景噪声会模糊关键区域的识别,从而进一步降低模型的性能。现有方法在多尺度特征融合和远程依赖建模方面也存在局限性,导致全局结构和边界划定不理想,特别是对于小息肉。为了解决这些挑战,我们提出了一种新的基于u - net的分割网络APU-Net,该网络结合了自适应权重融合(AWF)和金字塔双交叉注意(PDCA)模块。AWF模块通过图像感知加权机制自适应集成多尺度特征,增强上下文表示和显著区域感知。PDCA模块结合金字塔注意和双交叉注意,实现了分层特征建模和信息交互,提高了息肉的整体结构理解和边界描绘。在五个公开可用的结肠镜息肉数据集上进行的大量实验表明,APU-Net在Dice和IoU指标上都优于几种现有的分割方法,并且在分割小息肉方面表现出特别强的性能。具体来说,Dice分数在ETIS和CVC-ColonDB数据集上分别提高了9.1%和10.2%。在CVC-300数据集上,该模型达到了与当前最先进的性能相当的性能,证实了所提出网络的有效性和鲁棒性。
{"title":"APU-Net: A U-Net enhanced network with dynamic feature fusion and pyramid cross-attention mechanism for polyp segmentation","authors":"Fengyun Li ,&nbsp;Lanping Xu ,&nbsp;Zhendi Ma,&nbsp;Yuxin Zhao,&nbsp;Xiaobo Li","doi":"10.1016/j.dsp.2025.105879","DOIUrl":"10.1016/j.dsp.2025.105879","url":null,"abstract":"<div><div>Early detection of colorectal polyps is critical for effective screening and prevention of colorectal cancer. However, accurate segmentation remains challenging due to blurred boundaries between polyps and the rectal wall, along with low contrast, which reduce the reliability of shallow features. Additionally, uneven illumination and background noise further degrade model performance by obscuring key region identification. Existing methods also exhibit limitations in multi-scale feature fusion and long-range dependency modeling, leading to suboptimal global structure and boundary delineation—particularly for small polyps. To address these challenges, we propose APU-Net, a novel U-Net-based segmentation network incorporating Adaptive Weight Fusion (AWF) and Pyramid Dual Cross-Attention (PDCA) modules. The AWF module adaptively integrates multi-scale features through an image-aware weighting mechanism, enhancing contextual representation and salient region perception. The PDCA module combines pyramid attention with dual cross-attention to enable hierarchical feature modeling and information interaction, improving global structural understanding and boundary delineation of polyps. Extensive experiments on five publicly available colonoscopy polyp datasets demonstrate that APU-Net outperforms several existing segmentation methods in both Dice and IoU metrics, and shows particularly strong performance in segmenting small polyps. Specifically, Dice scores increase by 9.1% and 10.2% on the ETIS and CVC-ColonDB datasets, respectively. On the CVC-300 dataset, the model achieves performance comparable to the current state-of-the-art, confirming the effectiveness and robustness of the proposed network.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105879"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEFusion: Dynamic parameter tuning for infrared-visible image fusion in day-night alternating environments 融合:昼夜交替环境下红外-可见光图像融合的动态参数调整
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2026-01-02 DOI: 10.1016/j.dsp.2025.105874
Yaochen Liu, Mingyue Han, Jianwei Fan
Infrared and visible image fusion aims to generate a fused image with rich texture detail information around the clock. However, existing fusion methods adopt fixed fusion to integrate features from different modalities, making them difficult to adapt to drastic illumination variations in day-night alternating scenes. To address this challenge, this paper proposes a dynamic parameter tuning for infrared and visible image fusion (DEFusion), which can flexibly adjust network parameters based on the differences in information of input images, thus effectively adapting to the complex characteristics of alternating day-night scenes. Specifically, DEFusion designs dynamic parameter tuning sub-networks that dynamically adjust the contribution of features from different modalities based on the feature information of the input image. Meanwhile, each layer of the network is equipped with an infrared and visible dual-information extraction module and a bidirectional cross-modal enhancement module. The former is responsible for preserving the unique features of unimodal images, while the latter achieves feature complementation and enhancement between modalities by performing bidirectional cross-modal interactions in parallel. In addition, the network introduces a dynamic selection algorithm, which adaptively adjusts the propagation weights of each module by sensing scene changes in real-time, so as to construct the optimal fusion path that fits the current day-night scene characteristics. On the public MSRS and TNO datasets, this method achieves maximum improvements of 59.9 % and 68.0 % in the Average Gradient (AG) metric, and 32.3 % and 37.4 % in the Spatial Frequency (SF) metric, respectively. Both qualitative and quantitative evaluations demonstrate that our model exhibits strong robustness in alternating day-night scenes.
红外图像与可见光图像融合的目的是全天候生成具有丰富纹理细节信息的融合图像。然而,现有的融合方法采用固定融合来整合不同模态的特征,难以适应昼夜交替场景中剧烈的光照变化。针对这一挑战,本文提出了一种红外与可见光图像融合(DEFusion)的动态参数调整方法,该方法可以根据输入图像信息的差异灵活调整网络参数,从而有效适应昼夜交替场景的复杂特征。具体来说,DEFusion设计了动态参数调整子网络,根据输入图像的特征信息动态调整来自不同模态的特征的贡献。同时,网络的每一层都配备了红外和可见光双信息提取模块和双向跨模态增强模块。前者负责保持单模态图像的独特特征,后者通过并行进行双向跨模态交互,实现模态之间的特征互补和增强。此外,该网络还引入了动态选择算法,通过实时感知场景变化,自适应调整各模块的传播权重,构建最优融合路径,以适应当前昼夜场景特征。在公开的MSRS和TNO数据集上,该方法在平均梯度(AG)度量上的最大改进率分别为59.9 %和68.0 %,在空间频率(SF)度量上的最大改进率分别为32.3 %和37.4 %。定性和定量评估表明,我们的模型在昼夜交替的场景中表现出很强的鲁棒性。
{"title":"DEFusion: Dynamic parameter tuning for infrared-visible image fusion in day-night alternating environments","authors":"Yaochen Liu,&nbsp;Mingyue Han,&nbsp;Jianwei Fan","doi":"10.1016/j.dsp.2025.105874","DOIUrl":"10.1016/j.dsp.2025.105874","url":null,"abstract":"<div><div>Infrared and visible image fusion aims to generate a fused image with rich texture detail information around the clock. However, existing fusion methods adopt fixed fusion to integrate features from different modalities, making them difficult to adapt to drastic illumination variations in day-night alternating scenes. To address this challenge, this paper proposes a dynamic parameter tuning for infrared and visible image fusion (DEFusion), which can flexibly adjust network parameters based on the differences in information of input images, thus effectively adapting to the complex characteristics of alternating day-night scenes. Specifically, DEFusion designs dynamic parameter tuning sub-networks that dynamically adjust the contribution of features from different modalities based on the feature information of the input image. Meanwhile, each layer of the network is equipped with an infrared and visible dual-information extraction module and a bidirectional cross-modal enhancement module. The former is responsible for preserving the unique features of unimodal images, while the latter achieves feature complementation and enhancement between modalities by performing bidirectional cross-modal interactions in parallel. In addition, the network introduces a dynamic selection algorithm, which adaptively adjusts the propagation weights of each module by sensing scene changes in real-time, so as to construct the optimal fusion path that fits the current day-night scene characteristics. On the public MSRS and TNO datasets, this method achieves maximum improvements of 59.9 % and 68.0 % in the Average Gradient (AG) metric, and 32.3 % and 37.4 % in the Spatial Frequency (SF) metric, respectively. Both qualitative and quantitative evaluations demonstrate that our model exhibits strong robustness in alternating day-night scenes.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105874"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLME -Net: A high-accuracy model for surgical instrument detection via multi-level MixEnhance network MLME -Net:通过多级MixEnhance网络进行手术器械检测的高精度模型
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-15 Epub Date: 2026-01-02 DOI: 10.1016/j.dsp.2025.105857
Haikun Chen , Shuwan Pan , Qin Ye , Yuanda Lin , Lixin Zheng
Accurate identification and tracking of surgical instruments are critical for computer-assisted minimally invasive surgery. To improve the detection accuracy of surgical instruments, we propose a Multi-Level MixEnhance Network (MLME-Net), whose core component is a novel Multi-branch Multi-Level MixEnhance (M2LME) module. The M2LME module employs a multi-level attention-guided architecture for weight redistribution, specifically designed to strengthen discriminative feature extractive capabilities for fine-grained through multi-level feature integration. To further enhance performance, MLME-Net integrates two critical components: the Multi-Order Gated Aggregation Block (MOGAB) for cross-complexity feature interaction through gating mechanisms, and the Coordinate Attention (CA) module for accurate instrument localization in complex surgical environments. Additionally, we address class imbalance among surgical instruments by introducing Adaptive Threshold Focal Loss (ATFL), which dynamically adjusts loss weights through an adaptive mechanism. Experimental results demonstrate that MLME-Net achieves a mean Average Precision at 50% IoU (mAP50) of 94.9% on the m2cai16-tool-locations dataset, outperforming the baseline by 1.1%. Notably, detection accuracy of the Grasper and Irrigator classes has improved by 3.3% and 2.6%, respectively.
准确识别和跟踪手术器械是计算机辅助微创手术的关键。为了提高手术器械的检测精度,我们提出了一种多层次混合增强网络(MLME-Net),其核心组件是一种新型的多分支多层次混合增强(M2LME)模块。M2LME模块采用多级注意引导架构进行权重再分配,通过多级特征集成,增强细粒度的判别特征提取能力。为了进一步提高性能,MLME-Net集成了两个关键组件:通过门控机制进行跨复杂性特征交互的多阶门控聚合块(MOGAB),以及在复杂手术环境中精确定位仪器的协调注意(CA)模块。此外,我们通过引入自适应阈值焦点损失(ATFL)来解决手术器械之间的类别不平衡,该功能通过自适应机制动态调整损失权重。实验结果表明,在m2cai16-tool-locations数据集上,MLME-Net在50% IoU (mAP50)下的平均精度为94.9%,比基线高1.1%。值得注意的是,“抓草者”和“灌溉者”的检测准确率分别提高了3.3%和2.6%。
{"title":"MLME -Net: A high-accuracy model for surgical instrument detection via multi-level MixEnhance network","authors":"Haikun Chen ,&nbsp;Shuwan Pan ,&nbsp;Qin Ye ,&nbsp;Yuanda Lin ,&nbsp;Lixin Zheng","doi":"10.1016/j.dsp.2025.105857","DOIUrl":"10.1016/j.dsp.2025.105857","url":null,"abstract":"<div><div>Accurate identification and tracking of surgical instruments are critical for computer-assisted minimally invasive surgery. To improve the detection accuracy of surgical instruments, we propose a Multi-Level MixEnhance Network (MLME-Net), whose core component is a novel Multi-branch Multi-Level MixEnhance (M<sup>2</sup>LME) module. The M<sup>2</sup>LME module employs a multi-level attention-guided architecture for weight redistribution, specifically designed to strengthen discriminative feature extractive capabilities for fine-grained through multi-level feature integration. To further enhance performance, MLME-Net integrates two critical components: the Multi-Order Gated Aggregation Block (MOGAB) for cross-complexity feature interaction through gating mechanisms, and the Coordinate Attention (CA) module for accurate instrument localization in complex surgical environments. Additionally, we address class imbalance among surgical instruments by introducing Adaptive Threshold Focal Loss (ATFL), which dynamically adjusts loss weights through an adaptive mechanism. Experimental results demonstrate that MLME-Net achieves a mean Average Precision at 50% IoU (mAP50) of 94.9% on the m2cai16-tool-locations dataset, outperforming the baseline by 1.1%. Notably, detection accuracy of the Grasper and Irrigator classes has improved by 3.3% and 2.6%, respectively.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105857"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1