首页 > 最新文献

Digital Signal Processing最新文献

英文 中文
Performance analysis and robust DOA estimation using acoustic vector sensor array under non-orthogonal deviation 非正交偏差下声矢量传感器阵列性能分析及鲁棒DOA估计
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-02 DOI: 10.1016/j.dsp.2025.105867
Weidong Wang , Tianyou Wang , Hui Li , Wentao Shi , Wasiq Ali
In this paper, the problem of direction of arrival (DOA) estimation under the non-orthogonal deviation (NOD) in an acoustic vector sensor array (AVSA) is systematically addressed. First, by incorporating NOD information into the ideal AVSA model, two AVSA models with NOD are established. Subsequently, closed-form expressions for DOA estimation bias, the Cramér-Rao lower bound (CRLB), and the root mean square error (RMSE) are analytically derived for scenarios where each AVS exhibits NOD to illustrate the degrading influence of NOD on DOA estimation accuracy. To mitigate the effect of NOD, an innovative optimal modification matrix construction (OMMC) method is proposed. The NOD range of each AVS is initially coarsely estimated using prior information from a known auxiliary source and the theoretical RMSE. Based on the estimated deviation range, an overcomplete redundant correction matrix is constructed, which is used to calibrate the measurement data of each AVS. The optimal correction matrix is selected by minimizing the deviation between the estimated and true DOAs, and a global correction matrix for the entire array is formed by extracting the optimal correction sub-matrix for each AVS, thereby enabling accurate array calibration. A comprehensive performance evaluation is conducted through extensive simulations, where the proposed OMMC method is demonstrated to significantly outperform existing techniques, especially in challenging environments with large NOD or limited snapshot.
本文系统地研究了声矢量传感器阵列(AVSA)在非正交偏差(NOD)条件下的到达方向估计问题。首先,将NOD信息引入理想AVSA模型,建立了两个带NOD的AVSA模型。随后,在每个AVS都显示NOD的情况下,解析导出了DOA估计偏差、cram - rao下限(CRLB)和均方根误差(RMSE)的封闭表达式,以说明NOD对DOA估计精度的退化影响。为了减轻NOD的影响,提出了一种创新的最优修正矩阵构造(OMMC)方法。每个AVS的NOD范围最初是使用已知辅助源的先验信息和理论RMSE粗略估计的。根据估计的偏差范围,构造过完备冗余校正矩阵,用于标定各AVS的测量数据。通过最小化估计doa与真实doa之间的偏差来选择最优校正矩阵,并通过提取每个AVS的最优校正子矩阵形成整个阵列的全局校正矩阵,从而实现精确的阵列校准。通过广泛的模拟进行了全面的性能评估,其中提出的OMMC方法被证明明显优于现有技术,特别是在具有大NOD或有限快照的挑战性环境中。
{"title":"Performance analysis and robust DOA estimation using acoustic vector sensor array under non-orthogonal deviation","authors":"Weidong Wang ,&nbsp;Tianyou Wang ,&nbsp;Hui Li ,&nbsp;Wentao Shi ,&nbsp;Wasiq Ali","doi":"10.1016/j.dsp.2025.105867","DOIUrl":"10.1016/j.dsp.2025.105867","url":null,"abstract":"<div><div>In this paper, the problem of direction of arrival (DOA) estimation under the non-orthogonal deviation (NOD) in an acoustic vector sensor array (AVSA) is systematically addressed. First, by incorporating NOD information into the ideal AVSA model, two AVSA models with NOD are established. Subsequently, closed-form expressions for DOA estimation bias, the Cramér-Rao lower bound (CRLB), and the root mean square error (RMSE) are analytically derived for scenarios where each AVS exhibits NOD to illustrate the degrading influence of NOD on DOA estimation accuracy. To mitigate the effect of NOD, an innovative optimal modification matrix construction (OMMC) method is proposed. The NOD range of each AVS is initially coarsely estimated using prior information from a known auxiliary source and the theoretical RMSE. Based on the estimated deviation range, an overcomplete redundant correction matrix is constructed, which is used to calibrate the measurement data of each AVS. The optimal correction matrix is selected by minimizing the deviation between the estimated and true DOAs, and a global correction matrix for the entire array is formed by extracting the optimal correction sub-matrix for each AVS, thereby enabling accurate array calibration. A comprehensive performance evaluation is conducted through extensive simulations, where the proposed OMMC method is demonstrated to significantly outperform existing techniques, especially in challenging environments with large NOD or limited snapshot.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105867"},"PeriodicalIF":3.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-component LFM signal representation method under impulsive noise: Principle, method and application 脉冲噪声下多分量LFM信号表示方法:原理、方法及应用
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-02 DOI: 10.1016/j.dsp.2025.105868
Weiwei Shang , Yong Guo , Lidong Yang
In the presence of impulse noise modeled by the α-stable distribution, conventional noise suppression methods inevitably introduce cross-terms when processing multi-component signal, leading to significant deviations in subsequent signal representation and parameter estimation. To effectively address this issue, this paper develops an impulsive noise suppression technique based on K-medoids cluster (KMC), and proposes two representation methods for multi-component linear frequency modulation (LFM) signal under impulse noise. Firstly, the reason for cross-terms introduction is analyzed from the mathematical perspective, and subsequently a KMC-based impulsive noise suppression technology is developed. Secondly, KMC-fractional Fourier transform (KMC-FRFT) and KMC-synchrosqueezing transform (KMC-SST) are proposed, enabling precise characterization of multi-component LFM signal in the fractional domain and time-frequency domain, respectively. Finally, KMC-FRFT is applied to the parameter estimation of multi-component LFM signal under impulsive noise. Simulation experiments demonstrate that, from fractional domain and time-frequency domain, KMC not only suppresses high-amplitude burst impulsive noise, but also completely resolves the cross-terms problem inherent in existing methods. On this basis, under impulsive noise, KMC-FRFT and KMC-SST effectively capture the fractional spectral characteristic and time-frequency distribution characteristic of multi-component LFM signal from complementary perspectives. For both simulated and measured impulsive noise, RMSE demonstrates that KMC-FRFT can accurately estimate the parameters of weak component signal when GSNR  ≥  6dB, addressing the issue of incorrect parameter estimation caused by the cross-terms interference.
在α-稳定分布建模的脉冲噪声存在的情况下,传统的噪声抑制方法在处理多分量信号时不可避免地引入交叉项,导致后续的信号表示和参数估计出现较大偏差。为了有效地解决这一问题,本文发展了一种基于k -媒质聚类(KMC)的脉冲噪声抑制技术,并提出了两种多分量线性调频(LFM)信号在脉冲噪声下的表示方法。首先从数学的角度分析了交叉项引入的原因,然后提出了一种基于kmc的脉冲噪声抑制技术。其次,提出了kmc -分数阶傅里叶变换(KMC-FRFT)和kmc -同步压缩变换(KMC-SST),分别在分数域和时频域对多分量LFM信号进行精确表征。最后,将KMC-FRFT应用于脉冲噪声下多分量LFM信号的参数估计。仿真实验表明,从分数域和时频域两方面来看,KMC不仅能够抑制高幅值突发脉冲噪声,而且完全解决了现有方法固有的交叉项问题。在此基础上,在脉冲噪声下,KMC-FRFT和KMC-SST从互补的角度有效捕获了多分量LFM信号的分数阶谱特征和时频分布特征。对于模拟和测量的脉冲噪声,RMSE均表明,当GSNR ≥ 6dB时,KMC-FRFT可以准确估计弱分量信号的参数,解决了交叉项干扰导致的参数估计错误的问题。
{"title":"Multi-component LFM signal representation method under impulsive noise: Principle, method and application","authors":"Weiwei Shang ,&nbsp;Yong Guo ,&nbsp;Lidong Yang","doi":"10.1016/j.dsp.2025.105868","DOIUrl":"10.1016/j.dsp.2025.105868","url":null,"abstract":"<div><div>In the presence of impulse noise modeled by the <em>α</em>-stable distribution, conventional noise suppression methods inevitably introduce cross-terms when processing multi-component signal, leading to significant deviations in subsequent signal representation and parameter estimation. To effectively address this issue, this paper develops an impulsive noise suppression technique based on K-medoids cluster (KMC), and proposes two representation methods for multi-component linear frequency modulation (LFM) signal under impulse noise. Firstly, the reason for cross-terms introduction is analyzed from the mathematical perspective, and subsequently a KMC-based impulsive noise suppression technology is developed. Secondly, KMC-fractional Fourier transform (KMC-FRFT) and KMC-synchrosqueezing transform (KMC-SST) are proposed, enabling precise characterization of multi-component LFM signal in the fractional domain and time-frequency domain, respectively. Finally, KMC-FRFT is applied to the parameter estimation of multi-component LFM signal under impulsive noise. Simulation experiments demonstrate that, from fractional domain and time-frequency domain, KMC not only suppresses high-amplitude burst impulsive noise, but also completely resolves the cross-terms problem inherent in existing methods. On this basis, under impulsive noise, KMC-FRFT and KMC-SST effectively capture the fractional spectral characteristic and time-frequency distribution characteristic of multi-component LFM signal from complementary perspectives. For both simulated and measured impulsive noise, RMSE demonstrates that KMC-FRFT can accurately estimate the parameters of weak component signal when GSNR  ≥  6dB, addressing the issue of incorrect parameter estimation caused by the cross-terms interference.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105868"},"PeriodicalIF":3.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved reinforcement learning-based joint decision-making of detection modes and transmit power for LPI radar 基于改进强化学习的LPI雷达探测模式与发射功率联合决策
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-02 DOI: 10.1016/j.dsp.2025.105872
Huilong Tang , Wei Wang , Zhiwei Pu , Jianlin Wei , Wang Zhang
Modern airborne radar reconnaissance systems employ range-divided multi-stage operations (e.g., passive detection, active detection, and active identification). However, traditional low probability of intercept (LPI) radar designs focus on optimizing performance for individual reconnaissance stages, resulting in suboptimal overall detection capability. Meanwhile, multi-stage operations yield excessive invalid and suboptimal actions, creating action space redundancy that deteriorates learning efficiency. This paper proposes a reinforcement learning (RL)-based joint decision-making method for enhanced detection performance, incorporating improved RL exploration mechanisms to accelerate learning. Firstly, adversarial strategies from each stage are integrated to construct a joint decision-making framework for detection modes and transmit power (JD-DMTP). Based on this framework, the RL elements are designed to enhance detection performance under LPI constraints. Secondly, we propose the trainable suboptimal action mask (TSAM), equipped with suboptimal action elimination criteria, to filter out both invalid and suboptimal actions, thereby improving learning efficiency. Finally, the experimental results validate the effectiveness of the JD-DMTP, showing 6.46×/4.04× higher hit value ratio and 1.52×/1.32× better successful decision-making rate (ideal/non-ideal environment) compared to the minimum-transmit-power baseline. The TSAM achieves comparable performance to the trainable action mask (TAM) baseline with only 25% of the required training iterations.
现代机载雷达侦察系统采用距离分割多阶段操作(例如,被动探测、主动探测和主动识别)。然而,传统的低截获概率(LPI)雷达设计侧重于优化单个侦察阶段的性能,导致整体探测能力不理想。同时,多阶段操作会产生过多无效和次优动作,造成动作空间冗余,降低学习效率。本文提出了一种基于强化学习(RL)的联合决策方法来提高检测性能,并结合改进的RL探索机制来加速学习。首先,整合各阶段的对抗策略,构建探测模式和发射功率联合决策框架(JD-DMTP)。基于该框架,设计了RL元素,以提高LPI约束下的检测性能。其次,我们提出了可训练次优动作掩模(TSAM),该掩模具有次优动作消除准则,可以过滤掉无效动作和次优动作,从而提高学习效率。最后,实验结果验证了JD-DMTP的有效性,与最小发射功率基线相比,在理想/非理想环境下,JD-DMTP的命中率提高了6.46×/4.04×,成功决策率提高了1.52×/1.32×。TSAM只需要25%的训练迭代就可以达到与可训练动作掩码(TAM)基线相当的性能。
{"title":"Improved reinforcement learning-based joint decision-making of detection modes and transmit power for LPI radar","authors":"Huilong Tang ,&nbsp;Wei Wang ,&nbsp;Zhiwei Pu ,&nbsp;Jianlin Wei ,&nbsp;Wang Zhang","doi":"10.1016/j.dsp.2025.105872","DOIUrl":"10.1016/j.dsp.2025.105872","url":null,"abstract":"<div><div>Modern airborne radar reconnaissance systems employ range-divided multi-stage operations (e.g., passive detection, active detection, and active identification). However, traditional low probability of intercept (LPI) radar designs focus on optimizing performance for individual reconnaissance stages, resulting in suboptimal overall detection capability. Meanwhile, multi-stage operations yield excessive invalid and suboptimal actions, creating action space redundancy that deteriorates learning efficiency. This paper proposes a reinforcement learning (RL)-based joint decision-making method for enhanced detection performance, incorporating improved RL exploration mechanisms to accelerate learning. Firstly, adversarial strategies from each stage are integrated to construct a joint decision-making framework for detection modes and transmit power (JD-DMTP). Based on this framework, the RL elements are designed to enhance detection performance under LPI constraints. Secondly, we propose the trainable suboptimal action mask (TSAM), equipped with suboptimal action elimination criteria, to filter out both invalid and suboptimal actions, thereby improving learning efficiency. Finally, the experimental results validate the effectiveness of the JD-DMTP, showing 6.46×/4.04× higher hit value ratio and 1.52×/1.32× better successful decision-making rate (ideal/non-ideal environment) compared to the minimum-transmit-power baseline. The TSAM achieves comparable performance to the trainable action mask (TAM) baseline with only 25% of the required training iterations.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105872"},"PeriodicalIF":3.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLME -Net: A high-accuracy model for surgical instrument detection via multi-level MixEnhance network MLME -Net:通过多级MixEnhance网络进行手术器械检测的高精度模型
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-02 DOI: 10.1016/j.dsp.2025.105857
Haikun Chen , Shuwan Pan , Qin Ye , Yuanda Lin , Lixin Zheng
Accurate identification and tracking of surgical instruments are critical for computer-assisted minimally invasive surgery. To improve the detection accuracy of surgical instruments, we propose a Multi-Level MixEnhance Network (MLME-Net), whose core component is a novel Multi-branch Multi-Level MixEnhance (M2LME) module. The M2LME module employs a multi-level attention-guided architecture for weight redistribution, specifically designed to strengthen discriminative feature extractive capabilities for fine-grained through multi-level feature integration. To further enhance performance, MLME-Net integrates two critical components: the Multi-Order Gated Aggregation Block (MOGAB) for cross-complexity feature interaction through gating mechanisms, and the Coordinate Attention (CA) module for accurate instrument localization in complex surgical environments. Additionally, we address class imbalance among surgical instruments by introducing Adaptive Threshold Focal Loss (ATFL), which dynamically adjusts loss weights through an adaptive mechanism. Experimental results demonstrate that MLME-Net achieves a mean Average Precision at 50% IoU (mAP50) of 94.9% on the m2cai16-tool-locations dataset, outperforming the baseline by 1.1%. Notably, detection accuracy of the Grasper and Irrigator classes has improved by 3.3% and 2.6%, respectively.
准确识别和跟踪手术器械是计算机辅助微创手术的关键。为了提高手术器械的检测精度,我们提出了一种多层次混合增强网络(MLME-Net),其核心组件是一种新型的多分支多层次混合增强(M2LME)模块。M2LME模块采用多级注意引导架构进行权重再分配,通过多级特征集成,增强细粒度的判别特征提取能力。为了进一步提高性能,MLME-Net集成了两个关键组件:通过门控机制进行跨复杂性特征交互的多阶门控聚合块(MOGAB),以及在复杂手术环境中精确定位仪器的协调注意(CA)模块。此外,我们通过引入自适应阈值焦点损失(ATFL)来解决手术器械之间的类别不平衡,该功能通过自适应机制动态调整损失权重。实验结果表明,在m2cai16-tool-locations数据集上,MLME-Net在50% IoU (mAP50)下的平均精度为94.9%,比基线高1.1%。值得注意的是,“抓草者”和“灌溉者”的检测准确率分别提高了3.3%和2.6%。
{"title":"MLME -Net: A high-accuracy model for surgical instrument detection via multi-level MixEnhance network","authors":"Haikun Chen ,&nbsp;Shuwan Pan ,&nbsp;Qin Ye ,&nbsp;Yuanda Lin ,&nbsp;Lixin Zheng","doi":"10.1016/j.dsp.2025.105857","DOIUrl":"10.1016/j.dsp.2025.105857","url":null,"abstract":"<div><div>Accurate identification and tracking of surgical instruments are critical for computer-assisted minimally invasive surgery. To improve the detection accuracy of surgical instruments, we propose a Multi-Level MixEnhance Network (MLME-Net), whose core component is a novel Multi-branch Multi-Level MixEnhance (M<sup>2</sup>LME) module. The M<sup>2</sup>LME module employs a multi-level attention-guided architecture for weight redistribution, specifically designed to strengthen discriminative feature extractive capabilities for fine-grained through multi-level feature integration. To further enhance performance, MLME-Net integrates two critical components: the Multi-Order Gated Aggregation Block (MOGAB) for cross-complexity feature interaction through gating mechanisms, and the Coordinate Attention (CA) module for accurate instrument localization in complex surgical environments. Additionally, we address class imbalance among surgical instruments by introducing Adaptive Threshold Focal Loss (ATFL), which dynamically adjusts loss weights through an adaptive mechanism. Experimental results demonstrate that MLME-Net achieves a mean Average Precision at 50% IoU (mAP50) of 94.9% on the m2cai16-tool-locations dataset, outperforming the baseline by 1.1%. Notably, detection accuracy of the Grasper and Irrigator classes has improved by 3.3% and 2.6%, respectively.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105857"},"PeriodicalIF":3.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEFusion: Dynamic parameter tuning for infrared-visible image fusion in day-night alternating environments 融合:昼夜交替环境下红外-可见光图像融合的动态参数调整
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-02 DOI: 10.1016/j.dsp.2025.105874
Yaochen Liu, Mingyue Han, Jianwei Fan
Infrared and visible image fusion aims to generate a fused image with rich texture detail information around the clock. However, existing fusion methods adopt fixed fusion to integrate features from different modalities, making them difficult to adapt to drastic illumination variations in day-night alternating scenes. To address this challenge, this paper proposes a dynamic parameter tuning for infrared and visible image fusion (DEFusion), which can flexibly adjust network parameters based on the differences in information of input images, thus effectively adapting to the complex characteristics of alternating day-night scenes. Specifically, DEFusion designs dynamic parameter tuning sub-networks that dynamically adjust the contribution of features from different modalities based on the feature information of the input image. Meanwhile, each layer of the network is equipped with an infrared and visible dual-information extraction module and a bidirectional cross-modal enhancement module. The former is responsible for preserving the unique features of unimodal images, while the latter achieves feature complementation and enhancement between modalities by performing bidirectional cross-modal interactions in parallel. In addition, the network introduces a dynamic selection algorithm, which adaptively adjusts the propagation weights of each module by sensing scene changes in real-time, so as to construct the optimal fusion path that fits the current day-night scene characteristics. On the public MSRS and TNO datasets, this method achieves maximum improvements of 59.9 % and 68.0 % in the Average Gradient (AG) metric, and 32.3 % and 37.4 % in the Spatial Frequency (SF) metric, respectively. Both qualitative and quantitative evaluations demonstrate that our model exhibits strong robustness in alternating day-night scenes.
红外图像与可见光图像融合的目的是全天候生成具有丰富纹理细节信息的融合图像。然而,现有的融合方法采用固定融合来整合不同模态的特征,难以适应昼夜交替场景中剧烈的光照变化。针对这一挑战,本文提出了一种红外与可见光图像融合(DEFusion)的动态参数调整方法,该方法可以根据输入图像信息的差异灵活调整网络参数,从而有效适应昼夜交替场景的复杂特征。具体来说,DEFusion设计了动态参数调整子网络,根据输入图像的特征信息动态调整来自不同模态的特征的贡献。同时,网络的每一层都配备了红外和可见光双信息提取模块和双向跨模态增强模块。前者负责保持单模态图像的独特特征,后者通过并行进行双向跨模态交互,实现模态之间的特征互补和增强。此外,该网络还引入了动态选择算法,通过实时感知场景变化,自适应调整各模块的传播权重,构建最优融合路径,以适应当前昼夜场景特征。在公开的MSRS和TNO数据集上,该方法在平均梯度(AG)度量上的最大改进率分别为59.9 %和68.0 %,在空间频率(SF)度量上的最大改进率分别为32.3 %和37.4 %。定性和定量评估表明,我们的模型在昼夜交替的场景中表现出很强的鲁棒性。
{"title":"DEFusion: Dynamic parameter tuning for infrared-visible image fusion in day-night alternating environments","authors":"Yaochen Liu,&nbsp;Mingyue Han,&nbsp;Jianwei Fan","doi":"10.1016/j.dsp.2025.105874","DOIUrl":"10.1016/j.dsp.2025.105874","url":null,"abstract":"<div><div>Infrared and visible image fusion aims to generate a fused image with rich texture detail information around the clock. However, existing fusion methods adopt fixed fusion to integrate features from different modalities, making them difficult to adapt to drastic illumination variations in day-night alternating scenes. To address this challenge, this paper proposes a dynamic parameter tuning for infrared and visible image fusion (DEFusion), which can flexibly adjust network parameters based on the differences in information of input images, thus effectively adapting to the complex characteristics of alternating day-night scenes. Specifically, DEFusion designs dynamic parameter tuning sub-networks that dynamically adjust the contribution of features from different modalities based on the feature information of the input image. Meanwhile, each layer of the network is equipped with an infrared and visible dual-information extraction module and a bidirectional cross-modal enhancement module. The former is responsible for preserving the unique features of unimodal images, while the latter achieves feature complementation and enhancement between modalities by performing bidirectional cross-modal interactions in parallel. In addition, the network introduces a dynamic selection algorithm, which adaptively adjusts the propagation weights of each module by sensing scene changes in real-time, so as to construct the optimal fusion path that fits the current day-night scene characteristics. On the public MSRS and TNO datasets, this method achieves maximum improvements of 59.9 % and 68.0 % in the Average Gradient (AG) metric, and 32.3 % and 37.4 % in the Spatial Frequency (SF) metric, respectively. Both qualitative and quantitative evaluations demonstrate that our model exhibits strong robustness in alternating day-night scenes.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105874"},"PeriodicalIF":3.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic detection of multiscale defects in selective laser melting prepared 3D lattice structures: A model with improved attention mechanism 选择性激光熔化制备三维晶格结构多尺度缺陷的自动检测:一种改进注意机制的模型
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-02 DOI: 10.1016/j.dsp.2025.105877
Yintang Wen , Shengli Xue , Yankai Feng , Yuyan Zhang
The 3D printing process is becoming more and more developed and lattice structure applications are more common, the products of the combination are seeing success in many fields. Selective Laser Melting (SLM) as an additive manufacturing technology, the defects in the process of generation have the characteristics of multiscale, high randomness, and multi-genre. This study presents a new model, combined with target detection to achieve highly accurate detection of lattice structure defects in 3D printing. Firstly, a new attention mechanism module, Transformer Bottleneck Attention Module (TBAM) is proposed to process the information by combining the multi-head self-attention mechanism and the channel attention module, which effectively solves the problem of difficult to recognize the multi-scale information of the defects of lattice structure. Secondly, the Custom Spatial-Channel Down-sampling (C-SCDown) module is proposed to improve the performance of processing defect location information and channel information through down sampling and residual linkage, and enhance the ability of the network to adaptively perform feature extraction on different image regions or channels. Finally, the Attentional Scale Sequence Fusion Head (ASF-Head) framework is introduced to improve the segmentation accuracy and segmentation speed of the model to enhance the detection performance. This paper names the model, TCA-YOLO, is named for its featured modules: TBAM, C-SCDown, and ASF-head. The present model realizes the detection of geometric distortion, geometric fracture and general types of defects, and achieves an average accuracy of 98.1% for 3D printed lattice structure defects, proving the effectiveness of the detection of the present model.
随着3D打印技术的日益发达和晶格结构的应用越来越普遍,其结合的产品在许多领域都取得了成功。选择性激光熔化(SLM)作为一种增材制造技术,其缺陷产生过程具有多尺度、高随机性和多类型的特点。本研究提出了一种新的模型,结合目标检测实现了3D打印中晶格结构缺陷的高精度检测。首先,提出了一种新的注意机制模块——变压器瓶颈注意模块(tham),将多头自注意机制与通道注意模块相结合,对缺陷信息进行处理,有效解决了晶格结构缺陷多尺度信息难以识别的问题;其次,提出自定义空间信道下采样(C-SCDown)模块,通过下采样和残差联动,提高网络对缺陷位置信息和信道信息的处理性能,增强网络对不同图像区域或信道的自适应特征提取能力。最后,引入注意尺度序列融合头(attention Scale Sequence Fusion Head, ASF-Head)框架,提高模型的分割精度和分割速度,提高检测性能。本文将该模型命名为TCA-YOLO,以其特征模块TBAM、C-SCDown和ASF-head命名。本模型实现了几何畸变、几何断裂和一般类型缺陷的检测,3D打印点阵结构缺陷的平均检测精度达到98.1%,证明了本模型检测的有效性。
{"title":"Automatic detection of multiscale defects in selective laser melting prepared 3D lattice structures: A model with improved attention mechanism","authors":"Yintang Wen ,&nbsp;Shengli Xue ,&nbsp;Yankai Feng ,&nbsp;Yuyan Zhang","doi":"10.1016/j.dsp.2025.105877","DOIUrl":"10.1016/j.dsp.2025.105877","url":null,"abstract":"<div><div>The 3D printing process is becoming more and more developed and lattice structure applications are more common, the products of the combination are seeing success in many fields. Selective Laser Melting (SLM) as an additive manufacturing technology, the defects in the process of generation have the characteristics of multiscale, high randomness, and multi-genre. This study presents a new model, combined with target detection to achieve highly accurate detection of lattice structure defects in 3D printing. Firstly, a new attention mechanism module, Transformer Bottleneck Attention Module (TBAM) is proposed to process the information by combining the multi-head self-attention mechanism and the channel attention module, which effectively solves the problem of difficult to recognize the multi-scale information of the defects of lattice structure. Secondly, the Custom Spatial-Channel Down-sampling (C-SCDown) module is proposed to improve the performance of processing defect location information and channel information through down sampling and residual linkage, and enhance the ability of the network to adaptively perform feature extraction on different image regions or channels. Finally, the Attentional Scale Sequence Fusion Head (ASF-Head) framework is introduced to improve the segmentation accuracy and segmentation speed of the model to enhance the detection performance. This paper names the model, TCA-YOLO, is named for its featured modules: TBAM, C-SCDown, and ASF-head. The present model realizes the detection of geometric distortion, geometric fracture and general types of defects, and achieves an average accuracy of 98.1% for 3D printed lattice structure defects, proving the effectiveness of the detection of the present model.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105877"},"PeriodicalIF":3.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tunable polarization detection in nonzero-mean environment: Theoretical derivation and performance analysis 非零均值环境下的可调谐偏振检测:理论推导与性能分析
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-02 DOI: 10.1016/j.dsp.2025.105864
Haoqi Wu, Hongzhi Guo, Zhihang Wang, Zishu He
This paper addresses polarimetric adaptive detection of targets embedded in the nonzero-mean Gaussian environment with unknown mean vector (MV) and covariance matrix (CM). By adopting the generalized likelihood ratio test (GLRT) criterion, we derive two nonzero-mean polarimetric detectors, and then design a nonzero-mean tunable detector that includes the one-step and two-step GLRT tests. The proposed detectors are assessed from the aspects of the probability of detection (Pd) in both non-fluctuating and fluctuating target models and the probability of false alarm (Pfa). We derive the theoretical expressions for Pfa and Pd, which demonstrate that the proposed detectors achieve the constant false alarm rate (CFAR) property w.r.t. the MV and CM. In simulation experiments, we exploit the theoretical values and numerical simulation results to indicate the improvement of developed detectors in adaptive polarimetric detection. The results demonstrate the theoretical analyses on Pfa and Pd, verifying that the developed nonzero-mean polarimetric detectors achieve superior performance in Pd compared to the zero-mean counterparts. Further, it reveals that the proposed tunable detector can adjust the robustness or selectivity to mismatched signals.
本文研究了含有未知均值向量(MV)和协方差矩阵(CM)的非零均值高斯环境中嵌入目标的极化自适应检测。采用广义似然比检验(GLRT)准则,推导出两个非零均值极化检测器,并设计了包含一步和两步GLRT检验的非零均值可调检测器。从非波动和波动目标模型的检测概率(Pd)和虚警概率(Pfa)两方面对所提出的检测器进行了评估。我们推导了Pfa和Pd的理论表达式,证明了所提出的检测器在相对于MV和CM的情况下具有恒定虚警率(CFAR)的特性。在仿真实验中,我们利用理论值和数值模拟结果来说明所开发的探测器在自适应极化检测方面的改进。结果验证了对Pfa和Pd的理论分析,验证了所开发的非零平均极化检测器在Pd方面的性能优于零平均极化检测器。进一步表明,所提出的可调谐检测器可以调整对不匹配信号的鲁棒性或选择性。
{"title":"Tunable polarization detection in nonzero-mean environment: Theoretical derivation and performance analysis","authors":"Haoqi Wu,&nbsp;Hongzhi Guo,&nbsp;Zhihang Wang,&nbsp;Zishu He","doi":"10.1016/j.dsp.2025.105864","DOIUrl":"10.1016/j.dsp.2025.105864","url":null,"abstract":"<div><div>This paper addresses polarimetric adaptive detection of targets embedded in the nonzero-mean Gaussian environment with unknown mean vector (MV) and covariance matrix (CM). By adopting the generalized likelihood ratio test (GLRT) criterion, we derive two nonzero-mean polarimetric detectors, and then design a nonzero-mean tunable detector that includes the one-step and two-step GLRT tests. The proposed detectors are assessed from the aspects of the probability of detection (Pd) in both non-fluctuating and fluctuating target models and the probability of false alarm (Pfa). We derive the theoretical expressions for Pfa and Pd, which demonstrate that the proposed detectors achieve the constant false alarm rate (CFAR) property w.r.t. the MV and CM. In simulation experiments, we exploit the theoretical values and numerical simulation results to indicate the improvement of developed detectors in adaptive polarimetric detection. The results demonstrate the theoretical analyses on Pfa and Pd, verifying that the developed nonzero-mean polarimetric detectors achieve superior performance in Pd compared to the zero-mean counterparts. Further, it reveals that the proposed tunable detector can adjust the robustness or selectivity to mismatched signals.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105864"},"PeriodicalIF":3.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust adaptive beamforming via subspace-based diagonal loading in planar FDA-MIMO radar for main lobe interference suppression 基于子空对角加载的平面FDA-MIMO雷达鲁棒自适应波束形成抑制主瓣干扰
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-01 DOI: 10.1016/j.dsp.2025.105869
Yumei Tan, Yong Li, Wei Cheng, Limeng Dong, Langhuan Geng, Muhammad Moin Akhtar
FDA-MIMO radar provides a powerful framework for joint range-angle beamforming and offers superior main lobe interference suppression. However, its effectiveness is limited by the system’s sensitivity to steering vector mismatches, which can cause substantial SINR degradation and main lobe distortion. Diagonal loading (DL) is commonly employed to enhance robustness, but conventional methods often rely on fixed or heuristic loading levels, limiting adaptability and potentially introducing signal distortion under strong interference. To this end, this paper proposes a robust adaptive subspace-based diagonal loading (ASDL) beamforming method for planar FDA-MIMO radar. The beamformer is constrained to a mismatch-resilient subspace constructed from neighboring steering vectors to capture target location uncertainty. A closed-form solution is derived by minimizing a quadratic cost under a distortionless constraint, with the DL factor adaptively computed via Capon spectral estimation. Furthermore, the subspace dimension is automatically optimized based on the eigenvalue distribution of the sample covariance matrix, enabling a robust trade-off between mismatch tolerance and main lobe fidelity. Simulation results show that ASDL consistently delivers superior SINR performance and main lobe interference mitigation, outperforming conventional robust beamformers under various mismatch conditions.
FDA-MIMO雷达为联合距离-角度波束形成提供了一个强大的框架,并提供了优越的主瓣干扰抑制。然而,它的有效性受到系统对转向矢量不匹配的敏感性的限制,这可能导致严重的信噪比下降和主瓣失真。对角加载(DL)通常用于增强鲁棒性,但传统方法通常依赖于固定或启发式加载水平,限制了适应性,并可能在强干扰下引入信号失真。为此,本文提出了一种基于子空间自适应对角加载(ASDL)的平面FDA-MIMO雷达波束形成方法。波束形成器被约束在一个由相邻导向向量构成的不匹配弹性子空间中,以捕获目标位置的不确定性。通过最小化无失真约束下的二次代价,推导出封闭形式的解,并通过Capon谱估计自适应计算DL因子。此外,子空间维度根据样本协方差矩阵的特征值分布自动优化,实现了错配容忍度和主瓣保真度之间的鲁棒权衡。仿真结果表明,在各种失配条件下,ASDL持续提供优越的信噪比性能和主瓣干扰抑制,优于传统的鲁棒波束形成器。
{"title":"Robust adaptive beamforming via subspace-based diagonal loading in planar FDA-MIMO radar for main lobe interference suppression","authors":"Yumei Tan,&nbsp;Yong Li,&nbsp;Wei Cheng,&nbsp;Limeng Dong,&nbsp;Langhuan Geng,&nbsp;Muhammad Moin Akhtar","doi":"10.1016/j.dsp.2025.105869","DOIUrl":"10.1016/j.dsp.2025.105869","url":null,"abstract":"<div><div>FDA-MIMO radar provides a powerful framework for joint range-angle beamforming and offers superior main lobe interference suppression. However, its effectiveness is limited by the system’s sensitivity to steering vector mismatches, which can cause substantial SINR degradation and main lobe distortion. Diagonal loading (DL) is commonly employed to enhance robustness, but conventional methods often rely on fixed or heuristic loading levels, limiting adaptability and potentially introducing signal distortion under strong interference. To this end, this paper proposes a robust adaptive subspace-based diagonal loading (ASDL) beamforming method for planar FDA-MIMO radar. The beamformer is constrained to a mismatch-resilient subspace constructed from neighboring steering vectors to capture target location uncertainty. A closed-form solution is derived by minimizing a quadratic cost under a distortionless constraint, with the DL factor adaptively computed via Capon spectral estimation. Furthermore, the subspace dimension is automatically optimized based on the eigenvalue distribution of the sample covariance matrix, enabling a robust trade-off between mismatch tolerance and main lobe fidelity. Simulation results show that ASDL consistently delivers superior SINR performance and main lobe interference mitigation, outperforming conventional robust beamformers under various mismatch conditions.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105869"},"PeriodicalIF":3.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting low-rate DDoS attacks using Rényi entropy and trans-KAN hybrid model in SDN 基于rsamnyi熵和跨kan混合模型的SDN低速率DDoS攻击检测
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-31 DOI: 10.1016/j.dsp.2025.105859
Shiyue Na, Lin Chen, Muyu Lin
Software-Defined Networking (SDN) centralizes network control, enhancing management efficiency but increasing vulnerability to Distributed Denial-of-Service (DDoS) attacks. Low-rate DDoS (LDDoS) attacks are particularly challenging to detect, as their sporadic traffic closely mimics legitimate flows. Existing hybrid detection approaches often employ static fusion strategies that fail to adapt to the diverse characteristics of different LDDoS variants. This paper proposes a novel two-stage detection framework that fundamentally advances hybrid detection through adaptive feature fusion. The first stage utilizes Rényi entropy to efficiently filter 98.77% of benign traffic while retaining potential attack signatures. The second stage employs Trans-KAN, an innovative hybrid model that integrates Kolmogorov-Arnold Networks with Transformer architecture via an adaptive gating mechanism that dynamically balances their contributions through learnable weight matrices based on traffic characteristics. On custom SDN datasets, the proposed framework achieves 98.56% detection accuracy, 97.05% precision, 99.12% recall, and a 98.08% F1-Score with only 1.23% false positives, demonstrating improvements of 3.24% in accuracy over standalone Transformer and 5.48% over KAN. The synergistic combination of entropy-based pre-filtering and adaptive deep learning fusion establishes a new paradigm for LDDoS detection, offering theoretical insights into dynamic feature fusion and Kolmogorov-Arnold representations for hybrid deep learning, with practical applicability for next-generation network security systems.
SDN (software defined Networking)是一种网络集中控制技术,提高了网络管理效率,但也增加了受到DDoS (Distributed Denial-of-Service)攻击的脆弱性。低速率DDoS (LDDoS)攻击尤其难以检测,因为它们的零星流量非常接近合法流量。现有的混合检测方法通常采用静态融合策略,无法适应不同LDDoS变体的不同特征。本文提出了一种新的两阶段检测框架,从根本上推进了自适应特征融合混合检测。第一阶段利用rsamnyi熵有效过滤了98.77%的良性流量,同时保留了潜在的攻击特征。第二阶段采用Trans-KAN,这是一种创新的混合模型,通过自适应门通机制将Kolmogorov-Arnold网络与Transformer架构集成在一起,该机制通过基于流量特征的可学习权重矩阵动态平衡它们的贡献。在自定义SDN数据集上,该框架的检测准确率为98.56%,精度为97.05%,召回率为99.12%,F1-Score为98.08%,假阳性率仅为1.23%,比独立Transformer提高了3.24%,比KAN提高了5.48%。基于熵的预滤波和自适应深度学习融合的协同结合为LDDoS检测建立了一个新的范式,为混合深度学习的动态特征融合和Kolmogorov-Arnold表示提供了理论见解,具有下一代网络安全系统的实际适用性。
{"title":"Detecting low-rate DDoS attacks using Rényi entropy and trans-KAN hybrid model in SDN","authors":"Shiyue Na,&nbsp;Lin Chen,&nbsp;Muyu Lin","doi":"10.1016/j.dsp.2025.105859","DOIUrl":"10.1016/j.dsp.2025.105859","url":null,"abstract":"<div><div>Software-Defined Networking (SDN) centralizes network control, enhancing management efficiency but increasing vulnerability to Distributed Denial-of-Service (DDoS) attacks. Low-rate DDoS (LDDoS) attacks are particularly challenging to detect, as their sporadic traffic closely mimics legitimate flows. Existing hybrid detection approaches often employ static fusion strategies that fail to adapt to the diverse characteristics of different LDDoS variants. This paper proposes a novel two-stage detection framework that fundamentally advances hybrid detection through adaptive feature fusion. The first stage utilizes Rényi entropy to efficiently filter 98.77% of benign traffic while retaining potential attack signatures. The second stage employs Trans-KAN, an innovative hybrid model that integrates Kolmogorov-Arnold Networks with Transformer architecture via an adaptive gating mechanism that dynamically balances their contributions through learnable weight matrices based on traffic characteristics. On custom SDN datasets, the proposed framework achieves 98.56% detection accuracy, 97.05% precision, 99.12% recall, and a 98.08% F1-Score with only 1.23% false positives, demonstrating improvements of 3.24% in accuracy over standalone Transformer and 5.48% over KAN. The synergistic combination of entropy-based pre-filtering and adaptive deep learning fusion establishes a new paradigm for LDDoS detection, offering theoretical insights into dynamic feature fusion and Kolmogorov-Arnold representations for hybrid deep learning, with practical applicability for next-generation network security systems.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105859"},"PeriodicalIF":3.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-backbone pixel consistency and layer-wise attention fusion for weakly supervised semantic segmentation 弱监督语义分割的跨主干像素一致性和分层注意融合
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-31 DOI: 10.1016/j.dsp.2025.105832
Mengya Liu, Lei Zhu, Jiahui Cheng
Weakly Supervised Semantic Segmentation (WSSS) reduces labeling costs by simplifying annotation requirements. Most existing studies focus on downstream aspects, such as leveraging features provided by foundation models to generate better Class Activation Maps (CAMs). However, there has been limited exploration of optimization efforts tailored for the foundation model components themselves within the WSSS framework. Recently, hybrid backbone designs combining Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown advantages in vision tasks. However, we observed that architectural feature misalignment and ViT-induced feature over-smoothing represent critical bottlenecks in WSSS performance, further highlighting the insufficient investigation of hybrid architecture designs for WSSS. To mitigate this issue, we propose an approach that enhances the accuracy and consistency of CAMs by aligning pixel-level features in the overlapping regions of cropped images processed by both backbone architectures. In parallel, we introduce a cross-backbone pixel consistency loss, which maximizes the cosine similarity between corresponding pixel features extracted by CNNs and ViTs, further improving the expressiveness of activation maps. To address the over-smoothing problem in ViT, we propose a Deep Layer Attention Modulation (DLAM) module. It dynamically adjusts and optimizes deep attention weights, selectively integrating intermediate-layer semantic information to improve the discriminability of target regions while effectively suppressing noise caused by over-smoothing. Our experiments on the PASCAL VOC and MS COCO 2014 datasets show that our network improves the CNN-ViT hybrid model by addressing feature misalignment and over-smoothing. It achieves competitive WSSS performance, as evidenced by both the high-quality masks generated and the improved segmentation with pseudo labels.
弱监督语义分割(WSSS)通过简化标注需求来降低标注成本。大多数现有的研究集中在下游方面,比如利用基础模型提供的特性来生成更好的类激活图(Class Activation Maps, CAMs)。然而,对于在WSSS框架中为基础模型组件本身量身定制的优化工作的探索有限。近年来,结合卷积神经网络(cnn)和视觉变压器(ViTs)的混合主干设计在视觉任务中显示出优势。然而,我们观察到,架构特征错位和viti诱导的特征过度平滑是WSSS性能的关键瓶颈,进一步凸显了对WSSS混合架构设计的研究不足。为了缓解这一问题,我们提出了一种方法,通过在两个骨干架构处理的裁剪图像的重叠区域中对齐像素级特征来提高cam的准确性和一致性。同时,我们引入了跨主干像素一致性损失,使cnn和ViTs提取的对应像素特征之间的余弦相似度最大化,进一步提高了激活图的表达能力。为了解决ViT中的过平滑问题,我们提出了一种深层注意调制(DLAM)模块。动态调整和优化深度注意权值,选择性地整合中间层语义信息,提高目标区域的可分辨性,同时有效抑制过度平滑带来的噪声。我们在PASCAL VOC和MS COCO 2014数据集上的实验表明,我们的网络通过解决特征不对齐和过度平滑来改进CNN-ViT混合模型。通过生成高质量的掩码和改进的伪标签分割,该方法实现了具有竞争力的WSSS性能。
{"title":"Cross-backbone pixel consistency and layer-wise attention fusion for weakly supervised semantic segmentation","authors":"Mengya Liu,&nbsp;Lei Zhu,&nbsp;Jiahui Cheng","doi":"10.1016/j.dsp.2025.105832","DOIUrl":"10.1016/j.dsp.2025.105832","url":null,"abstract":"<div><div>Weakly Supervised Semantic Segmentation (WSSS) reduces labeling costs by simplifying annotation requirements. Most existing studies focus on downstream aspects, such as leveraging features provided by foundation models to generate better Class Activation Maps (CAMs). However, there has been limited exploration of optimization efforts tailored for the foundation model components themselves within the WSSS framework. Recently, hybrid backbone designs combining Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown advantages in vision tasks. However, we observed that architectural feature misalignment and ViT-induced feature over-smoothing represent critical bottlenecks in WSSS performance, further highlighting the insufficient investigation of hybrid architecture designs for WSSS. To mitigate this issue, we propose an approach that enhances the accuracy and consistency of CAMs by aligning pixel-level features in the overlapping regions of cropped images processed by both backbone architectures. In parallel, we introduce a cross-backbone pixel consistency loss, which maximizes the cosine similarity between corresponding pixel features extracted by CNNs and ViTs, further improving the expressiveness of activation maps. To address the over-smoothing problem in ViT, we propose a Deep Layer Attention Modulation (DLAM) module. It dynamically adjusts and optimizes deep attention weights, selectively integrating intermediate-layer semantic information to improve the discriminability of target regions while effectively suppressing noise caused by over-smoothing. Our experiments on the PASCAL VOC and MS COCO 2014 datasets show that our network improves the CNN-ViT hybrid model by addressing feature misalignment and over-smoothing. It achieves competitive WSSS performance, as evidenced by both the high-quality masks generated and the improved segmentation with pseudo labels.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"171 ","pages":"Article 105832"},"PeriodicalIF":3.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1