首页 > 最新文献

Digital Signal Processing最新文献

英文 中文
Hybrid transfer semantic segmentation architecture for hyperspectral image classification 高光谱图像分类的混合传递语义分割体系结构
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-22 DOI: 10.1016/j.dsp.2025.105852
Huaiping Yan , Yupeng Hou , Chengcai Leng , Yilin Li , Yang Li
Hyperspectral image (HSI) classification is a research hotspot in the field of remote sensing image processing. Deep learning-based methods have gradually become one of the mainstream in the field of HSI classification. However, deep learning-based HSI classification methods still face the challenge of insufficient training samples. Transfer learning is regarded as an effective method to alleviate the problem of insufficient samples. However, hyperspectral image data is scarce, lacking the foundation for pre-training high-quality models. In this paper, a Hybrid Transfer Semantic Segmentation Architecture (HTSSA) is proposed, which transfers knowledge from different datasets by adopting different network structures. The proposed model adopts a triple branch network architecture. The three branches respectively use the vision transformer (ViT) classification model pre-trained on ImageNet, the Deeplabv3 semantic segmentation model pre-trained on the PASCAL VOC 2012 dataset, and the convolutional neural network (CNN) model pre-trained on the source hyperspectral image dataset. The three branch network models were fine-tuned on the target hyperspectral image dataset. The mapping modules were designed to handle the problem of heterogeneous data migration. The ViT branch utilizes the Transformer to extract spatial global context features. The Deeplabv3 branch utilizes the feature pyramid to extract spatial local multi-scale features. The CNN branch uses 3D-CNN to extract the spectral features of hyperspectral images. Finally, the final classification result is obtained by using the fusion features of the three branches. Extensive experiments on public datasets have verified that the Hybrid Transfer Semantic Segmentation Architecture proposed in this paper has alleviated the negative impact of sample scarcity to a certain extent, enhanced the representation ability of the model, and improved the final classification performance.
高光谱图像分类是遥感图像处理领域的一个研究热点。基于深度学习的方法已逐渐成为HSI分类领域的主流方法之一。然而,基于深度学习的HSI分类方法仍然面临训练样本不足的挑战。迁移学习被认为是缓解样本不足问题的有效方法。然而,高光谱图像数据稀缺,缺乏预训练高质量模型的基础。本文提出了一种混合传输语义分割架构(HTSSA),该架构采用不同的网络结构对不同数据集的知识进行传输。该模型采用三分支网络结构。这三个分支分别使用在ImageNet上预训练的视觉变换(vision transformer, ViT)分类模型、在PASCAL VOC 2012数据集上预训练的Deeplabv3语义分割模型和在源高光谱图像数据集上预训练的卷积神经网络(convolutional neural network, CNN)模型。在目标高光谱图像数据集上对三种分支网络模型进行微调。映射模块的设计是为了解决异构数据迁移问题。ViT分支利用Transformer提取空间全局上下文特征。Deeplabv3分支利用特征金字塔提取空间局部多尺度特征。CNN分支使用3D-CNN提取高光谱图像的光谱特征。最后,利用三个分支的融合特征得到最终的分类结果。在公共数据集上的大量实验验证了本文提出的混合迁移语义分割架构在一定程度上缓解了样本稀缺性的负面影响,增强了模型的表示能力,提高了最终的分类性能。
{"title":"Hybrid transfer semantic segmentation architecture for hyperspectral image classification","authors":"Huaiping Yan ,&nbsp;Yupeng Hou ,&nbsp;Chengcai Leng ,&nbsp;Yilin Li ,&nbsp;Yang Li","doi":"10.1016/j.dsp.2025.105852","DOIUrl":"10.1016/j.dsp.2025.105852","url":null,"abstract":"<div><div>Hyperspectral image (HSI) classification is a research hotspot in the field of remote sensing image processing. Deep learning-based methods have gradually become one of the mainstream in the field of HSI classification. However, deep learning-based HSI classification methods still face the challenge of insufficient training samples. Transfer learning is regarded as an effective method to alleviate the problem of insufficient samples. However, hyperspectral image data is scarce, lacking the foundation for pre-training high-quality models. In this paper, a Hybrid Transfer Semantic Segmentation Architecture (HTSSA) is proposed, which transfers knowledge from different datasets by adopting different network structures. The proposed model adopts a triple branch network architecture. The three branches respectively use the vision transformer (ViT) classification model pre-trained on ImageNet, the Deeplabv3 semantic segmentation model pre-trained on the PASCAL VOC 2012 dataset, and the convolutional neural network (CNN) model pre-trained on the source hyperspectral image dataset. The three branch network models were fine-tuned on the target hyperspectral image dataset. The mapping modules were designed to handle the problem of heterogeneous data migration. The ViT branch utilizes the Transformer to extract spatial global context features. The Deeplabv3 branch utilizes the feature pyramid to extract spatial local multi-scale features. The CNN branch uses 3D-CNN to extract the spectral features of hyperspectral images. Finally, the final classification result is obtained by using the fusion features of the three branches. Extensive experiments on public datasets have verified that the Hybrid Transfer Semantic Segmentation Architecture proposed in this paper has alleviated the negative impact of sample scarcity to a certain extent, enhanced the representation ability of the model, and improved the final classification performance.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105852"},"PeriodicalIF":3.0,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FETrack: One-stream framework-based feature enhancement for object tracking FETrack:基于单流框架的目标跟踪功能增强
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-22 DOI: 10.1016/j.dsp.2026.105935
Yue Chen , Huiying Xu , Xinzhong Zhu , Xuedong He , Hongbo Li , Yi Li
Vision Transformer (ViT)-based one-stream architectures have emerged as the dominant framework for object tracking. However, their performance is hampered by similar object interference and background distractions. To address these limitations, this paper proposes FETrack, a one-stream tracker designed to enhance feature discriminability for improved object tracking. The core innovations of FETrack are as follows: 1) Global Enhancement (GE) and Cross-Depth Template Fusion (CDTF) modules, where the GE module adopts a novel global feature extraction mechanism to suppress background interference, and the CDTF module ensures efficient propagation of contextual information via cross-depth template fusion. 2) An unsupervised hard sample learning strategy, which introduces contrastive learning and treats each candidate token as an independent instance by leveraging its inherent hard sample properties, thereby enhancing feature discriminability. 3) A distillation-based fine-tuning approach that guides parameter optimization for the entire backbone network through feature distillation, enabling efficient tuning of newly integrated modules and ensuring their synergy with the original architecture. Experimental results on six benchmark datasets demonstrate the effectiveness of FETrack and confirm its state-of-the-art performance. Furthermore, the transferability of the proposed approaches for enhancing other one-stream trackers is validated.
基于视觉转换器(Vision Transformer, ViT)的单流架构已经成为目标跟踪的主流框架。然而,它们的性能受到类似物体干扰和背景干扰的阻碍。为了解决这些限制,本文提出了FETrack,一种单流跟踪器,旨在增强特征可辨别性以改进目标跟踪。FETrack的核心创新点有:1)全局增强(GE)和跨深度模板融合(CDTF)模块,其中GE模块采用新颖的全局特征提取机制来抑制背景干扰,CDTF模块通过跨深度模板融合确保上下文信息的高效传播。2)无监督硬样本学习策略,引入对比学习,利用其固有的硬样本属性将每个候选令牌视为一个独立的实例,从而增强特征的可判别性。3)基于蒸馏的微调方法,通过特征蒸馏指导整个骨干网的参数优化,实现新集成模块的高效调优,并保证其与原有架构的协同。在六个基准数据集上的实验结果证明了FETrack的有效性,并验证了其最先进的性能。此外,还验证了所提方法对其他单流跟踪器的可移植性。
{"title":"FETrack: One-stream framework-based feature enhancement for object tracking","authors":"Yue Chen ,&nbsp;Huiying Xu ,&nbsp;Xinzhong Zhu ,&nbsp;Xuedong He ,&nbsp;Hongbo Li ,&nbsp;Yi Li","doi":"10.1016/j.dsp.2026.105935","DOIUrl":"10.1016/j.dsp.2026.105935","url":null,"abstract":"<div><div>Vision Transformer (ViT)-based one-stream architectures have emerged as the dominant framework for object tracking. However, their performance is hampered by similar object interference and background distractions. To address these limitations, this paper proposes FETrack, a one-stream tracker designed to enhance feature discriminability for improved object tracking. The core innovations of FETrack are as follows: 1) Global Enhancement (GE) and Cross-Depth Template Fusion (CDTF) modules, where the GE module adopts a novel global feature extraction mechanism to suppress background interference, and the CDTF module ensures efficient propagation of contextual information via cross-depth template fusion. 2) An unsupervised hard sample learning strategy, which introduces contrastive learning and treats each candidate token as an independent instance by leveraging its inherent hard sample properties, thereby enhancing feature discriminability. 3) A distillation-based fine-tuning approach that guides parameter optimization for the entire backbone network through feature distillation, enabling efficient tuning of newly integrated modules and ensuring their synergy with the original architecture. Experimental results on six benchmark datasets demonstrate the effectiveness of FETrack and confirm its state-of-the-art performance. Furthermore, the transferability of the proposed approaches for enhancing other one-stream trackers is validated.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105935"},"PeriodicalIF":3.0,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiparameter estimation for bistatic EMVS-FDA-MIMO radar with arbitrarily configured arrays 任意阵列双基地EMVS-FDA-MIMO雷达的多参数估计
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-22 DOI: 10.1016/j.dsp.2026.105928
Huihui Ma , Haihong Tao , Yaxing Yue , Tiantian Zhong , Yunfei Fang , Le Wang
This study explores the multiparameter estimation challenge within bistatic frequency diverse array multiple-input-multiple-output (FDA-MIMO) radar system that employs arbitrarily configured electromagnetic vector sensor (EMVS) arrays. The signal reception model for the presented radar architecture is established. Building on this foundation, a subspace-based algorithm is proposed to achieve accurate estimation of spatial-polarization angles and ranges. First, rotation invariant structures in spatial domain are formed by constructing several virtual steering matrices, from which the normalized electromagnetic field vectors are derived. Then the two-dimensional direction-of-departure (2D-DOD) and two-dimensional direction-of-arrival (2D-DOA) estimates are computed through vector cross-product operation. Thereafter, polarization angles are determined using least squares (LS) approach. Finally, by compensating the steering matrix with the obtained 2D-DOD, the range estimation can be achieved. Furthermore, the developed framework is evaluated for its identifiability, flexibility, computational demands, and Cramér-Rao bound (CRB). It successfully estimates the targets’ spatial-polarization angles and ranges, while also achieving automatic parameters pairing. Simulation results demonstrate the validity of the developed approach.
本研究探讨了采用任意配置电磁矢量传感器(EMVS)阵列的双基地分频阵列多输入多输出(FDA-MIMO)雷达系统中的多参数估计挑战。建立了该雷达结构的信号接收模型。在此基础上,提出了一种基于子空间的算法来实现空间偏振角和距离的精确估计。首先,通过构造若干虚拟转向矩阵形成空间旋转不变结构,并由此导出归一化电磁场矢量;然后通过矢量叉乘运算计算二维出发方向(2D-DOD)和二维到达方向(2D-DOA)估计。然后,利用最小二乘法确定偏振角。最后,利用得到的2D-DOD对转向矩阵进行补偿,实现距离估计。此外,开发的框架评估其可识别性、灵活性、计算需求和cram - rao边界(CRB)。该方法成功地估计了目标的空间极化角度和距离,并实现了参数的自动配对。仿真结果验证了该方法的有效性。
{"title":"Multiparameter estimation for bistatic EMVS-FDA-MIMO radar with arbitrarily configured arrays","authors":"Huihui Ma ,&nbsp;Haihong Tao ,&nbsp;Yaxing Yue ,&nbsp;Tiantian Zhong ,&nbsp;Yunfei Fang ,&nbsp;Le Wang","doi":"10.1016/j.dsp.2026.105928","DOIUrl":"10.1016/j.dsp.2026.105928","url":null,"abstract":"<div><div>This study explores the multiparameter estimation challenge within bistatic frequency diverse array multiple-input-multiple-output (FDA-MIMO) radar system that employs arbitrarily configured electromagnetic vector sensor (EMVS) arrays. The signal reception model for the presented radar architecture is established. Building on this foundation, a subspace-based algorithm is proposed to achieve accurate estimation of spatial-polarization angles and ranges. First, rotation invariant structures in spatial domain are formed by constructing several virtual steering matrices, from which the normalized electromagnetic field vectors are derived. Then the two-dimensional direction-of-departure (2D-DOD) and two-dimensional direction-of-arrival (2D-DOA) estimates are computed through vector cross-product operation. Thereafter, polarization angles are determined using least squares (LS) approach. Finally, by compensating the steering matrix with the obtained 2D-DOD, the range estimation can be achieved. Furthermore, the developed framework is evaluated for its identifiability, flexibility, computational demands, and Cramér-Rao bound (CRB). It successfully estimates the targets’ spatial-polarization angles and ranges, while also achieving automatic parameters pairing. Simulation results demonstrate the validity of the developed approach.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105928"},"PeriodicalIF":3.0,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KADNet:Low SNR automatic modulation classification via SNR aware deformable convolution and Kolmogorov-Arnold networks 低信噪比自动调制分类通过信噪比感知的可变形卷积和Kolmogorov-Arnold网络
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-21 DOI: 10.1016/j.dsp.2026.105942
Run Wang , Jizhe Li , Youze Yang , Shasha Wang , Bing Zheng
The proliferation of modern communication technologies has precipitated increasingly sophisticated electromagnetic environments, demanding more rigorous performance from Automatic Modulation Classification (AMC) systems, especially in low signal-to-noise ratio (SNR) scenarios where conventional approaches struggle with feature extraction and classification fidelity. In response, we propose KADNet, a novel architecture tailored for AMC in low-SNR scenarios.KADNet comprises two key components: a Signal Enhancement Module (SEM) and an SNR-Aware Deformable Convolutional Network (SADCN).In the SEM, time-domain I/Q samples are first projected into the frequency domain via the fast Fourier transform (FFT). A spectral weighting mask is then generated by a Kolmogorov-Arnold Network (KAN), enabling precise attenuation of noise and amplification of decision-relevant signal components. Subsequently, the SADCN employs a lightweight subnetwork to estimate a soft SNR map, which is then fused into deformable convolution operations via a Signal Quality Spatial Attention (SQSA) mechanism. This fusion produces secondary spatial offsets and modulation-adaptive weights, allowing sampling grids to adjust dynamically in response to local signal quality. Extensive experiments on the RADIOML 2016.10A/B benchmarks demonstrate the effectiveness of our design: KADNet achieves mean classification accuracies of 64.66 percent and 65.58 percent, corresponding to improvements of 2.04 percent and 0.56 percent over baseline methods. Moreover, within the extremely low-SNR range of -20 dB to -2 dB, KADNet attains average accuracies of 36.86 percent and 37.92 percent, surpassing the current state of the art by 3.0 percent to 3.8 percent. This significant improvement over the current state-of-the-art in the most challenging SNR conditions confirms that KADNet is a superior AMC method in low-SNR conditions.
现代通信技术的发展导致电磁环境日益复杂,对自动调制分类(AMC)系统的性能提出了更高的要求,特别是在低信噪比(SNR)的情况下,传统方法难以实现特征提取和分类保真度。因此,我们提出了KADNet,这是一种为低信噪比场景下的AMC量身定制的新架构。KADNet包括两个关键组件:信号增强模块(SEM)和感知信噪比的可变形卷积网络(SADCN)。在扫描电镜中,时域I/Q样本首先通过快速傅里叶变换(FFT)投射到频域。然后由Kolmogorov-Arnold网络(KAN)生成一个频谱加权掩模,从而实现对噪声的精确衰减和与决策相关的信号分量的放大。随后,SADCN采用轻量级子网来估计软信噪比映射,然后通过信号质量空间注意(SQSA)机制将其融合到可变形卷积操作中。这种融合产生二次空间偏移和调制自适应权重,允许采样网格动态调整以响应本地信号质量。在RADIOML 2016.10A/B基准测试上的大量实验证明了我们设计的有效性:KADNet实现了64.66%和65.58%的平均分类准确率,相对于基线方法提高了2.04%和0.56%。此外,在-20 dB至-2 dB的极低信噪比范围内,KADNet的平均精度达到36.86%和37.92%,比目前的技术水平高出3.0%至3.8%。在最具挑战性的信噪比条件下,与目前最先进的技术相比,这一重大改进证实了KADNet在低信噪比条件下是一种优越的AMC方法。
{"title":"KADNet:Low SNR automatic modulation classification via SNR aware deformable convolution and Kolmogorov-Arnold networks","authors":"Run Wang ,&nbsp;Jizhe Li ,&nbsp;Youze Yang ,&nbsp;Shasha Wang ,&nbsp;Bing Zheng","doi":"10.1016/j.dsp.2026.105942","DOIUrl":"10.1016/j.dsp.2026.105942","url":null,"abstract":"<div><div>The proliferation of modern communication technologies has precipitated increasingly sophisticated electromagnetic environments, demanding more rigorous performance from Automatic Modulation Classification (AMC) systems, especially in low signal-to-noise ratio (SNR) scenarios where conventional approaches struggle with feature extraction and classification fidelity. In response, we propose KADNet, a novel architecture tailored for AMC in low-SNR scenarios.KADNet comprises two key components: a Signal Enhancement Module (SEM) and an SNR-Aware Deformable Convolutional Network (SADCN).In the SEM, time-domain I/Q samples are first projected into the frequency domain via the fast Fourier transform (FFT). A spectral weighting mask is then generated by a Kolmogorov-Arnold Network (KAN), enabling precise attenuation of noise and amplification of decision-relevant signal components. Subsequently, the SADCN employs a lightweight subnetwork to estimate a soft SNR map, which is then fused into deformable convolution operations via a Signal Quality Spatial Attention (SQSA) mechanism. This fusion produces secondary spatial offsets and modulation-adaptive weights, allowing sampling grids to adjust dynamically in response to local signal quality. Extensive experiments on the RADIOML 2016.10A/B benchmarks demonstrate the effectiveness of our design: KADNet achieves mean classification accuracies of 64.66 percent and 65.58 percent, corresponding to improvements of 2.04 percent and 0.56 percent over baseline methods. Moreover, within the extremely low-SNR range of -20 dB to -2 dB, KADNet attains average accuracies of 36.86 percent and 37.92 percent, surpassing the current state of the art by 3.0 percent to 3.8 percent. This significant improvement over the current state-of-the-art in the most challenging SNR conditions confirms that KADNet is a superior AMC method in low-SNR conditions.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105942"},"PeriodicalIF":3.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive sparse graph for multi-view clustering 多视图聚类的自适应稀疏图
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-21 DOI: 10.1016/j.dsp.2026.105944
Haoyan Yang , Qianyin Wei , Tianchuan Yang , Jipeng Guo
Graph-based multi-view clustering (MVGC) has aroused interest as it can exploit consistent and complementary information from multiple perspectives. The quality of the constructed similarity graph largely determines the clustering performance of MVGC. Many existing methods directly apply the acquired similarity graph for spectral clustering, ignoring the massive inter-cluster similarities in the graph, influencing cluster partition. Constructing the k-nearest neighbors (KNN) sparse graph to remove inter-cluster similarities is a common improvement. However, kNN graph requires extensive tuning of the parameter k. To solve this, we propose a graph-based multi-view clustering method based on the adaptive sparse graph (MNV-MC). Specifically, an initial similarity graph is obtained by a low-rank tensor learning framework. Then, the heuristic method, Mutual Nearest Neighbor Value (MNV), is proposed to adaptively select the optimal k based on density changes to construct the high-quality sparse similarity graph. After processing by the fusion mechanism, the graph is input into spectral clustering to obtain clustering results. Experiments indicate that MNV-MC achieves outstanding performance, and the effectiveness of MNV for adaptively k-value selection of KNN graph is verified. Specifically, MNV-MC achieves average improvements of 7.79% in ACC and 5.16% in NMI over the second-best method across eight datasets, and gains of 7.29% and 5.79% on four additional large-scale datasets. Notably, as a parameter-free post-processing step, MNV can be easily integrated to other MVGCs. Experiments show that MVGC methods significantly improve their performance after applying MNV. The code is publicly available at https://github.com/ytccyw/MNVMC.
基于图的多视图聚类(MVGC)由于能够从多个角度获取一致和互补的信息而引起了人们的兴趣。构建的相似图的质量在很大程度上决定了MVGC的聚类性能。现有的许多方法直接将获得的相似图用于谱聚类,忽略了图中大量的簇间相似度,影响了簇的划分。构造k近邻(KNN)稀疏图来去除簇间相似性是一种常见的改进方法。然而,kNN图需要大量调整参数k。为了解决这个问题,我们提出了一种基于自适应稀疏图(MNV-MC)的基于图的多视图聚类方法。具体而言,通过低秩张量学习框架获得初始相似图。然后,提出了基于密度变化自适应选择最优k的启发式方法互近邻值(MNV),构建高质量的稀疏相似图;经过融合机制处理后,将图输入到谱聚类中,得到聚类结果。实验表明,MNV- mc取得了优异的性能,验证了MNV自适应选择KNN图k值的有效性。具体而言,MNV-MC在8个数据集上的ACC和NMI平均提高了7.79%和5.16%,在另外4个大规模数据集上的增益分别为7.29%和5.79%。值得注意的是,作为一个无参数的后处理步骤,MNV可以很容易地集成到其他mvgc中。实验表明,应用MNV后,MVGC方法的性能得到了显著提高。该代码可在https://github.com/ytccyw/MNVMC上公开获得。
{"title":"Adaptive sparse graph for multi-view clustering","authors":"Haoyan Yang ,&nbsp;Qianyin Wei ,&nbsp;Tianchuan Yang ,&nbsp;Jipeng Guo","doi":"10.1016/j.dsp.2026.105944","DOIUrl":"10.1016/j.dsp.2026.105944","url":null,"abstract":"<div><div>Graph-based multi-view clustering (MVGC) has aroused interest as it can exploit consistent and complementary information from multiple perspectives. The quality of the constructed similarity graph largely determines the clustering performance of MVGC. Many existing methods directly apply the acquired similarity graph for spectral clustering, ignoring the massive inter-cluster similarities in the graph, influencing cluster partition. Constructing the <em>k</em>-nearest neighbors (KNN) sparse graph to remove inter-cluster similarities is a common improvement. However, kNN graph requires extensive tuning of the parameter <em>k</em>. To solve this, we propose a graph-based multi-view clustering method based on the adaptive sparse graph (MNV-MC). Specifically, an initial similarity graph is obtained by a low-rank tensor learning framework. Then, the heuristic method, Mutual Nearest Neighbor Value (MNV), is proposed to adaptively select the optimal <em>k</em> based on density changes to construct the high-quality sparse similarity graph. After processing by the fusion mechanism, the graph is input into spectral clustering to obtain clustering results. Experiments indicate that MNV-MC achieves outstanding performance, and the effectiveness of MNV for adaptively <em>k</em>-value selection of KNN graph is verified. Specifically, MNV-MC achieves average improvements of 7.79% in ACC and 5.16% in NMI over the second-best method across eight datasets, and gains of 7.29% and 5.79% on four additional large-scale datasets. Notably, as a parameter-free post-processing step, MNV can be easily integrated to other MVGCs. Experiments show that MVGC methods significantly improve their performance after applying MNV. The code is publicly available at <span><span>https://github.com/ytccyw/MNVMC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105944"},"PeriodicalIF":3.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing structure-aware graph representation and adaptive anchor graph learning for multi-view clustering 利用结构感知图表示和自适应锚图学习进行多视图聚类
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-21 DOI: 10.1016/j.dsp.2026.105937
Xiaoran Li, Jinglei Liu
Multi-view clustering (MVC) aims to enhance clustering performance through the effective integration of complementary information derived from multiple data sources. Nevertheless, current approaches frequently fall short of fully modeling the global topological characteristics and local similarity connections of multi-view data. In addition, adaptively learning representative anchors that align with the inherent data structure is another challenge for conventional anchor-based multi-view clustering (AMVC) techniques. To solve the above problems, we propose a novel MVC framework that integrates structure-aware graph representation and adaptive anchor graph learning (SAGA2G). Specifically, the SAGA2G approach achieves unified modeling of multi-level structures by preserving neighborhood structure features utilizing local similarity constraints and topological consistency through anchor-based global reconstruction. Simultaneously, we develop a dynamic anchor optimization approach that raises the expressive power of the data by automatically aligning the anchor distribution with the underlying cluster structure. Furthermore, an efficient alternating optimization algorithm is utilized to address the proposed approach, with theoretical guarantees of linear time complexity and convergence. Finally, extensive experiments performed on eight benchmark datasets demonstrate that SAGA2G significantly surpasses the current state-of-the-art techniques.
多视图聚类(MVC)旨在通过有效集成来自多个数据源的互补信息来提高聚类性能。然而,目前的方法往往不能完全模拟多视图数据的全局拓扑特征和局部相似连接。此外,自适应地学习与固有数据结构一致的代表性锚点是传统的基于锚点的多视图聚类(AMVC)技术面临的另一个挑战。为了解决上述问题,我们提出了一种新的MVC框架,该框架集成了结构感知图表示和自适应锚图学习(SAGA2G)。具体而言,SAGA2G方法通过基于锚点的全局重建,利用局部相似约束和拓扑一致性保留邻域结构特征,实现了多层次结构的统一建模。同时,我们开发了一种动态锚点优化方法,通过自动将锚点分布与底层集群结构对齐来提高数据的表现力。此外,采用了一种有效的交替优化算法来解决所提出的方法,并从理论上保证了线性时间复杂度和收敛性。最后,在八个基准数据集上进行的大量实验表明,SAGA2G显著优于当前最先进的技术。
{"title":"Harnessing structure-aware graph representation and adaptive anchor graph learning for multi-view clustering","authors":"Xiaoran Li,&nbsp;Jinglei Liu","doi":"10.1016/j.dsp.2026.105937","DOIUrl":"10.1016/j.dsp.2026.105937","url":null,"abstract":"<div><div>Multi-view clustering (MVC) aims to enhance clustering performance through the effective integration of complementary information derived from multiple data sources. Nevertheless, current approaches frequently fall short of fully modeling the global topological characteristics and local similarity connections of multi-view data. In addition, adaptively learning representative anchors that align with the inherent data structure is another challenge for conventional anchor-based multi-view clustering (AMVC) techniques. To solve the above problems, we propose a novel MVC framework that integrates structure-aware graph representation and adaptive anchor graph learning (SAGA<sup>2</sup>G). Specifically, the SAGA<sup>2</sup>G approach achieves unified modeling of multi-level structures by preserving neighborhood structure features utilizing local similarity constraints and topological consistency through anchor-based global reconstruction. Simultaneously, we develop a dynamic anchor optimization approach that raises the expressive power of the data by automatically aligning the anchor distribution with the underlying cluster structure. Furthermore, an efficient alternating optimization algorithm is utilized to address the proposed approach, with theoretical guarantees of linear time complexity and convergence. Finally, extensive experiments performed on eight benchmark datasets demonstrate that SAGA<sup>2</sup>G significantly surpasses the current state-of-the-art techniques.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105937"},"PeriodicalIF":3.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MA-YOLO: Enhanced multi-scale attentional remote sensing detector 增强型多尺度关注遥感探测器
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-21 DOI: 10.1016/j.dsp.2026.105948
Zikai Chen , Degang Yang , Tingting Song , Yichen Ye , Yongli Liu , Xin Zhang
With the continuous development of deep learning technology, object detection tasks in remote sensing images have received increasing attention. However, due to the diversity of object scales and the complexity of background environments, current detectors often find it difficult to control computational costs while ensuring high performance. To address these challenges, we design a remote sensing image object detector called MA-YOLO, which integrates multi-scale features and attention mechanisms. We design the mixed receptive field attention convolution (MRFAConv) module to strengthen the backbone network, which is a non-parametric shared convolution that takes into account both spatial and channel attention. Moreover, a multi-scale receptive field downsampling module (MRFD) is proposed, which can extract rich feature information from different receptive fields while effectively reducing information loss. Ultimately, a lightweight multi-scale attention module (LMSA) is designed and integrated into the neck network to further optimize the feature fusion effect. Extensive experiments conducted on the DIOR and TGRS-HRRSD datasets reveal that MA-YOLO enhances the mAP by 2.1% and 5.3%, respectively, compared to the baseline model YOLOv8n, while slightly reducing computational overhead and decreasing the number of parameters by 6.7%. These experimental results fully demonstrate the remarkable effectiveness of our proposed method in enhancing the detection accuracy of remote sensing images. The code will be available at https://github.com/Zikai-Chen/MA-YOLO.
随着深度学习技术的不断发展,遥感图像中的目标检测任务越来越受到重视。然而,由于目标尺度的多样性和背景环境的复杂性,当前检测器往往难以在保证高性能的同时控制计算成本。为了解决这些挑战,我们设计了一种名为MA-YOLO的遥感图像目标探测器,该探测器集成了多尺度特征和注意机制。为了增强骨干网,我们设计了混合感受野注意卷积(MRFAConv)模块,这是一种同时考虑空间和通道注意的非参数共享卷积。此外,提出了一种多尺度感受野降采样模块(MRFD),可以从不同的感受野中提取丰富的特征信息,同时有效降低信息损失。最后,设计轻量级多尺度注意力模块(LMSA)并集成到颈部网络中,进一步优化特征融合效果。在DIOR和TGRS-HRRSD数据集上进行的大量实验表明,与基线模型YOLOv8n相比,MA-YOLO的mAP分别提高了2.1%和5.3%,同时略微减少了计算开销,参数数量减少了6.7%。这些实验结果充分证明了本文方法在提高遥感图像检测精度方面的显著有效性。代码可在https://github.com/Zikai-Chen/MA-YOLO上获得。
{"title":"MA-YOLO: Enhanced multi-scale attentional remote sensing detector","authors":"Zikai Chen ,&nbsp;Degang Yang ,&nbsp;Tingting Song ,&nbsp;Yichen Ye ,&nbsp;Yongli Liu ,&nbsp;Xin Zhang","doi":"10.1016/j.dsp.2026.105948","DOIUrl":"10.1016/j.dsp.2026.105948","url":null,"abstract":"<div><div>With the continuous development of deep learning technology, object detection tasks in remote sensing images have received increasing attention. However, due to the diversity of object scales and the complexity of background environments, current detectors often find it difficult to control computational costs while ensuring high performance. To address these challenges, we design a remote sensing image object detector called MA-YOLO, which integrates multi-scale features and attention mechanisms. We design the mixed receptive field attention convolution (MRFAConv) module to strengthen the backbone network, which is a non-parametric shared convolution that takes into account both spatial and channel attention. Moreover, a multi-scale receptive field downsampling module (MRFD) is proposed, which can extract rich feature information from different receptive fields while effectively reducing information loss. Ultimately, a lightweight multi-scale attention module (LMSA) is designed and integrated into the neck network to further optimize the feature fusion effect. Extensive experiments conducted on the DIOR and TGRS-HRRSD datasets reveal that MA-YOLO enhances the mAP by 2.1% and 5.3%, respectively, compared to the baseline model YOLOv8n, while slightly reducing computational overhead and decreasing the number of parameters by 6.7%. These experimental results fully demonstrate the remarkable effectiveness of our proposed method in enhancing the detection accuracy of remote sensing images. The code will be available at <span><span>https://github.com/Zikai-Chen/MA-YOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105948"},"PeriodicalIF":3.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EnFuseNet: A Dual-Module approach combining tail-Class enhancement and dynamic fusion for long-Tail skin lesion diagnosis EnFuseNet:一种结合尾级增强和动态融合的双模块方法用于长尾皮肤病变诊断
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-21 DOI: 10.1016/j.dsp.2026.105929
Yongcai Tao , Renwei Xiao , Yucheng Shi , Zhe Li , Qing Zhang , Xiaotian Yuan , Lei Shi
The low incidence of skin diseases leads to a highly imbalanced class distribution, which complicates computer-aided diagnosis. While supervised contrastive learning has been applied to address this long-tail distribution, two challenges remain: first, the significant variation between intra-class and inter-class feature distributions, which hampers effective sample discrimination; and second, the insufficient number of tail-class samples, which limits their representation and impedes improvements in diagnostic accuracy. To address these challenges, we propose EnFuseNet, a novel contrastive learning framework. EnFuseNet incorporates two key modules: the Dual-view Interactive Fusion (DIF) module and the Tail Representation Enhancement (TREM) module. The DIF module enhances intra-class compactness and inter-class separability by combining dual-view features through a channel- and spatially interactive attention mechanism. The TREM module mitigates the issue of limited tail-class samples by generating and dynamically updating prototypes for these classes using a sliding window mechanism. Additionally, the Stage-Adaptive Weighted Cross-Entropy (SAW-CE) loss function, based on curriculum learning and dynamic weighting, guides the model toward more balanced inter-class learning, thereby alleviating diagnosis difficulties during training. Experimental results on the ISIC2018 and ISIC2019 skin disease datasets demonstrate that EnFuseNet achieves accuracy and AUC values of 86%-88% and 97%, respectively, outperforming state-of-the-art methods. These results highlight the potential of EnFuseNet in diagnosing rare and long-tail skin diseases. The source code is available on GitHub.
皮肤疾病的低发病率导致分类分布高度不平衡,这给计算机辅助诊断带来了复杂性。虽然监督对比学习已被应用于解决这种长尾分布,但仍然存在两个挑战:首先,类内和类间特征分布之间存在显著差异,这阻碍了有效的样本区分;其次,尾类样本数量不足,这限制了它们的代表性,阻碍了诊断准确性的提高。为了应对这些挑战,我们提出了一种新的对比学习框架EnFuseNet。EnFuseNet包含两个关键模块:双视图交互融合(DIF)模块和尾部表示增强(TREM)模块。DIF模块通过通道和空间交互关注机制结合双视图特性,增强了类内的紧凑性和类间的可分离性。TREM模块通过使用滑动窗口机制为这些类生成和动态更新原型,缓解了尾类样本有限的问题。此外,基于课程学习和动态加权的阶段自适应加权交叉熵(SAW-CE)损失函数,引导模型更平衡地进行班级间学习,从而减轻训练过程中的诊断困难。在ISIC2018和ISIC2019皮肤病数据集上的实验结果表明,EnFuseNet的准确率和AUC值分别为86%-88%和97%,优于目前最先进的方法。这些结果突出了EnFuseNet在诊断罕见和长尾皮肤病方面的潜力。源代码可在GitHub上获得。
{"title":"EnFuseNet: A Dual-Module approach combining tail-Class enhancement and dynamic fusion for long-Tail skin lesion diagnosis","authors":"Yongcai Tao ,&nbsp;Renwei Xiao ,&nbsp;Yucheng Shi ,&nbsp;Zhe Li ,&nbsp;Qing Zhang ,&nbsp;Xiaotian Yuan ,&nbsp;Lei Shi","doi":"10.1016/j.dsp.2026.105929","DOIUrl":"10.1016/j.dsp.2026.105929","url":null,"abstract":"<div><div>The low incidence of skin diseases leads to a highly imbalanced class distribution, which complicates computer-aided diagnosis. While supervised contrastive learning has been applied to address this long-tail distribution, two challenges remain: first, the significant variation between intra-class and inter-class feature distributions, which hampers effective sample discrimination; and second, the insufficient number of tail-class samples, which limits their representation and impedes improvements in diagnostic accuracy. To address these challenges, we propose EnFuseNet, a novel contrastive learning framework. EnFuseNet incorporates two key modules: the Dual-view Interactive Fusion (DIF) module and the Tail Representation Enhancement (TREM) module. The DIF module enhances intra-class compactness and inter-class separability by combining dual-view features through a channel- and spatially interactive attention mechanism. The TREM module mitigates the issue of limited tail-class samples by generating and dynamically updating prototypes for these classes using a sliding window mechanism. Additionally, the Stage-Adaptive Weighted Cross-Entropy (SAW-CE) loss function, based on curriculum learning and dynamic weighting, guides the model toward more balanced inter-class learning, thereby alleviating diagnosis difficulties during training. Experimental results on the ISIC2018 and ISIC2019 skin disease datasets demonstrate that EnFuseNet achieves accuracy and AUC values of 86%-88% and 97%, respectively, outperforming state-of-the-art methods. These results highlight the potential of EnFuseNet in diagnosing rare and long-tail skin diseases. The source code is available on <span><span>GitHub</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105929"},"PeriodicalIF":3.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physical layer security analysis and resource optimization for satellite-terrestrial multi-antenna systems 星-地多天线系统物理层安全分析与资源优化
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-20 DOI: 10.1016/j.dsp.2026.105949
Kexin Wang , Jian Zhang , Gang Xin , Jun Gao , Chengxian Ge , Yan Li
This paper proposes a multi-antenna based physical layer secure communication scheme and conducts a quantitative analysis of its secret key rate (SKR). Firstly, in order to meet the secure communication needs of long-distance terminals, a Rician fading multi-antenna secure communication model that conforms to the characteristics of satellite broadcast channels was established. Secondly, the upper and lower bounds of the SKR as well as its asymptotic limit under large-scale eavesdropping scenarios are derived, and the quantitative impacts of the antenna number ratio and channel gain ratio on the SKR limit are elucidated. To address the problem of limited antenna resources at the satellite, an optimal antenna number allocation strategy among legitimate terminals is further proposed. Simulations verify that tilting the antenna number allocation toward legitimate terminals with more prominent channel advantages can maximize the SKR. Compared with the uniform allocation of antenna numbers, the optimal allocation strategy can improve the SKR by up to 4.6% under certain scenario conditions. In addition, this strategy can achieve a positive SKR with only half the number of antennas. This scheme effectively addresses the issues of insufficient model adaptability and low resource efficiency in existing studies on satellite scenarios, providing a key theoretical basis for the design of satellite secure communication systems.
提出了一种基于多天线的物理层安全通信方案,并对其密钥率(SKR)进行了定量分析。首先,为了满足远距离终端的保密通信需求,建立了一种符合卫星广播信道特点的多天线衰落保密通信模型。其次,推导了大规模窃听场景下SKR的上下界及其渐近极限,并阐明了天线数比和信道增益比对SKR极限的定量影响;针对卫星天线资源有限的问题,进一步提出了合法终端间天线数的优化分配策略。仿真结果表明,将天线数分配向具有更突出信道优势的合法终端倾斜可以使SKR最大化。与天线数均匀分配相比,在一定场景条件下,该优化分配策略可使SKR提高4.6%。此外,该策略可以实现一个正的SKR,只有一半的天线数量。该方案有效地解决了现有卫星场景研究中存在的模型适应性不足、资源效率低等问题,为卫星保密通信系统的设计提供了关键的理论依据。
{"title":"Physical layer security analysis and resource optimization for satellite-terrestrial multi-antenna systems","authors":"Kexin Wang ,&nbsp;Jian Zhang ,&nbsp;Gang Xin ,&nbsp;Jun Gao ,&nbsp;Chengxian Ge ,&nbsp;Yan Li","doi":"10.1016/j.dsp.2026.105949","DOIUrl":"10.1016/j.dsp.2026.105949","url":null,"abstract":"<div><div>This paper proposes a multi-antenna based physical layer secure communication scheme and conducts a quantitative analysis of its secret key rate (SKR). Firstly, in order to meet the secure communication needs of long-distance terminals, a Rician fading multi-antenna secure communication model that conforms to the characteristics of satellite broadcast channels was established. Secondly, the upper and lower bounds of the SKR as well as its asymptotic limit under large-scale eavesdropping scenarios are derived, and the quantitative impacts of the antenna number ratio and channel gain ratio on the SKR limit are elucidated. To address the problem of limited antenna resources at the satellite, an optimal antenna number allocation strategy among legitimate terminals is further proposed. Simulations verify that tilting the antenna number allocation toward legitimate terminals with more prominent channel advantages can maximize the SKR. Compared with the uniform allocation of antenna numbers, the optimal allocation strategy can improve the SKR by up to 4.6% under certain scenario conditions. In addition, this strategy can achieve a positive SKR with only half the number of antennas. This scheme effectively addresses the issues of insufficient model adaptability and low resource efficiency in existing studies on satellite scenarios, providing a key theoretical basis for the design of satellite secure communication systems.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105949"},"PeriodicalIF":3.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSFI Net Pro: A tiny slender crack detection model with stronger feature utilization capability MSFI Net Pro:一个具有更强特征利用能力的微小细长裂纹检测模型
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-20 DOI: 10.1016/j.dsp.2026.105931
Fan Yang , Junzhou Huo , Zhenxiang Guan , Hua Li , Zhang Cheng
To address the limitations of conventional object detection neck structures–particularly insufficient utilization of deep features and inadequate multi-scale interactions in micro-crack detection–this paper introduces the Multi-scale Feature Stereo Interaction Network, MSFI Net (Pro). At its core lies the innovative Multi-scale Stereo Feature Extraction (MSFE) Block, which constructs parallel spatial interaction pathways across three adjacent scales to comprehensively integrate feature maps from shallow, intermediate, and deep layers. Simultaneously, it introduces a gradient enhancement mechanism through dual hybrid residual connections, effectively preserving gradient flow and feature integrity. MSFI Net (Pro) further engineers a dual-stage fusion pipeline, cascading two MSFE Blocks for coarse refinement followed by precise fine-tuning of features. This synergizes with dense cross-layer connectivity to fortify information propagation. Moreover, the network incorporates shallower P2-layer feature maps, injecting less noisy geometric information that significantly bolsters the recognition capability for slender cracks. Validation on enhanced micro-crack datasets and the Severstal steel defect dataset demonstrates MSFI Net (Pro)’s consistent performance uplift for baseline models. Specifically, under micro-crack test conditions, it achieves a 0.144 improvement in AP50-95 for YOLOv11n, while simultaneously boosting recall rates and prediction confidence for micro-crack targets. Compared to mainstream SOTA neck-optimized models, MSFI Net (Pro) maintains significant performance advantages in detection precision, classification accuracy, and localization efficacy.
为了解决传统目标检测颈部结构的局限性,特别是在微裂纹检测中深层特征的利用不足和多尺度相互作用不足,本文介绍了多尺度特征立体相互作用网络MSFI Net (Pro)。其核心是创新的多尺度立体特征提取(Multi-scale Stereo Feature Extraction, MSFE)区块,构建三个相邻尺度的平行空间交互路径,全面整合浅层、中间层和深层特征图。同时,通过双混合残差连接引入梯度增强机制,有效保持梯度流和特征完整性。MSFI Net (Pro)进一步设计了双级融合管道,级联两个MSFE块进行粗细化,然后进行精确的特征微调。这与密集的跨层连接协同作用,以加强信息传播。此外,该网络结合了较浅的p2层特征图,注入较少噪声的几何信息,显著增强了对细长裂缝的识别能力。对增强微裂纹数据集和Severstal钢缺陷数据集的验证表明,MSFI Net (Pro)在基线模型上具有一致的性能提升。具体而言,在微裂纹测试条件下,YOLOv11n在AP50-95上实现了0.144的改进,同时提高了微裂纹目标的召回率和预测置信度。与主流SOTA颈部优化模型相比,MSFI Net (Pro)在检测精度、分类精度和定位效率方面保持了显著的性能优势。
{"title":"MSFI Net Pro: A tiny slender crack detection model with stronger feature utilization capability","authors":"Fan Yang ,&nbsp;Junzhou Huo ,&nbsp;Zhenxiang Guan ,&nbsp;Hua Li ,&nbsp;Zhang Cheng","doi":"10.1016/j.dsp.2026.105931","DOIUrl":"10.1016/j.dsp.2026.105931","url":null,"abstract":"<div><div>To address the limitations of conventional object detection neck structures–particularly insufficient utilization of deep features and inadequate multi-scale interactions in micro-crack detection–this paper introduces the Multi-scale Feature Stereo Interaction Network, MSFI Net (Pro). At its core lies the innovative Multi-scale Stereo Feature Extraction (MSFE) Block, which constructs parallel spatial interaction pathways across three adjacent scales to comprehensively integrate feature maps from shallow, intermediate, and deep layers. Simultaneously, it introduces a gradient enhancement mechanism through dual hybrid residual connections, effectively preserving gradient flow and feature integrity. MSFI Net (Pro) further engineers a dual-stage fusion pipeline, cascading two MSFE Blocks for coarse refinement followed by precise fine-tuning of features. This synergizes with dense cross-layer connectivity to fortify information propagation. Moreover, the network incorporates shallower P2-layer feature maps, injecting less noisy geometric information that significantly bolsters the recognition capability for slender cracks. Validation on enhanced micro-crack datasets and the Severstal steel defect dataset demonstrates MSFI Net (Pro)’s consistent performance uplift for baseline models. Specifically, under micro-crack test conditions, it achieves a 0.144 improvement in AP<sup>50-95</sup> for YOLOv11n, while simultaneously boosting recall rates and prediction confidence for micro-crack targets. Compared to mainstream SOTA neck-optimized models, MSFI Net (Pro) maintains significant performance advantages in detection precision, classification accuracy, and localization efficacy.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105931"},"PeriodicalIF":3.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1