首页 > 最新文献

Digital Signal Processing最新文献

英文 中文
Deformable convolution and transformer hybrid network for hyperspectral image classification 高光谱图像分类的可变形卷积和变压器混合网络
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-29 DOI: 10.1016/j.dsp.2026.105962
Xiang Chen, Shuzhen Zhang, Hailong Song, Qi Yan
Recently, deformable convolutions based on convolutional neural networks have been widely used in hyperspectral image (HSI) classification due to their flexible geometric adaptability and superior local feature extraction capabilities. However, they still face significant challenges in establishing long-range dependencies and capturing global contextual information among pixel sequences. To address these challenges, a novel deformable convolution and Transformer hybrid network (DTHNet) is proposed for HSI classification. Specifically, PCA is firstly employed to reduce the dimensionality of the original HSI and a group depth joint convolution block (GDJCB) is utilized to capture the spectral-spatial features of the reduced HSI patches, avoiding the neglect of certain spectral bands. Secondly, a parallel architecture composed of a designed deformable convolution and a Transformer is utilized to jointly extract local-global spectral-spatial features and long-range dependencies in HSI. In the deformable convolution branch, a simple parameter-free attention (SimAM) enhanced spectral-spatial convolution block (SSCB) is designed to effectively prevent the loss of key information and the generation of redundant features during the convolution. In the Transformer branch, the deep integration of convolutional operation and self-attention mechanism further promotes more effective extraction of HSI features. Finally, fusion features from the two branches to obtain the more accurate HSI classification. Experimental results on three widely used HSI datasets demonstrate that the proposed DTHNet outperforms several state-of-the-art HSI classification networks.
近年来,基于卷积神经网络的可变形卷积以其灵活的几何适应性和优越的局部特征提取能力在高光谱图像分类中得到了广泛的应用。然而,它们在建立长期依赖关系和捕获像素序列之间的全局上下文信息方面仍然面临重大挑战。为了解决这些挑战,提出了一种新的可变形卷积和变压器混合网络(DTHNet)用于HSI分类。具体而言,首先利用PCA对原始HSI进行降维,并利用组深度联合卷积块(group depth joint convolution block, GDJCB)捕捉降维后HSI斑块的光谱空间特征,避免了某些光谱波段的忽略。其次,利用设计的可变形卷积和变压器组成的并行结构,联合提取局部-全局频谱空间特征和远程依赖关系。在可变形卷积分支中,设计了一种简单的无参数注意(SimAM)增强频谱空间卷积块(SSCB),有效防止了卷积过程中关键信息的丢失和冗余特征的产生。在Transformer分支中,卷积运算与自关注机制的深度融合进一步促进了HSI特征的更有效提取。最后,融合两个分支的特征,得到更准确的HSI分类。在三个广泛使用的恒指指数数据集上的实验结果表明,所提出的DTHNet优于几种最先进的恒指指数分类网络。
{"title":"Deformable convolution and transformer hybrid network for hyperspectral image classification","authors":"Xiang Chen,&nbsp;Shuzhen Zhang,&nbsp;Hailong Song,&nbsp;Qi Yan","doi":"10.1016/j.dsp.2026.105962","DOIUrl":"10.1016/j.dsp.2026.105962","url":null,"abstract":"<div><div>Recently, deformable convolutions based on convolutional neural networks have been widely used in hyperspectral image (HSI) classification due to their flexible geometric adaptability and superior local feature extraction capabilities. However, they still face significant challenges in establishing long-range dependencies and capturing global contextual information among pixel sequences. To address these challenges, a novel deformable convolution and Transformer hybrid network (DTHNet) is proposed for HSI classification. Specifically, PCA is firstly employed to reduce the dimensionality of the original HSI and a group depth joint convolution block (GDJCB) is utilized to capture the spectral-spatial features of the reduced HSI patches, avoiding the neglect of certain spectral bands. Secondly, a parallel architecture composed of a designed deformable convolution and a Transformer is utilized to jointly extract local-global spectral-spatial features and long-range dependencies in HSI. In the deformable convolution branch, a simple parameter-free attention (SimAM) enhanced spectral-spatial convolution block (SSCB) is designed to effectively prevent the loss of key information and the generation of redundant features during the convolution. In the Transformer branch, the deep integration of convolutional operation and self-attention mechanism further promotes more effective extraction of HSI features. Finally, fusion features from the two branches to obtain the more accurate HSI classification. Experimental results on three widely used HSI datasets demonstrate that the proposed DTHNet outperforms several state-of-the-art HSI classification networks.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105962"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-to-end target speaker speech recognition with voice activity detection fusion 端到端目标说话人语音识别与语音活动检测融合
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-29 DOI: 10.1016/j.dsp.2026.105966
Zhentao Lin , Bi Zeng , Song Wen , Zihao Chen , Huiting Hu
Traditional Voice Activity Detection (VAD)-based systems frequently encounter challenges in handling speaker overlap within multi-speaker environments, particularly in the context of target speaker Automatic Speech Recognition (ASR). This difficulty arises predominantly from the limitations of front-end VAD modules, which are independently trained to distinguish noise from speech but often introduce insertion and deletion errors, adversely affecting the overall performance of the ASR system. To address this coupling deficiency, we propose an End-to-End Streaming Personal target speaker ASR (SP-ASR) framework that achieves fusion of VAD and ASR components in a streaming style. Our architecture introduces two key innovations: Initially, a Streaming Personal VAD (SP-VAD) module functions as a neural gatekeeper, segmenting audio streams while emphasizing target speaker characteristics through its Contextual Attention and Target Speaker Attention (CA-TSA) mechanism. Subsequently, a Streaming Mask-based ASR (SM-ASR) model is employed, which is integrated with SP-VAD and fine-tuned using both coarse-grained and fine-grained speaker information to extract speaker-specific transcriptions. Our experiments reveal a remarkable reduction in the concatenated target-speaker Word Error Rate (ctWER), showcasing the superiority of the End-to-End SP-ASR fusion system over conventional ASR systems, especially under conditions with significant speech overlap and noise.
传统的基于语音活动检测(VAD)的系统在处理多说话人环境中的说话人重叠时经常遇到挑战,特别是在目标说话人自动语音识别(ASR)的背景下。这种困难主要来自前端VAD模块的局限性,这些模块被独立训练以区分噪声和语音,但经常引入插入和删除错误,对ASR系统的整体性能产生不利影响。为了解决这一耦合缺陷,我们提出了一个端到端流式个人目标扬声器ASR (SP-ASR)框架,该框架以流方式实现了VAD和ASR组件的融合。我们的架构引入了两个关键的创新:最初,一个流式个人VAD (SP-VAD)模块作为一个神经看门人,分割音频流,同时通过其上下文注意和目标说话人注意(CA-TSA)机制强调目标说话人的特征。随后,采用基于流掩码的ASR (SM-ASR)模型,该模型与SP-VAD集成,并使用粗粒度和细粒度的说话人信息进行微调,以提取说话人特定的转录。我们的实验显示,连接目标说话人的单词错误率(ctWER)显著降低,展示了端到端SP-ASR融合系统比传统ASR系统的优势,特别是在语音重叠和噪声严重的情况下。
{"title":"End-to-end target speaker speech recognition with voice activity detection fusion","authors":"Zhentao Lin ,&nbsp;Bi Zeng ,&nbsp;Song Wen ,&nbsp;Zihao Chen ,&nbsp;Huiting Hu","doi":"10.1016/j.dsp.2026.105966","DOIUrl":"10.1016/j.dsp.2026.105966","url":null,"abstract":"<div><div>Traditional Voice Activity Detection (VAD)-based systems frequently encounter challenges in handling speaker overlap within multi-speaker environments, particularly in the context of target speaker Automatic Speech Recognition (ASR). This difficulty arises predominantly from the limitations of front-end VAD modules, which are independently trained to distinguish noise from speech but often introduce <em>insertion and deletion errors</em>, adversely affecting the overall performance of the ASR system. To address this coupling deficiency, we propose an End-to-End Streaming Personal target speaker ASR (SP-ASR) framework that achieves fusion of VAD and ASR components in a streaming style. Our architecture introduces two key innovations: Initially, a Streaming Personal VAD (SP-VAD) module functions as a neural gatekeeper, segmenting audio streams while emphasizing target speaker characteristics through its Contextual Attention and Target Speaker Attention (CA-TSA) mechanism. Subsequently, a Streaming Mask-based ASR (SM-ASR) model is employed, which is integrated with SP-VAD and fine-tuned using both coarse-grained and fine-grained speaker information to extract speaker-specific transcriptions. Our experiments reveal a remarkable reduction in the concatenated target-speaker Word Error Rate (ctWER), showcasing the superiority of the End-to-End SP-ASR fusion system over conventional ASR systems, especially under conditions with significant speech overlap and noise.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105966"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMFNet: An efficient multi-scale fusion network for UAV small object detection EMFNet:用于无人机小目标检测的高效多尺度融合网络
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-27 DOI: 10.1016/j.dsp.2026.105952
Mingquan Wang , Huiying Xu , Yiming Sun , Hongbo Li , Zeyu Wang , Yi Li , Ruidong Wang , Xinzhong Zhu
Object detection in UAV aerial images holds significant application value in traffic monitoring, precision agriculture, and other fields. However, this task faces numerous challenges, including significant variations in object sizes, complex background interference, high object density, and class imbalance. Additionally, processing high-resolution aerial images involves disturbances such as uneven lighting and weather variations. To address these challenges, we propose an EMFNet model. This model effectively solves the problems in object detection in drone aerial images by enhancing the response to object areas under different lighting and weather conditions, suppressing interference from complex backgrounds, and improving adaptability to changes in image object size. Specifically, firstly, the lightweight vision transformer architecture RepViT is innovatively used as the backbone of EMFNet, combined with Dual Cross-Stage Partial Attention (DCPA) to optimize multi-scale feature fusion and background suppression, thereby enhancing small object feature extraction under varying lighting and weather conditions. Second, we propose the Context Guided Downsample Block (CGDB) to improve the downsampling process and mitigate feature information loss. Finally, the DyHead detection head utilizing the three-level attention mechanism receives three appropriately located prediction heads for classification and localization, thus improving the detection accuracy of dense and rare objects. Experiments on the VisDrone and UAVDT datasets demonstrate that EMFNet, with 6.76M parameters, achieves AP improvements of 7.5% and 15.2% over the baseline models, respectively.
无人机航拍图像中的目标检测在交通监控、精准农业等领域具有重要的应用价值。然而,这项任务面临着许多挑战,包括对象大小的显著变化、复杂的背景干扰、高对象密度和类不平衡。此外,处理高分辨率航拍图像还涉及光照不均匀和天气变化等干扰。为了应对这些挑战,我们提出了一个EMFNet模型。该模型通过增强对不同光照和天气条件下目标区域的响应,抑制复杂背景的干扰,提高对图像目标尺寸变化的适应性,有效解决了无人机航拍图像中目标检测问题。具体而言,首先,创新性地将轻量级视觉转换架构RepViT作为EMFNet的主干,结合双跨阶段局部注意(Dual Cross-Stage Partial Attention, DCPA)优化多尺度特征融合和背景抑制,从而增强不同光照和天气条件下的小目标特征提取;其次,我们提出了上下文引导下采样块(Context Guided Downsample Block, CGDB)来改进下采样过程,减少特征信息的丢失。最后,利用三级注意机制的DyHead检测头接收三个位置合适的预测头进行分类和定位,从而提高了密集和稀有物体的检测精度。在VisDrone和UAVDT数据集上的实验表明,EMFNet具有676万个参数,比基线模型分别提高了7.5%和15.2%的AP。
{"title":"EMFNet: An efficient multi-scale fusion network for UAV small object detection","authors":"Mingquan Wang ,&nbsp;Huiying Xu ,&nbsp;Yiming Sun ,&nbsp;Hongbo Li ,&nbsp;Zeyu Wang ,&nbsp;Yi Li ,&nbsp;Ruidong Wang ,&nbsp;Xinzhong Zhu","doi":"10.1016/j.dsp.2026.105952","DOIUrl":"10.1016/j.dsp.2026.105952","url":null,"abstract":"<div><div>Object detection in UAV aerial images holds significant application value in traffic monitoring, precision agriculture, and other fields. However, this task faces numerous challenges, including significant variations in object sizes, complex background interference, high object density, and class imbalance. Additionally, processing high-resolution aerial images involves disturbances such as uneven lighting and weather variations. To address these challenges, we propose an EMFNet model. This model effectively solves the problems in object detection in drone aerial images by enhancing the response to object areas under different lighting and weather conditions, suppressing interference from complex backgrounds, and improving adaptability to changes in image object size. Specifically, firstly, the lightweight vision transformer architecture RepViT is innovatively used as the backbone of EMFNet, combined with <strong>D</strong>ual <strong>C</strong>ross-Stage <strong>P</strong>artial <strong>A</strong>ttention (<strong>DCPA</strong>) to optimize multi-scale feature fusion and background suppression, thereby enhancing small object feature extraction under varying lighting and weather conditions. Second, we propose the <strong>C</strong>ontext <strong>G</strong>uided <strong>D</strong>ownsample <strong>B</strong>lock (<strong>CGDB</strong>) to improve the downsampling process and mitigate feature information loss. Finally, the DyHead detection head utilizing the three-level attention mechanism receives three appropriately located prediction heads for classification and localization, thus improving the detection accuracy of dense and rare objects. Experiments on the VisDrone and UAVDT datasets demonstrate that EMFNet, with 6.76M parameters, achieves <em>AP</em> improvements of 7.5% and 15.2% over the baseline models, respectively.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105952"},"PeriodicalIF":3.0,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Capturing HDR video in challenging light conditions by beam-splitting ratio variable multi-sensor system 利用分束比可变多传感器系统在恶劣光照条件下捕获HDR视频
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-25 DOI: 10.1016/j.dsp.2026.105956
Zhangchi Qiao , Hongwei Yi , Desheng Wen , Yong Han
Recording video in HDR scenes is challenging because it is always limited by the potential well capacity and sampling rate of the imaging sensor. The essence of this problem is how to balance the relationship between temporal resolution, spatial resolution and dynamic range. To solve this, we designed a variable beam-splitting ratio multi-sensor system (BRVMS) to capture both long and short exposure frames. It consists of a variety of configurations to meet changing light conditions. In addition, we considered motion blur from long exposures before synthesising the HDR frames. We proposed a method to estimate the blur kernel using short exposure frame constraints and add a mask to remove outliers in the overexposed area. Finally, we proposed a match-fusion method based on the two-layer 3D patch (2L3DP) to generate high-quality, detail-rich HDR frames. Extensive experimental results and ablation studies were performed to show the effectiveness of the system. By combining the BRVMS with the 2L3DP match-fusion method, we have enhanced the adaptability and performance of the vision system in high-speed, high-dynamic-range scenes to meet the growing demands of vision applications.
在HDR场景中录制视频具有挑战性,因为它总是受到成像传感器的潜在井容量和采样率的限制。该问题的实质是如何平衡时间分辨率、空间分辨率和动态范围之间的关系。为了解决这个问题,我们设计了一个可变波束分割比多传感器系统(BRVMS)来捕获长曝光和短曝光帧。它由多种配置组成,以满足不断变化的光线条件。此外,在合成HDR帧之前,我们考虑了长时间曝光的运动模糊。我们提出了一种利用短曝光帧约束来估计模糊核的方法,并在过度曝光区域添加蒙版来去除异常值。最后,我们提出了一种基于二层3D补丁(2L3DP)的匹配融合方法,以生成高质量、细节丰富的HDR帧。广泛的实验结果和烧蚀研究表明了该系统的有效性。通过将BRVMS与2L3DP匹配融合方法相结合,增强了视觉系统在高速、高动态范围场景中的适应性和性能,满足了视觉应用日益增长的需求。
{"title":"Capturing HDR video in challenging light conditions by beam-splitting ratio variable multi-sensor system","authors":"Zhangchi Qiao ,&nbsp;Hongwei Yi ,&nbsp;Desheng Wen ,&nbsp;Yong Han","doi":"10.1016/j.dsp.2026.105956","DOIUrl":"10.1016/j.dsp.2026.105956","url":null,"abstract":"<div><div>Recording video in HDR scenes is challenging because it is always limited by the potential well capacity and sampling rate of the imaging sensor. The essence of this problem is how to balance the relationship between temporal resolution, spatial resolution and dynamic range. To solve this, we designed a variable beam-splitting ratio multi-sensor system (BRVMS) to capture both long and short exposure frames. It consists of a variety of configurations to meet changing light conditions. In addition, we considered motion blur from long exposures before synthesising the HDR frames. We proposed a method to estimate the blur kernel using short exposure frame constraints and add a mask to remove outliers in the overexposed area. Finally, we proposed a match-fusion method based on the two-layer 3D patch (2L3DP) to generate high-quality, detail-rich HDR frames. Extensive experimental results and ablation studies were performed to show the effectiveness of the system. By combining the BRVMS with the 2L3DP match-fusion method, we have enhanced the adaptability and performance of the vision system in high-speed, high-dynamic-range scenes to meet the growing demands of vision applications.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105956"},"PeriodicalIF":3.0,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LPID-DAFT-YOLOv8: A lightweight high-precision contraband detection framework for X-ray security inspection LPID-DAFT-YOLOv8:用于x射线安全检查的轻型高精度违禁品检测框架
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-24 DOI: 10.1016/j.dsp.2026.105957
Fanyi Kong, Dongming Liu, Dan Shan, Hui Cao
To address the challenge of detecting small, overlapping, and occluded contraband items in complex X-ray security imagery, this paper proposes LPID-DAFT-YOLOv8, a lightweight object detection framework. The framework is designed to improve detection accuracy while maintaining real-time performance. First, a Deformable AIFI Encoder is introduced to replace the original SPPF module in YOLOv8, reducing computational overhead while enhancing semantic feature representation. Second, a Cross-Scale Fourier Convolution (CSFC) module is designed to improve multi-scale feature modeling. The CSFC integrates Multi-order Fractional Fourier Convolution (MFRFC) to jointly capture spatial structures and frequency-domain information. Third, an Inner-IoU loss function is adopted to adapt the bounding box regression scale according to IoU values, with the goal of localization accuracy and robustness. The proposed LPID-DAFT-YOLOv8 is evaluated under identical training conditions on a custom dual-energy X-ray dataset consisting of 20,000 annotated pseudo-colored images. The model achieves a mean Average Precision (mAP50) of 96.7% with an inference speed of 172.8 FPS. Comparative experiments indicate that LPID-DAFT-YOLOv8 achieves a balance between detection accuracy and inference efficiency, supporting its application in real-time contraband detection for high-throughput security screening scenarios.
为了解决在复杂的x射线安全图像中检测小的、重叠的和封闭的违禁品的挑战,本文提出了lvid - daft - yolov8,一个轻量级的物体检测框架。该框架旨在提高检测精度,同时保持实时性能。首先,在YOLOv8中引入了一个可变形的AIFI编码器来取代原有的SPPF模块,减少了计算开销,同时增强了语义特征表示。其次,设计了跨尺度傅里叶卷积(CSFC)模块,改进了多尺度特征建模;CSFC集成了多阶分数阶傅立叶卷积(MFRFC),共同捕获空间结构和频域信息。第三,采用Inner-IoU损失函数根据IoU值调整边界盒回归尺度,以保证定位精度和鲁棒性。在由20,000张带注释的伪彩色图像组成的自定义双能x射线数据集上,在相同的训练条件下对所提出的LPID-DAFT-YOLOv8进行了评估。该模型的平均精度(mAP50)为96.7%,推理速度为172.8 FPS。对比实验表明,LPID-DAFT-YOLOv8在检测精度和推理效率之间取得了平衡,支持其在高通量安检场景下的实时违禁品检测中应用。
{"title":"LPID-DAFT-YOLOv8: A lightweight high-precision contraband detection framework for X-ray security inspection","authors":"Fanyi Kong,&nbsp;Dongming Liu,&nbsp;Dan Shan,&nbsp;Hui Cao","doi":"10.1016/j.dsp.2026.105957","DOIUrl":"10.1016/j.dsp.2026.105957","url":null,"abstract":"<div><div>To address the challenge of detecting small, overlapping, and occluded contraband items in complex X-ray security imagery, this paper proposes LPID-DAFT-YOLOv8, a lightweight object detection framework. The framework is designed to improve detection accuracy while maintaining real-time performance. First, a Deformable AIFI Encoder is introduced to replace the original SPPF module in YOLOv8, reducing computational overhead while enhancing semantic feature representation. Second, a Cross-Scale Fourier Convolution (CSFC) module is designed to improve multi-scale feature modeling. The CSFC integrates Multi-order Fractional Fourier Convolution (MFRFC) to jointly capture spatial structures and frequency-domain information. Third, an Inner-IoU loss function is adopted to adapt the bounding box regression scale according to IoU values, with the goal of localization accuracy and robustness. The proposed LPID-DAFT-YOLOv8 is evaluated under identical training conditions on a custom dual-energy X-ray dataset consisting of 20,000 annotated pseudo-colored images. The model achieves a mean Average Precision (mAP50) of 96.7% with an inference speed of 172.8 FPS. Comparative experiments indicate that LPID-DAFT-YOLOv8 achieves a balance between detection accuracy and inference efficiency, supporting its application in real-time contraband detection for high-throughput security screening scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105957"},"PeriodicalIF":3.0,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A communication signal recognition method based on multi-scale feature fusion 基于多尺度特征融合的通信信号识别方法
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-24 DOI: 10.1016/j.dsp.2026.105950
Yaoyi He , An Gong , Yunlu Ge , Xiaolei Zhao , Ning Ding
Communication signal recognition is a critical technology for ensuring the security and intelligent management of wireless communication systems, with broad applications in spectrum monitoring, electronic warfare, unmanned communication, and cognitive radio. Traditional neural networks often struggle to extract signal features across different scales, leading to low recognition accuracy. This paper introduces a new model designed to solve this issue by fusing multi-scale features. The model uses a dual-branch architecture. One branch employs the Discrete Wavelet Transform (DWT) to capture features from both low and high signal frequencies. The second branch is a Bidirectional Long Short-Term Memory (BiLSTM) network that extracts temporal patterns. A gating mechanism, a bidirectional structure, and a global timestep attention mechanism all enhance the BiLSTM module’s performance. Finally, the system combines these distinct features to enable effective signal detection and recognition. Tests conducted with the Panoradio HF dataset confirm our model’s capabilities. Our proposed method attained an average recognition accuracy of 79.52%, which surpasses competing baseline models by 4.51%.
通信信号识别是确保无线通信系统安全和智能管理的关键技术,在频谱监测、电子战、无人通信和认知无线电等领域有着广泛的应用。传统的神经网络往往难以提取不同尺度的信号特征,导致识别精度较低。本文提出了一种融合多尺度特征的模型来解决这一问题。该模型使用双分支架构。一个分支采用离散小波变换(DWT)来捕获低频率和高频率信号的特征。第二个分支是提取时间模式的双向长短期记忆(BiLSTM)网络。门控机制、双向结构和全局时间步长注意机制都提高了模块的性能。最后,系统将这些不同的特征结合起来,实现有效的信号检测和识别。使用Panoradio HF数据集进行的测试证实了我们模型的能力。我们提出的方法平均识别准确率为79.52%,比竞争基准模型高出4.51%。
{"title":"A communication signal recognition method based on multi-scale feature fusion","authors":"Yaoyi He ,&nbsp;An Gong ,&nbsp;Yunlu Ge ,&nbsp;Xiaolei Zhao ,&nbsp;Ning Ding","doi":"10.1016/j.dsp.2026.105950","DOIUrl":"10.1016/j.dsp.2026.105950","url":null,"abstract":"<div><div>Communication signal recognition is a critical technology for ensuring the security and intelligent management of wireless communication systems, with broad applications in spectrum monitoring, electronic warfare, unmanned communication, and cognitive radio. Traditional neural networks often struggle to extract signal features across different scales, leading to low recognition accuracy. This paper introduces a new model designed to solve this issue by fusing multi-scale features. The model uses a dual-branch architecture. One branch employs the Discrete Wavelet Transform (DWT) to capture features from both low and high signal frequencies. The second branch is a Bidirectional Long Short-Term Memory (BiLSTM) network that extracts temporal patterns. A gating mechanism, a bidirectional structure, and a global timestep attention mechanism all enhance the BiLSTM module’s performance. Finally, the system combines these distinct features to enable effective signal detection and recognition. Tests conducted with the Panoradio HF dataset confirm our model’s capabilities. Our proposed method attained an average recognition accuracy of 79.52%, which surpasses competing baseline models by 4.51%.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105950"},"PeriodicalIF":3.0,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced feature fusion and detail-Preserving network for small object detection in medical microscopic images 基于增强特征融合和细节保留网络的医学显微图像小目标检测
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-23 DOI: 10.1016/j.dsp.2026.105938
Runtian Zheng, Congpeng Zhang, Ying Liu
Accurately detecting tiny targets in microscopic images is critical for tuberculosis screening yet remains difficult due to large shape variation, dense instances with weak semantics, and cluttered backgrounds. We curate a Mycobacterium tuberculosis dataset of 5,842 microscopic images and present EFDNet, an Enhanced Feature Fusion and Detail-Preserving detector. EFDNet combines an Adaptive Feature Enhancement module that dynamically shifts convolutional sampling to capture irregular, fine-grained patterns, a Cross-Stage Enhanced Feature Pyramid Network that fuses semantic and localization cues across scales to withstand crowding and background clutter, and a lightweight shared Detail-Enhanced detection head that preserves high-frequency structure through differential convolutions and shared parameters, together with a Normalized Wasserstein Distance loss that reduces localization sensitivity for small boxes. On our dataset, the Tuberculosis-Phonecamera dataset, and the cross-domain BBBC041 blood-cell benchmark, EFDNet achieves AP50 of 81.9%, 87.6%, and 95.2%, outperforming a strong baseline by +5.7, +3.2, and +3.9 points, respectively, while maintaining low computational cost. These results indicate robust small-object detection under varied microscopy conditions and support the practical utility of EFDNet for automated screening.
准确检测显微图像中的微小目标对于结核病筛查至关重要,但由于形状变化大,语义弱的密集实例和杂乱的背景,仍然很困难。我们整理了一个包含5842张显微图像的结核分枝杆菌数据集,并提出了EFDNet,一种增强的特征融合和细节保留检测器。EFDNet结合了一个自适应特征增强模块,该模块动态移动卷积采样以捕获不规则、细粒度的模式;一个跨阶段增强特征金字塔网络,融合跨尺度的语义和定位线索,以抵御拥挤和背景混乱;一个轻量级共享细节增强检测头,通过微分卷积和共享参数保留高频结构。加上标准化的Wasserstein距离损失,降低了小盒子的定位灵敏度。在我们的数据集(Tuberculosis-Phonecamera数据集)和跨域BBBC041血细胞基准上,EFDNet的AP50分别达到81.9%、87.6%和95.2%,分别比强基线高出+5.7、+3.2和+3.9点,同时保持较低的计算成本。这些结果表明,在不同的显微镜条件下,小物体检测是可靠的,并支持EFDNet在自动筛选中的实际应用。
{"title":"Enhanced feature fusion and detail-Preserving network for small object detection in medical microscopic images","authors":"Runtian Zheng,&nbsp;Congpeng Zhang,&nbsp;Ying Liu","doi":"10.1016/j.dsp.2026.105938","DOIUrl":"10.1016/j.dsp.2026.105938","url":null,"abstract":"<div><div>Accurately detecting tiny targets in microscopic images is critical for tuberculosis screening yet remains difficult due to large shape variation, dense instances with weak semantics, and cluttered backgrounds. We curate a Mycobacterium tuberculosis dataset of 5,842 microscopic images and present EFDNet, an Enhanced Feature Fusion and Detail-Preserving detector. EFDNet combines an Adaptive Feature Enhancement module that dynamically shifts convolutional sampling to capture irregular, fine-grained patterns, a Cross-Stage Enhanced Feature Pyramid Network that fuses semantic and localization cues across scales to withstand crowding and background clutter, and a lightweight shared Detail-Enhanced detection head that preserves high-frequency structure through differential convolutions and shared parameters, together with a Normalized Wasserstein Distance loss that reduces localization sensitivity for small boxes. On our dataset, the Tuberculosis-Phonecamera dataset, and the cross-domain BBBC041 blood-cell benchmark, EFDNet achieves <em>AP</em><sub>50</sub> of 81.9%, 87.6%, and 95.2%, outperforming a strong baseline by <span><math><mrow><mo>+</mo><mn>5.7</mn></mrow></math></span>, <span><math><mrow><mo>+</mo><mn>3.2</mn></mrow></math></span>, and <span><math><mrow><mo>+</mo><mn>3.9</mn></mrow></math></span> points, respectively, while maintaining low computational cost. These results indicate robust small-object detection under varied microscopy conditions and support the practical utility of EFDNet for automated screening.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105938"},"PeriodicalIF":3.0,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid transfer semantic segmentation architecture for hyperspectral image classification 高光谱图像分类的混合传递语义分割体系结构
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-22 DOI: 10.1016/j.dsp.2025.105852
Huaiping Yan , Yupeng Hou , Chengcai Leng , Yilin Li , Yang Li
Hyperspectral image (HSI) classification is a research hotspot in the field of remote sensing image processing. Deep learning-based methods have gradually become one of the mainstream in the field of HSI classification. However, deep learning-based HSI classification methods still face the challenge of insufficient training samples. Transfer learning is regarded as an effective method to alleviate the problem of insufficient samples. However, hyperspectral image data is scarce, lacking the foundation for pre-training high-quality models. In this paper, a Hybrid Transfer Semantic Segmentation Architecture (HTSSA) is proposed, which transfers knowledge from different datasets by adopting different network structures. The proposed model adopts a triple branch network architecture. The three branches respectively use the vision transformer (ViT) classification model pre-trained on ImageNet, the Deeplabv3 semantic segmentation model pre-trained on the PASCAL VOC 2012 dataset, and the convolutional neural network (CNN) model pre-trained on the source hyperspectral image dataset. The three branch network models were fine-tuned on the target hyperspectral image dataset. The mapping modules were designed to handle the problem of heterogeneous data migration. The ViT branch utilizes the Transformer to extract spatial global context features. The Deeplabv3 branch utilizes the feature pyramid to extract spatial local multi-scale features. The CNN branch uses 3D-CNN to extract the spectral features of hyperspectral images. Finally, the final classification result is obtained by using the fusion features of the three branches. Extensive experiments on public datasets have verified that the Hybrid Transfer Semantic Segmentation Architecture proposed in this paper has alleviated the negative impact of sample scarcity to a certain extent, enhanced the representation ability of the model, and improved the final classification performance.
高光谱图像分类是遥感图像处理领域的一个研究热点。基于深度学习的方法已逐渐成为HSI分类领域的主流方法之一。然而,基于深度学习的HSI分类方法仍然面临训练样本不足的挑战。迁移学习被认为是缓解样本不足问题的有效方法。然而,高光谱图像数据稀缺,缺乏预训练高质量模型的基础。本文提出了一种混合传输语义分割架构(HTSSA),该架构采用不同的网络结构对不同数据集的知识进行传输。该模型采用三分支网络结构。这三个分支分别使用在ImageNet上预训练的视觉变换(vision transformer, ViT)分类模型、在PASCAL VOC 2012数据集上预训练的Deeplabv3语义分割模型和在源高光谱图像数据集上预训练的卷积神经网络(convolutional neural network, CNN)模型。在目标高光谱图像数据集上对三种分支网络模型进行微调。映射模块的设计是为了解决异构数据迁移问题。ViT分支利用Transformer提取空间全局上下文特征。Deeplabv3分支利用特征金字塔提取空间局部多尺度特征。CNN分支使用3D-CNN提取高光谱图像的光谱特征。最后,利用三个分支的融合特征得到最终的分类结果。在公共数据集上的大量实验验证了本文提出的混合迁移语义分割架构在一定程度上缓解了样本稀缺性的负面影响,增强了模型的表示能力,提高了最终的分类性能。
{"title":"Hybrid transfer semantic segmentation architecture for hyperspectral image classification","authors":"Huaiping Yan ,&nbsp;Yupeng Hou ,&nbsp;Chengcai Leng ,&nbsp;Yilin Li ,&nbsp;Yang Li","doi":"10.1016/j.dsp.2025.105852","DOIUrl":"10.1016/j.dsp.2025.105852","url":null,"abstract":"<div><div>Hyperspectral image (HSI) classification is a research hotspot in the field of remote sensing image processing. Deep learning-based methods have gradually become one of the mainstream in the field of HSI classification. However, deep learning-based HSI classification methods still face the challenge of insufficient training samples. Transfer learning is regarded as an effective method to alleviate the problem of insufficient samples. However, hyperspectral image data is scarce, lacking the foundation for pre-training high-quality models. In this paper, a Hybrid Transfer Semantic Segmentation Architecture (HTSSA) is proposed, which transfers knowledge from different datasets by adopting different network structures. The proposed model adopts a triple branch network architecture. The three branches respectively use the vision transformer (ViT) classification model pre-trained on ImageNet, the Deeplabv3 semantic segmentation model pre-trained on the PASCAL VOC 2012 dataset, and the convolutional neural network (CNN) model pre-trained on the source hyperspectral image dataset. The three branch network models were fine-tuned on the target hyperspectral image dataset. The mapping modules were designed to handle the problem of heterogeneous data migration. The ViT branch utilizes the Transformer to extract spatial global context features. The Deeplabv3 branch utilizes the feature pyramid to extract spatial local multi-scale features. The CNN branch uses 3D-CNN to extract the spectral features of hyperspectral images. Finally, the final classification result is obtained by using the fusion features of the three branches. Extensive experiments on public datasets have verified that the Hybrid Transfer Semantic Segmentation Architecture proposed in this paper has alleviated the negative impact of sample scarcity to a certain extent, enhanced the representation ability of the model, and improved the final classification performance.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105852"},"PeriodicalIF":3.0,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FETrack: One-stream framework-based feature enhancement for object tracking FETrack:基于单流框架的目标跟踪功能增强
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-22 DOI: 10.1016/j.dsp.2026.105935
Yue Chen , Huiying Xu , Xinzhong Zhu , Xuedong He , Hongbo Li , Yi Li
Vision Transformer (ViT)-based one-stream architectures have emerged as the dominant framework for object tracking. However, their performance is hampered by similar object interference and background distractions. To address these limitations, this paper proposes FETrack, a one-stream tracker designed to enhance feature discriminability for improved object tracking. The core innovations of FETrack are as follows: 1) Global Enhancement (GE) and Cross-Depth Template Fusion (CDTF) modules, where the GE module adopts a novel global feature extraction mechanism to suppress background interference, and the CDTF module ensures efficient propagation of contextual information via cross-depth template fusion. 2) An unsupervised hard sample learning strategy, which introduces contrastive learning and treats each candidate token as an independent instance by leveraging its inherent hard sample properties, thereby enhancing feature discriminability. 3) A distillation-based fine-tuning approach that guides parameter optimization for the entire backbone network through feature distillation, enabling efficient tuning of newly integrated modules and ensuring their synergy with the original architecture. Experimental results on six benchmark datasets demonstrate the effectiveness of FETrack and confirm its state-of-the-art performance. Furthermore, the transferability of the proposed approaches for enhancing other one-stream trackers is validated.
基于视觉转换器(Vision Transformer, ViT)的单流架构已经成为目标跟踪的主流框架。然而,它们的性能受到类似物体干扰和背景干扰的阻碍。为了解决这些限制,本文提出了FETrack,一种单流跟踪器,旨在增强特征可辨别性以改进目标跟踪。FETrack的核心创新点有:1)全局增强(GE)和跨深度模板融合(CDTF)模块,其中GE模块采用新颖的全局特征提取机制来抑制背景干扰,CDTF模块通过跨深度模板融合确保上下文信息的高效传播。2)无监督硬样本学习策略,引入对比学习,利用其固有的硬样本属性将每个候选令牌视为一个独立的实例,从而增强特征的可判别性。3)基于蒸馏的微调方法,通过特征蒸馏指导整个骨干网的参数优化,实现新集成模块的高效调优,并保证其与原有架构的协同。在六个基准数据集上的实验结果证明了FETrack的有效性,并验证了其最先进的性能。此外,还验证了所提方法对其他单流跟踪器的可移植性。
{"title":"FETrack: One-stream framework-based feature enhancement for object tracking","authors":"Yue Chen ,&nbsp;Huiying Xu ,&nbsp;Xinzhong Zhu ,&nbsp;Xuedong He ,&nbsp;Hongbo Li ,&nbsp;Yi Li","doi":"10.1016/j.dsp.2026.105935","DOIUrl":"10.1016/j.dsp.2026.105935","url":null,"abstract":"<div><div>Vision Transformer (ViT)-based one-stream architectures have emerged as the dominant framework for object tracking. However, their performance is hampered by similar object interference and background distractions. To address these limitations, this paper proposes FETrack, a one-stream tracker designed to enhance feature discriminability for improved object tracking. The core innovations of FETrack are as follows: 1) Global Enhancement (GE) and Cross-Depth Template Fusion (CDTF) modules, where the GE module adopts a novel global feature extraction mechanism to suppress background interference, and the CDTF module ensures efficient propagation of contextual information via cross-depth template fusion. 2) An unsupervised hard sample learning strategy, which introduces contrastive learning and treats each candidate token as an independent instance by leveraging its inherent hard sample properties, thereby enhancing feature discriminability. 3) A distillation-based fine-tuning approach that guides parameter optimization for the entire backbone network through feature distillation, enabling efficient tuning of newly integrated modules and ensuring their synergy with the original architecture. Experimental results on six benchmark datasets demonstrate the effectiveness of FETrack and confirm its state-of-the-art performance. Furthermore, the transferability of the proposed approaches for enhancing other one-stream trackers is validated.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105935"},"PeriodicalIF":3.0,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KADNet:Low SNR automatic modulation classification via SNR aware deformable convolution and Kolmogorov-Arnold networks 低信噪比自动调制分类通过信噪比感知的可变形卷积和Kolmogorov-Arnold网络
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-21 DOI: 10.1016/j.dsp.2026.105942
Run Wang , Jizhe Li , Youze Yang , Shasha Wang , Bing Zheng
The proliferation of modern communication technologies has precipitated increasingly sophisticated electromagnetic environments, demanding more rigorous performance from Automatic Modulation Classification (AMC) systems, especially in low signal-to-noise ratio (SNR) scenarios where conventional approaches struggle with feature extraction and classification fidelity. In response, we propose KADNet, a novel architecture tailored for AMC in low-SNR scenarios.KADNet comprises two key components: a Signal Enhancement Module (SEM) and an SNR-Aware Deformable Convolutional Network (SADCN).In the SEM, time-domain I/Q samples are first projected into the frequency domain via the fast Fourier transform (FFT). A spectral weighting mask is then generated by a Kolmogorov-Arnold Network (KAN), enabling precise attenuation of noise and amplification of decision-relevant signal components. Subsequently, the SADCN employs a lightweight subnetwork to estimate a soft SNR map, which is then fused into deformable convolution operations via a Signal Quality Spatial Attention (SQSA) mechanism. This fusion produces secondary spatial offsets and modulation-adaptive weights, allowing sampling grids to adjust dynamically in response to local signal quality. Extensive experiments on the RADIOML 2016.10A/B benchmarks demonstrate the effectiveness of our design: KADNet achieves mean classification accuracies of 64.66 percent and 65.58 percent, corresponding to improvements of 2.04 percent and 0.56 percent over baseline methods. Moreover, within the extremely low-SNR range of -20 dB to -2 dB, KADNet attains average accuracies of 36.86 percent and 37.92 percent, surpassing the current state of the art by 3.0 percent to 3.8 percent. This significant improvement over the current state-of-the-art in the most challenging SNR conditions confirms that KADNet is a superior AMC method in low-SNR conditions.
现代通信技术的发展导致电磁环境日益复杂,对自动调制分类(AMC)系统的性能提出了更高的要求,特别是在低信噪比(SNR)的情况下,传统方法难以实现特征提取和分类保真度。因此,我们提出了KADNet,这是一种为低信噪比场景下的AMC量身定制的新架构。KADNet包括两个关键组件:信号增强模块(SEM)和感知信噪比的可变形卷积网络(SADCN)。在扫描电镜中,时域I/Q样本首先通过快速傅里叶变换(FFT)投射到频域。然后由Kolmogorov-Arnold网络(KAN)生成一个频谱加权掩模,从而实现对噪声的精确衰减和与决策相关的信号分量的放大。随后,SADCN采用轻量级子网来估计软信噪比映射,然后通过信号质量空间注意(SQSA)机制将其融合到可变形卷积操作中。这种融合产生二次空间偏移和调制自适应权重,允许采样网格动态调整以响应本地信号质量。在RADIOML 2016.10A/B基准测试上的大量实验证明了我们设计的有效性:KADNet实现了64.66%和65.58%的平均分类准确率,相对于基线方法提高了2.04%和0.56%。此外,在-20 dB至-2 dB的极低信噪比范围内,KADNet的平均精度达到36.86%和37.92%,比目前的技术水平高出3.0%至3.8%。在最具挑战性的信噪比条件下,与目前最先进的技术相比,这一重大改进证实了KADNet在低信噪比条件下是一种优越的AMC方法。
{"title":"KADNet:Low SNR automatic modulation classification via SNR aware deformable convolution and Kolmogorov-Arnold networks","authors":"Run Wang ,&nbsp;Jizhe Li ,&nbsp;Youze Yang ,&nbsp;Shasha Wang ,&nbsp;Bing Zheng","doi":"10.1016/j.dsp.2026.105942","DOIUrl":"10.1016/j.dsp.2026.105942","url":null,"abstract":"<div><div>The proliferation of modern communication technologies has precipitated increasingly sophisticated electromagnetic environments, demanding more rigorous performance from Automatic Modulation Classification (AMC) systems, especially in low signal-to-noise ratio (SNR) scenarios where conventional approaches struggle with feature extraction and classification fidelity. In response, we propose KADNet, a novel architecture tailored for AMC in low-SNR scenarios.KADNet comprises two key components: a Signal Enhancement Module (SEM) and an SNR-Aware Deformable Convolutional Network (SADCN).In the SEM, time-domain I/Q samples are first projected into the frequency domain via the fast Fourier transform (FFT). A spectral weighting mask is then generated by a Kolmogorov-Arnold Network (KAN), enabling precise attenuation of noise and amplification of decision-relevant signal components. Subsequently, the SADCN employs a lightweight subnetwork to estimate a soft SNR map, which is then fused into deformable convolution operations via a Signal Quality Spatial Attention (SQSA) mechanism. This fusion produces secondary spatial offsets and modulation-adaptive weights, allowing sampling grids to adjust dynamically in response to local signal quality. Extensive experiments on the RADIOML 2016.10A/B benchmarks demonstrate the effectiveness of our design: KADNet achieves mean classification accuracies of 64.66 percent and 65.58 percent, corresponding to improvements of 2.04 percent and 0.56 percent over baseline methods. Moreover, within the extremely low-SNR range of -20 dB to -2 dB, KADNet attains average accuracies of 36.86 percent and 37.92 percent, surpassing the current state of the art by 3.0 percent to 3.8 percent. This significant improvement over the current state-of-the-art in the most challenging SNR conditions confirms that KADNet is a superior AMC method in low-SNR conditions.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"174 ","pages":"Article 105942"},"PeriodicalIF":3.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1