Pub Date : 2026-04-01Epub Date: 2026-01-06DOI: 10.1016/j.dsp.2025.105876
Wei Lu , Junzheng Jiang , Yaojun Wu , Yinghui Quan
The high sidelobes of radar echo signals after pulse compression have been shown to adversely affect anti-jamming performance. This paper proposes a waveform design method based on a proximal optimization framework. The methodology begins by establishing a model for minimizing the integrated sidelobe level (ISL), leveraging its direct correlation with waveform sidelobe energy. The alternating direction method of multipliers (ADMM) algorithm is employed to solve the formulated problem. During ADMM iterations, the proximal operator quantizes the waveform’s phase components to predefined discrete phases. Experimental results demonstrate that the proposed algorithm achieves superior optimization performance compared to existing design techniques.
{"title":"Quantized phase-coded waveform design by using ADMM with proximal operator","authors":"Wei Lu , Junzheng Jiang , Yaojun Wu , Yinghui Quan","doi":"10.1016/j.dsp.2025.105876","DOIUrl":"10.1016/j.dsp.2025.105876","url":null,"abstract":"<div><div>The high sidelobes of radar echo signals after pulse compression have been shown to adversely affect anti-jamming performance. This paper proposes a waveform design method based on a proximal optimization framework. The methodology begins by establishing a model for minimizing the integrated sidelobe level (ISL), leveraging its direct correlation with waveform sidelobe energy. The alternating direction method of multipliers (ADMM) algorithm is employed to solve the formulated problem. During ADMM iterations, the proximal operator quantizes the waveform’s phase components to predefined discrete phases. Experimental results demonstrate that the proposed algorithm achieves superior optimization performance compared to existing design techniques.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105876"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-20DOI: 10.1016/j.dsp.2026.105943
Su Nguyen Quoc , Phan Van Tri , Ba Cao Nguyen , Bui Vu Minh , Nguyen Huu Khanh Nhan
This paper proposes the use of multiple antennas (MA) and rate-splitting multiple access (RSMA) to enhance the secrecy performance of a multi-user (MU) wireless system under practical constraints, including imperfect successive interference cancellation (iSIC) and imperfect channel state information (iCSI). Closed-form and analytical expressions of the secrecy outage probability (SOP) and ergodic secrecy capacity (ESC) of both common and private messages are obtained and validated through extensive Monte-Carlo simulations. The results reveal several key insights into the influence of system parameters on secrecy performance. In particular, it is shown that iCSI at the eavesdropper can significantly enhance secrecy, particularly when the estimation error is large, as it limits the eavesdropper’s decoding capability. Furthermore, increasing the number of transmit antennas substantially improves the SOP and ESC of private messages due to enhanced spatial diversity. However, the ESC of the common message does not always benefit from additional antennas; instead, it reaches an optimal value beyond which performance may degrade due to signal leakage or increased interference. Additionally, the impacts of imperfect SIC, power allocation, bandwidth, and operating frequency on the SOP and ESC are thoroughly examined. The findings highlight the importance of jointly using key system parameters such as transmit power, bandwidth, frequency allocation, and antenna configuration while accounting for CSI imperfections, to ensure secure and reliable RSMA-based communication.
{"title":"Enhancing secrecy performance in multi-user multi-antenna systems using rate-splitting multiple access","authors":"Su Nguyen Quoc , Phan Van Tri , Ba Cao Nguyen , Bui Vu Minh , Nguyen Huu Khanh Nhan","doi":"10.1016/j.dsp.2026.105943","DOIUrl":"10.1016/j.dsp.2026.105943","url":null,"abstract":"<div><div>This paper proposes the use of multiple antennas (MA) and rate-splitting multiple access (RSMA) to enhance the secrecy performance of a multi-user (MU) wireless system under practical constraints, including imperfect successive interference cancellation (iSIC) and imperfect channel state information (iCSI). Closed-form and analytical expressions of the secrecy outage probability (SOP) and ergodic secrecy capacity (ESC) of both common and private messages are obtained and validated through extensive Monte-Carlo simulations. The results reveal several key insights into the influence of system parameters on secrecy performance. In particular, it is shown that iCSI at the eavesdropper can significantly enhance secrecy, particularly when the estimation error is large, as it limits the eavesdropper’s decoding capability. Furthermore, increasing the number of transmit antennas substantially improves the SOP and ESC of private messages due to enhanced spatial diversity. However, the ESC of the common message does not always benefit from additional antennas; instead, it reaches an optimal value beyond which performance may degrade due to signal leakage or increased interference. Additionally, the impacts of imperfect SIC, power allocation, bandwidth, and operating frequency on the SOP and ESC are thoroughly examined. The findings highlight the importance of jointly using key system parameters such as transmit power, bandwidth, frequency allocation, and antenna configuration while accounting for CSI imperfections, to ensure secure and reliable RSMA-based communication.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105943"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-20DOI: 10.1016/j.dsp.2026.105946
Xiaoruo Li , Hongwei Ding , Yuanjing Zhu
A fundamental challenge in aerial image object detection lies in accurately identifying multi-scale features within complex backgrounds, characterized by substantial scale variations and inconsistent object distribution. However, existing approaches frequently fail to effectively incorporate edge information, which is critical for precise object localization and remains a major obstacle to improving detection accuracy in aerial imagery. To address this challenge, we propose the Wavelet-guided Multi-scale Edge Fusion Network (WMEFNet) for Aerial Object Detection. Our method begins with the construction of a Feature Edge Perception Backbone Network (FEPBN), in which an edge extractor is embedded into the shallow layers to enhance fine-grained feature representations through a cross-channel fusion strategy. Subsequently, we introduce the Wavelet-Context Fusion Pyramid Network (WCFPN), which integrates edge-aware cues and semantic features from diverse receptive fields, thereby improving the model’s contextual understanding and its adaptability to scale and resolution variations. Furthermore, we design the Wavelet Upsampling Feature Fusion Module (WUFF) and the Wavelet Downsampling Module (WDM), which minimize information loss during sampling operations, enhance the model’s sensitivity to small targets, and preserve crucial edge details. Collectively, the proposed architecture substantially enhances the model’s capability to capture and fuse multi-scale edge features. Extensive experiments show that WMEFNet improves mAP50 by 2.2% (39.1% vs. 36.9%) over RT-DETR on the VisDrone2019-test dataset while maintaining real-time performance. Further results on multiple benchmarks confirm its high accuracy, efficiency, and practical utility for aerial object detection.
{"title":"Wavelet-guided multi-scale edge fusion network for aerial object detection","authors":"Xiaoruo Li , Hongwei Ding , Yuanjing Zhu","doi":"10.1016/j.dsp.2026.105946","DOIUrl":"10.1016/j.dsp.2026.105946","url":null,"abstract":"<div><div>A fundamental challenge in aerial image object detection lies in accurately identifying multi-scale features within complex backgrounds, characterized by substantial scale variations and inconsistent object distribution. However, existing approaches frequently fail to effectively incorporate edge information, which is critical for precise object localization and remains a major obstacle to improving detection accuracy in aerial imagery. To address this challenge, we propose the Wavelet-guided Multi-scale Edge Fusion Network (WMEFNet) for Aerial Object Detection. Our method begins with the construction of a Feature Edge Perception Backbone Network (FEPBN), in which an edge extractor is embedded into the shallow layers to enhance fine-grained feature representations through a cross-channel fusion strategy. Subsequently, we introduce the Wavelet-Context Fusion Pyramid Network (WCFPN), which integrates edge-aware cues and semantic features from diverse receptive fields, thereby improving the model’s contextual understanding and its adaptability to scale and resolution variations. Furthermore, we design the Wavelet Upsampling Feature Fusion Module (WUFF) and the Wavelet Downsampling Module (WDM), which minimize information loss during sampling operations, enhance the model’s sensitivity to small targets, and preserve crucial edge details. Collectively, the proposed architecture substantially enhances the model’s capability to capture and fuse multi-scale edge features. Extensive experiments show that WMEFNet improves mAP50 by 2.2% (39.1% vs. 36.9%) over RT-DETR on the VisDrone2019-test dataset while maintaining real-time performance. Further results on multiple benchmarks confirm its high accuracy, efficiency, and practical utility for aerial object detection.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105946"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-21DOI: 10.1016/j.dsp.2026.105944
Haoyan Yang , Qianyin Wei , Tianchuan Yang , Jipeng Guo
Graph-based multi-view clustering (MVGC) has aroused interest as it can exploit consistent and complementary information from multiple perspectives. The quality of the constructed similarity graph largely determines the clustering performance of MVGC. Many existing methods directly apply the acquired similarity graph for spectral clustering, ignoring the massive inter-cluster similarities in the graph, influencing cluster partition. Constructing the k-nearest neighbors (KNN) sparse graph to remove inter-cluster similarities is a common improvement. However, kNN graph requires extensive tuning of the parameter k. To solve this, we propose a graph-based multi-view clustering method based on the adaptive sparse graph (MNV-MC). Specifically, an initial similarity graph is obtained by a low-rank tensor learning framework. Then, the heuristic method, Mutual Nearest Neighbor Value (MNV), is proposed to adaptively select the optimal k based on density changes to construct the high-quality sparse similarity graph. After processing by the fusion mechanism, the graph is input into spectral clustering to obtain clustering results. Experiments indicate that MNV-MC achieves outstanding performance, and the effectiveness of MNV for adaptively k-value selection of KNN graph is verified. Specifically, MNV-MC achieves average improvements of 7.79% in ACC and 5.16% in NMI over the second-best method across eight datasets, and gains of 7.29% and 5.79% on four additional large-scale datasets. Notably, as a parameter-free post-processing step, MNV can be easily integrated to other MVGCs. Experiments show that MVGC methods significantly improve their performance after applying MNV. The code is publicly available at https://github.com/ytccyw/MNVMC.
{"title":"Adaptive sparse graph for multi-view clustering","authors":"Haoyan Yang , Qianyin Wei , Tianchuan Yang , Jipeng Guo","doi":"10.1016/j.dsp.2026.105944","DOIUrl":"10.1016/j.dsp.2026.105944","url":null,"abstract":"<div><div>Graph-based multi-view clustering (MVGC) has aroused interest as it can exploit consistent and complementary information from multiple perspectives. The quality of the constructed similarity graph largely determines the clustering performance of MVGC. Many existing methods directly apply the acquired similarity graph for spectral clustering, ignoring the massive inter-cluster similarities in the graph, influencing cluster partition. Constructing the <em>k</em>-nearest neighbors (KNN) sparse graph to remove inter-cluster similarities is a common improvement. However, kNN graph requires extensive tuning of the parameter <em>k</em>. To solve this, we propose a graph-based multi-view clustering method based on the adaptive sparse graph (MNV-MC). Specifically, an initial similarity graph is obtained by a low-rank tensor learning framework. Then, the heuristic method, Mutual Nearest Neighbor Value (MNV), is proposed to adaptively select the optimal <em>k</em> based on density changes to construct the high-quality sparse similarity graph. After processing by the fusion mechanism, the graph is input into spectral clustering to obtain clustering results. Experiments indicate that MNV-MC achieves outstanding performance, and the effectiveness of MNV for adaptively <em>k</em>-value selection of KNN graph is verified. Specifically, MNV-MC achieves average improvements of 7.79% in ACC and 5.16% in NMI over the second-best method across eight datasets, and gains of 7.29% and 5.79% on four additional large-scale datasets. Notably, as a parameter-free post-processing step, MNV can be easily integrated to other MVGCs. Experiments show that MVGC methods significantly improve their performance after applying MNV. The code is publicly available at <span><span>https://github.com/ytccyw/MNVMC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105944"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ocean information perception based on artificial intelligence is driving the innovative advancements in comprehensive sea observation. The underwater acoustic communication, as the neural link for ocean information interconnection, is susceptible to various interferences such as complex ocean environments and unstable communications. Considering the measurement errors caused by noisy hydroacoustic signals, this paper proposes a tensor low-rank sparse representation by nonconvex regularization (TLSRNR) model for hydroacoustic intelligent recovery. Firstly, the hydroacoustic original tensor mapped by multidimensional hydroacoustic data is decomposed into hydroacoustic sparse tensor, and hydroacoustic target tensor obtained by the t-product of hydroacoustic dictionary tensor and coefficient tensor. Secondly, the nonconvex penalty function is introduced to reduce the approximation error in the tubal rank of coefficient tensor, while the inherent deviation of hydroacoustic sparse tensor is solved by smoothly clipped absolute deviation. Thirdly, the alternating direction method of multipliers is employed to solve proposed TLSRNR model efficiently for recovering the hydroacoustic target tensor. Through simulation experiments and platform lake trials, the recovery performance of noisy hydroacoustic data is evaluated under different algorithms, demonstrating that the proposed model achieves superior accuracy and robustness.
{"title":"Intelligent recovery of low-rank sparse tensor for noisy hydroacoustic with use of nonconvex regularization","authors":"Yuhang Mei, Chengming Luo, Jinqing Cao, Zizhuo Liu, Yongshuai Fei, Fantong Kong, Biao Wang","doi":"10.1016/j.dsp.2026.105927","DOIUrl":"10.1016/j.dsp.2026.105927","url":null,"abstract":"<div><div>Ocean information perception based on artificial intelligence is driving the innovative advancements in comprehensive sea observation. The underwater acoustic communication, as the neural link for ocean information interconnection, is susceptible to various interferences such as complex ocean environments and unstable communications. Considering the measurement errors caused by noisy hydroacoustic signals, this paper proposes a tensor low-rank sparse representation by nonconvex regularization (TLSRNR) model for hydroacoustic intelligent recovery. Firstly, the hydroacoustic original tensor mapped by multidimensional hydroacoustic data is decomposed into hydroacoustic sparse tensor, and hydroacoustic target tensor obtained by the <em>t</em>-product of hydroacoustic dictionary tensor and coefficient tensor. Secondly, the nonconvex penalty function is introduced to reduce the approximation error in the tubal rank of coefficient tensor, while the inherent deviation of hydroacoustic sparse tensor is solved by smoothly clipped absolute deviation. Thirdly, the alternating direction method of multipliers is employed to solve proposed TLSRNR model efficiently for recovering the hydroacoustic target tensor. Through simulation experiments and platform lake trials, the recovery performance of noisy hydroacoustic data is evaluated under different algorithms, demonstrating that the proposed model achieves superior accuracy and robustness.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105927"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-24DOI: 10.1016/j.dsp.2026.105957
Fanyi Kong, Dongming Liu, Dan Shan, Hui Cao
To address the challenge of detecting small, overlapping, and occluded contraband items in complex X-ray security imagery, this paper proposes LPID-DAFT-YOLOv8, a lightweight object detection framework. The framework is designed to improve detection accuracy while maintaining real-time performance. First, a Deformable AIFI Encoder is introduced to replace the original SPPF module in YOLOv8, reducing computational overhead while enhancing semantic feature representation. Second, a Cross-Scale Fourier Convolution (CSFC) module is designed to improve multi-scale feature modeling. The CSFC integrates Multi-order Fractional Fourier Convolution (MFRFC) to jointly capture spatial structures and frequency-domain information. Third, an Inner-IoU loss function is adopted to adapt the bounding box regression scale according to IoU values, with the goal of localization accuracy and robustness. The proposed LPID-DAFT-YOLOv8 is evaluated under identical training conditions on a custom dual-energy X-ray dataset consisting of 20,000 annotated pseudo-colored images. The model achieves a mean Average Precision (mAP50) of 96.7% with an inference speed of 172.8 FPS. Comparative experiments indicate that LPID-DAFT-YOLOv8 achieves a balance between detection accuracy and inference efficiency, supporting its application in real-time contraband detection for high-throughput security screening scenarios.
{"title":"LPID-DAFT-YOLOv8: A lightweight high-precision contraband detection framework for X-ray security inspection","authors":"Fanyi Kong, Dongming Liu, Dan Shan, Hui Cao","doi":"10.1016/j.dsp.2026.105957","DOIUrl":"10.1016/j.dsp.2026.105957","url":null,"abstract":"<div><div>To address the challenge of detecting small, overlapping, and occluded contraband items in complex X-ray security imagery, this paper proposes LPID-DAFT-YOLOv8, a lightweight object detection framework. The framework is designed to improve detection accuracy while maintaining real-time performance. First, a Deformable AIFI Encoder is introduced to replace the original SPPF module in YOLOv8, reducing computational overhead while enhancing semantic feature representation. Second, a Cross-Scale Fourier Convolution (CSFC) module is designed to improve multi-scale feature modeling. The CSFC integrates Multi-order Fractional Fourier Convolution (MFRFC) to jointly capture spatial structures and frequency-domain information. Third, an Inner-IoU loss function is adopted to adapt the bounding box regression scale according to IoU values, with the goal of localization accuracy and robustness. The proposed LPID-DAFT-YOLOv8 is evaluated under identical training conditions on a custom dual-energy X-ray dataset consisting of 20,000 annotated pseudo-colored images. The model achieves a mean Average Precision (mAP50) of 96.7% with an inference speed of 172.8 FPS. Comparative experiments indicate that LPID-DAFT-YOLOv8 achieves a balance between detection accuracy and inference efficiency, supporting its application in real-time contraband detection for high-throughput security screening scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105957"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-13DOI: 10.1016/j.dsp.2026.105913
Yihan Wang , Yongfang Wang , Zhijun Fang , Tengyao Cui
Existing Point Cloud Geometry Compression (PCGC) methods often inadequately handle non-uniform point density and fail to fully exploit multi-scale contextual features, limiting their efficiency and reconstruction quality. To bridge this gap, we argue that an effective solution must jointly addresses local geometric adaptation and the aggregation of multi-scale contextual features. Accordingly, we propose a novel PCGC method, consisting of Global-Local Feature Extraction Network (GLFE-Net), Multi-scale Feature Enhancement Network (MFE-Net), and Coordinates Reconstruction based on Offset (CRO). The GLFE-Net incorporates Local Adaptive Density (LAD) to address the non-uniform density distribution and Global-Local Context Differential (GLCD) module to fuse local and global features. The MFE-Net employs the Feature Extraction based on Offset-attention (FEO) module to enhance the feature expression ability, and utilizes the Multi-scale Semantics Fusion (MSF) module to optimize the multi-scale feature fusion. The CRO module utilizes the learnable offset mechanism for high-fidelity reconstruction. Experimental results demonstrate that our method achieves significant improvements, with Peak Signal-to-Noise Ratio (PSNR) gains of up to 29.25 dB (D1) and 27.31 dB (D2) over the existing PCGC methods. This work provides an effective solution for high performance PCGC method by jointly addressing the key challenges of density adaptation and multi-scale feature learning.
现有的点云几何压缩(PCGC)方法往往不能充分处理非均匀点密度,不能充分利用多尺度上下文特征,限制了其效率和重建质量。为了弥补这一差距,我们认为一个有效的解决方案必须同时解决局部几何适应和多尺度上下文特征的聚集。为此,我们提出了一种新的PCGC方法,包括全局局部特征提取网络(GLFE-Net)、多尺度特征增强网络(MFE-Net)和基于偏移量的坐标重建(CRO)。GLFE-Net采用局部自适应密度(LAD)来解决密度分布不均匀的问题,采用全局-局部上下文差分(GLCD)模块来融合局部和全局特征。MFE-Net采用基于偏移注意力的特征提取(FEO)模块来增强特征表达能力,并利用多尺度语义融合(MSF)模块来优化多尺度特征融合。CRO模块利用可学习偏移机制实现高保真重建。实验结果表明,我们的方法取得了显著的改进,与现有的PCGC方法相比,峰值信噪比(PSNR)增益高达29.25 dB (D1)和27.31 dB (D2)。该工作通过共同解决密度自适应和多尺度特征学习的关键挑战,为高性能PCGC方法提供了有效的解决方案。
{"title":"Towards point cloud geometry compression via global-local and multi-scale feature learning","authors":"Yihan Wang , Yongfang Wang , Zhijun Fang , Tengyao Cui","doi":"10.1016/j.dsp.2026.105913","DOIUrl":"10.1016/j.dsp.2026.105913","url":null,"abstract":"<div><div>Existing Point Cloud Geometry Compression (PCGC) methods often inadequately handle non-uniform point density and fail to fully exploit multi-scale contextual features, limiting their efficiency and reconstruction quality. To bridge this gap, we argue that an effective solution must jointly addresses local geometric adaptation and the aggregation of multi-scale contextual features. Accordingly, we propose a novel PCGC method, consisting of Global-Local Feature Extraction Network (GLFE-Net), Multi-scale Feature Enhancement Network (MFE-Net), and Coordinates Reconstruction based on Offset (CRO). The GLFE-Net incorporates Local Adaptive Density (LAD) to address the non-uniform density distribution and Global-Local Context Differential (GLCD) module to fuse local and global features. The MFE-Net employs the Feature Extraction based on Offset-attention (FEO) module to enhance the feature expression ability, and utilizes the Multi-scale Semantics Fusion (MSF) module to optimize the multi-scale feature fusion. The CRO module utilizes the learnable offset mechanism for high-fidelity reconstruction. Experimental results demonstrate that our method achieves significant improvements, with Peak Signal-to-Noise Ratio (PSNR) gains of up to 29.25 dB (D1) and 27.31 dB (D2) over the existing PCGC methods. This work provides an effective solution for high performance PCGC method by jointly addressing the key challenges of density adaptation and multi-scale feature learning.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105913"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-20DOI: 10.1016/j.dsp.2026.105932
Shuai Shi , Li Zhang
The challenge of oriented object detection in aerial images is due to the arbitrary orientation, dense distribution, and large scale variations of objects. Although recent models based on DEtection TRansformer (DETR) in an end-to-end manner have achieved excellent performance for oriented object detection, they suffer from slow inference speed. To address this issue, this study proposes a Multi-scale Enhanced DETR (ME-DETR) to achieve efficient and effective oriented object detection for aerial images. ME-DETR is an end-to-end detection model that consists of three parts: backbone, encoder and decoder. For the encoder part, we design a novel multi-scale enhanced (ME) encoder that can effectively and efficiently fuse multi-scale features. The ME encoder mainly contains three modules related to multi-scale information fusion: Fine-grained Enhanced Intra-scale Feature Interaction (FEIFI), Multi-scale Feature Fusion (MFF), and Multi-receptive Field Feature Extraction (MRFE). Specifically, the FEIFI module combines low-level features to enrich the intra-scale feature interaction process and then outputs feature with abundant fine-grained information; the MFF module implements the multi-scale feature fusion, effectively enhancing the detailed information in high-level features and reducing background interference; the MRFE module effectively utilizes convolutions of different sizes to extract features with rich multi-scale information. To further enhance performance without affecting inference speed, we present a training scheme of Low-quality Query Filter DeNoising (LQFDN), which adaptively filters out low-quality denoised positive queries. Extensive experiments are conducted on three oriented object detection datasets (DOTA-v1.0, DOTA-v1.5 and DIOR-R). Specifically, when ResNet50 is used as the backbone, ME-DETR achieves 78.35% mAP on DOTA-v1.0 at a speed of 15.2 FPS, and 71.28% mAP on DIOR-R at a speed of 18.2 FPS.
{"title":"ME-DETR: A multi-scale enhanced detection Transformer with low-quality query filter denoising for aerial oriented object detection","authors":"Shuai Shi , Li Zhang","doi":"10.1016/j.dsp.2026.105932","DOIUrl":"10.1016/j.dsp.2026.105932","url":null,"abstract":"<div><div>The challenge of oriented object detection in aerial images is due to the arbitrary orientation, dense distribution, and large scale variations of objects. Although recent models based on DEtection TRansformer (DETR) in an end-to-end manner have achieved excellent performance for oriented object detection, they suffer from slow inference speed. To address this issue, this study proposes a Multi-scale Enhanced DETR (ME-DETR) to achieve efficient and effective oriented object detection for aerial images. ME-DETR is an end-to-end detection model that consists of three parts: backbone, encoder and decoder. For the encoder part, we design a novel multi-scale enhanced (ME) encoder that can effectively and efficiently fuse multi-scale features. The ME encoder mainly contains three modules related to multi-scale information fusion: Fine-grained Enhanced Intra-scale Feature Interaction (FEIFI), Multi-scale Feature Fusion (MFF), and Multi-receptive Field Feature Extraction (MRFE). Specifically, the FEIFI module combines low-level features to enrich the intra-scale feature interaction process and then outputs feature with abundant fine-grained information; the MFF module implements the multi-scale feature fusion, effectively enhancing the detailed information in high-level features and reducing background interference; the MRFE module effectively utilizes convolutions of different sizes to extract features with rich multi-scale information. To further enhance performance without affecting inference speed, we present a training scheme of Low-quality Query Filter DeNoising (LQFDN), which adaptively filters out low-quality denoised positive queries. Extensive experiments are conducted on three oriented object detection datasets (DOTA-v1.0, DOTA-v1.5 and DIOR-R). Specifically, when ResNet50 is used as the backbone, ME-DETR achieves 78.35% mAP on DOTA-v1.0 at a speed of 15.2 FPS, and 71.28% mAP on DIOR-R at a speed of 18.2 FPS.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105932"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-20DOI: 10.1016/j.dsp.2026.105939
Zilong Li, Jing Zhang, Jiashuai Xiao
Cross-modal learning models like Contrastive Language-Image Pre-training (CLIP) have demonstrated remarkable performance in various downstream tasks. However, applying CLIP to person re-identification (ReID) reveals key limitations, particularly its emphasis on global semantic features while neglecting fine-grained local features and spatial relationships critical for distinguishing identities. To overcome these challenges, we propose Multi-Scale Dilated Fusion Attention (MDFA), a novel framework that enhances the CLIP visual encoder with spatial and channel attention mechanisms combined with global context modeling and multi-scale dilated convolutions. By integrating multiple dilation rates, MDFA effectively aggregates information across varied receptive fields, enabling the model to gather fine-grained local details alongside broader contextual information. This design allows the model to capture richer identity cues and better handle complex scenarios such as occlusion and background clutter, effectively addressing the lack of local discrimination and contextual awareness in CLIP-based ReID models. Extensive experiments demonstrate that MDFA achieves superior performance over existing methods, offering a robust and scalable solution for real-world ReID applications such as surveillance and autonomous driving.
{"title":"Multi-scale dilated fusion attention for CLIP-based person re-identification","authors":"Zilong Li, Jing Zhang, Jiashuai Xiao","doi":"10.1016/j.dsp.2026.105939","DOIUrl":"10.1016/j.dsp.2026.105939","url":null,"abstract":"<div><div>Cross-modal learning models like Contrastive Language-Image Pre-training (CLIP) have demonstrated remarkable performance in various downstream tasks. However, applying CLIP to person re-identification (ReID) reveals key limitations, particularly its emphasis on global semantic features while neglecting fine-grained local features and spatial relationships critical for distinguishing identities. To overcome these challenges, we propose Multi-Scale Dilated Fusion Attention (MDFA), a novel framework that enhances the CLIP visual encoder with spatial and channel attention mechanisms combined with global context modeling and multi-scale dilated convolutions. By integrating multiple dilation rates, MDFA effectively aggregates information across varied receptive fields, enabling the model to gather fine-grained local details alongside broader contextual information. This design allows the model to capture richer identity cues and better handle complex scenarios such as occlusion and background clutter, effectively addressing the lack of local discrimination and contextual awareness in CLIP-based ReID models. Extensive experiments demonstrate that MDFA achieves superior performance over existing methods, offering a robust and scalable solution for real-world ReID applications such as surveillance and autonomous driving.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105939"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-10-22DOI: 10.1016/j.dsp.2025.105664
Tao Chen, Yihao Luo, Yuwei Yu
In this paper, we propose a gridless joint estimation algorithm for direction of arrival (DoA) and polarization parameters based on a fractal economy polarization-sensitive array. This method ensures a maximum continuous aperture and essential element utilization by introducing the array economy factor. Additionally, it leverages the self-similarity of fractal arrays to recursively expand the aperture, generating a large difference coarray and significantly increasing the degrees of freedom (DOF) relative to the original subarray. At the algorithmic level, the Fractal Economy Polarization-Sensitive Array Atomic Norm Minimization (FEPSA-ANM) algorithm is proposed to achieve gridless DoA estimation through covariance matrix summation and vectorization dimensionality reduction of orthogonal dipole subarrays and polynomial rooting by combining with optimal solution of the atomic norm dual model. Furthermore, polarization parameters are decoupled using the least squares method, enabling joint estimation of DoA and polarization parameters. The method, validated through 500 Monte Carlo experiments on a second-order fractal economy array, demonstrates excellent estimation performance, particularly in low signal-to-noise ratio(SNR) scenarios. It achieves improvement in Root Mean Square Error (RMSE) compared to existing techniques and offers the critical advantage of operating without a priori knowledge of the source number. These attributes highlight its substantial potential for advanced applications in radar, wireless communications, and remote sensing.
{"title":"A gridless joint DoA and polarization parameter estimation method based on fractal economy polarization-sensitive array","authors":"Tao Chen, Yihao Luo, Yuwei Yu","doi":"10.1016/j.dsp.2025.105664","DOIUrl":"10.1016/j.dsp.2025.105664","url":null,"abstract":"<div><div>In this paper, we propose a gridless joint estimation algorithm for direction of arrival (DoA) and polarization parameters based on a fractal economy polarization-sensitive array. This method ensures a maximum continuous aperture and essential element utilization by introducing the array economy factor. Additionally, it leverages the self-similarity of fractal arrays to recursively expand the aperture, generating a large difference coarray and significantly increasing the degrees of freedom (DOF) relative to the original subarray. At the algorithmic level, the Fractal Economy Polarization-Sensitive Array Atomic Norm Minimization (FEPSA-ANM) algorithm is proposed to achieve gridless DoA estimation through covariance matrix summation and vectorization dimensionality reduction of orthogonal dipole subarrays and polynomial rooting by combining with optimal solution of the atomic norm dual model. Furthermore, polarization parameters are decoupled using the least squares method, enabling joint estimation of DoA and polarization parameters. The method, validated through 500 Monte Carlo experiments on a second-order fractal economy array, demonstrates excellent estimation performance, particularly in low signal-to-noise ratio(SNR) scenarios. It achieves improvement in Root Mean Square Error (RMSE) compared to existing techniques and offers the critical advantage of operating without a priori knowledge of the source number. These attributes highlight its substantial potential for advanced applications in radar, wireless communications, and remote sensing.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105664"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}