Pub Date : 2026-03-15Epub Date: 2026-01-08DOI: 10.1016/j.dsp.2026.105900
Nguyen Hong Kiem , Bui Anh Duc , Nguyen Tuan Minh , Le T.T. Huyen , Tran Manh Hoang
This paper investigates outage probability (OP) and ergodic capacity (EC) of a reconfigurable intelligent surface (RIS) assisted two-user rate-splitting multiple access (RSMA) communication system. Closed-form expressions for OP and EC are derived over Rayleigh fading channels, and validated through extensive Monte Carlo simulations. A comprehensive performance comparison is conducted between the proposed RIS-assisted RSMA scheme and two benchmark systems: RIS-assisted non-orthogonal multiple access (NOMA) and relay-assisted RSMA. Simulation results demonstrate that the proposed scheme significantly outperforms both benchmarks in terms of OP and EC, regardless of fading conditions. The influence of the critical system parameters, including the number of RIS reflecting elements, transmit power, power allocation factors, and the required rate of the common stream, is thoroughly examined. The results reveal that optimal power allocation between streams is essential for minimizing OP. These findings confirm that integrating RSMA with RIS provides a robust and efficient solution for enhancing communication reliability and spectral efficiency in future 6G wireless networks, especially in challenging non-line-of-sight environments.
{"title":"Outage probability and ergodic capacity of RIS-assisted RSMA communication system","authors":"Nguyen Hong Kiem , Bui Anh Duc , Nguyen Tuan Minh , Le T.T. Huyen , Tran Manh Hoang","doi":"10.1016/j.dsp.2026.105900","DOIUrl":"10.1016/j.dsp.2026.105900","url":null,"abstract":"<div><div>This paper investigates outage probability (OP) and ergodic capacity (EC) of a reconfigurable intelligent surface (RIS) assisted two-user rate-splitting multiple access (RSMA) communication system. Closed-form expressions for OP and EC are derived over Rayleigh fading channels, and validated through extensive Monte Carlo simulations. A comprehensive performance comparison is conducted between the proposed RIS-assisted RSMA scheme and two benchmark systems: RIS-assisted non-orthogonal multiple access (NOMA) and relay-assisted RSMA. Simulation results demonstrate that the proposed scheme significantly outperforms both benchmarks in terms of OP and EC, regardless of fading conditions. The influence of the critical system parameters, including the number of RIS reflecting elements, transmit power, power allocation factors, and the required rate of the common stream, is thoroughly examined. The results reveal that optimal power allocation between streams is essential for minimizing OP. These findings confirm that integrating RSMA with RIS provides a robust and efficient solution for enhancing communication reliability and spectral efficiency in future 6G wireless networks, especially in challenging non-line-of-sight environments.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105900"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-05DOI: 10.1016/j.dsp.2026.105888
Amin A. Maggang , David B. Tay , Jinzhe Gong
In water distribution networks, pressure monitoring in terms of the nodal head values, is essential to provide insights into the operational state of the network and to optimize performance. Installing pressure sensors at pipe junctions is usually practically challenging and costly. Nodal heads at most locations are usually estimated using measurements from the sensors at limited number of locations. This task can be formulated as a problem in signal reconstruction, where the nodal head is considered as the graph signal. In this work, we develop a graph signal processing based algorithm for nodal head reconstruction. The algorithm is developed in the graph spectral domain, but can be implemented in the vertex domain, without the need for performing eigendecomposition. The algorithm exploits the smoothness assumption that is usually observed with steady-state nodal heads, but is also able to deal with any low level high-frequency information that may be present in the signal. Extensive performance evaluation of the proposed algorithm using realistic water network model and comparison with other algorithms is presented in this work.
{"title":"SBLR: A high-accuracy graph signal processing algorithm for nodal head reconstruction in water networks","authors":"Amin A. Maggang , David B. Tay , Jinzhe Gong","doi":"10.1016/j.dsp.2026.105888","DOIUrl":"10.1016/j.dsp.2026.105888","url":null,"abstract":"<div><div>In water distribution networks, pressure monitoring in terms of the nodal head values, is essential to provide insights into the operational state of the network and to optimize performance. Installing pressure sensors at pipe junctions is usually practically challenging and costly. Nodal heads at most locations are usually estimated using measurements from the sensors at limited number of locations. This task can be formulated as a problem in signal reconstruction, where the nodal head is considered as the graph signal. In this work, we develop a graph signal processing based algorithm for nodal head reconstruction. The algorithm is developed in the graph spectral domain, but can be implemented in the vertex domain, without the need for performing eigendecomposition. The algorithm exploits the smoothness assumption that is usually observed with steady-state nodal heads, but is also able to deal with any low level high-frequency information that may be present in the signal. Extensive performance evaluation of the proposed algorithm using realistic water network model and comparison with other algorithms is presented in this work.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105888"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-31DOI: 10.1016/j.dsp.2025.105862
Zhengyu Li , Yanjun Peng , Keqiong Wang , Nan Lv
Kidney cancer is one of the top ten cancers in the world. Early detection of renal tumors can significantly improve patient survival rates. Automatic segmentation of kidneys and renal lesions from CT images is crucial for the treatment of renal cancer. However, due to the irregular and uneven distribution of renal tumor growth, diverse physiological morphologies, and the extreme difficulty in detecting small tumors, achieving complete detection of tumors remains challenging.Therefore, this paper proposes an unsupervised kidney and renal tumor segmentation network that integrates image enhancement and transfer-based restoration mechanisms. First, a Markov image enhancement method is designed to generate synthetic tumor images with more realistic structures and textures through pixel-level dependency modeling. These synthetic images are used as pseudo-labels during training to enrich data diversity. Second, an unsupervised saliency segmentation network is constructed to adaptively extract salient regions by leveraging feature differences between foreground and background areas, enabling fine segmentation of tumors and cysts. Finally, a transfer restoration mechanism is introduced, which reconstructs overlapping regions of the kidney and lesions based on spatial consistency constraints between pixels and their neighborhoods, effectively completing incomplete labels and further improving the accuracy and integrity of kidney structure segmentation. Our method achieved high experimental results on the Kits2019, Kits2021, and Kits2023 datasets, with the results on Kits2021 and Kits2023 surpassing the current first-place entries in the competitions.
{"title":"UEM-Net: Unsupervised segmentation network for kidneys and renal tumors based on image enhancement and migration repair mechanisms","authors":"Zhengyu Li , Yanjun Peng , Keqiong Wang , Nan Lv","doi":"10.1016/j.dsp.2025.105862","DOIUrl":"10.1016/j.dsp.2025.105862","url":null,"abstract":"<div><div>Kidney cancer is one of the top ten cancers in the world. Early detection of renal tumors can significantly improve patient survival rates. Automatic segmentation of kidneys and renal lesions from CT images is crucial for the treatment of renal cancer. However, due to the irregular and uneven distribution of renal tumor growth, diverse physiological morphologies, and the extreme difficulty in detecting small tumors, achieving complete detection of tumors remains challenging.Therefore, this paper proposes an unsupervised kidney and renal tumor segmentation network that integrates image enhancement and transfer-based restoration mechanisms. First, a Markov image enhancement method is designed to generate synthetic tumor images with more realistic structures and textures through pixel-level dependency modeling. These synthetic images are used as pseudo-labels during training to enrich data diversity. Second, an unsupervised saliency segmentation network is constructed to adaptively extract salient regions by leveraging feature differences between foreground and background areas, enabling fine segmentation of tumors and cysts. Finally, a transfer restoration mechanism is introduced, which reconstructs overlapping regions of the kidney and lesions based on spatial consistency constraints between pixels and their neighborhoods, effectively completing incomplete labels and further improving the accuracy and integrity of kidney structure segmentation. Our method achieved high experimental results on the Kits2019, Kits2021, and Kits2023 datasets, with the results on Kits2021 and Kits2023 surpassing the current first-place entries in the competitions.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105862"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the emergence of sixth-generation (6G) networks, unmanned aerial vehicles (UAVs) are expected to play a key role in enhancing coverage, connectivity, and capacity, particularly in highly dynamic environments. However, UAV-based communication systems face significant challenges such as high mobility and severe Doppler effects, interference management, and link reliability, due to the doubly selective nature of wireless channels, characterized by both time and frequency selective variations resulting from high mobility. To address these issues, this paper proposes a downlink transmission framework that integrates rate-splitting multiple access (RSMA) with orthogonal time frequency space (OTFS) modulation. This combination enhances the communication reliability and the spectral efficiency in UAV-assisted networks. The channel is modeled using integer-valued delay-Doppler parameters to reflect realistic high-mobility conditions. To reduce the inter-symbol interference (ISI) and enhance the detection performance, a low-complexity equalization algorithm based on QR decomposition with Givens rotations is introduced, tailored to the sparse structure of the OTFS channel matrix. Additionally, a bi-objective power allocation strategy for RSMA is formulated to simultaneously minimize bit error rate (BER) and maximize throughput. The non-dominated sorting genetic algorithm II (NSGA-II) is used to find Pareto-optimal solutions suited to varying system demands. Comprehensive simulations further compare the proposed OTFS-RSMA system with benchmark schemes, namely OTFS-NOMA and OFDMA-RSMA, under identical conditions, showing superior sum-rate and BER performance.
{"title":"Joint equalization and power allocation for UAV-assisted RSMA-OTFS transmission over doubly dispersive channels","authors":"Ghania Khraimech , Fatiha Merazka , Mustapha Benssalah","doi":"10.1016/j.dsp.2025.105870","DOIUrl":"10.1016/j.dsp.2025.105870","url":null,"abstract":"<div><div>With the emergence of sixth-generation (6G) networks, unmanned aerial vehicles (UAVs) are expected to play a key role in enhancing coverage, connectivity, and capacity, particularly in highly dynamic environments. However, UAV-based communication systems face significant challenges such as high mobility and severe Doppler effects, interference management, and link reliability, due to the doubly selective nature of wireless channels, characterized by both time and frequency selective variations resulting from high mobility. To address these issues, this paper proposes a downlink transmission framework that integrates rate-splitting multiple access (RSMA) with orthogonal time frequency space (OTFS) modulation. This combination enhances the communication reliability and the spectral efficiency in UAV-assisted networks. The channel is modeled using integer-valued delay-Doppler parameters to reflect realistic high-mobility conditions. To reduce the inter-symbol interference (ISI) and enhance the detection performance, a low-complexity equalization algorithm based on QR decomposition with Givens rotations is introduced, tailored to the sparse structure of the OTFS channel matrix. Additionally, a bi-objective power allocation strategy for RSMA is formulated to simultaneously minimize bit error rate (BER) and maximize throughput. The non-dominated sorting genetic algorithm II (NSGA-II) is used to find Pareto-optimal solutions suited to varying system demands. Comprehensive simulations further compare the proposed OTFS-RSMA system with benchmark schemes, namely OTFS-NOMA and OFDMA-RSMA, under identical conditions, showing superior sum-rate and BER performance.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105870"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-02DOI: 10.1016/j.dsp.2025.105872
Huilong Tang , Wei Wang , Zhiwei Pu , Jianlin Wei , Wang Zhang
Modern airborne radar reconnaissance systems employ range-divided multi-stage operations (e.g., passive detection, active detection, and active identification). However, traditional low probability of intercept (LPI) radar designs focus on optimizing performance for individual reconnaissance stages, resulting in suboptimal overall detection capability. Meanwhile, multi-stage operations yield excessive invalid and suboptimal actions, creating action space redundancy that deteriorates learning efficiency. This paper proposes a reinforcement learning (RL)-based joint decision-making method for enhanced detection performance, incorporating improved RL exploration mechanisms to accelerate learning. Firstly, adversarial strategies from each stage are integrated to construct a joint decision-making framework for detection modes and transmit power (JD-DMTP). Based on this framework, the RL elements are designed to enhance detection performance under LPI constraints. Secondly, we propose the trainable suboptimal action mask (TSAM), equipped with suboptimal action elimination criteria, to filter out both invalid and suboptimal actions, thereby improving learning efficiency. Finally, the experimental results validate the effectiveness of the JD-DMTP, showing 6.46×/4.04× higher hit value ratio and 1.52×/1.32× better successful decision-making rate (ideal/non-ideal environment) compared to the minimum-transmit-power baseline. The TSAM achieves comparable performance to the trainable action mask (TAM) baseline with only 25% of the required training iterations.
{"title":"Improved reinforcement learning-based joint decision-making of detection modes and transmit power for LPI radar","authors":"Huilong Tang , Wei Wang , Zhiwei Pu , Jianlin Wei , Wang Zhang","doi":"10.1016/j.dsp.2025.105872","DOIUrl":"10.1016/j.dsp.2025.105872","url":null,"abstract":"<div><div>Modern airborne radar reconnaissance systems employ range-divided multi-stage operations (e.g., passive detection, active detection, and active identification). However, traditional low probability of intercept (LPI) radar designs focus on optimizing performance for individual reconnaissance stages, resulting in suboptimal overall detection capability. Meanwhile, multi-stage operations yield excessive invalid and suboptimal actions, creating action space redundancy that deteriorates learning efficiency. This paper proposes a reinforcement learning (RL)-based joint decision-making method for enhanced detection performance, incorporating improved RL exploration mechanisms to accelerate learning. Firstly, adversarial strategies from each stage are integrated to construct a joint decision-making framework for detection modes and transmit power (JD-DMTP). Based on this framework, the RL elements are designed to enhance detection performance under LPI constraints. Secondly, we propose the trainable suboptimal action mask (TSAM), equipped with suboptimal action elimination criteria, to filter out both invalid and suboptimal actions, thereby improving learning efficiency. Finally, the experimental results validate the effectiveness of the JD-DMTP, showing 6.46×/4.04× higher hit value ratio and 1.52×/1.32× better successful decision-making rate (ideal/non-ideal environment) compared to the minimum-transmit-power baseline. The TSAM achieves comparable performance to the trainable action mask (TAM) baseline with only 25% of the required training iterations.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105872"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Urban traffic surveillance often faces challenges such as vehicle occlusion, scale variation, and complex background interference, leading to missed detections and false positives. To address these challenges, this paper presents YOLOv11-W, an enhanced object detection network built upon YOLOv11. The proposed model improves three key aspects: reducing detection errors, strengthening perception of small-angle objects, and enhancing feature extraction. A C3k2_SimAM module is introduced to apply secondary weighting during feature fusion, thereby improving target saliency. At the backbone endpoint, the GAM module forms a Multi-Point Attention Enhancement (MPAE) mechanism, enabling stronger perception of critical regions and more effective global modeling. For feature aggregation, the sparsity-constrained SPPCSPC module replaces the conventional SPPF, enhancing multi-scale contextual awareness while minimizing redundancy. In the detection head, WIoU v2 serves as the bounding-box regression loss, improving localization accuracy for occluded and small objects while accelerating convergence. To mitigate motion blur, the network employs an attention-based multi-scale reweighting mechanism that reinforces blurred edge and texture features across scales, effectively preserving structural details. By the cooperation of these modules, YOLOv11-W achieves greater representational power and adaptability in complex multi-object traffic scenarios. Experimental results confirm its effectiveness: recognition accuracies reach 97.5%, 96.2%, and 96.2% in free flow, synchronous flow, and blocking flow conditions, representing gains of 0.1%, 0.9%, and 2.4% over YOLOv11. Meanwhile, the optimized design preserves real-time performance, achieving 66.7, 64.4, and 67.3 FPS across different traffic states. These results demonstrate that YOLOv11-W provides a balanced solution for accuracy and efficiency in urban traffic detection.
{"title":"An enhanced traffic object detection network based on multi-point attention and sparse feature aggregation","authors":"Xun Li, Qidi Wang, Ruixue Shi, Ruibo Nui, Weizhong Chen","doi":"10.1016/j.dsp.2026.105897","DOIUrl":"10.1016/j.dsp.2026.105897","url":null,"abstract":"<div><div>Urban traffic surveillance often faces challenges such as vehicle occlusion, scale variation, and complex background interference, leading to missed detections and false positives. To address these challenges, this paper presents YOLOv11-W, an enhanced object detection network built upon YOLOv11. The proposed model improves three key aspects: reducing detection errors, strengthening perception of small-angle objects, and enhancing feature extraction. A C3k2_SimAM module is introduced to apply secondary weighting during feature fusion, thereby improving target saliency. At the backbone endpoint, the GAM module forms a Multi-Point Attention Enhancement (MPAE) mechanism, enabling stronger perception of critical regions and more effective global modeling. For feature aggregation, the sparsity-constrained SPPCSPC module replaces the conventional SPPF, enhancing multi-scale contextual awareness while minimizing redundancy. In the detection head, WIoU v2 serves as the bounding-box regression loss, improving localization accuracy for occluded and small objects while accelerating convergence. To mitigate motion blur, the network employs an attention-based multi-scale reweighting mechanism that reinforces blurred edge and texture features across scales, effectively preserving structural details. By the cooperation of these modules, YOLOv11-W achieves greater representational power and adaptability in complex multi-object traffic scenarios. Experimental results confirm its effectiveness: recognition accuracies reach 97.5%, 96.2%, and 96.2% in free flow, synchronous flow, and blocking flow conditions, representing gains of 0.1%, 0.9%, and 2.4% over YOLOv11. Meanwhile, the optimized design preserves real-time performance, achieving 66.7, 64.4, and 67.3 FPS across different traffic states. These results demonstrate that YOLOv11-W provides a balanced solution for accuracy and efficiency in urban traffic detection.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105897"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-31DOI: 10.1016/j.dsp.2025.105871
Dan Xu, Jiaao Wang, Yang Zhou, Qiang Qian, Jinlong Shi
Self-supervised learning has emerged as a promising approach in monocular depth estimation due to its independence from ground-truth depth annotations. However, its reliance on photometric consistency between adjacent frames as the supervisory signal makes it particularly vulnerable to illumination changes, occlusions, and dynamic objects. These limitations often result in unstable supervision and blurred depth predictions, especially near object boundaries. Furthermore, the absence of geometric constraints-typically provided by stereo or multi-view systems-hinders the accurate modeling of scene structure, compromising spatial coherence and geometric fidelity. To address these challenges, we propose a novel monocular depth estimation framework that combines hierarchical decoder with a relative distance cost volume. The proposed hierarchical decoder employs Laplacian pyramid residuals to enhance high-frequency details, while a residual mean feature strengthens edge and texture representation during decoding, effectively reducing boundary blurring caused by photometric inconsistencies. Additionally, we introduce a global-local decoupling structure within a Transformer-based architecture to construct the relative distance cost volume. By integrating global depth representations with local query mechanisms, our method captures intricate geometric relationships and improves scene understanding. Extensive experiments on the KITTI and Make3D datasets demonstrate that our framework achieves state-of-the-art performance across all evaluation metrics, while preserving fine-grained details and exhibiting strong generalization capability. Our source codes are available at https://github.com/jiaaw1/LPD-DCV-depth.
{"title":"Self-supervised monocular depth estimation using a hierarchical decoder and relative cost volume","authors":"Dan Xu, Jiaao Wang, Yang Zhou, Qiang Qian, Jinlong Shi","doi":"10.1016/j.dsp.2025.105871","DOIUrl":"10.1016/j.dsp.2025.105871","url":null,"abstract":"<div><div>Self-supervised learning has emerged as a promising approach in monocular depth estimation due to its independence from ground-truth depth annotations. However, its reliance on photometric consistency between adjacent frames as the supervisory signal makes it particularly vulnerable to illumination changes, occlusions, and dynamic objects. These limitations often result in unstable supervision and blurred depth predictions, especially near object boundaries. Furthermore, the absence of geometric constraints-typically provided by stereo or multi-view systems-hinders the accurate modeling of scene structure, compromising spatial coherence and geometric fidelity. To address these challenges, we propose a novel monocular depth estimation framework that combines hierarchical decoder with a relative distance cost volume. The proposed hierarchical decoder employs Laplacian pyramid residuals to enhance high-frequency details, while a residual mean feature strengthens edge and texture representation during decoding, effectively reducing boundary blurring caused by photometric inconsistencies. Additionally, we introduce a global-local decoupling structure within a Transformer-based architecture to construct the relative distance cost volume. By integrating global depth representations with local query mechanisms, our method captures intricate geometric relationships and improves scene understanding. Extensive experiments on the KITTI and Make3D datasets demonstrate that our framework achieves state-of-the-art performance across all evaluation metrics, while preserving fine-grained details and exhibiting strong generalization capability. Our source codes are available at <span><span>https://github.com/jiaaw1/LPD-DCV-depth</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105871"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-29DOI: 10.1016/j.dsp.2025.105860
Liwen Jiang , Li Chai
It is well-known that Generative Adversarial Networks (GANs) are difficult to train and great efforts have been devoted to analyze and stabilize the training dynamics of GANs. Recent works have analyzed the factors influencing regularized GAN’s convergence and revealed that the training dynamics tend to converge locally near equilibrium points. In this paper, we studied the convergence rate of regularized GANs by the theory of dissipative dynamical systems, which can be viewed as the generalization of passivity theory and the small-gain theory of nonlinear systems. We analyze the impact of learning rate on regularized GANs’ training process. We prove the convergence of the training process for regularized GANs without involving eigen-analysis of the Jacobian matrix. We are able to derive the system’s geometric convergence rate and identify the optimal learning rate that leads to the fastest convergence. We have conducted extensive experiments on several datasets to verify our theoretical results.
{"title":"On the convergence rate of regularized GANs training: A dissipative dynamical system perspective","authors":"Liwen Jiang , Li Chai","doi":"10.1016/j.dsp.2025.105860","DOIUrl":"10.1016/j.dsp.2025.105860","url":null,"abstract":"<div><div>It is well-known that Generative Adversarial Networks (GANs) are difficult to train and great efforts have been devoted to analyze and stabilize the training dynamics of GANs. Recent works have analyzed the factors influencing regularized GAN’s convergence and revealed that the training dynamics tend to converge locally near equilibrium points. In this paper, we studied the convergence rate of regularized GANs by the theory of dissipative dynamical systems, which can be viewed as the generalization of passivity theory and the small-gain theory of nonlinear systems. We analyze the impact of learning rate on regularized GANs’ training process. We prove the convergence of the training process for regularized GANs without involving eigen-analysis of the Jacobian matrix. We are able to derive the system’s geometric convergence rate and identify the optimal learning rate that leads to the fastest convergence. We have conducted extensive experiments on several datasets to verify our theoretical results.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105860"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapid advancement of satellite and sensor technologies has facilitated the acquisition of remote sensing images, yet the efficient and accurate extraction of valuable information from high-resolution data, particularly for object detection, remains challenging. While deep learning-based algorithms have shown promise in automatic feature learning, single-scale feature layers struggle with large-scale variations, and existing attention mechanisms and convolutional modules are insufficient for remote sensing tasks. To address these issues, this paper proposes a novel Multi-scale Perception and Detail Learning YOLO (MPDL-YOLO) model for remote sensing object detection. First, we propose a Median-pooling Space and Channel Attention Block (MPCS), integrating global average, max, and median pooling to create a multi-dimensional attention mechanism that reduces noise while preserving edge details. Second, we design a Depthwise Separable Lightweight Inception Convolution (DWInceptionLite) by combining depthwise separable convolutions and Inception structures, significantly reducing computational complexity. Finally, we employ an inverted Residual Mobile Block (iRMB) to construct a Hierarchical Feature Fusion Block (HFFB), enhancing feature extraction and detail precision. Experimental results demonstrate that compared to YOLOv8, MPDL-YOLO achieves improvements of 1.2% and 3.2% in [email protected], and 1.5% and 1.8% in [email protected]:0.95 on the DIOR and RSOD datasets, respectively, thus validating the effectiveness and superiority of the proposed algorithm.
{"title":"MPDL-YOLO: A multidimensional attention and lightweight convolution framework for remote sensing object detection","authors":"Pengyu Chen, Lie Wang, Yuman Liang, Zunmin Hou, Guangbin He, Hongshuai Chen","doi":"10.1016/j.dsp.2025.105873","DOIUrl":"10.1016/j.dsp.2025.105873","url":null,"abstract":"<div><div>The rapid advancement of satellite and sensor technologies has facilitated the acquisition of remote sensing images, yet the efficient and accurate extraction of valuable information from high-resolution data, particularly for object detection, remains challenging. While deep learning-based algorithms have shown promise in automatic feature learning, single-scale feature layers struggle with large-scale variations, and existing attention mechanisms and convolutional modules are insufficient for remote sensing tasks. To address these issues, this paper proposes a novel Multi-scale Perception and Detail Learning YOLO (MPDL-YOLO) model for remote sensing object detection. First, we propose a Median-pooling Space and Channel Attention Block (MPCS), integrating global average, max, and median pooling to create a multi-dimensional attention mechanism that reduces noise while preserving edge details. Second, we design a Depthwise Separable Lightweight Inception Convolution (DWInceptionLite) by combining depthwise separable convolutions and Inception structures, significantly reducing computational complexity. Finally, we employ an inverted Residual Mobile Block (iRMB) to construct a Hierarchical Feature Fusion Block (HFFB), enhancing feature extraction and detail precision. Experimental results demonstrate that compared to YOLOv8, MPDL-YOLO achieves improvements of 1.2% and 3.2% in [email protected], and 1.5% and 1.8% in [email protected]:0.95 on the DIOR and RSOD datasets, respectively, thus validating the effectiveness and superiority of the proposed algorithm.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105873"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-06DOI: 10.1016/j.dsp.2026.105894
Yahya M. Al-Moliki , Ali H. Alqahtani , Yahya Al-Harthi , Mohammed T. Alresheedi
This paper presents a novel deep learning (DL)-enabled receiver architecture for orthogonal frequency-division multiplexing (OFDM)-based time-domain generalized spatial modulation (TD-GSM) in optical multiple-input multiple-output (MIMO) systems. Unlike prior deep learning studies that focused on frequency-domain GSM (FD-GSM) or pulse-amplitude-modulation-based GSM under perfect channel assumptions, this work is the first to integrate a conditional denoising autoencoder (DAE), a multilayer perceptron (MLP), and a convolutional neural network (CNN) into an OFDM-based TD-GSM framework under realistic intensity modulation/direct detection (IM/DD) and imperfect channel state information (CSI) conditions. The proposed architecture uniquely combines: (i) a conditional DAE for adaptive channel estimation across varying signal-to-noise ratios, (ii) an MLP classifier for accurate spatial index classification, and (iii) a CNN classifier for robust constellation symbol recovery under nonlinear optical distortions. This design not only improves robustness but also reduces hardware complexity compared to FD-GSM, since TD-GSM requires only a single OFDM chain while still embedding spatial information in the time domain. Simulation results confirm that the proposed DL-based receiver achieves bit-error-rate (BER) performance gains of up to 9 dB over maximum-likelihood detection at a BER of , demonstrating both the scalability and generalizability of the approach. By explicitly addressing IM/DD biasing constraints and channel estimation imperfections, this architecture advances the methodological capabilities of optical digital signal processing and provides a practical, high-efficiency solution for future optical wireless communication systems.
{"title":"Deep learning-enabled receiver for OFDM-Based time-domain generalized spatial modulation in optical MIMO systems","authors":"Yahya M. Al-Moliki , Ali H. Alqahtani , Yahya Al-Harthi , Mohammed T. Alresheedi","doi":"10.1016/j.dsp.2026.105894","DOIUrl":"10.1016/j.dsp.2026.105894","url":null,"abstract":"<div><div>This paper presents a novel deep learning (DL)-enabled receiver architecture for orthogonal frequency-division multiplexing (OFDM)-based time-domain generalized spatial modulation (TD-GSM) in optical multiple-input multiple-output (MIMO) systems. Unlike prior deep learning studies that focused on frequency-domain GSM (FD-GSM) or pulse-amplitude-modulation-based GSM under perfect channel assumptions, this work is the first to integrate a conditional denoising autoencoder (DAE), a multilayer perceptron (MLP), and a convolutional neural network (CNN) into an OFDM-based TD-GSM framework under realistic intensity modulation/direct detection (IM/DD) and imperfect channel state information (CSI) conditions. The proposed architecture uniquely combines: (i) a conditional DAE for adaptive channel estimation across varying signal-to-noise ratios, (ii) an MLP classifier for accurate spatial index classification, and (iii) a CNN classifier for robust constellation symbol recovery under nonlinear optical distortions. This design not only improves robustness but also reduces hardware complexity compared to FD-GSM, since TD-GSM requires only a single OFDM chain while still embedding spatial information in the time domain. Simulation results confirm that the proposed DL-based receiver achieves bit-error-rate (BER) performance gains of up to 9 dB over maximum-likelihood detection at a BER of <span><math><mrow><mn>3</mn><mspace></mspace><mo>×</mo><msup><mrow><mn>10</mn></mrow><mrow><mo>−</mo><mn>3</mn></mrow></msup></mrow></math></span>, demonstrating both the scalability and generalizability of the approach. By explicitly addressing IM/DD biasing constraints and channel estimation imperfections, this architecture advances the methodological capabilities of optical digital signal processing and provides a practical, high-efficiency solution for future optical wireless communication systems.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105894"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}