Pub Date : 2026-04-01Epub Date: 2026-01-11DOI: 10.1016/j.dsp.2026.105896
Mukul Chauhan, Waseem Z. Lone, Amit K. Verma
This paper introduces a novel time–frequency distribution, referred to as the two-dimensional non-separable quadratic-phase Wigner distribution (2D-NSQPWD), formulated within the framework of the two-dimensional non-separable quadratic-phase Fourier transform (2D-NSQPFT). The proposed distribution extends the classical two-dimensional Wigner distribution (2D-WD) through a convolution-based formulation that incorporates the structural characteristics of the 2D-NSQPFT, thereby enabling an effective representation of complex, non-separable signal structures. We rigorously establish several key properties of the 2D-NSQPWD, including time and frequency shift invariance, marginal behavior, conjugate symmetry, convolution relations, and Moyal’s identity. The effectiveness of the distribution is demonstrated through its application to single-, bi-, and tri-component two-dimensional linear frequency-modulated (2D-LFM) signals. Finally, simulations show that the proposed transform exhibits superior performance in cross-term suppression and signal localization compared to existing transforms.
{"title":"A novel two-dimensional Wigner distribution framework via the quadratic phase Fourier transform with a non-separable kernel","authors":"Mukul Chauhan, Waseem Z. Lone, Amit K. Verma","doi":"10.1016/j.dsp.2026.105896","DOIUrl":"10.1016/j.dsp.2026.105896","url":null,"abstract":"<div><div>This paper introduces a novel time–frequency distribution, referred to as the two-dimensional non-separable quadratic-phase Wigner distribution (2D-NSQPWD), formulated within the framework of the two-dimensional non-separable quadratic-phase Fourier transform (2D-NSQPFT). The proposed distribution extends the classical two-dimensional Wigner distribution (2D-WD) through a convolution-based formulation that incorporates the structural characteristics of the 2D-NSQPFT, thereby enabling an effective representation of complex, non-separable signal structures. We rigorously establish several key properties of the 2D-NSQPWD, including time and frequency shift invariance, marginal behavior, conjugate symmetry, convolution relations, and Moyal’s identity. The effectiveness of the distribution is demonstrated through its application to single-, bi-, and tri-component two-dimensional linear frequency-modulated (2D-LFM) signals. Finally, simulations show that the proposed transform exhibits superior performance in cross-term suppression and signal localization compared to existing transforms.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105896"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-21DOI: 10.1016/j.dsp.2026.105937
Xiaoran Li, Jinglei Liu
Multi-view clustering (MVC) aims to enhance clustering performance through the effective integration of complementary information derived from multiple data sources. Nevertheless, current approaches frequently fall short of fully modeling the global topological characteristics and local similarity connections of multi-view data. In addition, adaptively learning representative anchors that align with the inherent data structure is another challenge for conventional anchor-based multi-view clustering (AMVC) techniques. To solve the above problems, we propose a novel MVC framework that integrates structure-aware graph representation and adaptive anchor graph learning (SAGA2G). Specifically, the SAGA2G approach achieves unified modeling of multi-level structures by preserving neighborhood structure features utilizing local similarity constraints and topological consistency through anchor-based global reconstruction. Simultaneously, we develop a dynamic anchor optimization approach that raises the expressive power of the data by automatically aligning the anchor distribution with the underlying cluster structure. Furthermore, an efficient alternating optimization algorithm is utilized to address the proposed approach, with theoretical guarantees of linear time complexity and convergence. Finally, extensive experiments performed on eight benchmark datasets demonstrate that SAGA2G significantly surpasses the current state-of-the-art techniques.
{"title":"Harnessing structure-aware graph representation and adaptive anchor graph learning for multi-view clustering","authors":"Xiaoran Li, Jinglei Liu","doi":"10.1016/j.dsp.2026.105937","DOIUrl":"10.1016/j.dsp.2026.105937","url":null,"abstract":"<div><div>Multi-view clustering (MVC) aims to enhance clustering performance through the effective integration of complementary information derived from multiple data sources. Nevertheless, current approaches frequently fall short of fully modeling the global topological characteristics and local similarity connections of multi-view data. In addition, adaptively learning representative anchors that align with the inherent data structure is another challenge for conventional anchor-based multi-view clustering (AMVC) techniques. To solve the above problems, we propose a novel MVC framework that integrates structure-aware graph representation and adaptive anchor graph learning (SAGA<sup>2</sup>G). Specifically, the SAGA<sup>2</sup>G approach achieves unified modeling of multi-level structures by preserving neighborhood structure features utilizing local similarity constraints and topological consistency through anchor-based global reconstruction. Simultaneously, we develop a dynamic anchor optimization approach that raises the expressive power of the data by automatically aligning the anchor distribution with the underlying cluster structure. Furthermore, an efficient alternating optimization algorithm is utilized to address the proposed approach, with theoretical guarantees of linear time complexity and convergence. Finally, extensive experiments performed on eight benchmark datasets demonstrate that SAGA<sup>2</sup>G significantly surpasses the current state-of-the-art techniques.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105937"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-14DOI: 10.1016/j.dsp.2026.105908
Z.M. Kurdoshev , E.A. Pchelintsev
The paper considers the optimal filtering of square integrable signals in Gaussian noise of small intensity. The problem is studied under the condition that the observed process is available only at discrete time moments. This study aims to develop an automated and data-driven model selection procedure (MSP) based on sharp oracle inequalities for optimal estimation of an unknown signal by determining the best combination of smoothness parameters that minimizes the mean square error. We propose a novel hybrid neural network architecture that combines statistical estimation theory with deep learning. A dedicated neural MSP layer is designed to generate a wide range of potential parameter combinations. For each combination, a weighted least squares estimate of the signal is calculated. A gateway network, inspired by the mixture of experts paradigm, is then used to dynamically select the most accurate estimate from this set of candidates. The entire system is trained on a variety of synthetic datasets of clean and noisy signal pairs containing different waveforms, using the mean square error. The proposed MSP demonstrates high performance over a wide range of noise levels. The mean square error for elementary signals remained below 0.5 even in high-noise scenarios. The method also proved to be robust for complex signal combinations, hybrid waveforms, ECG and CWRU signals, successfully reconstructing them with satisfactory accuracy. The gating network effectively learned to set optimal parameters by continuously selecting values within stable ranges. The developed MSP-NN system provides a robust automated solution for nonparametric signals estimation from noisy discrete observations. It successfully bridges the gap between theoretical statistical efficiency and practical application by automating the important and previously manual step of parameter selection. This work paves the way for the development of intelligent data-driven signal processing systems that can operate reliably in the presence of noise uncertainty.
{"title":"Model selection method based on the neural networks for signal processing","authors":"Z.M. Kurdoshev , E.A. Pchelintsev","doi":"10.1016/j.dsp.2026.105908","DOIUrl":"10.1016/j.dsp.2026.105908","url":null,"abstract":"<div><div>The paper considers the optimal filtering of square integrable signals in Gaussian noise of small intensity. The problem is studied under the condition that the observed process is available only at discrete time moments. This study aims to develop an automated and data-driven model selection procedure (MSP) based on sharp oracle inequalities for optimal estimation of an unknown signal by determining the best combination of smoothness parameters that minimizes the mean square error. We propose a novel hybrid neural network architecture that combines statistical estimation theory with deep learning. A dedicated neural MSP layer is designed to generate a wide range of potential parameter combinations. For each combination, a weighted least squares estimate of the signal is calculated. A gateway network, inspired by the mixture of experts paradigm, is then used to dynamically select the most accurate estimate from this set of candidates. The entire system is trained on a variety of synthetic datasets of clean and noisy signal pairs containing different waveforms, using the mean square error. The proposed MSP demonstrates high performance over a wide range of noise levels. The mean square error for elementary signals remained below 0.5 even in high-noise scenarios. The method also proved to be robust for complex signal combinations, hybrid waveforms, ECG and CWRU signals, successfully reconstructing them with satisfactory accuracy. The gating network effectively learned to set optimal parameters by continuously selecting values within stable ranges. The developed MSP-NN system provides a robust automated solution for nonparametric signals estimation from noisy discrete observations. It successfully bridges the gap between theoretical statistical efficiency and practical application by automating the important and previously manual step of parameter selection. This work paves the way for the development of intelligent data-driven signal processing systems that can operate reliably in the presence of noise uncertainty.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105908"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wideband DOA estimation has become a significant concern in communication, navigation, and radar systems. Previous approaches employed the frequency-domain focusing method to alleviate the wideband impact, but it was constrained by its reliance on prior DOA knowledge. The time-domain wideband DOA estimation methods have also been explored, but often suffered from high-dimensional complexity. This work introduces a time-domain energy focusing (TDEF) scheme that leverages the known waveform and eliminates the reliance on prior DOA information and reduce the high-dimensional complexity. TDEF consists of multi-channel matched filtering and joint power-peak detection. The former concentrates signal energy in the time domain, while the latter mitigates peak migration induced by the wideband scenario. Through this process, the wideband scenario is transformed into an equivalent narrowband counterpart, enabling the application of narrowband DOA estimation techniques. Using matrix-perturbation analysis, we establish the theoretically asymptotic MSE equivalence between TDEF scheme and frequency-domain focusing. The numerical simulations show that the TDEF-based method achieves asymptotic performance approaching the CRLB without prior DOA information, improved resolution for closely spaced sources with different TOAs, and lower computational complexity, especially compared to time-domian sparsity-recovery methods.
{"title":"Wideband DOA estimation based on time-domain energy focusing","authors":"Yuxiang Jiang , Qing Shen , Kejiang Wu , Zexiang Zhang , Chenxi Liao , Shuyuan Xu","doi":"10.1016/j.dsp.2026.105903","DOIUrl":"10.1016/j.dsp.2026.105903","url":null,"abstract":"<div><div>Wideband DOA estimation has become a significant concern in communication, navigation, and radar systems. Previous approaches employed the frequency-domain focusing method to alleviate the wideband impact, but it was constrained by its reliance on prior DOA knowledge. The time-domain wideband DOA estimation methods have also been explored, but often suffered from high-dimensional complexity. This work introduces a time-domain energy focusing (TDEF) scheme that leverages the known waveform and eliminates the reliance on prior DOA information and reduce the high-dimensional complexity. TDEF consists of multi-channel matched filtering and joint power-peak detection. The former concentrates signal energy in the time domain, while the latter mitigates peak migration induced by the wideband scenario. Through this process, the wideband scenario is transformed into an equivalent narrowband counterpart, enabling the application of narrowband DOA estimation techniques. Using matrix-perturbation analysis, we establish the theoretically asymptotic MSE equivalence between TDEF scheme and frequency-domain focusing. The numerical simulations show that the TDEF-based method achieves asymptotic performance approaching the CRLB without prior DOA information, improved resolution for closely spaced sources with different TOAs, and lower computational complexity, especially compared to time-domian sparsity-recovery methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105903"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-10DOI: 10.1016/j.dsp.2026.105904
Hao Ming , Hanping Hu , Jun Zheng
For chaotic cryptography to advance toward practical deployment, it is necessary to pay attention not only to the security issues of chaotic systems but also to problems such as the actual degradation of digital performance and system synchronization. Regarding the security of the chaotic system itself, its characteristic information (including parameters, the structure of coupled chaotic systems, etc.) provides critical entry points for attackers. If these characteristics remain static, chaotic cryptography becomes increasingly vulnerable to cryptanalysis. In this paper, a time-variant stream cipher based on a nondegenerate and coupled chaotic system is proposed. The analog-digital hybrid technique is employed to solve the dynamical degradation in the digital field, and digital adaptive pulse control for synchronization. The coupling structure, delay, and parameter of the coupled chaos are dynamically varied following a time-variant mechanism to enhance the security. The practical effectiveness is demonstrated by FPGA-FPAA collaborative hardware design, wherein an event-triggered synchronization scheme is also presented for hardware implementation. Experimental results and theoretical analyses show that the proposed cipher can provide high-quality and robust keystreams for wide cryptographic applications. The construction strategy and components of the proposed cryptosystem are beneficial to motivate chaotic cipher designs and applications.
{"title":"Design and hardware implementation of a dynamically variable chaotic stream cipher system with analog-Digital hybrid control and synchronization","authors":"Hao Ming , Hanping Hu , Jun Zheng","doi":"10.1016/j.dsp.2026.105904","DOIUrl":"10.1016/j.dsp.2026.105904","url":null,"abstract":"<div><div>For chaotic cryptography to advance toward practical deployment, it is necessary to pay attention not only to the security issues of chaotic systems but also to problems such as the actual degradation of digital performance and system synchronization. Regarding the security of the chaotic system itself, its characteristic information (including parameters, the structure of coupled chaotic systems, etc.) provides critical entry points for attackers. If these characteristics remain static, chaotic cryptography becomes increasingly vulnerable to cryptanalysis. In this paper, a time-variant stream cipher based on a nondegenerate and coupled chaotic system is proposed. The analog-digital hybrid technique is employed to solve the dynamical degradation in the digital field, and digital adaptive pulse control for synchronization. The coupling structure, delay, and parameter of the coupled chaos are dynamically varied following a time-variant mechanism to enhance the security. The practical effectiveness is demonstrated by FPGA-FPAA collaborative hardware design, wherein an event-triggered synchronization scheme is also presented for hardware implementation. Experimental results and theoretical analyses show that the proposed cipher can provide high-quality and robust keystreams for wide cryptographic applications. The construction strategy and components of the proposed cryptosystem are beneficial to motivate chaotic cipher designs and applications.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105904"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-24DOI: 10.1016/j.dsp.2026.105950
Yaoyi He , An Gong , Yunlu Ge , Xiaolei Zhao , Ning Ding
Communication signal recognition is a critical technology for ensuring the security and intelligent management of wireless communication systems, with broad applications in spectrum monitoring, electronic warfare, unmanned communication, and cognitive radio. Traditional neural networks often struggle to extract signal features across different scales, leading to low recognition accuracy. This paper introduces a new model designed to solve this issue by fusing multi-scale features. The model uses a dual-branch architecture. One branch employs the Discrete Wavelet Transform (DWT) to capture features from both low and high signal frequencies. The second branch is a Bidirectional Long Short-Term Memory (BiLSTM) network that extracts temporal patterns. A gating mechanism, a bidirectional structure, and a global timestep attention mechanism all enhance the BiLSTM module’s performance. Finally, the system combines these distinct features to enable effective signal detection and recognition. Tests conducted with the Panoradio HF dataset confirm our model’s capabilities. Our proposed method attained an average recognition accuracy of 79.52%, which surpasses competing baseline models by 4.51%.
{"title":"A communication signal recognition method based on multi-scale feature fusion","authors":"Yaoyi He , An Gong , Yunlu Ge , Xiaolei Zhao , Ning Ding","doi":"10.1016/j.dsp.2026.105950","DOIUrl":"10.1016/j.dsp.2026.105950","url":null,"abstract":"<div><div>Communication signal recognition is a critical technology for ensuring the security and intelligent management of wireless communication systems, with broad applications in spectrum monitoring, electronic warfare, unmanned communication, and cognitive radio. Traditional neural networks often struggle to extract signal features across different scales, leading to low recognition accuracy. This paper introduces a new model designed to solve this issue by fusing multi-scale features. The model uses a dual-branch architecture. One branch employs the Discrete Wavelet Transform (DWT) to capture features from both low and high signal frequencies. The second branch is a Bidirectional Long Short-Term Memory (BiLSTM) network that extracts temporal patterns. A gating mechanism, a bidirectional structure, and a global timestep attention mechanism all enhance the BiLSTM module’s performance. Finally, the system combines these distinct features to enable effective signal detection and recognition. Tests conducted with the Panoradio HF dataset confirm our model’s capabilities. Our proposed method attained an average recognition accuracy of 79.52%, which surpasses competing baseline models by 4.51%.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105950"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-22DOI: 10.1016/j.dsp.2026.105935
Yue Chen , Huiying Xu , Xinzhong Zhu , Xuedong He , Hongbo Li , Yi Li
Vision Transformer (ViT)-based one-stream architectures have emerged as the dominant framework for object tracking. However, their performance is hampered by similar object interference and background distractions. To address these limitations, this paper proposes FETrack, a one-stream tracker designed to enhance feature discriminability for improved object tracking. The core innovations of FETrack are as follows: 1) Global Enhancement (GE) and Cross-Depth Template Fusion (CDTF) modules, where the GE module adopts a novel global feature extraction mechanism to suppress background interference, and the CDTF module ensures efficient propagation of contextual information via cross-depth template fusion. 2) An unsupervised hard sample learning strategy, which introduces contrastive learning and treats each candidate token as an independent instance by leveraging its inherent hard sample properties, thereby enhancing feature discriminability. 3) A distillation-based fine-tuning approach that guides parameter optimization for the entire backbone network through feature distillation, enabling efficient tuning of newly integrated modules and ensuring their synergy with the original architecture. Experimental results on six benchmark datasets demonstrate the effectiveness of FETrack and confirm its state-of-the-art performance. Furthermore, the transferability of the proposed approaches for enhancing other one-stream trackers is validated.
{"title":"FETrack: One-stream framework-based feature enhancement for object tracking","authors":"Yue Chen , Huiying Xu , Xinzhong Zhu , Xuedong He , Hongbo Li , Yi Li","doi":"10.1016/j.dsp.2026.105935","DOIUrl":"10.1016/j.dsp.2026.105935","url":null,"abstract":"<div><div>Vision Transformer (ViT)-based one-stream architectures have emerged as the dominant framework for object tracking. However, their performance is hampered by similar object interference and background distractions. To address these limitations, this paper proposes FETrack, a one-stream tracker designed to enhance feature discriminability for improved object tracking. The core innovations of FETrack are as follows: 1) Global Enhancement (GE) and Cross-Depth Template Fusion (CDTF) modules, where the GE module adopts a novel global feature extraction mechanism to suppress background interference, and the CDTF module ensures efficient propagation of contextual information via cross-depth template fusion. 2) An unsupervised hard sample learning strategy, which introduces contrastive learning and treats each candidate token as an independent instance by leveraging its inherent hard sample properties, thereby enhancing feature discriminability. 3) A distillation-based fine-tuning approach that guides parameter optimization for the entire backbone network through feature distillation, enabling efficient tuning of newly integrated modules and ensuring their synergy with the original architecture. Experimental results on six benchmark datasets demonstrate the effectiveness of FETrack and confirm its state-of-the-art performance. Furthermore, the transferability of the proposed approaches for enhancing other one-stream trackers is validated.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105935"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-15DOI: 10.1016/j.dsp.2025.105861
Panigrahi Srikanth, Chandan Kumar Behera
Audio-based diagnostics are rapidly emerging as non-invasive and accessible tools for identifying respiratory diseases. Medical acoustic signals such as coughs, breaths, and lung sounds carry clinically relevant information with strong potential for disease detection and monitoring. In this context, we introduce SWaRaA, a novel multi-modal deep learning framework that leverages the complementary characteristics of two distinct types of respiratory sound representations. The framework integrates Mel-spectrogram-based image features and Wav2Vec 2.0 embeddings of medical acoustic signals to enhance classification accuracy by capturing both spectral and contextual information. SWaRaA consists of two parallel processing pathways. The first extracts spectral-temporal features using a proposed lightweight CNN-Transformer model comprising Depth-Wise Separable Convolution (DSC), Parallel Convolution Series (PCS), Serial Convolution Series (SCS), and Transformer blocks (TR). The second processes raw acoustic signals through the Wav2Vec 2.0 model to capture deep contextual and temporal features. These representations are fused through a dedicated integration module and passed to a classification head for final prediction. The proposed framework effectively captures both local and long-range dependencies, enabling robust respiratory disease classification. Through extensive experiments across three benchmark datasets and 15 medical acoustic tasks, we establish SWaRaA as a state-of-the-art multi-modal acoustic classification model, offering a scalable and high-performance solution for real-world healthcare applications.
{"title":"SWaRaA: A multi-modal deep learning framework for the diagnosis and classification of respiratory diseases using medical acoustic representations","authors":"Panigrahi Srikanth, Chandan Kumar Behera","doi":"10.1016/j.dsp.2025.105861","DOIUrl":"10.1016/j.dsp.2025.105861","url":null,"abstract":"<div><div>Audio-based diagnostics are rapidly emerging as non-invasive and accessible tools for identifying respiratory diseases. Medical acoustic signals such as coughs, breaths, and lung sounds carry clinically relevant information with strong potential for disease detection and monitoring. In this context, we introduce SWaRaA, a novel multi-modal deep learning framework that leverages the complementary characteristics of two distinct types of respiratory sound representations. The framework integrates Mel-spectrogram-based image features and Wav2Vec 2.0 embeddings of medical acoustic signals to enhance classification accuracy by capturing both spectral and contextual information. SWaRaA consists of two parallel processing pathways. The first extracts spectral-temporal features using a proposed lightweight CNN-Transformer model comprising Depth-Wise Separable Convolution (DSC), Parallel Convolution Series (PCS), Serial Convolution Series (SCS), and Transformer blocks (TR). The second processes raw acoustic signals through the Wav2Vec 2.0 model to capture deep contextual and temporal features. These representations are fused through a dedicated integration module and passed to a classification head for final prediction. The proposed framework effectively captures both local and long-range dependencies, enabling robust respiratory disease classification. Through extensive experiments across three benchmark datasets and 15 medical acoustic tasks, we establish SWaRaA as a state-of-the-art multi-modal acoustic classification model, offering a scalable and high-performance solution for real-world healthcare applications.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105861"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-19DOI: 10.1016/j.dsp.2026.105934
Tao Cao , Baojian Ren , Zhengyang Zhang , Hongfei Cao , Xinglin Zhang , Shuchen Bai
The acquisition of signals via physical image sensors under low-light conditions constitutes a classic ill-posed inverse problem in digital signal processing. The severe signal-to-noise ratio (SNR) degradation, stemming from stochastic processes like photon shot noise and non-ideal characteristics of the sensor's signal processing pipeline, poses a significant challenge. Conventional supervised restoration algorithms are often constrained by the "sensor reality gap," where models trained on synthetic data fail to generalize to the complex, non-linear degradation profiles of real-world hardware. Meanwhile, unsupervised methods frequently suffer from unstable convergence due to the absence of reliable optimization constraints. To address this fundamental issue, we propose the Adaptive Reality Correction Network (ARC-Net), a novel self-guided refinement framework. Without requiring paired data, ARC-Net formulates the unknown physical sensor corruption as a degradation residual. This residual is iteratively estimated from real-world, unpaired samples and injected back into the training stream as a learned prior through a self-correction loop. This mechanism adaptively forces the network to learn the inverse mapping of authentic sensor artifacts. Furthermore, we introduce stochastic information occlusion as a robust regularization strategy, which enhances the network's ability to reconstruct signals from severely corrupted regions by emulating photon starvation. Extensive experiments demonstrate the state-of-the-art performance of ARC-Net. It not only surpasses the leading supervised method by over 1.4 dB in PSNR on a standard paired dataset but, more critically, it successfully restores fine-grained signal details and color fidelity in extreme real-world scenarios where most contemporary algorithms fail. This validates the framework's superiority in addressing complex, authentic signal processing challenges and highlights its significant potential for improving the reliability of sensor-based systems.
{"title":"Bridging the sensor reality gap: Adaptive learning from implicit degradation priors for low-light image enhancement","authors":"Tao Cao , Baojian Ren , Zhengyang Zhang , Hongfei Cao , Xinglin Zhang , Shuchen Bai","doi":"10.1016/j.dsp.2026.105934","DOIUrl":"10.1016/j.dsp.2026.105934","url":null,"abstract":"<div><div>The acquisition of signals via physical image sensors under low-light conditions constitutes a classic ill-posed inverse problem in digital signal processing. The severe signal-to-noise ratio (SNR) degradation, stemming from stochastic processes like photon shot noise and non-ideal characteristics of the sensor's signal processing pipeline, poses a significant challenge. Conventional supervised restoration algorithms are often constrained by the \"sensor reality gap,\" where models trained on synthetic data fail to generalize to the complex, non-linear degradation profiles of real-world hardware. Meanwhile, unsupervised methods frequently suffer from unstable convergence due to the absence of reliable optimization constraints. To address this fundamental issue, we propose the Adaptive Reality Correction Network (ARC-Net), a novel self-guided refinement framework. Without requiring paired data, ARC-Net formulates the unknown physical sensor corruption as a degradation residual. This residual is iteratively estimated from real-world, unpaired samples and injected back into the training stream as a learned prior through a self-correction loop. This mechanism adaptively forces the network to learn the inverse mapping of authentic sensor artifacts. Furthermore, we introduce stochastic information occlusion as a robust regularization strategy, which enhances the network's ability to reconstruct signals from severely corrupted regions by emulating photon starvation. Extensive experiments demonstrate the state-of-the-art performance of ARC-Net. It not only surpasses the leading supervised method by over 1.4 dB in PSNR on a standard paired dataset but, more critically, it successfully restores fine-grained signal details and color fidelity in extreme real-world scenarios where most contemporary algorithms fail. This validates the framework's superiority in addressing complex, authentic signal processing challenges and highlights its significant potential for improving the reliability of sensor-based systems.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105934"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-05DOI: 10.1016/j.dsp.2025.105853
Qingfeng Zeng , Yanfeng Geng , Shu Jiang , Weiliang Wang
Mud pulse telemetry (MPT) enables real-time transmission of downhole data during drilling operations. As the transmission distance increases, the received continuous pressure signals undergo significant attenuation. Moreover, strong periodic pump interference, random noise, and complex multipath propagation in the MPT system introduce three major challenges: (1) dynamic spectral overlap between signal and noise, (2) periodic disturbances with spectral drift, and (3) complex multi-scale temporal-frequency characteristics of the noise. These effects severely degrade signal quality, making accurate recovery particularly difficult for traditional model-based and learning-based denoising methods. To address these challenges, a lightweight neural network architecture named WaveU-Net is proposed. It consists of three major aspects: (1) To address dynamic spectral overlap between signal and noise, a learnable wavelet denoising network (LWDNet) is incorporated. By adaptively learning wavelet filters, LWDNet enables the model to track and separate time-varying overlapping frequency bands, thereby enhancing the extraction of weak signals from strong, spectrally mixed interference; (2) To cope with periodic noise and spectral drift, a frequency-domain contrast regularization (FCR) loss is introduced. This loss explicitly enforces separation between signal and noise in the frequency domain, improving the model’s ability to distinguish useful components even under shifting interference; (3) To effectively exploit information at multiple temporal and frequency scales, a compact U-Net architecture with frequency-aware skip connections is employed, which facilitates adaptive multi-scale feature fusion, further improving denoising performance. Experimental results on field-collected datasets demonstrate that WaveU-Net achieves an average reduction of 38.85% in mean squared error (MSE) compared to standard U-Net models. Moreover, WaveU-Net outperforms recent state-of-the-art (SOTA) models in terms of signal reconstruction quality, while requiring significantly fewer parameters and reducing computational complexity.
{"title":"WaveU -Net: Multi-scale wavelet framework for robust recovery of continuous pressure signals in mud pulse telemetry","authors":"Qingfeng Zeng , Yanfeng Geng , Shu Jiang , Weiliang Wang","doi":"10.1016/j.dsp.2025.105853","DOIUrl":"10.1016/j.dsp.2025.105853","url":null,"abstract":"<div><div>Mud pulse telemetry (MPT) enables real-time transmission of downhole data during drilling operations. As the transmission distance increases, the received continuous pressure signals undergo significant attenuation. Moreover, strong periodic pump interference, random noise, and complex multipath propagation in the MPT system introduce three major challenges: (1) dynamic spectral overlap between signal and noise, (2) periodic disturbances with spectral drift, and (3) complex multi-scale temporal-frequency characteristics of the noise. These effects severely degrade signal quality, making accurate recovery particularly difficult for traditional model-based and learning-based denoising methods. To address these challenges, a lightweight neural network architecture named WaveU-Net is proposed. It consists of three major aspects: (1) To address dynamic spectral overlap between signal and noise, a learnable wavelet denoising network (LWDNet) is incorporated. By adaptively learning wavelet filters, LWDNet enables the model to track and separate time-varying overlapping frequency bands, thereby enhancing the extraction of weak signals from strong, spectrally mixed interference; (2) To cope with periodic noise and spectral drift, a frequency-domain contrast regularization (FCR) loss is introduced. This loss explicitly enforces separation between signal and noise in the frequency domain, improving the model’s ability to distinguish useful components even under shifting interference; (3) To effectively exploit information at multiple temporal and frequency scales, a compact U-Net architecture with frequency-aware skip connections is employed, which facilitates adaptive multi-scale feature fusion, further improving denoising performance. Experimental results on field-collected datasets demonstrate that WaveU-Net achieves an average reduction of 38.85% in mean squared error (MSE) compared to standard U-Net models. Moreover, WaveU-Net outperforms recent state-of-the-art (SOTA) models in terms of signal reconstruction quality, while requiring significantly fewer parameters and reducing computational complexity.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105853"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}