Pub Date : 2026-04-01Epub Date: 2026-01-06DOI: 10.1016/j.dsp.2026.105881
Yuhan Zhang , Ka-Fai Cedric Yiu , Zhibao Li
Microphone arrays are widely utilized in various speech-related applications. However, using all available microphones enlarges the number of filter coefficients to be estimated, thereby increasing the computational burden without benefitting the overall performance. Consequently, selecting an optimal subset of microphones is crucial for enhancing beamformer performance. This problem is inherently combinatorial and conventionally solved through greedy-based methodologies. In this paper, we propose a novel microphone subset selection problem for beamforming and reformulate the combinatorial constraints into algebraic constraints, thereby transforming the problem into a novel mixed-integer linear programming (MILP) problem. The optimal subset is derived from a multi-objective optimization problem that maximizes beamforming performance while minimizing the number of selected microphones. The branch-and-bound method is employed to guarantee global optimality. Numerical experiments demonstrate the proposed method achieves similar beamforming performance to the greedy method and genetic algorithm (GA) while utilizing fewer microphones. This makes it particularly valuable in applications where hardware scale is strictly constrained.
{"title":"Optimal microphone subset selection for beamforming","authors":"Yuhan Zhang , Ka-Fai Cedric Yiu , Zhibao Li","doi":"10.1016/j.dsp.2026.105881","DOIUrl":"10.1016/j.dsp.2026.105881","url":null,"abstract":"<div><div>Microphone arrays are widely utilized in various speech-related applications. However, using all available microphones enlarges the number of filter coefficients to be estimated, thereby increasing the computational burden without benefitting the overall performance. Consequently, selecting an optimal subset of microphones is crucial for enhancing beamformer performance. This problem is inherently combinatorial and conventionally solved through greedy-based methodologies. In this paper, we propose a novel microphone subset selection problem for beamforming and reformulate the combinatorial constraints into algebraic constraints, thereby transforming the problem into a novel mixed-integer linear programming (MILP) problem. The optimal subset is derived from a multi-objective optimization problem that maximizes beamforming performance while minimizing the number of selected microphones. The branch-and-bound method is employed to guarantee global optimality. Numerical experiments demonstrate the proposed method achieves similar beamforming performance to the greedy method and genetic algorithm (GA) while utilizing fewer microphones. This makes it particularly valuable in applications where hardware scale is strictly constrained.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105881"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-13DOI: 10.1016/j.dsp.2026.105914
Yang He , Ning Cao , Hao Lu , Can Hu , Yajuan Guo
Continuous Phase Modulation (CPM) signals offer excellent spectral efficiency and constant envelope properties for wireless communications, but traditional detection methods suffer from prohibitive computational complexity. This paper presents CPMNet, a novel deep learning-based detection framework that addresses these limitations through an enhanced residual network architecture incorporating spatial attention mechanisms, multi-scale feature fusion, and bidirectional LSTM networks. CPMNet performs sequence-to-sequence detection without requiring channel estimation or equalization. Experimental results on Advanced Range Telemetry (ARTM) Tier 2 signals show performance varies with modulation complexity: while exhibiting 2–4 dB gaps compared to Maximum Likelihood Sequence Detection (MLSD) in high signal-to-noise ratio (SNR) AWGN channels for lower-order modulations, CPMNet maintains robust performance for high-order modulations where MLSD becomes impractical. In multipath fading channels, CPMNet significantly outperforms MLSD by 3–6 dB across various conditions, demonstrating superior resilience to channel impairments. The framework exhibits excellent generalization with only 1–2 dB degradation in unseen environments. Most critically, CPMNet maintains constant computational complexity regardless of CPM parameters, contrasting sharply with MLSD’s exponential complexity growth, making it particularly advantageous for high-order CPM signals that are computationally prohibitive for traditional methods.
{"title":"CPMNet: an enhanced residual network for continuous phase modulation signal detection","authors":"Yang He , Ning Cao , Hao Lu , Can Hu , Yajuan Guo","doi":"10.1016/j.dsp.2026.105914","DOIUrl":"10.1016/j.dsp.2026.105914","url":null,"abstract":"<div><div>Continuous Phase Modulation (CPM) signals offer excellent spectral efficiency and constant envelope properties for wireless communications, but traditional detection methods suffer from prohibitive computational complexity. This paper presents CPMNet, a novel deep learning-based detection framework that addresses these limitations through an enhanced residual network architecture incorporating spatial attention mechanisms, multi-scale feature fusion, and bidirectional LSTM networks. CPMNet performs sequence-to-sequence detection without requiring channel estimation or equalization. Experimental results on Advanced Range Telemetry (ARTM) Tier 2 signals show performance varies with modulation complexity: while exhibiting 2–4 dB gaps compared to Maximum Likelihood Sequence Detection (MLSD) in high signal-to-noise ratio (SNR) AWGN channels for lower-order modulations, CPMNet maintains robust performance for high-order modulations where MLSD becomes impractical. In multipath fading channels, CPMNet significantly outperforms MLSD by 3–6 dB across various conditions, demonstrating superior resilience to channel impairments. The framework exhibits excellent generalization with only 1–2 dB degradation in unseen environments. Most critically, CPMNet maintains constant computational complexity regardless of CPM parameters, contrasting sharply with MLSD’s exponential complexity growth, making it particularly advantageous for high-order CPM signals that are computationally prohibitive for traditional methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105914"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-17DOI: 10.1016/j.dsp.2026.105926
Zeyad A.H. Qasem , Xingbin Tu , Chunyi Song , Hamada Esmaiel , Waheb A. Jabbar , Fengzhong Qu
Although orthogonal signal division multiplexing (OSDM) offers improved performance for underwater acoustic communication (UWAC), it still faces two major challenges related to the high peak-to-average power ratio (PAPR) and increased sensitivity to inter-vector interference (IVI). This paper proposes a novel OSDM system, termed precoded unique word OSDM based on unitary neural network (UW-OSDM-UNN), to address these issues effectively. The proposed scheme embeds the guard interval within the fast Fourier transform duration to mitigate inter-symbol interference and employs a UNN-based precoder at the transmitter to reduce PAPR and significantly overcome the IVI sensitivity. The UNN-based transmitter is completely independent of the UWAC channel, eliminating the need for receiver-side training or additional testing-stage training. Furthermore, zero vectors and frequency-shifted Chu sequences are incorporated to enable robust Doppler shift estimation and multipath compensation, respectively. The Chu sequences are inserted in the frequency domain to generate deterministic sequences within the guard interval without introducing additional inter-symbol interference. The system is validated through both simulations and real-world sea trials over a 300-meter underwater connection. Results show that the proposed scheme achieves up to a 4 dB PAPR reduction, a 5 dB improvement in bit error rate (BER), and superior robustness against challenging UWAC channel conditions compared to state-of-the-art OSDM-based systems.
{"title":"Unique word orthogonal signal division multiplexing with complex unitary neural network for underwater acoustic communication","authors":"Zeyad A.H. Qasem , Xingbin Tu , Chunyi Song , Hamada Esmaiel , Waheb A. Jabbar , Fengzhong Qu","doi":"10.1016/j.dsp.2026.105926","DOIUrl":"10.1016/j.dsp.2026.105926","url":null,"abstract":"<div><div>Although orthogonal signal division multiplexing (OSDM) offers improved performance for underwater acoustic communication (UWAC), it still faces two major challenges related to the high peak-to-average power ratio (PAPR) and increased sensitivity to inter-vector interference (IVI). This paper proposes a novel OSDM system, termed precoded unique word OSDM based on unitary neural network (UW-OSDM-UNN), to address these issues effectively. The proposed scheme embeds the guard interval within the fast Fourier transform duration to mitigate inter-symbol interference and employs a UNN-based precoder at the transmitter to reduce PAPR and significantly overcome the IVI sensitivity. The UNN-based transmitter is completely independent of the UWAC channel, eliminating the need for receiver-side training or additional testing-stage training. Furthermore, zero vectors and frequency-shifted Chu sequences are incorporated to enable robust Doppler shift estimation and multipath compensation, respectively. The Chu sequences are inserted in the frequency domain to generate deterministic sequences within the guard interval without introducing additional inter-symbol interference. The system is validated through both simulations and real-world sea trials over a 300-meter underwater connection. Results show that the proposed scheme achieves up to a 4 dB PAPR reduction, a 5 dB improvement in bit error rate (BER), and superior robustness against challenging UWAC channel conditions compared to state-of-the-art OSDM-based systems.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105926"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-12DOI: 10.1016/j.dsp.2026.105895
Lei Gao, Taichang Tian, Luosheng Wen
Time-series clustering is an important method in data mining, which is widely used in various fields. However, the traditional clustering algorithms directly deal with the time-series data, which will lead to the serious issue of “dimensionality catastrophe”. It is an important method to capture the local features of time-series data by using the neighbor information. In this paper, we propose a hierarchical graph clustering algorithm (CTNG) based on common tightest neighbors(CTN), which is able to cluster various kinds of complex streaming data and noisy data by using the ratio of common tightest neighbors between data points to determine whether the edges are connected in the tightest neighbors graph(TNG) or not. In order to solve the issue of “dimension disaster”, combined with the local linear embedding algorithm (LLE), this paper proposes a time-series clustering algorithm based on LLE_CTNG, which can make full use of the local structure of the data to realize the dimensionality reduction and clustering. Through a large number of experiments, it is shown that the algorithm has superior and stable clustering performance, has certain advantages in running speed, and is robust to the number of the tightest neighbors parameter.
{"title":"Time-series clustering algorithm based on common tightest neighbors and local embedding","authors":"Lei Gao, Taichang Tian, Luosheng Wen","doi":"10.1016/j.dsp.2026.105895","DOIUrl":"10.1016/j.dsp.2026.105895","url":null,"abstract":"<div><div>Time-series clustering is an important method in data mining, which is widely used in various fields. However, the traditional clustering algorithms directly deal with the time-series data, which will lead to the serious issue of “dimensionality catastrophe”. It is an important method to capture the local features of time-series data by using the neighbor information. In this paper, we propose a hierarchical graph clustering algorithm (CTNG) based on common tightest neighbors(CTN), which is able to cluster various kinds of complex streaming data and noisy data by using the ratio of common tightest neighbors between data points to determine whether the edges are connected in the tightest neighbors graph(TNG) or not. In order to solve the issue of “dimension disaster”, combined with the local linear embedding algorithm (LLE), this paper proposes a time-series clustering algorithm based on LLE_CTNG, which can make full use of the local structure of the data to realize the dimensionality reduction and clustering. Through a large number of experiments, it is shown that the algorithm has superior and stable clustering performance, has certain advantages in running speed, and is robust to the number of the tightest neighbors parameter.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105895"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-13DOI: 10.1016/j.dsp.2026.105909
Ziqi Yan , Zhichao Zhang
To address the limitations of the graph fractional Fourier transform (GFRFT) Wiener filtering and the traditional joint time-vertex fractional Fourier transform (JFRFT) Wiener filtering, this study proposes a filtering method based on the hyper-differential form of the JFRFT. The gradient backpropagation mechanism is employed to establish the adaptive selection of transform order pair and filter coefficients. First, leveraging the hyper-differential form of the GFRFT and the fractional Fourier transform, the hyper-differential form of the JFRFT is constructed and its properties are analyzed. Second, time-varying graph signals are divided into dynamic graph sequences of equal span along the temporal dimension. A spatiotemporal joint representation is then established through vectorized reorganization, followed by the joint time-vertex Wiener filtering. Furthermore, by rigorously proving the differentiability of the transform orders, both the transform orders and filter coefficients are embedded as learnable parameters within a neural network architecture. Through gradient backpropagation, their synchronized iterative optimization is achieved, constructing a parameters-adaptive learning filtering framework. This method leverages a model-driven approach to learn the optimal transform order pair and filter coefficients. Experimental results indicate that the proposed framework improves the time-varying graph signals denoising performance, while reducing the computational burden of the traditional grid search strategy.
{"title":"Trainable joint time-vertex fractional Fourier transform","authors":"Ziqi Yan , Zhichao Zhang","doi":"10.1016/j.dsp.2026.105909","DOIUrl":"10.1016/j.dsp.2026.105909","url":null,"abstract":"<div><div>To address the limitations of the graph fractional Fourier transform (GFRFT) Wiener filtering and the traditional joint time-vertex fractional Fourier transform (JFRFT) Wiener filtering, this study proposes a filtering method based on the hyper-differential form of the JFRFT. The gradient backpropagation mechanism is employed to establish the adaptive selection of transform order pair and filter coefficients. First, leveraging the hyper-differential form of the GFRFT and the fractional Fourier transform, the hyper-differential form of the JFRFT is constructed and its properties are analyzed. Second, time-varying graph signals are divided into dynamic graph sequences of equal span along the temporal dimension. A spatiotemporal joint representation is then established through vectorized reorganization, followed by the joint time-vertex Wiener filtering. Furthermore, by rigorously proving the differentiability of the transform orders, both the transform orders and filter coefficients are embedded as learnable parameters within a neural network architecture. Through gradient backpropagation, their synchronized iterative optimization is achieved, constructing a parameters-adaptive learning filtering framework. This method leverages a model-driven approach to learn the optimal transform order pair and filter coefficients. Experimental results indicate that the proposed framework improves the time-varying graph signals denoising performance, while reducing the computational burden of the traditional grid search strategy.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105909"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-21DOI: 10.1016/j.dsp.2026.105929
Yongcai Tao , Renwei Xiao , Yucheng Shi , Zhe Li , Qing Zhang , Xiaotian Yuan , Lei Shi
The low incidence of skin diseases leads to a highly imbalanced class distribution, which complicates computer-aided diagnosis. While supervised contrastive learning has been applied to address this long-tail distribution, two challenges remain: first, the significant variation between intra-class and inter-class feature distributions, which hampers effective sample discrimination; and second, the insufficient number of tail-class samples, which limits their representation and impedes improvements in diagnostic accuracy. To address these challenges, we propose EnFuseNet, a novel contrastive learning framework. EnFuseNet incorporates two key modules: the Dual-view Interactive Fusion (DIF) module and the Tail Representation Enhancement (TREM) module. The DIF module enhances intra-class compactness and inter-class separability by combining dual-view features through a channel- and spatially interactive attention mechanism. The TREM module mitigates the issue of limited tail-class samples by generating and dynamically updating prototypes for these classes using a sliding window mechanism. Additionally, the Stage-Adaptive Weighted Cross-Entropy (SAW-CE) loss function, based on curriculum learning and dynamic weighting, guides the model toward more balanced inter-class learning, thereby alleviating diagnosis difficulties during training. Experimental results on the ISIC2018 and ISIC2019 skin disease datasets demonstrate that EnFuseNet achieves accuracy and AUC values of 86%-88% and 97%, respectively, outperforming state-of-the-art methods. These results highlight the potential of EnFuseNet in diagnosing rare and long-tail skin diseases. The source code is available on GitHub.
{"title":"EnFuseNet: A Dual-Module approach combining tail-Class enhancement and dynamic fusion for long-Tail skin lesion diagnosis","authors":"Yongcai Tao , Renwei Xiao , Yucheng Shi , Zhe Li , Qing Zhang , Xiaotian Yuan , Lei Shi","doi":"10.1016/j.dsp.2026.105929","DOIUrl":"10.1016/j.dsp.2026.105929","url":null,"abstract":"<div><div>The low incidence of skin diseases leads to a highly imbalanced class distribution, which complicates computer-aided diagnosis. While supervised contrastive learning has been applied to address this long-tail distribution, two challenges remain: first, the significant variation between intra-class and inter-class feature distributions, which hampers effective sample discrimination; and second, the insufficient number of tail-class samples, which limits their representation and impedes improvements in diagnostic accuracy. To address these challenges, we propose EnFuseNet, a novel contrastive learning framework. EnFuseNet incorporates two key modules: the Dual-view Interactive Fusion (DIF) module and the Tail Representation Enhancement (TREM) module. The DIF module enhances intra-class compactness and inter-class separability by combining dual-view features through a channel- and spatially interactive attention mechanism. The TREM module mitigates the issue of limited tail-class samples by generating and dynamically updating prototypes for these classes using a sliding window mechanism. Additionally, the Stage-Adaptive Weighted Cross-Entropy (SAW-CE) loss function, based on curriculum learning and dynamic weighting, guides the model toward more balanced inter-class learning, thereby alleviating diagnosis difficulties during training. Experimental results on the ISIC2018 and ISIC2019 skin disease datasets demonstrate that EnFuseNet achieves accuracy and AUC values of 86%-88% and 97%, respectively, outperforming state-of-the-art methods. These results highlight the potential of EnFuseNet in diagnosing rare and long-tail skin diseases. The source code is available on <span><span>GitHub</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105929"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-15DOI: 10.1016/j.dsp.2026.105911
Khalid Abdullah M. Salih , Ismail Amin Ali , Ramadhan J. Mstafa
Efficient video streaming over IP networks faces significant challenges due to packet loss and network congestion, particularly when using User Datagram Protocol (UDP), which lacks inherent error correction mechanisms. This study provides a comprehensive framework for selecting HEVC encoding configurations based on motion content and network condition. The paper evaluates the packet loss resilience of various HEVC encoding configurations across video content with high-motion, intermediate-motion, and low-motion activity. Utilizing UDP streaming in conjunction with the MPEG Transport Stream (MPEG-TS) container, video quality was quantified using the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) under packet loss rates of up to 1.0%. Three HEVC encoding configurations IPPP, periodic I, and periodic IDR were assessed. The results indicate that periodic IDR, with its closed GOP structure, achieves the highest resilience to packet loss, rendering it ideal for unreliable networks. Specifically, for high-motion video content, periodic IDR limited PSNR degradation to 6.97 dB (from 28.78 dB to 21.87 dB) under a 0.5% packet loss rate. For intermediate-motion content (Mobcal), PSNR decreased by 9.26 dB (from 34.85 dB to 25.23 dB), and for low-motion content (FourPeople), PSNR degraded by 6.96 dB (from 40.87 dB to 33.91 dB), consistently outperforming the other configurations. In contrast, periodic I demonstrated moderate resilience, with PSNR degradation of 9.6 dB for high-motion content, up to 14.36 dB for intermediate-motion content, and approximately 11.46 dB for low-motion content. The IPPP configuration exhibited the greatest vulnerability, with PSNR degradations of 12.66 dB, 18.7 dB, and 11.95 dB for Crowd_run, Mobcal, and FourPeople, respectively, due to extensive error propagation inherent in its open GOP structure. The findings advance the understanding of error resilience in video compression and offer practical guidelines for maximizing video quality in real-world streaming scenarios over lossy IP networks.
{"title":"Performance of HEVC video coding for delivery over IP networks","authors":"Khalid Abdullah M. Salih , Ismail Amin Ali , Ramadhan J. Mstafa","doi":"10.1016/j.dsp.2026.105911","DOIUrl":"10.1016/j.dsp.2026.105911","url":null,"abstract":"<div><div>Efficient video streaming over IP networks faces significant challenges due to packet loss and network congestion, particularly when using User Datagram Protocol (UDP), which lacks inherent error correction mechanisms. This study provides a comprehensive framework for selecting HEVC encoding configurations based on motion content and network condition. The paper evaluates the packet loss resilience of various HEVC encoding configurations across video content with high-motion, intermediate-motion, and low-motion activity. Utilizing UDP streaming in conjunction with the MPEG Transport Stream (MPEG-TS) container, video quality was quantified using the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) under packet loss rates of up to 1.0%. Three HEVC encoding configurations <em>IPPP, periodic I</em>, and <em>periodic IDR</em> were assessed. The results indicate that <em>periodic IDR</em>, with its closed GOP structure, achieves the highest resilience to packet loss, rendering it ideal for unreliable networks. Specifically, for high-motion video content, <em>periodic IDR</em> limited PSNR degradation to 6.97 dB (from 28.78 dB to 21.87 dB) under a 0.5% packet loss rate. For intermediate-motion content (Mobcal), PSNR decreased by 9.26 dB (from 34.85 dB to 25.23 dB), and for low-motion content (FourPeople), PSNR degraded by 6.96 dB (from 40.87 dB to 33.91 dB), consistently outperforming the other configurations. In contrast, <em>periodic I</em> demonstrated moderate resilience, with PSNR degradation of 9.6 dB for high-motion content, up to 14.36 dB for intermediate-motion content, and approximately 11.46 dB for low-motion content. The <em>IPPP</em> configuration exhibited the greatest vulnerability, with PSNR degradations of 12.66 dB, 18.7 dB, and 11.95 dB for Crowd_run, Mobcal, and FourPeople, respectively, due to extensive error propagation inherent in its open GOP structure. The findings advance the understanding of error resilience in video compression and offer practical guidelines for maximizing video quality in real-world streaming scenarios over lossy IP networks.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105911"},"PeriodicalIF":3.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-02DOI: 10.1016/j.dsp.2025.105867
Weidong Wang , Tianyou Wang , Hui Li , Wentao Shi , Wasiq Ali
In this paper, the problem of direction of arrival (DOA) estimation under the non-orthogonal deviation (NOD) in an acoustic vector sensor array (AVSA) is systematically addressed. First, by incorporating NOD information into the ideal AVSA model, two AVSA models with NOD are established. Subsequently, closed-form expressions for DOA estimation bias, the Cramér-Rao lower bound (CRLB), and the root mean square error (RMSE) are analytically derived for scenarios where each AVS exhibits NOD to illustrate the degrading influence of NOD on DOA estimation accuracy. To mitigate the effect of NOD, an innovative optimal modification matrix construction (OMMC) method is proposed. The NOD range of each AVS is initially coarsely estimated using prior information from a known auxiliary source and the theoretical RMSE. Based on the estimated deviation range, an overcomplete redundant correction matrix is constructed, which is used to calibrate the measurement data of each AVS. The optimal correction matrix is selected by minimizing the deviation between the estimated and true DOAs, and a global correction matrix for the entire array is formed by extracting the optimal correction sub-matrix for each AVS, thereby enabling accurate array calibration. A comprehensive performance evaluation is conducted through extensive simulations, where the proposed OMMC method is demonstrated to significantly outperform existing techniques, especially in challenging environments with large NOD or limited snapshot.
{"title":"Performance analysis and robust DOA estimation using acoustic vector sensor array under non-orthogonal deviation","authors":"Weidong Wang , Tianyou Wang , Hui Li , Wentao Shi , Wasiq Ali","doi":"10.1016/j.dsp.2025.105867","DOIUrl":"10.1016/j.dsp.2025.105867","url":null,"abstract":"<div><div>In this paper, the problem of direction of arrival (DOA) estimation under the non-orthogonal deviation (NOD) in an acoustic vector sensor array (AVSA) is systematically addressed. First, by incorporating NOD information into the ideal AVSA model, two AVSA models with NOD are established. Subsequently, closed-form expressions for DOA estimation bias, the Cramér-Rao lower bound (CRLB), and the root mean square error (RMSE) are analytically derived for scenarios where each AVS exhibits NOD to illustrate the degrading influence of NOD on DOA estimation accuracy. To mitigate the effect of NOD, an innovative optimal modification matrix construction (OMMC) method is proposed. The NOD range of each AVS is initially coarsely estimated using prior information from a known auxiliary source and the theoretical RMSE. Based on the estimated deviation range, an overcomplete redundant correction matrix is constructed, which is used to calibrate the measurement data of each AVS. The optimal correction matrix is selected by minimizing the deviation between the estimated and true DOAs, and a global correction matrix for the entire array is formed by extracting the optimal correction sub-matrix for each AVS, thereby enabling accurate array calibration. A comprehensive performance evaluation is conducted through extensive simulations, where the proposed OMMC method is demonstrated to significantly outperform existing techniques, especially in challenging environments with large NOD or limited snapshot.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105867"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2025-12-31DOI: 10.1016/j.dsp.2025.105859
Shiyue Na, Lin Chen, Muyu Lin
Software-Defined Networking (SDN) centralizes network control, enhancing management efficiency but increasing vulnerability to Distributed Denial-of-Service (DDoS) attacks. Low-rate DDoS (LDDoS) attacks are particularly challenging to detect, as their sporadic traffic closely mimics legitimate flows. Existing hybrid detection approaches often employ static fusion strategies that fail to adapt to the diverse characteristics of different LDDoS variants. This paper proposes a novel two-stage detection framework that fundamentally advances hybrid detection through adaptive feature fusion. The first stage utilizes Rényi entropy to efficiently filter 98.77% of benign traffic while retaining potential attack signatures. The second stage employs Trans-KAN, an innovative hybrid model that integrates Kolmogorov-Arnold Networks with Transformer architecture via an adaptive gating mechanism that dynamically balances their contributions through learnable weight matrices based on traffic characteristics. On custom SDN datasets, the proposed framework achieves 98.56% detection accuracy, 97.05% precision, 99.12% recall, and a 98.08% F1-Score with only 1.23% false positives, demonstrating improvements of 3.24% in accuracy over standalone Transformer and 5.48% over KAN. The synergistic combination of entropy-based pre-filtering and adaptive deep learning fusion establishes a new paradigm for LDDoS detection, offering theoretical insights into dynamic feature fusion and Kolmogorov-Arnold representations for hybrid deep learning, with practical applicability for next-generation network security systems.
SDN (software defined Networking)是一种网络集中控制技术,提高了网络管理效率,但也增加了受到DDoS (Distributed Denial-of-Service)攻击的脆弱性。低速率DDoS (LDDoS)攻击尤其难以检测,因为它们的零星流量非常接近合法流量。现有的混合检测方法通常采用静态融合策略,无法适应不同LDDoS变体的不同特征。本文提出了一种新的两阶段检测框架,从根本上推进了自适应特征融合混合检测。第一阶段利用rsamnyi熵有效过滤了98.77%的良性流量,同时保留了潜在的攻击特征。第二阶段采用Trans-KAN,这是一种创新的混合模型,通过自适应门通机制将Kolmogorov-Arnold网络与Transformer架构集成在一起,该机制通过基于流量特征的可学习权重矩阵动态平衡它们的贡献。在自定义SDN数据集上,该框架的检测准确率为98.56%,精度为97.05%,召回率为99.12%,F1-Score为98.08%,假阳性率仅为1.23%,比独立Transformer提高了3.24%,比KAN提高了5.48%。基于熵的预滤波和自适应深度学习融合的协同结合为LDDoS检测建立了一个新的范式,为混合深度学习的动态特征融合和Kolmogorov-Arnold表示提供了理论见解,具有下一代网络安全系统的实际适用性。
{"title":"Detecting low-rate DDoS attacks using Rényi entropy and trans-KAN hybrid model in SDN","authors":"Shiyue Na, Lin Chen, Muyu Lin","doi":"10.1016/j.dsp.2025.105859","DOIUrl":"10.1016/j.dsp.2025.105859","url":null,"abstract":"<div><div>Software-Defined Networking (SDN) centralizes network control, enhancing management efficiency but increasing vulnerability to Distributed Denial-of-Service (DDoS) attacks. Low-rate DDoS (LDDoS) attacks are particularly challenging to detect, as their sporadic traffic closely mimics legitimate flows. Existing hybrid detection approaches often employ static fusion strategies that fail to adapt to the diverse characteristics of different LDDoS variants. This paper proposes a novel two-stage detection framework that fundamentally advances hybrid detection through adaptive feature fusion. The first stage utilizes Rényi entropy to efficiently filter 98.77% of benign traffic while retaining potential attack signatures. The second stage employs Trans-KAN, an innovative hybrid model that integrates Kolmogorov-Arnold Networks with Transformer architecture via an adaptive gating mechanism that dynamically balances their contributions through learnable weight matrices based on traffic characteristics. On custom SDN datasets, the proposed framework achieves 98.56% detection accuracy, 97.05% precision, 99.12% recall, and a 98.08% F1-Score with only 1.23% false positives, demonstrating improvements of 3.24% in accuracy over standalone Transformer and 5.48% over KAN. The synergistic combination of entropy-based pre-filtering and adaptive deep learning fusion establishes a new paradigm for LDDoS detection, offering theoretical insights into dynamic feature fusion and Kolmogorov-Arnold representations for hybrid deep learning, with practical applicability for next-generation network security systems.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105859"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15Epub Date: 2026-01-08DOI: 10.1016/j.dsp.2026.105900
Nguyen Hong Kiem , Bui Anh Duc , Nguyen Tuan Minh , Le T.T. Huyen , Tran Manh Hoang
This paper investigates outage probability (OP) and ergodic capacity (EC) of a reconfigurable intelligent surface (RIS) assisted two-user rate-splitting multiple access (RSMA) communication system. Closed-form expressions for OP and EC are derived over Rayleigh fading channels, and validated through extensive Monte Carlo simulations. A comprehensive performance comparison is conducted between the proposed RIS-assisted RSMA scheme and two benchmark systems: RIS-assisted non-orthogonal multiple access (NOMA) and relay-assisted RSMA. Simulation results demonstrate that the proposed scheme significantly outperforms both benchmarks in terms of OP and EC, regardless of fading conditions. The influence of the critical system parameters, including the number of RIS reflecting elements, transmit power, power allocation factors, and the required rate of the common stream, is thoroughly examined. The results reveal that optimal power allocation between streams is essential for minimizing OP. These findings confirm that integrating RSMA with RIS provides a robust and efficient solution for enhancing communication reliability and spectral efficiency in future 6G wireless networks, especially in challenging non-line-of-sight environments.
{"title":"Outage probability and ergodic capacity of RIS-assisted RSMA communication system","authors":"Nguyen Hong Kiem , Bui Anh Duc , Nguyen Tuan Minh , Le T.T. Huyen , Tran Manh Hoang","doi":"10.1016/j.dsp.2026.105900","DOIUrl":"10.1016/j.dsp.2026.105900","url":null,"abstract":"<div><div>This paper investigates outage probability (OP) and ergodic capacity (EC) of a reconfigurable intelligent surface (RIS) assisted two-user rate-splitting multiple access (RSMA) communication system. Closed-form expressions for OP and EC are derived over Rayleigh fading channels, and validated through extensive Monte Carlo simulations. A comprehensive performance comparison is conducted between the proposed RIS-assisted RSMA scheme and two benchmark systems: RIS-assisted non-orthogonal multiple access (NOMA) and relay-assisted RSMA. Simulation results demonstrate that the proposed scheme significantly outperforms both benchmarks in terms of OP and EC, regardless of fading conditions. The influence of the critical system parameters, including the number of RIS reflecting elements, transmit power, power allocation factors, and the required rate of the common stream, is thoroughly examined. The results reveal that optimal power allocation between streams is essential for minimizing OP. These findings confirm that integrating RSMA with RIS provides a robust and efficient solution for enhancing communication reliability and spectral efficiency in future 6G wireless networks, especially in challenging non-line-of-sight environments.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105900"},"PeriodicalIF":3.0,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}