Pub Date : 2026-01-13DOI: 10.1016/j.dsp.2026.105906
Haiyi Tong, Dekang Zhu, Zhou Zhang
This paper presents HAIR-GLMB, a Hybrid Appearance and IoU Reinforced Generalized Labeled Multi-Bernoulli (GLMB) filter tailored for multi-target tracking in challenging unmanned aerial vehicle (UAV) scenarios. To address frequent association ambiguities caused by dense target distributions, we propose an adaptive hybrid cost matrix that integrates Intersection-over-Union (IoU) spatial cues with appearance similarity. Specifically, an entropy-based adaptive weighting mechanism dynamically balances spatial and appearance information, thereby enhancing association reliability. We further develop a reinforced likelihood computation within the GLMB recursion, explicitly embedding spatial and appearance information into the update process. A motion-aware adaptive survival probability model is also proposed, effectively sustaining track continuity for inward-moving targets near the boundaries of the camera’s field of view. To improve efficiency, the Gibbs sampler is initialized with an assignment obtained by the Hungarian algorithm on the hybrid cost matrix, placing the Markov chain near high-probability regions and reducing sampling overhead under a limited computational budget. Experiments on challenging UAV benchmarks (VisDrone2019, UAVDT) show that HAIR-GLMB consistently outperforms a GLMB baseline relying only on IoU, yielding higher tracking accuracy, fewer identity switches, and reduced fragmentation.
{"title":"HAIR-GLMB: Hybrid appearance-IoU reinforced GLMB filter for UAV-based multi-target tracking","authors":"Haiyi Tong, Dekang Zhu, Zhou Zhang","doi":"10.1016/j.dsp.2026.105906","DOIUrl":"10.1016/j.dsp.2026.105906","url":null,"abstract":"<div><div>This paper presents HAIR-GLMB, a Hybrid Appearance and IoU Reinforced Generalized Labeled Multi-Bernoulli (GLMB) filter tailored for multi-target tracking in challenging unmanned aerial vehicle (UAV) scenarios. To address frequent association ambiguities caused by dense target distributions, we propose an adaptive hybrid cost matrix that integrates Intersection-over-Union (IoU) spatial cues with appearance similarity. Specifically, an entropy-based adaptive weighting mechanism dynamically balances spatial and appearance information, thereby enhancing association reliability. We further develop a reinforced likelihood computation within the GLMB recursion, explicitly embedding spatial and appearance information into the update process. A motion-aware adaptive survival probability model is also proposed, effectively sustaining track continuity for inward-moving targets near the boundaries of the camera’s field of view. To improve efficiency, the Gibbs sampler is initialized with an assignment obtained by the Hungarian algorithm on the hybrid cost matrix, placing the Markov chain near high-probability regions and reducing sampling overhead under a limited computational budget. Experiments on challenging UAV benchmarks (VisDrone2019, UAVDT) show that HAIR-GLMB consistently outperforms a GLMB baseline relying only on IoU, yielding higher tracking accuracy, fewer identity switches, and reduced fragmentation.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105906"},"PeriodicalIF":3.0,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.dsp.2026.105913
Yihan Wang , Yongfang Wang , Zhijun Fang , Tengyao Cui
Existing Point Cloud Geometry Compression (PCGC) methods often inadequately handle non-uniform point density and fail to fully exploit multi-scale contextual features, limiting their efficiency and reconstruction quality. To bridge this gap, we argue that an effective solution must jointly addresses local geometric adaptation and the aggregation of multi-scale contextual features. Accordingly, we propose a novel PCGC method, consisting of Global-Local Feature Extraction Network (GLFE-Net), Multi-scale Feature Enhancement Network (MFE-Net), and Coordinates Reconstruction based on Offset (CRO). The GLFE-Net incorporates Local Adaptive Density (LAD) to address the non-uniform density distribution and Global-Local Context Differential (GLCD) module to fuse local and global features. The MFE-Net employs the Feature Extraction based on Offset-attention (FEO) module to enhance the feature expression ability, and utilizes the Multi-scale Semantics Fusion (MSF) module to optimize the multi-scale feature fusion. The CRO module utilizes the learnable offset mechanism for high-fidelity reconstruction. Experimental results demonstrate that our method achieves significant improvements, with Peak Signal-to-Noise Ratio (PSNR) gains of up to 29.25 dB (D1) and 27.31 dB (D2) over the existing PCGC methods. This work provides an effective solution for high performance PCGC method by jointly addressing the key challenges of density adaptation and multi-scale feature learning.
现有的点云几何压缩(PCGC)方法往往不能充分处理非均匀点密度,不能充分利用多尺度上下文特征,限制了其效率和重建质量。为了弥补这一差距,我们认为一个有效的解决方案必须同时解决局部几何适应和多尺度上下文特征的聚集。为此,我们提出了一种新的PCGC方法,包括全局局部特征提取网络(GLFE-Net)、多尺度特征增强网络(MFE-Net)和基于偏移量的坐标重建(CRO)。GLFE-Net采用局部自适应密度(LAD)来解决密度分布不均匀的问题,采用全局-局部上下文差分(GLCD)模块来融合局部和全局特征。MFE-Net采用基于偏移注意力的特征提取(FEO)模块来增强特征表达能力,并利用多尺度语义融合(MSF)模块来优化多尺度特征融合。CRO模块利用可学习偏移机制实现高保真重建。实验结果表明,我们的方法取得了显著的改进,与现有的PCGC方法相比,峰值信噪比(PSNR)增益高达29.25 dB (D1)和27.31 dB (D2)。该工作通过共同解决密度自适应和多尺度特征学习的关键挑战,为高性能PCGC方法提供了有效的解决方案。
{"title":"Towards point cloud geometry compression via global-local and multi-scale feature learning","authors":"Yihan Wang , Yongfang Wang , Zhijun Fang , Tengyao Cui","doi":"10.1016/j.dsp.2026.105913","DOIUrl":"10.1016/j.dsp.2026.105913","url":null,"abstract":"<div><div>Existing Point Cloud Geometry Compression (PCGC) methods often inadequately handle non-uniform point density and fail to fully exploit multi-scale contextual features, limiting their efficiency and reconstruction quality. To bridge this gap, we argue that an effective solution must jointly addresses local geometric adaptation and the aggregation of multi-scale contextual features. Accordingly, we propose a novel PCGC method, consisting of Global-Local Feature Extraction Network (GLFE-Net), Multi-scale Feature Enhancement Network (MFE-Net), and Coordinates Reconstruction based on Offset (CRO). The GLFE-Net incorporates Local Adaptive Density (LAD) to address the non-uniform density distribution and Global-Local Context Differential (GLCD) module to fuse local and global features. The MFE-Net employs the Feature Extraction based on Offset-attention (FEO) module to enhance the feature expression ability, and utilizes the Multi-scale Semantics Fusion (MSF) module to optimize the multi-scale feature fusion. The CRO module utilizes the learnable offset mechanism for high-fidelity reconstruction. Experimental results demonstrate that our method achieves significant improvements, with Peak Signal-to-Noise Ratio (PSNR) gains of up to 29.25 dB (D1) and 27.31 dB (D2) over the existing PCGC methods. This work provides an effective solution for high performance PCGC method by jointly addressing the key challenges of density adaptation and multi-scale feature learning.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105913"},"PeriodicalIF":3.0,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.dsp.2026.105909
Ziqi Yan , Zhichao Zhang
To address the limitations of the graph fractional Fourier transform (GFRFT) Wiener filtering and the traditional joint time-vertex fractional Fourier transform (JFRFT) Wiener filtering, this study proposes a filtering method based on the hyper-differential form of the JFRFT. The gradient backpropagation mechanism is employed to establish the adaptive selection of transform order pair and filter coefficients. First, leveraging the hyper-differential form of the GFRFT and the fractional Fourier transform, the hyper-differential form of the JFRFT is constructed and its properties are analyzed. Second, time-varying graph signals are divided into dynamic graph sequences of equal span along the temporal dimension. A spatiotemporal joint representation is then established through vectorized reorganization, followed by the joint time-vertex Wiener filtering. Furthermore, by rigorously proving the differentiability of the transform orders, both the transform orders and filter coefficients are embedded as learnable parameters within a neural network architecture. Through gradient backpropagation, their synchronized iterative optimization is achieved, constructing a parameters-adaptive learning filtering framework. This method leverages a model-driven approach to learn the optimal transform order pair and filter coefficients. Experimental results indicate that the proposed framework improves the time-varying graph signals denoising performance, while reducing the computational burden of the traditional grid search strategy.
{"title":"Trainable joint time-vertex fractional Fourier transform","authors":"Ziqi Yan , Zhichao Zhang","doi":"10.1016/j.dsp.2026.105909","DOIUrl":"10.1016/j.dsp.2026.105909","url":null,"abstract":"<div><div>To address the limitations of the graph fractional Fourier transform (GFRFT) Wiener filtering and the traditional joint time-vertex fractional Fourier transform (JFRFT) Wiener filtering, this study proposes a filtering method based on the hyper-differential form of the JFRFT. The gradient backpropagation mechanism is employed to establish the adaptive selection of transform order pair and filter coefficients. First, leveraging the hyper-differential form of the GFRFT and the fractional Fourier transform, the hyper-differential form of the JFRFT is constructed and its properties are analyzed. Second, time-varying graph signals are divided into dynamic graph sequences of equal span along the temporal dimension. A spatiotemporal joint representation is then established through vectorized reorganization, followed by the joint time-vertex Wiener filtering. Furthermore, by rigorously proving the differentiability of the transform orders, both the transform orders and filter coefficients are embedded as learnable parameters within a neural network architecture. Through gradient backpropagation, their synchronized iterative optimization is achieved, constructing a parameters-adaptive learning filtering framework. This method leverages a model-driven approach to learn the optimal transform order pair and filter coefficients. Experimental results indicate that the proposed framework improves the time-varying graph signals denoising performance, while reducing the computational burden of the traditional grid search strategy.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105909"},"PeriodicalIF":3.0,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.dsp.2026.105896
Mukul Chauhan, Waseem Z. Lone, Amit K. Verma
This paper introduces a novel time–frequency distribution, referred to as the two-dimensional non-separable quadratic-phase Wigner distribution (2D-NSQPWD), formulated within the framework of the two-dimensional non-separable quadratic-phase Fourier transform (2D-NSQPFT). The proposed distribution extends the classical two-dimensional Wigner distribution (2D-WD) through a convolution-based formulation that incorporates the structural characteristics of the 2D-NSQPFT, thereby enabling an effective representation of complex, non-separable signal structures. We rigorously establish several key properties of the 2D-NSQPWD, including time and frequency shift invariance, marginal behavior, conjugate symmetry, convolution relations, and Moyal’s identity. The effectiveness of the distribution is demonstrated through its application to single-, bi-, and tri-component two-dimensional linear frequency-modulated (2D-LFM) signals. Finally, simulations show that the proposed transform exhibits superior performance in cross-term suppression and signal localization compared to existing transforms.
{"title":"A novel two-dimensional Wigner distribution framework via the quadratic phase Fourier transform with a non-separable kernel","authors":"Mukul Chauhan, Waseem Z. Lone, Amit K. Verma","doi":"10.1016/j.dsp.2026.105896","DOIUrl":"10.1016/j.dsp.2026.105896","url":null,"abstract":"<div><div>This paper introduces a novel time–frequency distribution, referred to as the two-dimensional non-separable quadratic-phase Wigner distribution (2D-NSQPWD), formulated within the framework of the two-dimensional non-separable quadratic-phase Fourier transform (2D-NSQPFT). The proposed distribution extends the classical two-dimensional Wigner distribution (2D-WD) through a convolution-based formulation that incorporates the structural characteristics of the 2D-NSQPFT, thereby enabling an effective representation of complex, non-separable signal structures. We rigorously establish several key properties of the 2D-NSQPWD, including time and frequency shift invariance, marginal behavior, conjugate symmetry, convolution relations, and Moyal’s identity. The effectiveness of the distribution is demonstrated through its application to single-, bi-, and tri-component two-dimensional linear frequency-modulated (2D-LFM) signals. Finally, simulations show that the proposed transform exhibits superior performance in cross-term suppression and signal localization compared to existing transforms.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105896"},"PeriodicalIF":3.0,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.dsp.2026.105893
Markus Sifft, Armin Ghorbanietemad, Fabian Wagner, Daniel Hägele
Higher-order spectra (Brillinger’s polyspectra) offer powerful methods for solving critical problems in signal processing and data analysis. Despite their significant potential, their practical use has remained limited due to unresolved mathematical issues in spectral estimation, including the absence of unbiased and consistent estimators and the high computational cost associated with evaluating multidimensional spectra. Consequently, existing tools frequently produce artifacts-no existing software library correctly implements Brillinger’s cumulant-based trispectrum-or fail to scale effectively to real-world data volumes, leaving crucial applications like multi-detector spectral analysis largely unexplored.
In this paper, we revisit higher-order spectra from a modern perspective, addressing the root causes of their historical underuse. We reformulate higher-order spectral estimation using recently derived multivariate k-statistics, yielding unbiased and consistent estimators that eliminate spurious artifacts and precisely align with Brillinger’s theoretical definitions. Our methodology covers single- and multi-channel spectral analysis up to the bispectrum (third order) and trispectrum (fourth order), enabling robust investigations of inter-frequency coupling, non-Gaussian behavior, and time-reversal symmetry breaking. Additionally, we introduce quasi-polyspectra to uncover non-stationary, time-dependent higher-order features. We implement these new estimators in SignalSnap, an open-source GPU-accelerated library capable of efficiently analyzing datasets exceeding hundreds of gigabytes within minutes.
In applications such as continuous quantum measurements, SignalSnap’s rigorous estimators enable precise quantitative matching between experimental data and theoretical models. With detailed derivations and illustrative examples, this work provides the theoretical and computational foundation necessary for establishing higher-order spectra as a reliable, standard tool in modern signal analysis.
{"title":"Correct estimation of higher-order spectra: From theoretical challenges to practical multi-channel implementation in SignalSnap","authors":"Markus Sifft, Armin Ghorbanietemad, Fabian Wagner, Daniel Hägele","doi":"10.1016/j.dsp.2026.105893","DOIUrl":"10.1016/j.dsp.2026.105893","url":null,"abstract":"<div><div>Higher-order spectra (Brillinger’s polyspectra) offer powerful methods for solving critical problems in signal processing and data analysis. Despite their significant potential, their practical use has remained limited due to unresolved mathematical issues in spectral estimation, including the absence of unbiased and consistent estimators and the high computational cost associated with evaluating multidimensional spectra. Consequently, existing tools frequently produce artifacts-no existing software library correctly implements Brillinger’s cumulant-based trispectrum-or fail to scale effectively to real-world data volumes, leaving crucial applications like multi-detector spectral analysis largely unexplored.</div><div>In this paper, we revisit higher-order spectra from a modern perspective, addressing the root causes of their historical underuse. We reformulate higher-order spectral estimation using recently derived multivariate k-statistics, yielding unbiased and consistent estimators that eliminate spurious artifacts and precisely align with Brillinger’s theoretical definitions. Our methodology covers single- and multi-channel spectral analysis up to the bispectrum (third order) and trispectrum (fourth order), enabling robust investigations of inter-frequency coupling, non-Gaussian behavior, and time-reversal symmetry breaking. Additionally, we introduce quasi-polyspectra to uncover non-stationary, time-dependent higher-order features. We implement these new estimators in SignalSnap, an open-source GPU-accelerated library capable of efficiently analyzing datasets exceeding hundreds of gigabytes within minutes.</div><div>In applications such as continuous quantum measurements, SignalSnap’s rigorous estimators enable precise quantitative matching between experimental data and theoretical models. With detailed derivations and illustrative examples, this work provides the theoretical and computational foundation necessary for establishing higher-order spectra as a reliable, standard tool in modern signal analysis.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105893"},"PeriodicalIF":3.0,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Small object detection remains a critical challenge due to limited pixel representation and uneven spatial distribution. In the absence of sufficient contextual information, it is difficult to extract discriminative and complete features for accurate detection. By analyzing multi-scale feature fusion within modern detectors, we proposed a Multi-stage Path Aggregation module(MPAM) composed of the Parallel Residual Fusion Module(PRFM) and the Differential Path Channel Aggregation Module(DPCAM). Through decomposing the path aggregation operation into multiple stages, MPAM significantly enhanced the feature maps’ capacity to accommodate and process contextual information. PRFM captured texture and semantic information from the multi-scale feature maps through skip connections. Moreover, a channel branch was added to enable the dynamic distribution of attention weights across both the channel and spatial dimensions. DPCAM is proposed to balance channel and spatial information from different feature maps through channel expansion operation. Additionally, Deep-wise Partial Attention(DPA) is designed to enhance the ability of representing features for small objects within complex backgrounds by balancing weights between local and global information. Integrated into popular detectors, our method delivers consistent gains. Compared with yolov8s, mAP50:95 of our method improved by 3.7% on VisDrone and 3.2% on MS COCO, respectively. Experimental results validate the effectiveness of the proposed module in significantly enhancing small object detection accuracy.
{"title":"A multi-stage path aggregation module for small object detection on drone-captured scenarios","authors":"Wenyuan Fan , Xuemei Xu , Zhaohui Jiang , Zehan Zhu","doi":"10.1016/j.dsp.2026.105901","DOIUrl":"10.1016/j.dsp.2026.105901","url":null,"abstract":"<div><div>Small object detection remains a critical challenge due to limited pixel representation and uneven spatial distribution. In the absence of sufficient contextual information, it is difficult to extract discriminative and complete features for accurate detection. By analyzing multi-scale feature fusion within modern detectors, we proposed a Multi-stage Path Aggregation module(MPAM) composed of the Parallel Residual Fusion Module(PRFM) and the Differential Path Channel Aggregation Module(DPCAM). Through decomposing the path aggregation operation into multiple stages, MPAM significantly enhanced the feature maps’ capacity to accommodate and process contextual information. PRFM captured texture and semantic information from the multi-scale feature maps through skip connections. Moreover, a channel branch was added to enable the dynamic distribution of attention weights across both the channel and spatial dimensions. DPCAM is proposed to balance channel and spatial information from different feature maps through channel expansion operation. Additionally, Deep-wise Partial Attention(DPA) is designed to enhance the ability of representing features for small objects within complex backgrounds by balancing weights between local and global information. Integrated into popular detectors, our method delivers consistent gains. Compared with yolov8s, mAP50:95 of our method improved by 3.7% on VisDrone and 3.2% on MS COCO, respectively. Experimental results validate the effectiveness of the proposed module in significantly enhancing small object detection accuracy.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105901"},"PeriodicalIF":3.0,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.dsp.2026.105902
Pengpeng Xie, Ziyang Ding, Qianfan Li, Cong Shi, Shibo Bin
Current image fusion algorithms often face modality preference issues: they either excessively depend on the thermal radiation features of infrared images, leading to the loss of visible light texture details, or they prioritize visible light images, which undermines infrared target detection. This makes it challenging to achieve a dynamic balance and collaborative optimization of information from both modalities in complex scenarios. This asymmetric fusion approach makes it difficult for the system to simultaneously preserve sensitivity to thermal radiation targets while maintaining the ability to resolve texture details under extreme lighting conditions. To address this, the paper proposes an infrared and visible light fusion model that incorporates a gradient-pixel joint constraint. Our approach eliminates the complexity and uncertainty associated with manual feature extraction, while effectively leveraging shallow features through multiple shortcut connections. Within the framework of Generative Adversarial Networks, we design a gradient-pixel joint loss function that strikes a balance between preserving significant targets in the infrared image and maintaining the texture structure in the visible light image, thereby enhancing image detail and retaining high-contrast information. To thoroughly evaluate the performance of the proposed method, we conducted systematic experiments using the TNO and RoadScene benchmark datasets, comparing it with eleven state-of-the-art fusion algorithms. The experimental results demonstrate that the proposed method offers significant advantages in both subjective visual quality and objective evaluation metrics. In terms of qualitative evaluation, the fusion results not only preserve natural lighting transitions but, more importantly, accentuate thermal radiation targets in the infrared image while fully retaining the texture details of the visible light image. Quantitative analysis reveals that the proposed method significantly improves metrics such as Mutual Information (MI) and Spatial Frequency (SF). This provides new insights in the field of multimodal image fusion and contributes to balancing the complementary advantages of different modality features.
{"title":"GPF-GAN: An unsupervised generative adversarial network for joint gradient and pixel-constrained fusion of infrared and visible images","authors":"Pengpeng Xie, Ziyang Ding, Qianfan Li, Cong Shi, Shibo Bin","doi":"10.1016/j.dsp.2026.105902","DOIUrl":"10.1016/j.dsp.2026.105902","url":null,"abstract":"<div><div>Current image fusion algorithms often face modality preference issues: they either excessively depend on the thermal radiation features of infrared images, leading to the loss of visible light texture details, or they prioritize visible light images, which undermines infrared target detection. This makes it challenging to achieve a dynamic balance and collaborative optimization of information from both modalities in complex scenarios. This asymmetric fusion approach makes it difficult for the system to simultaneously preserve sensitivity to thermal radiation targets while maintaining the ability to resolve texture details under extreme lighting conditions. To address this, the paper proposes an infrared and visible light fusion model that incorporates a gradient-pixel joint constraint. Our approach eliminates the complexity and uncertainty associated with manual feature extraction, while effectively leveraging shallow features through multiple shortcut connections. Within the framework of Generative Adversarial Networks, we design a gradient-pixel joint loss function that strikes a balance between preserving significant targets in the infrared image and maintaining the texture structure in the visible light image, thereby enhancing image detail and retaining high-contrast information. To thoroughly evaluate the performance of the proposed method, we conducted systematic experiments using the TNO and RoadScene benchmark datasets, comparing it with eleven state-of-the-art fusion algorithms. The experimental results demonstrate that the proposed method offers significant advantages in both subjective visual quality and objective evaluation metrics. In terms of qualitative evaluation, the fusion results not only preserve natural lighting transitions but, more importantly, accentuate thermal radiation targets in the infrared image while fully retaining the texture details of the visible light image. Quantitative analysis reveals that the proposed method significantly improves metrics such as Mutual Information (MI) and Spatial Frequency (SF). This provides new insights in the field of multimodal image fusion and contributes to balancing the complementary advantages of different modality features.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105902"},"PeriodicalIF":3.0,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145950131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.dsp.2026.105905
Weigang Chen, Zhiyong Chen
In the field of active noise control (ANC), the traditional filtered-x normalized least mean square (FxNLMS) algorithm does not utilize the sparsity of the adaptive filter's weight vector, resulting in poor noise reduction performance. Additionally, when the reverberation time is long, the FxNLMS algorithm suffers from excessive computational load. To address the above two shortcomings of the FxNLMS algorithm, this paper proposes a logarithmic-sum function constrained set-membership FxNLMS (LSF-SM-FxNLMS) algorithm, which introduces a constraint and a logarithmic-sum function penalty to the cost function of the FxNLMS algorithm to reduce the computational load and utilize the sparsity of the adaptive filter's weight vector. A hardware-in-the-loop test bench was constructed to measure the actual primary and secondary paths. In this paper, the proposed algorithm is described and derived in detail, and its performance is analyzed through computer simulations based on the actual primary and secondary paths. Simulation results show that the proposed algorithm outperforms the traditional algorithms in terms of the noise reduction.
{"title":"Logarithmic-sum function constrained set-membership FxNLMS algorithm for active noise control","authors":"Weigang Chen, Zhiyong Chen","doi":"10.1016/j.dsp.2026.105905","DOIUrl":"10.1016/j.dsp.2026.105905","url":null,"abstract":"<div><div>In the field of active noise control (ANC), the traditional filtered-x normalized least mean square (FxNLMS) algorithm does not utilize the sparsity of the adaptive filter's weight vector, resulting in poor noise reduction performance. Additionally, when the reverberation time is long, the FxNLMS algorithm suffers from excessive computational load. To address the above two shortcomings of the FxNLMS algorithm, this paper proposes a logarithmic-sum function constrained set-membership FxNLMS (LSF-SM-FxNLMS) algorithm, which introduces a constraint and a logarithmic-sum function penalty to the cost function of the FxNLMS algorithm to reduce the computational load and utilize the sparsity of the adaptive filter's weight vector. A hardware-in-the-loop test bench was constructed to measure the actual primary and secondary paths. In this paper, the proposed algorithm is described and derived in detail, and its performance is analyzed through computer simulations based on the actual primary and secondary paths. Simulation results show that the proposed algorithm outperforms the traditional algorithms in terms of the noise reduction.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105905"},"PeriodicalIF":3.0,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.dsp.2026.105900
Nguyen Hong Kiem , Bui Anh Duc , Nguyen Tuan Minh , Le T.T. Huyen , Tran Manh Hoang
This paper investigates outage probability (OP) and ergodic capacity (EC) of a reconfigurable intelligent surface (RIS) assisted two-user rate-splitting multiple access (RSMA) communication system. Closed-form expressions for OP and EC are derived over Rayleigh fading channels, and validated through extensive Monte Carlo simulations. A comprehensive performance comparison is conducted between the proposed RIS-assisted RSMA scheme and two benchmark systems: RIS-assisted non-orthogonal multiple access (NOMA) and relay-assisted RSMA. Simulation results demonstrate that the proposed scheme significantly outperforms both benchmarks in terms of OP and EC, regardless of fading conditions. The influence of the critical system parameters, including the number of RIS reflecting elements, transmit power, power allocation factors, and the required rate of the common stream, is thoroughly examined. The results reveal that optimal power allocation between streams is essential for minimizing OP. These findings confirm that integrating RSMA with RIS provides a robust and efficient solution for enhancing communication reliability and spectral efficiency in future 6G wireless networks, especially in challenging non-line-of-sight environments.
{"title":"Outage probability and ergodic capacity of RIS-assisted RSMA communication system","authors":"Nguyen Hong Kiem , Bui Anh Duc , Nguyen Tuan Minh , Le T.T. Huyen , Tran Manh Hoang","doi":"10.1016/j.dsp.2026.105900","DOIUrl":"10.1016/j.dsp.2026.105900","url":null,"abstract":"<div><div>This paper investigates outage probability (OP) and ergodic capacity (EC) of a reconfigurable intelligent surface (RIS) assisted two-user rate-splitting multiple access (RSMA) communication system. Closed-form expressions for OP and EC are derived over Rayleigh fading channels, and validated through extensive Monte Carlo simulations. A comprehensive performance comparison is conducted between the proposed RIS-assisted RSMA scheme and two benchmark systems: RIS-assisted non-orthogonal multiple access (NOMA) and relay-assisted RSMA. Simulation results demonstrate that the proposed scheme significantly outperforms both benchmarks in terms of OP and EC, regardless of fading conditions. The influence of the critical system parameters, including the number of RIS reflecting elements, transmit power, power allocation factors, and the required rate of the common stream, is thoroughly examined. The results reveal that optimal power allocation between streams is essential for minimizing OP. These findings confirm that integrating RSMA with RIS provides a robust and efficient solution for enhancing communication reliability and spectral efficiency in future 6G wireless networks, especially in challenging non-line-of-sight environments.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105900"},"PeriodicalIF":3.0,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.dsp.2026.105898
Meng Yin , Binghe Sun , Rugang Wang , Yuanyuan Wang , Feng Zhou , Xuesheng Bian
To address the challenges of weak features, susceptibility to complex background interference in infrared small targets, and the high computational cost of existing specialized detection models, this paper proposes the Dual-Domain Fusion and Class-Aware Self-supervised YOLO (DCS-YOLO). This framework leverages dual-domain feature fusion and class-aware self-supervised learning for semantic enhancement. During feature extraction, a Class-aware Self-supervised Semantic Fusion Module (CSSFM) utilizes a class-aware self-supervised architecture as a deep semantic guide for generating discriminative semantic features, thereby enhancing the perception of faint target characteristics. Additionally, a Dual-domain Aware Enhancement Module (A2C2f_DDA) is designed, which analyzes the high-frequency components of small targets and employs a spatial-frequency domain feature complementary fusion strategy to sharpen feature capture while suppressing background clutter. For feature upsampling and fusion, a Multi-dimensional Selective Feature Pyramid Network (MSFPN) employs a frequency-domain, spatial, and channel three-dimensional cooperative selection mechanism, integrated with deep semantic information, to enhance feature integration across dimensions and improve detection performance in complex scenes. Furthermore, lightweight components including GSConv, VoVGSCSP, and LSCD-Detect are incorporated to reduce computational complexity and model parameters. Comprehensive evaluations on the IRSTD-1K, RealScene-ISTD, and SIRST-v2 datasets demonstrate the effectiveness of the proposed algorithm, achieving [email protected] scores of 80.7%, 90.2%, and 93.3%, respectively. The results indicate that the algorithm effectively utilizes frequency-domain analysis and semantic enhancement, providing a powerful and efficient solution for infrared small target detection in complex scenarios while maintaining a favorable balance between accuracy and computational cost.
{"title":"Research on infrared small target detection technology based on DCS-YOLO algorithm","authors":"Meng Yin , Binghe Sun , Rugang Wang , Yuanyuan Wang , Feng Zhou , Xuesheng Bian","doi":"10.1016/j.dsp.2026.105898","DOIUrl":"10.1016/j.dsp.2026.105898","url":null,"abstract":"<div><div>To address the challenges of weak features, susceptibility to complex background interference in infrared small targets, and the high computational cost of existing specialized detection models, this paper proposes the Dual-Domain Fusion and Class-Aware Self-supervised YOLO (DCS-YOLO). This framework leverages dual-domain feature fusion and class-aware self-supervised learning for semantic enhancement. During feature extraction, a Class-aware Self-supervised Semantic Fusion Module (CSSFM) utilizes a class-aware self-supervised architecture as a deep semantic guide for generating discriminative semantic features, thereby enhancing the perception of faint target characteristics. Additionally, a Dual-domain Aware Enhancement Module (A2C2f_DDA) is designed, which analyzes the high-frequency components of small targets and employs a spatial-frequency domain feature complementary fusion strategy to sharpen feature capture while suppressing background clutter. For feature upsampling and fusion, a Multi-dimensional Selective Feature Pyramid Network (MSFPN) employs a frequency-domain, spatial, and channel three-dimensional cooperative selection mechanism, integrated with deep semantic information, to enhance feature integration across dimensions and improve detection performance in complex scenes. Furthermore, lightweight components including GSConv, VoVGSCSP, and LSCD-Detect are incorporated to reduce computational complexity and model parameters. Comprehensive evaluations on the IRSTD-1K, RealScene-ISTD, and SIRST-v2 datasets demonstrate the effectiveness of the proposed algorithm, achieving [email protected] scores of 80.7%, 90.2%, and 93.3%, respectively. The results indicate that the algorithm effectively utilizes frequency-domain analysis and semantic enhancement, providing a powerful and efficient solution for infrared small target detection in complex scenarios while maintaining a favorable balance between accuracy and computational cost.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105898"},"PeriodicalIF":3.0,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}