首页 > 最新文献

Digital Signal Processing最新文献

英文 中文
HAIR-GLMB: Hybrid appearance-IoU reinforced GLMB filter for UAV-based multi-target tracking HAIR-GLMB:用于无人机多目标跟踪的混合appearance-IoU增强GLMB滤波器
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-13 DOI: 10.1016/j.dsp.2026.105906
Haiyi Tong, Dekang Zhu, Zhou Zhang
This paper presents HAIR-GLMB, a Hybrid Appearance and IoU Reinforced Generalized Labeled Multi-Bernoulli (GLMB) filter tailored for multi-target tracking in challenging unmanned aerial vehicle (UAV) scenarios. To address frequent association ambiguities caused by dense target distributions, we propose an adaptive hybrid cost matrix that integrates Intersection-over-Union (IoU) spatial cues with appearance similarity. Specifically, an entropy-based adaptive weighting mechanism dynamically balances spatial and appearance information, thereby enhancing association reliability. We further develop a reinforced likelihood computation within the GLMB recursion, explicitly embedding spatial and appearance information into the update process. A motion-aware adaptive survival probability model is also proposed, effectively sustaining track continuity for inward-moving targets near the boundaries of the camera’s field of view. To improve efficiency, the Gibbs sampler is initialized with an assignment obtained by the Hungarian algorithm on the hybrid cost matrix, placing the Markov chain near high-probability regions and reducing sampling overhead under a limited computational budget. Experiments on challenging UAV benchmarks (VisDrone2019, UAVDT) show that HAIR-GLMB consistently outperforms a GLMB baseline relying only on IoU, yielding higher tracking accuracy, fewer identity switches, and reduced fragmentation.
本文提出了HAIR-GLMB滤波器,这是一种专为具有挑战性的无人机场景中的多目标跟踪而设计的混合外观和IoU增强广义标记多伯努利(GLMB)滤波器。为了解决密集目标分布引起的频繁关联模糊,我们提出了一个自适应混合成本矩阵,该矩阵将交叉-超联合(IoU)空间线索与外观相似性集成在一起。具体而言,基于熵的自适应加权机制动态平衡空间和外观信息,从而提高关联的可靠性。我们进一步在GLMB递归中开发了强化的似然计算,明确地将空间和外观信息嵌入到更新过程中。提出了一种运动感知自适应生存概率模型,有效地维持了摄像机视场边界附近向内运动目标的轨迹连续性。为了提高效率,Gibbs采样器使用匈牙利算法在混合代价矩阵上得到的赋值进行初始化,将马尔可夫链放置在高概率区域附近,在有限的计算预算下减少采样开销。在具有挑战性的无人机基准测试(VisDrone2019, UAVDT)上进行的实验表明,HAIR-GLMB始终优于仅依赖IoU的GLMB基线,具有更高的跟踪精度,更少的身份切换和更少的碎片化。
{"title":"HAIR-GLMB: Hybrid appearance-IoU reinforced GLMB filter for UAV-based multi-target tracking","authors":"Haiyi Tong,&nbsp;Dekang Zhu,&nbsp;Zhou Zhang","doi":"10.1016/j.dsp.2026.105906","DOIUrl":"10.1016/j.dsp.2026.105906","url":null,"abstract":"<div><div>This paper presents HAIR-GLMB, a Hybrid Appearance and IoU Reinforced Generalized Labeled Multi-Bernoulli (GLMB) filter tailored for multi-target tracking in challenging unmanned aerial vehicle (UAV) scenarios. To address frequent association ambiguities caused by dense target distributions, we propose an adaptive hybrid cost matrix that integrates Intersection-over-Union (IoU) spatial cues with appearance similarity. Specifically, an entropy-based adaptive weighting mechanism dynamically balances spatial and appearance information, thereby enhancing association reliability. We further develop a reinforced likelihood computation within the GLMB recursion, explicitly embedding spatial and appearance information into the update process. A motion-aware adaptive survival probability model is also proposed, effectively sustaining track continuity for inward-moving targets near the boundaries of the camera’s field of view. To improve efficiency, the Gibbs sampler is initialized with an assignment obtained by the Hungarian algorithm on the hybrid cost matrix, placing the Markov chain near high-probability regions and reducing sampling overhead under a limited computational budget. Experiments on challenging UAV benchmarks (VisDrone2019, UAVDT) show that HAIR-GLMB consistently outperforms a GLMB baseline relying only on IoU, yielding higher tracking accuracy, fewer identity switches, and reduced fragmentation.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105906"},"PeriodicalIF":3.0,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards point cloud geometry compression via global-local and multi-scale feature learning 基于全局-局部和多尺度特征学习的点云几何压缩
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-13 DOI: 10.1016/j.dsp.2026.105913
Yihan Wang , Yongfang Wang , Zhijun Fang , Tengyao Cui
Existing Point Cloud Geometry Compression (PCGC) methods often inadequately handle non-uniform point density and fail to fully exploit multi-scale contextual features, limiting their efficiency and reconstruction quality. To bridge this gap, we argue that an effective solution must jointly addresses local geometric adaptation and the aggregation of multi-scale contextual features. Accordingly, we propose a novel PCGC method, consisting of Global-Local Feature Extraction Network (GLFE-Net), Multi-scale Feature Enhancement Network (MFE-Net), and Coordinates Reconstruction based on Offset (CRO). The GLFE-Net incorporates Local Adaptive Density (LAD) to address the non-uniform density distribution and Global-Local Context Differential (GLCD) module to fuse local and global features. The MFE-Net employs the Feature Extraction based on Offset-attention (FEO) module to enhance the feature expression ability, and utilizes the Multi-scale Semantics Fusion (MSF) module to optimize the multi-scale feature fusion. The CRO module utilizes the learnable offset mechanism for high-fidelity reconstruction. Experimental results demonstrate that our method achieves significant improvements, with Peak Signal-to-Noise Ratio (PSNR) gains of up to 29.25 dB (D1) and 27.31 dB (D2) over the existing PCGC methods. This work provides an effective solution for high performance PCGC method by jointly addressing the key challenges of density adaptation and multi-scale feature learning.
现有的点云几何压缩(PCGC)方法往往不能充分处理非均匀点密度,不能充分利用多尺度上下文特征,限制了其效率和重建质量。为了弥补这一差距,我们认为一个有效的解决方案必须同时解决局部几何适应和多尺度上下文特征的聚集。为此,我们提出了一种新的PCGC方法,包括全局局部特征提取网络(GLFE-Net)、多尺度特征增强网络(MFE-Net)和基于偏移量的坐标重建(CRO)。GLFE-Net采用局部自适应密度(LAD)来解决密度分布不均匀的问题,采用全局-局部上下文差分(GLCD)模块来融合局部和全局特征。MFE-Net采用基于偏移注意力的特征提取(FEO)模块来增强特征表达能力,并利用多尺度语义融合(MSF)模块来优化多尺度特征融合。CRO模块利用可学习偏移机制实现高保真重建。实验结果表明,我们的方法取得了显著的改进,与现有的PCGC方法相比,峰值信噪比(PSNR)增益高达29.25 dB (D1)和27.31 dB (D2)。该工作通过共同解决密度自适应和多尺度特征学习的关键挑战,为高性能PCGC方法提供了有效的解决方案。
{"title":"Towards point cloud geometry compression via global-local and multi-scale feature learning","authors":"Yihan Wang ,&nbsp;Yongfang Wang ,&nbsp;Zhijun Fang ,&nbsp;Tengyao Cui","doi":"10.1016/j.dsp.2026.105913","DOIUrl":"10.1016/j.dsp.2026.105913","url":null,"abstract":"<div><div>Existing Point Cloud Geometry Compression (PCGC) methods often inadequately handle non-uniform point density and fail to fully exploit multi-scale contextual features, limiting their efficiency and reconstruction quality. To bridge this gap, we argue that an effective solution must jointly addresses local geometric adaptation and the aggregation of multi-scale contextual features. Accordingly, we propose a novel PCGC method, consisting of Global-Local Feature Extraction Network (GLFE-Net), Multi-scale Feature Enhancement Network (MFE-Net), and Coordinates Reconstruction based on Offset (CRO). The GLFE-Net incorporates Local Adaptive Density (LAD) to address the non-uniform density distribution and Global-Local Context Differential (GLCD) module to fuse local and global features. The MFE-Net employs the Feature Extraction based on Offset-attention (FEO) module to enhance the feature expression ability, and utilizes the Multi-scale Semantics Fusion (MSF) module to optimize the multi-scale feature fusion. The CRO module utilizes the learnable offset mechanism for high-fidelity reconstruction. Experimental results demonstrate that our method achieves significant improvements, with Peak Signal-to-Noise Ratio (PSNR) gains of up to 29.25 dB (D1) and 27.31 dB (D2) over the existing PCGC methods. This work provides an effective solution for high performance PCGC method by jointly addressing the key challenges of density adaptation and multi-scale feature learning.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105913"},"PeriodicalIF":3.0,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trainable joint time-vertex fractional Fourier transform 可训练联合时顶点分数傅里叶变换
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-13 DOI: 10.1016/j.dsp.2026.105909
Ziqi Yan , Zhichao Zhang
To address the limitations of the graph fractional Fourier transform (GFRFT) Wiener filtering and the traditional joint time-vertex fractional Fourier transform (JFRFT) Wiener filtering, this study proposes a filtering method based on the hyper-differential form of the JFRFT. The gradient backpropagation mechanism is employed to establish the adaptive selection of transform order pair and filter coefficients. First, leveraging the hyper-differential form of the GFRFT and the fractional Fourier transform, the hyper-differential form of the JFRFT is constructed and its properties are analyzed. Second, time-varying graph signals are divided into dynamic graph sequences of equal span along the temporal dimension. A spatiotemporal joint representation is then established through vectorized reorganization, followed by the joint time-vertex Wiener filtering. Furthermore, by rigorously proving the differentiability of the transform orders, both the transform orders and filter coefficients are embedded as learnable parameters within a neural network architecture. Through gradient backpropagation, their synchronized iterative optimization is achieved, constructing a parameters-adaptive learning filtering framework. This method leverages a model-driven approach to learn the optimal transform order pair and filter coefficients. Experimental results indicate that the proposed framework improves the time-varying graph signals denoising performance, while reducing the computational burden of the traditional grid search strategy.
针对图分数阶傅里叶变换(GFRFT)维纳滤波和传统联合时间顶点分数阶傅里叶变换(JFRFT)维纳滤波的局限性,提出了一种基于JFRFT超微分形式的滤波方法。利用梯度反向传播机制建立了变换阶对和滤波系数的自适应选择。首先,利用GFRFT的超微分形式和分数阶傅里叶变换,构造了JFRFT的超微分形式并分析了其性质。其次,将时变图信号沿时间维划分为等跨度的动态图序列;然后通过向量化重组建立时空联合表示,然后进行联合时间-顶点维纳滤波。此外,通过严格证明变换阶数的可微性,将变换阶数和滤波系数作为可学习参数嵌入到神经网络结构中。通过梯度反向传播,实现了它们的同步迭代优化,构造了一个参数自适应学习滤波框架。该方法利用模型驱动的方法来学习最优变换阶对和过滤系数。实验结果表明,该框架在提高时变图信号去噪性能的同时,减少了传统网格搜索策略的计算量。
{"title":"Trainable joint time-vertex fractional Fourier transform","authors":"Ziqi Yan ,&nbsp;Zhichao Zhang","doi":"10.1016/j.dsp.2026.105909","DOIUrl":"10.1016/j.dsp.2026.105909","url":null,"abstract":"<div><div>To address the limitations of the graph fractional Fourier transform (GFRFT) Wiener filtering and the traditional joint time-vertex fractional Fourier transform (JFRFT) Wiener filtering, this study proposes a filtering method based on the hyper-differential form of the JFRFT. The gradient backpropagation mechanism is employed to establish the adaptive selection of transform order pair and filter coefficients. First, leveraging the hyper-differential form of the GFRFT and the fractional Fourier transform, the hyper-differential form of the JFRFT is constructed and its properties are analyzed. Second, time-varying graph signals are divided into dynamic graph sequences of equal span along the temporal dimension. A spatiotemporal joint representation is then established through vectorized reorganization, followed by the joint time-vertex Wiener filtering. Furthermore, by rigorously proving the differentiability of the transform orders, both the transform orders and filter coefficients are embedded as learnable parameters within a neural network architecture. Through gradient backpropagation, their synchronized iterative optimization is achieved, constructing a parameters-adaptive learning filtering framework. This method leverages a model-driven approach to learn the optimal transform order pair and filter coefficients. Experimental results indicate that the proposed framework improves the time-varying graph signals denoising performance, while reducing the computational burden of the traditional grid search strategy.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105909"},"PeriodicalIF":3.0,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel two-dimensional Wigner distribution framework via the quadratic phase Fourier transform with a non-separable kernel 基于二次相傅里叶变换的不可分核二维Wigner分布框架
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-11 DOI: 10.1016/j.dsp.2026.105896
Mukul Chauhan, Waseem Z. Lone, Amit K. Verma
This paper introduces a novel time–frequency distribution, referred to as the two-dimensional non-separable quadratic-phase Wigner distribution (2D-NSQPWD), formulated within the framework of the two-dimensional non-separable quadratic-phase Fourier transform (2D-NSQPFT). The proposed distribution extends the classical two-dimensional Wigner distribution (2D-WD) through a convolution-based formulation that incorporates the structural characteristics of the 2D-NSQPFT, thereby enabling an effective representation of complex, non-separable signal structures. We rigorously establish several key properties of the 2D-NSQPWD, including time and frequency shift invariance, marginal behavior, conjugate symmetry, convolution relations, and Moyal’s identity. The effectiveness of the distribution is demonstrated through its application to single-, bi-, and tri-component two-dimensional linear frequency-modulated (2D-LFM) signals. Finally, simulations show that the proposed transform exhibits superior performance in cross-term suppression and signal localization compared to existing transforms.
本文介绍了一种新的时频分布,即二维不可分二次相维格纳分布(2D-NSQPWD),该分布是在二维不可分二次相傅里叶变换(2D-NSQPFT)的框架内提出的。所提出的分布通过基于卷积的公式扩展了经典二维维格纳分布(2D-WD),该公式结合了2D-NSQPFT的结构特征,从而能够有效地表示复杂的、不可分离的信号结构。我们严格地建立了2D-NSQPWD的几个关键性质,包括时频移不变性、边缘行为、共轭对称性、卷积关系和Moyal恒等式。通过对单分量、双分量和三分量二维线性调频(2D-LFM)信号的应用,证明了该分布的有效性。最后,仿真结果表明,与现有变换相比,该变换在交叉项抑制和信号定位方面具有更好的性能。
{"title":"A novel two-dimensional Wigner distribution framework via the quadratic phase Fourier transform with a non-separable kernel","authors":"Mukul Chauhan,&nbsp;Waseem Z. Lone,&nbsp;Amit K. Verma","doi":"10.1016/j.dsp.2026.105896","DOIUrl":"10.1016/j.dsp.2026.105896","url":null,"abstract":"<div><div>This paper introduces a novel time–frequency distribution, referred to as the two-dimensional non-separable quadratic-phase Wigner distribution (2D-NSQPWD), formulated within the framework of the two-dimensional non-separable quadratic-phase Fourier transform (2D-NSQPFT). The proposed distribution extends the classical two-dimensional Wigner distribution (2D-WD) through a convolution-based formulation that incorporates the structural characteristics of the 2D-NSQPFT, thereby enabling an effective representation of complex, non-separable signal structures. We rigorously establish several key properties of the 2D-NSQPWD, including time and frequency shift invariance, marginal behavior, conjugate symmetry, convolution relations, and Moyal’s identity. The effectiveness of the distribution is demonstrated through its application to single-, bi-, and tri-component two-dimensional linear frequency-modulated (2D-LFM) signals. Finally, simulations show that the proposed transform exhibits superior performance in cross-term suppression and signal localization compared to existing transforms.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105896"},"PeriodicalIF":3.0,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correct estimation of higher-order spectra: From theoretical challenges to practical multi-channel implementation in SignalSnap 高阶频谱的正确估计:从理论挑战到SignalSnap的实际多通道实现
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-10 DOI: 10.1016/j.dsp.2026.105893
Markus Sifft, Armin Ghorbanietemad, Fabian Wagner, Daniel Hägele
Higher-order spectra (Brillinger’s polyspectra) offer powerful methods for solving critical problems in signal processing and data analysis. Despite their significant potential, their practical use has remained limited due to unresolved mathematical issues in spectral estimation, including the absence of unbiased and consistent estimators and the high computational cost associated with evaluating multidimensional spectra. Consequently, existing tools frequently produce artifacts-no existing software library correctly implements Brillinger’s cumulant-based trispectrum-or fail to scale effectively to real-world data volumes, leaving crucial applications like multi-detector spectral analysis largely unexplored.
In this paper, we revisit higher-order spectra from a modern perspective, addressing the root causes of their historical underuse. We reformulate higher-order spectral estimation using recently derived multivariate k-statistics, yielding unbiased and consistent estimators that eliminate spurious artifacts and precisely align with Brillinger’s theoretical definitions. Our methodology covers single- and multi-channel spectral analysis up to the bispectrum (third order) and trispectrum (fourth order), enabling robust investigations of inter-frequency coupling, non-Gaussian behavior, and time-reversal symmetry breaking. Additionally, we introduce quasi-polyspectra to uncover non-stationary, time-dependent higher-order features. We implement these new estimators in SignalSnap, an open-source GPU-accelerated library capable of efficiently analyzing datasets exceeding hundreds of gigabytes within minutes.
In applications such as continuous quantum measurements, SignalSnap’s rigorous estimators enable precise quantitative matching between experimental data and theoretical models. With detailed derivations and illustrative examples, this work provides the theoretical and computational foundation necessary for establishing higher-order spectra as a reliable, standard tool in modern signal analysis.
高阶谱(布里林格多谱)为解决信号处理和数据分析中的关键问题提供了强有力的方法。尽管它们具有巨大的潜力,但由于光谱估计中未解决的数学问题,包括缺乏无偏和一致的估计器以及与评估多维光谱相关的高计算成本,它们的实际应用仍然有限。因此,现有的工具经常产生工件——没有现有的软件库正确地实现Brillinger的基于累积量的三光谱——或者不能有效地扩展到现实世界的数据量,使得像多探测器光谱分析这样的关键应用在很大程度上没有被探索。在本文中,我们从现代的角度重新审视高阶光谱,解决其历史上未充分利用的根本原因。我们使用最近导出的多元k统计量重新制定高阶光谱估计,产生无偏和一致的估计,消除了虚假的工件,并精确地与Brillinger的理论定义对齐。我们的方法涵盖单通道和多通道频谱分析,直至双频谱(三阶)和三频谱(四阶),能够对频间耦合,非高斯行为和时间反转对称性破断进行稳健的研究。此外,我们引入了准多光谱来揭示非平稳的、时变的高阶特征。我们在SignalSnap中实现了这些新的估计器,SignalSnap是一个开源的gpu加速库,能够在几分钟内有效地分析超过数百gb的数据集。在连续量子测量等应用中,SignalSnap的严格估计器可以实现实验数据和理论模型之间的精确定量匹配。通过详细的推导和举例说明,本工作为建立高阶谱作为现代信号分析中可靠的标准工具提供了必要的理论和计算基础。
{"title":"Correct estimation of higher-order spectra: From theoretical challenges to practical multi-channel implementation in SignalSnap","authors":"Markus Sifft,&nbsp;Armin Ghorbanietemad,&nbsp;Fabian Wagner,&nbsp;Daniel Hägele","doi":"10.1016/j.dsp.2026.105893","DOIUrl":"10.1016/j.dsp.2026.105893","url":null,"abstract":"<div><div>Higher-order spectra (Brillinger’s polyspectra) offer powerful methods for solving critical problems in signal processing and data analysis. Despite their significant potential, their practical use has remained limited due to unresolved mathematical issues in spectral estimation, including the absence of unbiased and consistent estimators and the high computational cost associated with evaluating multidimensional spectra. Consequently, existing tools frequently produce artifacts-no existing software library correctly implements Brillinger’s cumulant-based trispectrum-or fail to scale effectively to real-world data volumes, leaving crucial applications like multi-detector spectral analysis largely unexplored.</div><div>In this paper, we revisit higher-order spectra from a modern perspective, addressing the root causes of their historical underuse. We reformulate higher-order spectral estimation using recently derived multivariate k-statistics, yielding unbiased and consistent estimators that eliminate spurious artifacts and precisely align with Brillinger’s theoretical definitions. Our methodology covers single- and multi-channel spectral analysis up to the bispectrum (third order) and trispectrum (fourth order), enabling robust investigations of inter-frequency coupling, non-Gaussian behavior, and time-reversal symmetry breaking. Additionally, we introduce quasi-polyspectra to uncover non-stationary, time-dependent higher-order features. We implement these new estimators in SignalSnap, an open-source GPU-accelerated library capable of efficiently analyzing datasets exceeding hundreds of gigabytes within minutes.</div><div>In applications such as continuous quantum measurements, SignalSnap’s rigorous estimators enable precise quantitative matching between experimental data and theoretical models. With detailed derivations and illustrative examples, this work provides the theoretical and computational foundation necessary for establishing higher-order spectra as a reliable, standard tool in modern signal analysis.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105893"},"PeriodicalIF":3.0,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-stage path aggregation module for small object detection on drone-captured scenarios 用于无人机捕获场景下小目标检测的多阶段路径聚合模块
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-10 DOI: 10.1016/j.dsp.2026.105901
Wenyuan Fan , Xuemei Xu , Zhaohui Jiang , Zehan Zhu
Small object detection remains a critical challenge due to limited pixel representation and uneven spatial distribution. In the absence of sufficient contextual information, it is difficult to extract discriminative and complete features for accurate detection. By analyzing multi-scale feature fusion within modern detectors, we proposed a Multi-stage Path Aggregation module(MPAM) composed of the Parallel Residual Fusion Module(PRFM) and the Differential Path Channel Aggregation Module(DPCAM). Through decomposing the path aggregation operation into multiple stages, MPAM significantly enhanced the feature maps’ capacity to accommodate and process contextual information. PRFM captured texture and semantic information from the multi-scale feature maps through skip connections. Moreover, a channel branch was added to enable the dynamic distribution of attention weights across both the channel and spatial dimensions. DPCAM is proposed to balance channel and spatial information from different feature maps through channel expansion operation. Additionally, Deep-wise Partial Attention(DPA) is designed to enhance the ability of representing features for small objects within complex backgrounds by balancing weights between local and global information. Integrated into popular detectors, our method delivers consistent gains. Compared with yolov8s, mAP50:95 of our method improved by 3.7% on VisDrone and 3.2% on MS COCO, respectively. Experimental results validate the effectiveness of the proposed module in significantly enhancing small object detection accuracy.
由于像素表示有限和空间分布不均匀,小目标检测仍然是一个关键的挑战。在缺乏足够的上下文信息的情况下,很难提取出有区别的、完整的特征来进行准确的检测。在分析现代探测器多尺度特征融合的基础上,提出了一种由并行残差融合模块(PRFM)和差分路径通道聚合模块(DPCAM)组成的多阶段路径聚合模块(MPAM)。通过将路径聚合操作分解为多个阶段,MPAM显著增强了特征映射容纳和处理上下文信息的能力。PRFM通过跳跃连接从多尺度特征图中捕获纹理和语义信息。此外,还增加了一个通道分支,以实现注意力权重在通道和空间维度上的动态分布。DPCAM通过通道扩展运算来平衡来自不同特征映射的通道和空间信息。此外,深度部分注意(Deep-wise Partial Attention, DPA)通过平衡局部和全局信息之间的权重,增强了复杂背景中小目标的特征表示能力。集成到流行的检测器,我们的方法提供一致的增益。与yolov8s相比,该方法的mAP50:95在VisDrone和MS COCO上分别提高了3.7%和3.2%。实验结果验证了该模块的有效性,显著提高了小目标的检测精度。
{"title":"A multi-stage path aggregation module for small object detection on drone-captured scenarios","authors":"Wenyuan Fan ,&nbsp;Xuemei Xu ,&nbsp;Zhaohui Jiang ,&nbsp;Zehan Zhu","doi":"10.1016/j.dsp.2026.105901","DOIUrl":"10.1016/j.dsp.2026.105901","url":null,"abstract":"<div><div>Small object detection remains a critical challenge due to limited pixel representation and uneven spatial distribution. In the absence of sufficient contextual information, it is difficult to extract discriminative and complete features for accurate detection. By analyzing multi-scale feature fusion within modern detectors, we proposed a Multi-stage Path Aggregation module(MPAM) composed of the Parallel Residual Fusion Module(PRFM) and the Differential Path Channel Aggregation Module(DPCAM). Through decomposing the path aggregation operation into multiple stages, MPAM significantly enhanced the feature maps’ capacity to accommodate and process contextual information. PRFM captured texture and semantic information from the multi-scale feature maps through skip connections. Moreover, a channel branch was added to enable the dynamic distribution of attention weights across both the channel and spatial dimensions. DPCAM is proposed to balance channel and spatial information from different feature maps through channel expansion operation. Additionally, Deep-wise Partial Attention(DPA) is designed to enhance the ability of representing features for small objects within complex backgrounds by balancing weights between local and global information. Integrated into popular detectors, our method delivers consistent gains. Compared with yolov8s, mAP50:95 of our method improved by 3.7% on VisDrone and 3.2% on MS COCO, respectively. Experimental results validate the effectiveness of the proposed module in significantly enhancing small object detection accuracy.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105901"},"PeriodicalIF":3.0,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPF-GAN: An unsupervised generative adversarial network for joint gradient and pixel-constrained fusion of infrared and visible images GPF-GAN:用于红外和可见光图像联合梯度和像素约束融合的无监督生成对抗网络
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-09 DOI: 10.1016/j.dsp.2026.105902
Pengpeng Xie, Ziyang Ding, Qianfan Li, Cong Shi, Shibo Bin
Current image fusion algorithms often face modality preference issues: they either excessively depend on the thermal radiation features of infrared images, leading to the loss of visible light texture details, or they prioritize visible light images, which undermines infrared target detection. This makes it challenging to achieve a dynamic balance and collaborative optimization of information from both modalities in complex scenarios. This asymmetric fusion approach makes it difficult for the system to simultaneously preserve sensitivity to thermal radiation targets while maintaining the ability to resolve texture details under extreme lighting conditions. To address this, the paper proposes an infrared and visible light fusion model that incorporates a gradient-pixel joint constraint. Our approach eliminates the complexity and uncertainty associated with manual feature extraction, while effectively leveraging shallow features through multiple shortcut connections. Within the framework of Generative Adversarial Networks, we design a gradient-pixel joint loss function that strikes a balance between preserving significant targets in the infrared image and maintaining the texture structure in the visible light image, thereby enhancing image detail and retaining high-contrast information. To thoroughly evaluate the performance of the proposed method, we conducted systematic experiments using the TNO and RoadScene benchmark datasets, comparing it with eleven state-of-the-art fusion algorithms. The experimental results demonstrate that the proposed method offers significant advantages in both subjective visual quality and objective evaluation metrics. In terms of qualitative evaluation, the fusion results not only preserve natural lighting transitions but, more importantly, accentuate thermal radiation targets in the infrared image while fully retaining the texture details of the visible light image. Quantitative analysis reveals that the proposed method significantly improves metrics such as Mutual Information (MI) and Spatial Frequency (SF). This provides new insights in the field of multimodal image fusion and contributes to balancing the complementary advantages of different modality features.
目前的图像融合算法往往面临模态偏好问题:要么过度依赖红外图像的热辐射特征,导致丢失可见光纹理细节,要么优先考虑可见光图像,从而破坏红外目标检测。这使得在复杂场景中实现两种模式信息的动态平衡和协作优化具有挑战性。这种不对称融合方法使得系统难以同时保持对热辐射目标的敏感性,同时保持在极端光照条件下解决纹理细节的能力。为了解决这一问题,本文提出了一种包含梯度-像素联合约束的红外和可见光融合模型。我们的方法消除了人工特征提取的复杂性和不确定性,同时通过多个快捷连接有效地利用了浅层特征。在生成对抗网络的框架内,我们设计了一个梯度-像素联合损失函数,在保留红外图像中的重要目标和保留可见光图像中的纹理结构之间取得平衡,从而增强图像细节并保留高对比度信息。为了全面评估该方法的性能,我们使用TNO和RoadScene基准数据集进行了系统实验,并将其与11种最先进的融合算法进行了比较。实验结果表明,该方法在主观视觉质量和客观评价指标上都具有显著的优势。在定性评价方面,融合结果不仅保留了自然光照过渡,更重要的是在充分保留可见光图像纹理细节的同时,突出了红外图像中的热辐射目标。定量分析表明,该方法显著提高了互信息(MI)和空间频率(SF)等指标。这为多模态图像融合领域提供了新的见解,有助于平衡不同模态特征的互补优势。
{"title":"GPF-GAN: An unsupervised generative adversarial network for joint gradient and pixel-constrained fusion of infrared and visible images","authors":"Pengpeng Xie,&nbsp;Ziyang Ding,&nbsp;Qianfan Li,&nbsp;Cong Shi,&nbsp;Shibo Bin","doi":"10.1016/j.dsp.2026.105902","DOIUrl":"10.1016/j.dsp.2026.105902","url":null,"abstract":"<div><div>Current image fusion algorithms often face modality preference issues: they either excessively depend on the thermal radiation features of infrared images, leading to the loss of visible light texture details, or they prioritize visible light images, which undermines infrared target detection. This makes it challenging to achieve a dynamic balance and collaborative optimization of information from both modalities in complex scenarios. This asymmetric fusion approach makes it difficult for the system to simultaneously preserve sensitivity to thermal radiation targets while maintaining the ability to resolve texture details under extreme lighting conditions. To address this, the paper proposes an infrared and visible light fusion model that incorporates a gradient-pixel joint constraint. Our approach eliminates the complexity and uncertainty associated with manual feature extraction, while effectively leveraging shallow features through multiple shortcut connections. Within the framework of Generative Adversarial Networks, we design a gradient-pixel joint loss function that strikes a balance between preserving significant targets in the infrared image and maintaining the texture structure in the visible light image, thereby enhancing image detail and retaining high-contrast information. To thoroughly evaluate the performance of the proposed method, we conducted systematic experiments using the TNO and RoadScene benchmark datasets, comparing it with eleven state-of-the-art fusion algorithms. The experimental results demonstrate that the proposed method offers significant advantages in both subjective visual quality and objective evaluation metrics. In terms of qualitative evaluation, the fusion results not only preserve natural lighting transitions but, more importantly, accentuate thermal radiation targets in the infrared image while fully retaining the texture details of the visible light image. Quantitative analysis reveals that the proposed method significantly improves metrics such as Mutual Information (MI) and Spatial Frequency (SF). This provides new insights in the field of multimodal image fusion and contributes to balancing the complementary advantages of different modality features.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105902"},"PeriodicalIF":3.0,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145950131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Logarithmic-sum function constrained set-membership FxNLMS algorithm for active noise control 有源噪声控制的对数和函数约束集隶属度FxNLMS算法
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-09 DOI: 10.1016/j.dsp.2026.105905
Weigang Chen, Zhiyong Chen
In the field of active noise control (ANC), the traditional filtered-x normalized least mean square (FxNLMS) algorithm does not utilize the sparsity of the adaptive filter's weight vector, resulting in poor noise reduction performance. Additionally, when the reverberation time is long, the FxNLMS algorithm suffers from excessive computational load. To address the above two shortcomings of the FxNLMS algorithm, this paper proposes a logarithmic-sum function constrained set-membership FxNLMS (LSF-SM-FxNLMS) algorithm, which introduces a constraint and a logarithmic-sum function penalty to the cost function of the FxNLMS algorithm to reduce the computational load and utilize the sparsity of the adaptive filter's weight vector. A hardware-in-the-loop test bench was constructed to measure the actual primary and secondary paths. In this paper, the proposed algorithm is described and derived in detail, and its performance is analyzed through computer simulations based on the actual primary and secondary paths. Simulation results show that the proposed algorithm outperforms the traditional algorithms in terms of the noise reduction.
在主动噪声控制(ANC)领域,传统的滤波-x归一化最小均方(FxNLMS)算法没有利用自适应滤波器权向量的稀疏性,导致降噪效果较差。此外,当混响时间较长时,FxNLMS算法的计算量过大。针对FxNLMS算法的上述两个缺点,本文提出了一种对数和函数约束集隶属度FxNLMS (LSF-SM-FxNLMS)算法,该算法在FxNLMS算法的代价函数上引入约束和对数和函数惩罚,以减少计算量并利用自适应滤波器权向量的稀疏性。搭建了硬件在环试验台,对实际主、次路径进行了测量。本文对所提出的算法进行了详细的描述和推导,并基于实际主从路径进行了计算机仿真,分析了算法的性能。仿真结果表明,该算法在降噪方面优于传统算法。
{"title":"Logarithmic-sum function constrained set-membership FxNLMS algorithm for active noise control","authors":"Weigang Chen,&nbsp;Zhiyong Chen","doi":"10.1016/j.dsp.2026.105905","DOIUrl":"10.1016/j.dsp.2026.105905","url":null,"abstract":"<div><div>In the field of active noise control (ANC), the traditional filtered-x normalized least mean square (FxNLMS) algorithm does not utilize the sparsity of the adaptive filter's weight vector, resulting in poor noise reduction performance. Additionally, when the reverberation time is long, the FxNLMS algorithm suffers from excessive computational load. To address the above two shortcomings of the FxNLMS algorithm, this paper proposes a logarithmic-sum function constrained set-membership FxNLMS (LSF-SM-FxNLMS) algorithm, which introduces a constraint and a logarithmic-sum function penalty to the cost function of the FxNLMS algorithm to reduce the computational load and utilize the sparsity of the adaptive filter's weight vector. A hardware-in-the-loop test bench was constructed to measure the actual primary and secondary paths. In this paper, the proposed algorithm is described and derived in detail, and its performance is analyzed through computer simulations based on the actual primary and secondary paths. Simulation results show that the proposed algorithm outperforms the traditional algorithms in terms of the noise reduction.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105905"},"PeriodicalIF":3.0,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outage probability and ergodic capacity of RIS-assisted RSMA communication system ris辅助RSMA通信系统的中断概率和遍历容量
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-08 DOI: 10.1016/j.dsp.2026.105900
Nguyen Hong Kiem , Bui Anh Duc , Nguyen Tuan Minh , Le T.T. Huyen , Tran Manh Hoang
This paper investigates outage probability (OP) and ergodic capacity (EC) of a reconfigurable intelligent surface (RIS) assisted two-user rate-splitting multiple access (RSMA) communication system. Closed-form expressions for OP and EC are derived over Rayleigh fading channels, and validated through extensive Monte Carlo simulations. A comprehensive performance comparison is conducted between the proposed RIS-assisted RSMA scheme and two benchmark systems: RIS-assisted non-orthogonal multiple access (NOMA) and relay-assisted RSMA. Simulation results demonstrate that the proposed scheme significantly outperforms both benchmarks in terms of OP and EC, regardless of fading conditions. The influence of the critical system parameters, including the number of RIS reflecting elements, transmit power, power allocation factors, and the required rate of the common stream, is thoroughly examined. The results reveal that optimal power allocation between streams is essential for minimizing OP. These findings confirm that integrating RSMA with RIS provides a robust and efficient solution for enhancing communication reliability and spectral efficiency in future 6G wireless networks, especially in challenging non-line-of-sight environments.
研究了可重构智能表面(RIS)辅助的双用户分速多址(RSMA)通信系统的中断概率(OP)和遍历容量(EC)。在瑞利衰落信道上推导了OP和EC的封闭表达式,并通过大量的蒙特卡罗模拟进行了验证。将本文提出的ris辅助RSMA方案与ris辅助非正交多址(NOMA)和中继辅助RSMA两种基准系统进行了性能比较。仿真结果表明,无论在何种衰落条件下,该方案在OP和EC方面都明显优于两个基准。对关键系统参数的影响,包括RIS反射元件的数量、发射功率、功率分配因素和公共流所需的速率,进行了全面的研究。结果表明,流之间的最佳功率分配对于最小化op至关重要。这些研究结果证实,将RSMA与RIS集成为提高未来6G无线网络的通信可靠性和频谱效率提供了一个强大而有效的解决方案,特别是在具有挑战性的非视距环境中。
{"title":"Outage probability and ergodic capacity of RIS-assisted RSMA communication system","authors":"Nguyen Hong Kiem ,&nbsp;Bui Anh Duc ,&nbsp;Nguyen Tuan Minh ,&nbsp;Le T.T. Huyen ,&nbsp;Tran Manh Hoang","doi":"10.1016/j.dsp.2026.105900","DOIUrl":"10.1016/j.dsp.2026.105900","url":null,"abstract":"<div><div>This paper investigates outage probability (OP) and ergodic capacity (EC) of a reconfigurable intelligent surface (RIS) assisted two-user rate-splitting multiple access (RSMA) communication system. Closed-form expressions for OP and EC are derived over Rayleigh fading channels, and validated through extensive Monte Carlo simulations. A comprehensive performance comparison is conducted between the proposed RIS-assisted RSMA scheme and two benchmark systems: RIS-assisted non-orthogonal multiple access (NOMA) and relay-assisted RSMA. Simulation results demonstrate that the proposed scheme significantly outperforms both benchmarks in terms of OP and EC, regardless of fading conditions. The influence of the critical system parameters, including the number of RIS reflecting elements, transmit power, power allocation factors, and the required rate of the common stream, is thoroughly examined. The results reveal that optimal power allocation between streams is essential for minimizing OP. These findings confirm that integrating RSMA with RIS provides a robust and efficient solution for enhancing communication reliability and spectral efficiency in future 6G wireless networks, especially in challenging non-line-of-sight environments.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"172 ","pages":"Article 105900"},"PeriodicalIF":3.0,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on infrared small target detection technology based on DCS-YOLO algorithm 基于DCS-YOLO算法的红外小目标检测技术研究
IF 3 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-01-08 DOI: 10.1016/j.dsp.2026.105898
Meng Yin , Binghe Sun , Rugang Wang , Yuanyuan Wang , Feng Zhou , Xuesheng Bian
To address the challenges of weak features, susceptibility to complex background interference in infrared small targets, and the high computational cost of existing specialized detection models, this paper proposes the Dual-Domain Fusion and Class-Aware Self-supervised YOLO (DCS-YOLO). This framework leverages dual-domain feature fusion and class-aware self-supervised learning for semantic enhancement. During feature extraction, a Class-aware Self-supervised Semantic Fusion Module (CSSFM) utilizes a class-aware self-supervised architecture as a deep semantic guide for generating discriminative semantic features, thereby enhancing the perception of faint target characteristics. Additionally, a Dual-domain Aware Enhancement Module (A2C2f_DDA) is designed, which analyzes the high-frequency components of small targets and employs a spatial-frequency domain feature complementary fusion strategy to sharpen feature capture while suppressing background clutter. For feature upsampling and fusion, a Multi-dimensional Selective Feature Pyramid Network (MSFPN) employs a frequency-domain, spatial, and channel three-dimensional cooperative selection mechanism, integrated with deep semantic information, to enhance feature integration across dimensions and improve detection performance in complex scenes. Furthermore, lightweight components including GSConv, VoVGSCSP, and LSCD-Detect are incorporated to reduce computational complexity and model parameters. Comprehensive evaluations on the IRSTD-1K, RealScene-ISTD, and SIRST-v2 datasets demonstrate the effectiveness of the proposed algorithm, achieving [email protected] scores of 80.7%, 90.2%, and 93.3%, respectively. The results indicate that the algorithm effectively utilizes frequency-domain analysis and semantic enhancement, providing a powerful and efficient solution for infrared small target detection in complex scenarios while maintaining a favorable balance between accuracy and computational cost.
针对红外小目标特征弱、易受复杂背景干扰以及现有专业检测模型计算成本高等问题,提出了双域融合类感知自监督YOLO (DCS-YOLO)算法。该框架利用双域特征融合和类感知自监督学习进行语义增强。在特征提取过程中,类感知自监督语义融合模块(Class-aware Self-supervised Semantic Fusion Module, CSSFM)利用类感知自监督架构作为深层语义向导生成判别性语义特征,从而增强对模糊目标特征的感知。此外,设计了双域感知增强模块(A2C2f_DDA),分析小目标的高频成分,采用空频域特征互补融合策略,在抑制背景杂波的同时锐化特征捕获。在特征上采样和融合方面,多维选择特征金字塔网络(MSFPN)采用频域、空间和信道三维协同选择机制,结合深度语义信息,增强了特征跨维度的融合,提高了复杂场景下的检测性能。此外,还结合了GSConv、VoVGSCSP和LSCD-Detect等轻量级组件,以降低计算复杂度和模型参数。对IRSTD-1K、RealScene-ISTD和SIRST-v2数据集的综合评估表明了该算法的有效性,[email protected]得分分别为80.7%、90.2%和93.3%。结果表明,该算法有效地利用频域分析和语义增强,在保持精度和计算成本的良好平衡的同时,为复杂场景下的红外小目标检测提供了强大而高效的解决方案。
{"title":"Research on infrared small target detection technology based on DCS-YOLO algorithm","authors":"Meng Yin ,&nbsp;Binghe Sun ,&nbsp;Rugang Wang ,&nbsp;Yuanyuan Wang ,&nbsp;Feng Zhou ,&nbsp;Xuesheng Bian","doi":"10.1016/j.dsp.2026.105898","DOIUrl":"10.1016/j.dsp.2026.105898","url":null,"abstract":"<div><div>To address the challenges of weak features, susceptibility to complex background interference in infrared small targets, and the high computational cost of existing specialized detection models, this paper proposes the Dual-Domain Fusion and Class-Aware Self-supervised YOLO (DCS-YOLO). This framework leverages dual-domain feature fusion and class-aware self-supervised learning for semantic enhancement. During feature extraction, a Class-aware Self-supervised Semantic Fusion Module (CSSFM) utilizes a class-aware self-supervised architecture as a deep semantic guide for generating discriminative semantic features, thereby enhancing the perception of faint target characteristics. Additionally, a Dual-domain Aware Enhancement Module (A2C2f_DDA) is designed, which analyzes the high-frequency components of small targets and employs a spatial-frequency domain feature complementary fusion strategy to sharpen feature capture while suppressing background clutter. For feature upsampling and fusion, a Multi-dimensional Selective Feature Pyramid Network (MSFPN) employs a frequency-domain, spatial, and channel three-dimensional cooperative selection mechanism, integrated with deep semantic information, to enhance feature integration across dimensions and improve detection performance in complex scenes. Furthermore, lightweight components including GSConv, VoVGSCSP, and LSCD-Detect are incorporated to reduce computational complexity and model parameters. Comprehensive evaluations on the IRSTD-1K, RealScene-ISTD, and SIRST-v2 datasets demonstrate the effectiveness of the proposed algorithm, achieving [email protected] scores of 80.7%, 90.2%, and 93.3%, respectively. The results indicate that the algorithm effectively utilizes frequency-domain analysis and semantic enhancement, providing a powerful and efficient solution for infrared small target detection in complex scenarios while maintaining a favorable balance between accuracy and computational cost.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"173 ","pages":"Article 105898"},"PeriodicalIF":3.0,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1