Pub Date : 2026-01-13DOI: 10.1109/LSP.2026.3653693
Fei Peng;Zhanhong Liu;Min Long
To address the limitations of current 3D mesh watermarking in robustness and imperceptibility, this paper proposes a deep watermarking based on a geometric-weighted aggregation mechanism. The message encoder and decoder networks are first improved to enable the effective embedding of 16-bit binary watermark information. An attack simulation module is then introduced to enhance the decoder’s robustness against various distortions. Additionally, an adversarial discriminator is incorporated to guide the encoder in optimizing the embedding strategy, thereby minimizing geometric distortion. Furthermore, a cross-resolution strategy is developed to enable training on low-resolution meshes and perform watermark embedding and extraction on high-resolution meshes. Experimental results demonstrate that it outperforms the existing mainstream approaches in terms of extraction accuracy, geometric fidelity, and imperceptibility.
{"title":"Robust Watermarking for 3D Mesh Models Based on Geometrically Weighted Aggregation","authors":"Fei Peng;Zhanhong Liu;Min Long","doi":"10.1109/LSP.2026.3653693","DOIUrl":"https://doi.org/10.1109/LSP.2026.3653693","url":null,"abstract":"To address the limitations of current 3D mesh watermarking in robustness and imperceptibility, this paper proposes a deep watermarking based on a geometric-weighted aggregation mechanism. The message encoder and decoder networks are first improved to enable the effective embedding of 16-bit binary watermark information. An attack simulation module is then introduced to enhance the decoder’s robustness against various distortions. Additionally, an adversarial discriminator is incorporated to guide the encoder in optimizing the embedding strategy, thereby minimizing geometric distortion. Furthermore, a cross-resolution strategy is developed to enable training on low-resolution meshes and perform watermark embedding and extraction on high-resolution meshes. Experimental results demonstrate that it outperforms the existing mainstream approaches in terms of extraction accuracy, geometric fidelity, and imperceptibility.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"648-652"},"PeriodicalIF":3.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/LSP.2026.3653616
Yufeng Li;Chao Song;Chuanlong Xie
Object detection is a core task in computer vision, yet its performance is severely degraded in low-light environments, where foreground objects blend into the background, feature contrast is reduced, and object boundaries become blurred, ultimately impairing detection accuracy. To address this problem, we propose FFE-DETR, an end-to-end detection framework specifically designed for low-light scenes. The model incorporates a Frequency-Aware Feature Enhancer that applies Laplacian pyramid decomposition to separate low-frequency and high-frequency components. The low-frequency features are globally modeled to enhance foreground saliency and emphasize object boundaries, and the enhanced representation subsequently guides high-frequency detail restoration and noise suppression, yielding clearer and more discriminative features. In addition, a Multi-Scale Adaptive Feature Fusion module is introduced to efficiently integrate shallow texture information with deep semantic cues, enhancing the feature representation capability across different scales. Experimental results on widely used low-light benchmarks demonstrate that FFE-DETR consistently outperforms state-of-the-art methods and achieves significantly superior detection accuracy, highlighting its effectiveness and robustness.
{"title":"FFE-DETR: Frequency-Aware Feature Enhancement for Object Detection in Low-Light Scenarios","authors":"Yufeng Li;Chao Song;Chuanlong Xie","doi":"10.1109/LSP.2026.3653616","DOIUrl":"https://doi.org/10.1109/LSP.2026.3653616","url":null,"abstract":"Object detection is a core task in computer vision, yet its performance is severely degraded in low-light environments, where foreground objects blend into the background, feature contrast is reduced, and object boundaries become blurred, ultimately impairing detection accuracy. To address this problem, we propose FFE-DETR, an end-to-end detection framework specifically designed for low-light scenes. The model incorporates a Frequency-Aware Feature Enhancer that applies Laplacian pyramid decomposition to separate low-frequency and high-frequency components. The low-frequency features are globally modeled to enhance foreground saliency and emphasize object boundaries, and the enhanced representation subsequently guides high-frequency detail restoration and noise suppression, yielding clearer and more discriminative features. In addition, a Multi-Scale Adaptive Feature Fusion module is introduced to efficiently integrate shallow texture information with deep semantic cues, enhancing the feature representation capability across different scales. Experimental results on widely used low-light benchmarks demonstrate that FFE-DETR consistently outperforms state-of-the-art methods and achieves significantly superior detection accuracy, highlighting its effectiveness and robustness.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"678-682"},"PeriodicalIF":3.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frequency estimation plays a vital role in various research fields, such as Doppler compensation in wireless communication. Traditional DFT-based methods for frequency estimation often suffer from reduced performance under low-SNR conditions. In order to overcome this limitation, we present a novel non-iterative estimation approach that employs a bounded mapping strategy. By concentrating on the real part of the spectrum and constraining the frequency correction within a defined range, our method effectively mitigates inaccuracies caused by noise. Our proposed algorithm for frequency estimation achieves accuracy comparable to iterative methods while significantly reducing computational complexity. Through simulations and experiments, we illustrate that our approach enhances estimation accuracy at lower SNR levels with a limited number of samples compared to existing techniques.
{"title":"Bounded Mapping Frequency Estimation Algorithm for Low SNR Environments","authors":"Qingke Ma;Jiale Wang;Jie Lian;Xinyi Li;Benben Li;Qi Wang;Guolei Zhu","doi":"10.1109/LSP.2026.3653690","DOIUrl":"https://doi.org/10.1109/LSP.2026.3653690","url":null,"abstract":"Frequency estimation plays a vital role in various research fields, such as Doppler compensation in wireless communication. Traditional DFT-based methods for frequency estimation often suffer from reduced performance under low-SNR conditions. In order to overcome this limitation, we present a novel non-iterative estimation approach that employs a bounded mapping strategy. By concentrating on the real part of the spectrum and constraining the frequency correction within a defined range, our method effectively mitigates inaccuracies caused by noise. Our proposed algorithm for frequency estimation achieves accuracy comparable to iterative methods while significantly reducing computational complexity. Through simulations and experiments, we illustrate that our approach enhances estimation accuracy at lower SNR levels with a limited number of samples compared to existing techniques.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"629-633"},"PeriodicalIF":3.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frequency-modulated continuous-wave radar is a cornerstone of advanced driver assistance systems thanks to its low cost and resilience to adverse weather. Yet the absence of explicit semantics makes radar annotation difficult, and the scarcity of large-scale labeled data limits the performance of radar perception models. To address this issue, we propose a self-supervised framework for object detection directly from Range–Azimuth–Doppler (RAD) cubes that learns transferable representations from unlabeled radar data. Specifically, we introduce cross-view contrastive learning to model correspondences among complementary views of the RAD cube, encouraging the network to capture spatial structure from multiple perspectives. In addition, an auxiliary cross-modal contrastive objective distills semantic knowledge from vision into radar. The joint objective integrates cross-view and cross-modal signals to strengthen radar feature representations. We further extend the framework to cross-domain pretraining using datasets from different sources. Experimental results demonstrate that the proposed method significantly improves radar object detection performance, especially with limited labeled data.
{"title":"Cross-View and Cross-Modal Contrastive Learning for Radar Object Detection","authors":"Qiaolong Qian;Yi Shi;Ruichao Hou;Haoyu Qin;Gangshan Wu","doi":"10.1109/LSP.2026.3653684","DOIUrl":"https://doi.org/10.1109/LSP.2026.3653684","url":null,"abstract":"Frequency-modulated continuous-wave radar is a cornerstone of advanced driver assistance systems thanks to its low cost and resilience to adverse weather. Yet the absence of explicit semantics makes radar annotation difficult, and the scarcity of large-scale labeled data limits the performance of radar perception models. To address this issue, we propose a self-supervised framework for object detection directly from Range–Azimuth–Doppler (RAD) cubes that learns transferable representations from unlabeled radar data. Specifically, we introduce cross-view contrastive learning to model correspondences among complementary views of the RAD cube, encouraging the network to capture spatial structure from multiple perspectives. In addition, an auxiliary cross-modal contrastive objective distills semantic knowledge from vision into radar. The joint objective integrates cross-view and cross-modal signals to strengthen radar feature representations. We further extend the framework to cross-domain pretraining using datasets from different sources. Experimental results demonstrate that the proposed method significantly improves radar object detection performance, especially with limited labeled data.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"594-598"},"PeriodicalIF":3.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/LSP.2026.3653694
Tanveer Alam Khan;Somanath Pradhan
The effectiveness of conventional active noise control (ANC) system deteriorates significantly when operating against impulsive noise environments. Over the past few years, the hyperbolic family of adaptive filtering algorithms have been extensively applied for suppressing impulsive noise. This work introduces a new exponential hyperbolic secant adaptive filter for active control operation, which is well suited for impulsive noise scenarios. Additionally, the stability condition in relation to the learning rate, steady-state analysis along with the computational complexity are also studied. Simulation outcomes based on measured acoustic paths demonstrate the efficiency of the proposed algorithm under strong and dynamic impulsive environment.
{"title":"Robust Exponential Hyperbolic Secant Algorithm for Active Control Against Impulsive Noise Environments","authors":"Tanveer Alam Khan;Somanath Pradhan","doi":"10.1109/LSP.2026.3653694","DOIUrl":"https://doi.org/10.1109/LSP.2026.3653694","url":null,"abstract":"The effectiveness of conventional active noise control (ANC) system deteriorates significantly when operating against impulsive noise environments. Over the past few years, the hyperbolic family of adaptive filtering algorithms have been extensively applied for suppressing impulsive noise. This work introduces a new exponential hyperbolic secant adaptive filter for active control operation, which is well suited for impulsive noise scenarios. Additionally, the stability condition in relation to the learning rate, steady-state analysis along with the computational complexity are also studied. Simulation outcomes based on measured acoustic paths demonstrate the efficiency of the proposed algorithm under strong and dynamic impulsive environment.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"663-667"},"PeriodicalIF":3.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652378
Rui Yan;Yao Zhao;Shaowei Weng;Lifang Yu
Recently, researchers have shifted focus to reversible data hiding (RDH) schemes for JPEG images. The reinforcement learning (RL) is a solution for RDH to automatically acquire the optimal two-dimensional (2D) mapping for 2D histograms of non-zero quantized alternating current coefficients. However, merely utilizing the payload-distortion reward mechanism (PDRM) in RL cannot inject the payload guidance to the 2D mapping generation process. To tackle this issue, we propose a payload supplementary reward mechanism (PSRM) and incorporate PDRM and PSRM into RL to construct DR-2DNet, a dual-reward guided 2D mapping generation network with considering additional payload guidance. DR-2DNet generates two candidate 2D mappings, one with low distortion generated by merely utilizing PDRM and the other with low distortion and high payload obtained by jointly using PDRM and PSRM. Finally, according to the required payload, the one with the lower distortion selected from two acquired 2D mappings is used for achieving data embedding. To priorly select the frequency bands with low costs for data embedding, a frequency selection strategy combining the smoothness and embedding performance of the frequency band is designed to evaluate the cost of each frequency band, reducing image distortion and preserving the file size. Extensive experiments are conducted on the Kodak dataset and 100 images randomly chosen from the BOSSBase dataset, and the results demonstrate that the proposed method is superior to several related state-of-the-art RDH schemes for JPEG images.
{"title":"A Dual-Reward Guided 2D Mapping Generation Network for JPEG Reversible Data Hiding","authors":"Rui Yan;Yao Zhao;Shaowei Weng;Lifang Yu","doi":"10.1109/LSP.2026.3652378","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652378","url":null,"abstract":"Recently, researchers have shifted focus to reversible data hiding (RDH) schemes for JPEG images. The reinforcement learning (RL) is a solution for RDH to automatically acquire the optimal two-dimensional (2D) mapping for 2D histograms of non-zero quantized alternating current coefficients. However, merely utilizing the payload-distortion reward mechanism (PDRM) in RL cannot inject the payload guidance to the 2D mapping generation process. To tackle this issue, we propose a payload supplementary reward mechanism (PSRM) and incorporate PDRM and PSRM into RL to construct DR-2DNet, a dual-reward guided 2D mapping generation network with considering additional payload guidance. DR-2DNet generates two candidate 2D mappings, one with low distortion generated by merely utilizing PDRM and the other with low distortion and high payload obtained by jointly using PDRM and PSRM. Finally, according to the required payload, the one with the lower distortion selected from two acquired 2D mappings is used for achieving data embedding. To priorly select the frequency bands with low costs for data embedding, a frequency selection strategy combining the smoothness and embedding performance of the frequency band is designed to evaluate the cost of each frequency band, reducing image distortion and preserving the file size. Extensive experiments are conducted on the Kodak dataset and 100 images randomly chosen from the BOSSBase dataset, and the results demonstrate that the proposed method is superior to several related state-of-the-art RDH schemes for JPEG images.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"526-530"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter investigates the problem of direction-of-departure (DOD) and direction-of-arrival (DOA) estimation for non-line-of-sight (NLOS) targets in bistatic multiple-input multiple-output (MIMO) radar systems assisted by an intelligent reflecting surface (IRS). To tackle this issue, we propose a covariance tensor subspace-based algorithm. First, the received data is modeled within a tensor framework to preserve their inherent multi-dimensional spatiotemporal structure. Then, a fourth-order covariance tensor is constructed by computing correlations along the temporal dimension. Using the higher-order singular value decomposition (HOSVD), the signal subspace matrix is derived from this covariance tensor. The receive steering matrix is accurately reconstructed by exploiting the property of the Khatri–Rao product for full-column-rank matrices. Based on the estimated signal subspace and the reconstructed steering matrix, DOD and DOA estimation is efficiently performed via the rotational invariance technique combined with a one-dimensional correlation-based method, which provides automatic parameter pairing. Simulation results validate the superiority and effectiveness of the proposed algorithm in estimating angles.
{"title":"Covariance Tensor Decomposition for NLOS Direction Finding in RIS-Aided Bistatic MIMO Radar","authors":"Qian-Peng Xie;Xiao-Peng Li;Ji-Yuan Chen;Ming-Xing Fang","doi":"10.1109/LSP.2026.3652124","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652124","url":null,"abstract":"This letter investigates the problem of direction-of-departure (DOD) and direction-of-arrival (DOA) estimation for non-line-of-sight (NLOS) targets in bistatic multiple-input multiple-output (MIMO) radar systems assisted by an intelligent reflecting surface (IRS). To tackle this issue, we propose a covariance tensor subspace-based algorithm. First, the received data is modeled within a tensor framework to preserve their inherent multi-dimensional spatiotemporal structure. Then, a fourth-order covariance tensor is constructed by computing correlations along the temporal dimension. Using the higher-order singular value decomposition (HOSVD), the signal subspace matrix is derived from this covariance tensor. The receive steering matrix is accurately reconstructed by exploiting the property of the Khatri–Rao product for full-column-rank matrices. Based on the estimated signal subspace and the reconstructed steering matrix, DOD and DOA estimation is efficiently performed via the rotational invariance technique combined with a one-dimensional correlation-based method, which provides automatic parameter pairing. Simulation results validate the superiority and effectiveness of the proposed algorithm in estimating angles.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"574-578"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652975
Yunjia Zhang;Zhixiong Hu;Mei Li
Accurate blood pressure (BP) monitoring using photoplethysmography (PPG) remains challenging due to noise, individual variability, and nonlinear signal dynamics. In this letter, we present FBPP-Net, a lightweight temporal-mixing framework that integrates sparse and shared Mixture-of-Experts (MoE) modules with quantile regression. The model enables specialized subnetworks to capture diverse temporal dependencies while providing robust probabilistic estimation of systolic and diastolic BP. Without requiring GPU acceleration, FBPP-Net achieves efficient training and inference, making it suitable for real-time wearable applications. Experiments on the UQVS dataset show that FBPP-Net-MoE attains SBP/DBP errors of 2.83/3.54 mmHg, and FBPP-Net-TM achieves 3.08/3.18 mmHg, outperforming XGBoost, LSTM, MLP, Informer, and TSMixer baselines. Furthermore, the analysis of expert activations and temporal segments provides interpretable insights into the localized dynamics driving near-future BP variations, supporting intelligent and explainable physiological monitoring.
{"title":"Modeling Localized PPG for Blood Pressure Forecasting With MoE and Quantile Regression","authors":"Yunjia Zhang;Zhixiong Hu;Mei Li","doi":"10.1109/LSP.2026.3652975","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652975","url":null,"abstract":"Accurate blood pressure (BP) monitoring using photoplethysmography (PPG) remains challenging due to noise, individual variability, and nonlinear signal dynamics. In this letter, we present FBPP-Net, a lightweight temporal-mixing framework that integrates sparse and shared Mixture-of-Experts (MoE) modules with quantile regression. The model enables specialized subnetworks to capture diverse temporal dependencies while providing robust probabilistic estimation of systolic and diastolic BP. Without requiring GPU acceleration, FBPP-Net achieves efficient training and inference, making it suitable for real-time wearable applications. Experiments on the UQVS dataset show that FBPP-Net-MoE attains SBP/DBP errors of 2.83/3.54 mmHg, and FBPP-Net-TM achieves 3.08/3.18 mmHg, outperforming XGBoost, LSTM, MLP, Informer, and TSMixer baselines. Furthermore, the analysis of expert activations and temporal segments provides interpretable insights into the localized dynamics driving near-future BP variations, supporting intelligent and explainable physiological monitoring.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"531-535"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652120
Song-Kyoo Kim
Customized dynamic filter augmentation (CDFA) presents a novel data augmentation technique for time-series forecasting, adapting convolutional principles from signal processing to emphasize historical patterns through localized correlations and amplitude adjustments. Built upon convolutional filters, local correlations between paired random variables, and statistical forecasting functions from compact data learning, CDFA generates plausible subsequences while preserving original data characteristics. Empirical evaluations on real-world datasets, including stock prices for Apple, Google, AMD, and oil, demonstrate superior root mean square error (RMSE) reductions, with CDFA achieving 81% to 82% improvements over baselines like statistical forecasting from CDL and customized convolutional filters. This approach enhances model efficiency for large-scale sequences, outperforming traditional linear models in capturing shared patterns across diverse applications.
{"title":"Customized Dynamic Filter Augmentation","authors":"Song-Kyoo Kim","doi":"10.1109/LSP.2026.3652120","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652120","url":null,"abstract":"Customized dynamic filter augmentation (CDFA) presents a novel data augmentation technique for time-series forecasting, adapting convolutional principles from signal processing to emphasize historical patterns through localized correlations and amplitude adjustments. Built upon convolutional filters, local correlations between paired random variables, and statistical forecasting functions from compact data learning, CDFA generates plausible subsequences while preserving original data characteristics. Empirical evaluations on real-world datasets, including stock prices for Apple, Google, AMD, and oil, demonstrate superior root mean square error (RMSE) reductions, with CDFA achieving 81% to 82% improvements over baselines like statistical forecasting from CDL and customized convolutional filters. This approach enhances model efficiency for large-scale sequences, outperforming traditional linear models in capturing shared patterns across diverse applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"639-642"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652953
Ping Cao;Chunjie Zhang;Xiaolong Zheng;Yao Zhao
Human-Object Interaction (HOI) detection requires not only recognizing what the interaction is but also understanding where it occurs. Although recent methods have achieved remarkable progress, they often lack effective joint modeling of spatial and semantic information, which is essential for accurate reasoning in complex scenes. In this paper, we propose a Semantic-Spatial Guided Reasoning (SSGR) framework that performs interaction reasoning by jointly modeling global semantic cues and fine-grained spatial priors. Specifically, SSGR constructs pair-specific spatial layouts to encode detailed spatial relationships and introduces a global semantic decoder to learn category-aware semantic representations. A semantic-spatial guided reasoning module further adaptively fuses these complementary cues, enabling unified reasoning and more discriminative interaction understanding. Extensive experiments on HICO-DET and V-COCO demonstrate that SSGR consistently outperforms prior methods under both standard and zero-shot settings, validating the effectiveness of our semantic-spatial reasoning paradigm.
{"title":"Semantic-Spatial Guided Reasoning for Human-Object Interaction Detection","authors":"Ping Cao;Chunjie Zhang;Xiaolong Zheng;Yao Zhao","doi":"10.1109/LSP.2026.3652953","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652953","url":null,"abstract":"Human-Object Interaction (HOI) detection requires not only recognizing <italic>what</i> the interaction is but also understanding <italic>where</i> it occurs. Although recent methods have achieved remarkable progress, they often lack effective joint modeling of spatial and semantic information, which is essential for accurate reasoning in complex scenes. In this paper, we propose a Semantic-Spatial Guided Reasoning (SSGR) framework that performs interaction reasoning by jointly modeling global semantic cues and fine-grained spatial priors. Specifically, SSGR constructs pair-specific spatial layouts to encode detailed spatial relationships and introduces a global semantic decoder to learn category-aware semantic representations. A semantic-spatial guided reasoning module further adaptively fuses these complementary cues, enabling unified reasoning and more discriminative interaction understanding. Extensive experiments on HICO-DET and V-COCO demonstrate that SSGR consistently outperforms prior methods under both standard and zero-shot settings, validating the effectiveness of our semantic-spatial reasoning paradigm.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"551-555"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}