Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652378
Rui Yan;Yao Zhao;Shaowei Weng;Lifang Yu
Recently, researchers have shifted focus to reversible data hiding (RDH) schemes for JPEG images. The reinforcement learning (RL) is a solution for RDH to automatically acquire the optimal two-dimensional (2D) mapping for 2D histograms of non-zero quantized alternating current coefficients. However, merely utilizing the payload-distortion reward mechanism (PDRM) in RL cannot inject the payload guidance to the 2D mapping generation process. To tackle this issue, we propose a payload supplementary reward mechanism (PSRM) and incorporate PDRM and PSRM into RL to construct DR-2DNet, a dual-reward guided 2D mapping generation network with considering additional payload guidance. DR-2DNet generates two candidate 2D mappings, one with low distortion generated by merely utilizing PDRM and the other with low distortion and high payload obtained by jointly using PDRM and PSRM. Finally, according to the required payload, the one with the lower distortion selected from two acquired 2D mappings is used for achieving data embedding. To priorly select the frequency bands with low costs for data embedding, a frequency selection strategy combining the smoothness and embedding performance of the frequency band is designed to evaluate the cost of each frequency band, reducing image distortion and preserving the file size. Extensive experiments are conducted on the Kodak dataset and 100 images randomly chosen from the BOSSBase dataset, and the results demonstrate that the proposed method is superior to several related state-of-the-art RDH schemes for JPEG images.
{"title":"A Dual-Reward Guided 2D Mapping Generation Network for JPEG Reversible Data Hiding","authors":"Rui Yan;Yao Zhao;Shaowei Weng;Lifang Yu","doi":"10.1109/LSP.2026.3652378","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652378","url":null,"abstract":"Recently, researchers have shifted focus to reversible data hiding (RDH) schemes for JPEG images. The reinforcement learning (RL) is a solution for RDH to automatically acquire the optimal two-dimensional (2D) mapping for 2D histograms of non-zero quantized alternating current coefficients. However, merely utilizing the payload-distortion reward mechanism (PDRM) in RL cannot inject the payload guidance to the 2D mapping generation process. To tackle this issue, we propose a payload supplementary reward mechanism (PSRM) and incorporate PDRM and PSRM into RL to construct DR-2DNet, a dual-reward guided 2D mapping generation network with considering additional payload guidance. DR-2DNet generates two candidate 2D mappings, one with low distortion generated by merely utilizing PDRM and the other with low distortion and high payload obtained by jointly using PDRM and PSRM. Finally, according to the required payload, the one with the lower distortion selected from two acquired 2D mappings is used for achieving data embedding. To priorly select the frequency bands with low costs for data embedding, a frequency selection strategy combining the smoothness and embedding performance of the frequency band is designed to evaluate the cost of each frequency band, reducing image distortion and preserving the file size. Extensive experiments are conducted on the Kodak dataset and 100 images randomly chosen from the BOSSBase dataset, and the results demonstrate that the proposed method is superior to several related state-of-the-art RDH schemes for JPEG images.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"526-530"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter investigates the problem of direction-of-departure (DOD) and direction-of-arrival (DOA) estimation for non-line-of-sight (NLOS) targets in bistatic multiple-input multiple-output (MIMO) radar systems assisted by an intelligent reflecting surface (IRS). To tackle this issue, we propose a covariance tensor subspace-based algorithm. First, the received data is modeled within a tensor framework to preserve their inherent multi-dimensional spatiotemporal structure. Then, a fourth-order covariance tensor is constructed by computing correlations along the temporal dimension. Using the higher-order singular value decomposition (HOSVD), the signal subspace matrix is derived from this covariance tensor. The receive steering matrix is accurately reconstructed by exploiting the property of the Khatri–Rao product for full-column-rank matrices. Based on the estimated signal subspace and the reconstructed steering matrix, DOD and DOA estimation is efficiently performed via the rotational invariance technique combined with a one-dimensional correlation-based method, which provides automatic parameter pairing. Simulation results validate the superiority and effectiveness of the proposed algorithm in estimating angles.
{"title":"Covariance Tensor Decomposition for NLOS Direction Finding in RIS-Aided Bistatic MIMO Radar","authors":"Qian-Peng Xie;Xiao-Peng Li;Ji-Yuan Chen;Ming-Xing Fang","doi":"10.1109/LSP.2026.3652124","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652124","url":null,"abstract":"This letter investigates the problem of direction-of-departure (DOD) and direction-of-arrival (DOA) estimation for non-line-of-sight (NLOS) targets in bistatic multiple-input multiple-output (MIMO) radar systems assisted by an intelligent reflecting surface (IRS). To tackle this issue, we propose a covariance tensor subspace-based algorithm. First, the received data is modeled within a tensor framework to preserve their inherent multi-dimensional spatiotemporal structure. Then, a fourth-order covariance tensor is constructed by computing correlations along the temporal dimension. Using the higher-order singular value decomposition (HOSVD), the signal subspace matrix is derived from this covariance tensor. The receive steering matrix is accurately reconstructed by exploiting the property of the Khatri–Rao product for full-column-rank matrices. Based on the estimated signal subspace and the reconstructed steering matrix, DOD and DOA estimation is efficiently performed via the rotational invariance technique combined with a one-dimensional correlation-based method, which provides automatic parameter pairing. Simulation results validate the superiority and effectiveness of the proposed algorithm in estimating angles.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"574-578"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652975
Yunjia Zhang;Zhixiong Hu;Mei Li
Accurate blood pressure (BP) monitoring using photoplethysmography (PPG) remains challenging due to noise, individual variability, and nonlinear signal dynamics. In this letter, we present FBPP-Net, a lightweight temporal-mixing framework that integrates sparse and shared Mixture-of-Experts (MoE) modules with quantile regression. The model enables specialized subnetworks to capture diverse temporal dependencies while providing robust probabilistic estimation of systolic and diastolic BP. Without requiring GPU acceleration, FBPP-Net achieves efficient training and inference, making it suitable for real-time wearable applications. Experiments on the UQVS dataset show that FBPP-Net-MoE attains SBP/DBP errors of 2.83/3.54 mmHg, and FBPP-Net-TM achieves 3.08/3.18 mmHg, outperforming XGBoost, LSTM, MLP, Informer, and TSMixer baselines. Furthermore, the analysis of expert activations and temporal segments provides interpretable insights into the localized dynamics driving near-future BP variations, supporting intelligent and explainable physiological monitoring.
{"title":"Modeling Localized PPG for Blood Pressure Forecasting With MoE and Quantile Regression","authors":"Yunjia Zhang;Zhixiong Hu;Mei Li","doi":"10.1109/LSP.2026.3652975","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652975","url":null,"abstract":"Accurate blood pressure (BP) monitoring using photoplethysmography (PPG) remains challenging due to noise, individual variability, and nonlinear signal dynamics. In this letter, we present FBPP-Net, a lightweight temporal-mixing framework that integrates sparse and shared Mixture-of-Experts (MoE) modules with quantile regression. The model enables specialized subnetworks to capture diverse temporal dependencies while providing robust probabilistic estimation of systolic and diastolic BP. Without requiring GPU acceleration, FBPP-Net achieves efficient training and inference, making it suitable for real-time wearable applications. Experiments on the UQVS dataset show that FBPP-Net-MoE attains SBP/DBP errors of 2.83/3.54 mmHg, and FBPP-Net-TM achieves 3.08/3.18 mmHg, outperforming XGBoost, LSTM, MLP, Informer, and TSMixer baselines. Furthermore, the analysis of expert activations and temporal segments provides interpretable insights into the localized dynamics driving near-future BP variations, supporting intelligent and explainable physiological monitoring.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"531-535"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652120
Song-Kyoo Kim
Customized dynamic filter augmentation (CDFA) presents a novel data augmentation technique for time-series forecasting, adapting convolutional principles from signal processing to emphasize historical patterns through localized correlations and amplitude adjustments. Built upon convolutional filters, local correlations between paired random variables, and statistical forecasting functions from compact data learning, CDFA generates plausible subsequences while preserving original data characteristics. Empirical evaluations on real-world datasets, including stock prices for Apple, Google, AMD, and oil, demonstrate superior root mean square error (RMSE) reductions, with CDFA achieving 81% to 82% improvements over baselines like statistical forecasting from CDL and customized convolutional filters. This approach enhances model efficiency for large-scale sequences, outperforming traditional linear models in capturing shared patterns across diverse applications.
{"title":"Customized Dynamic Filter Augmentation","authors":"Song-Kyoo Kim","doi":"10.1109/LSP.2026.3652120","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652120","url":null,"abstract":"Customized dynamic filter augmentation (CDFA) presents a novel data augmentation technique for time-series forecasting, adapting convolutional principles from signal processing to emphasize historical patterns through localized correlations and amplitude adjustments. Built upon convolutional filters, local correlations between paired random variables, and statistical forecasting functions from compact data learning, CDFA generates plausible subsequences while preserving original data characteristics. Empirical evaluations on real-world datasets, including stock prices for Apple, Google, AMD, and oil, demonstrate superior root mean square error (RMSE) reductions, with CDFA achieving 81% to 82% improvements over baselines like statistical forecasting from CDL and customized convolutional filters. This approach enhances model efficiency for large-scale sequences, outperforming traditional linear models in capturing shared patterns across diverse applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"639-642"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652953
Ping Cao;Chunjie Zhang;Xiaolong Zheng;Yao Zhao
Human-Object Interaction (HOI) detection requires not only recognizing what the interaction is but also understanding where it occurs. Although recent methods have achieved remarkable progress, they often lack effective joint modeling of spatial and semantic information, which is essential for accurate reasoning in complex scenes. In this paper, we propose a Semantic-Spatial Guided Reasoning (SSGR) framework that performs interaction reasoning by jointly modeling global semantic cues and fine-grained spatial priors. Specifically, SSGR constructs pair-specific spatial layouts to encode detailed spatial relationships and introduces a global semantic decoder to learn category-aware semantic representations. A semantic-spatial guided reasoning module further adaptively fuses these complementary cues, enabling unified reasoning and more discriminative interaction understanding. Extensive experiments on HICO-DET and V-COCO demonstrate that SSGR consistently outperforms prior methods under both standard and zero-shot settings, validating the effectiveness of our semantic-spatial reasoning paradigm.
{"title":"Semantic-Spatial Guided Reasoning for Human-Object Interaction Detection","authors":"Ping Cao;Chunjie Zhang;Xiaolong Zheng;Yao Zhao","doi":"10.1109/LSP.2026.3652953","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652953","url":null,"abstract":"Human-Object Interaction (HOI) detection requires not only recognizing <italic>what</i> the interaction is but also understanding <italic>where</i> it occurs. Although recent methods have achieved remarkable progress, they often lack effective joint modeling of spatial and semantic information, which is essential for accurate reasoning in complex scenes. In this paper, we propose a Semantic-Spatial Guided Reasoning (SSGR) framework that performs interaction reasoning by jointly modeling global semantic cues and fine-grained spatial priors. Specifically, SSGR constructs pair-specific spatial layouts to encode detailed spatial relationships and introduces a global semantic decoder to learn category-aware semantic representations. A semantic-spatial guided reasoning module further adaptively fuses these complementary cues, enabling unified reasoning and more discriminative interaction understanding. Extensive experiments on HICO-DET and V-COCO demonstrate that SSGR consistently outperforms prior methods under both standard and zero-shot settings, validating the effectiveness of our semantic-spatial reasoning paradigm.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"551-555"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3653395
Luan Portella;R. J. Cintra;Aluísio Pinheiro
This letter introduces low-complexity approximations for the Wavelet Energy Correlation Screening (WECS) method, which aims at change detection in multitemporal SAR images. The WECS method relies on the non-decimated discrete wavelet transform (ND-DWT) to compute approximation coefficients employed in a feature screening process based on the Pearson correlation. Although effective, WECS presents a high computational cost due to its repeated wavelet filtering stage. To overcome this drawback, we propose two approximations for the wavelet filter coefficients, obtained by truncating their canonical signed digit (CSD) representation, which significantly reduces the number of arithmetic operations. Numerical experiments using both simulated and real-world datasets demonstrate that the proposed methods not only maintain the performance of the original WECS but also achieve computational gains, even outperforming it in certain scenarios.
{"title":"Low-Complexity Approximations of the WECS Method for SAR Change Detection","authors":"Luan Portella;R. J. Cintra;Aluísio Pinheiro","doi":"10.1109/LSP.2026.3653395","DOIUrl":"https://doi.org/10.1109/LSP.2026.3653395","url":null,"abstract":"This letter introduces low-complexity approximations for the Wavelet Energy Correlation Screening (WECS) method, which aims at change detection in multitemporal SAR images. The WECS method relies on the non-decimated discrete wavelet transform (ND-DWT) to compute approximation coefficients employed in a feature screening process based on the Pearson correlation. Although effective, WECS presents a high computational cost due to its repeated wavelet filtering stage. To overcome this drawback, we propose two approximations for the wavelet filter coefficients, obtained by truncating their canonical signed digit (CSD) representation, which significantly reduces the number of arithmetic operations. Numerical experiments using both simulated and real-world datasets demonstrate that the proposed methods not only maintain the performance of the original WECS but also achieve computational gains, even outperforming it in certain scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"643-647"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652955
Min Li;Zhaofei Hao;Gang Li;Jin Wan;Delong Han;Mingle Zhou
Small Object Detection (SOD) aims to accurately identify and locate small objects in images. However, existing methods usually focus on exploring spatial domain features, neglecting high-frequency features that preserve fine-grained details such as texture and edge information. To overcome this limitation, we propose a High-Frequency Feature-Oriented Network (HFFO-Net). First, we introduce the Channel-wise Frequency Modulation Module (CFMM), which leverages the 2D Discrete Cosine Transform (DCT) to accentuate salient frequency components while mitigating noise interference. Second, we design a High-Frequency Oriented Module (HFOM), which utilizes the Channel Selection Branch (CSB) and Spatial Selection Branch (SSB) to highlight small objects in the channel and spatial region. Third, we introduce a Dual-Query Attention Fusion Mechanism (DQAFM), which reduces the semantic gap between spatial and frequency features and achieves better feature fusion through bidirectional cross-attention. Extensive experiments are implemented, and the corresponding results demonstrate that HFFO-Net excels at detecting small objects.
{"title":"Boosting Small Object Detection via High-Frequency Feature Oriented Network","authors":"Min Li;Zhaofei Hao;Gang Li;Jin Wan;Delong Han;Mingle Zhou","doi":"10.1109/LSP.2026.3652955","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652955","url":null,"abstract":"Small Object Detection (SOD) aims to accurately identify and locate small objects in images. However, existing methods usually focus on exploring spatial domain features, neglecting high-frequency features that preserve fine-grained details such as texture and edge information. To overcome this limitation, we propose a High-Frequency Feature-Oriented Network (HFFO-Net). First, we introduce the Channel-wise Frequency Modulation Module (CFMM), which leverages the 2D Discrete Cosine Transform (DCT) to accentuate salient frequency components while mitigating noise interference. Second, we design a High-Frequency Oriented Module (HFOM), which utilizes the Channel Selection Branch (CSB) and Spatial Selection Branch (SSB) to highlight small objects in the channel and spatial region. Third, we introduce a Dual-Query Attention Fusion Mechanism (DQAFM), which reduces the semantic gap between spatial and frequency features and achieves better feature fusion through bidirectional cross-attention. Extensive experiments are implemented, and the corresponding results demonstrate that HFFO-Net excels at detecting small objects.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"584-588"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652127
Kun Lu;Hongli Zhang;Yuchen Yang;Chao Meng;Binxing Fang
Social bots constitute a substantial fraction of active accounts on digital platforms, fundamentally threatening information authenticity and democratic discourse. Contemporary detection methods confront critical limitations: information imbalance across heterogeneous relations, computational challenges in processing massive neighborhoods, and inadequate multi-scale representation learning. We propose HIER (Heterogeneous Information Bottleneck and Expert Routing), a pioneering framework that integrates variational information theory with mixture-of-experts paradigms for social network analysis. HIER introduces relation-aware variational information bottleneck for optimal compression across relationship types, dynamic sparse expert routing that extends mixture-of-experts to edge-level graph processing, and dual-scale mutual information maximization enhancing representation discriminability through neighborhood consistency and graph-level contrastive learning. Experimental validation demonstrates HIER’s superior performance across real-world datasets, establishing new benchmarks for heterogeneous social bot detection.
{"title":"HIER: Heterogeneous Information Bottleneck and Expert Routing for Social Bot Detection","authors":"Kun Lu;Hongli Zhang;Yuchen Yang;Chao Meng;Binxing Fang","doi":"10.1109/LSP.2026.3652127","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652127","url":null,"abstract":"Social bots constitute a substantial fraction of active accounts on digital platforms, fundamentally threatening information authenticity and democratic discourse. Contemporary detection methods confront critical limitations: information imbalance across heterogeneous relations, computational challenges in processing massive neighborhoods, and inadequate multi-scale representation learning. We propose HIER (Heterogeneous Information Bottleneck and Expert Routing), a pioneering framework that integrates variational information theory with mixture-of-experts paradigms for social network analysis. HIER introduces relation-aware variational information bottleneck for optimal compression across relationship types, dynamic sparse expert routing that extends mixture-of-experts to edge-level graph processing, and dual-scale mutual information maximization enhancing representation discriminability through neighborhood consistency and graph-level contrastive learning. Experimental validation demonstrates HIER’s superior performance across real-world datasets, establishing new benchmarks for heterogeneous social bot detection.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"521-525"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3652954
Kai-Wei Peng
We study sampling of smooth/bandlimited graph signals when (i) sensor noise is heterogeneous across vertices and (ii) the graph used to design the sampler can be mildly mismatched to the true topology.We propose a risk-aware variant of local low-pass importance sampling that scores each vertex via a Hutchinson estimator of the diagonal of a graph heat-kernel operator and reweights the score by the inverse noise variance. The sampler selects without replacement according to these risk-aware scores. Reconstruction is performed with standard decoders (Tikhonov, Bandlimited, and a Chebyshev data-consistent smoother), enabling fair comparisons to prior work. On grid, Erdős–Rényi (ER), and Barabási–Albert (BA) graphs, our approach consistently reduces the normalized root-mean-square error (NRMSE) compared to random sampling; the gain increases with the sampling rate and persists under selection-graph mismatch. The method is simple, eigendecomposition-free, and scales linearly in the number of edges per Hutchinson probe.
{"title":"Risk-Aware Low-Pass Importance Sampling for Graph Signals Under Heterogeneous Noise and Model Mismatch","authors":"Kai-Wei Peng","doi":"10.1109/LSP.2026.3652954","DOIUrl":"https://doi.org/10.1109/LSP.2026.3652954","url":null,"abstract":"We study sampling of smooth/bandlimited graph signals when (i) sensor noise is heterogeneous across vertices and (ii) the graph used to design the sampler can be mildly mismatched to the true topology.We propose a risk-aware variant of local low-pass importance sampling that scores each vertex via a Hutchinson estimator of the diagonal of a graph heat-kernel operator and reweights the score by the inverse noise variance. The sampler selects without replacement according to these risk-aware scores. Reconstruction is performed with standard decoders (<sc>Tikhonov</small>, <sc>Bandlimited</small>, and a Chebyshev data-consistent smoother), enabling fair comparisons to prior work. On grid, Erdős–Rényi (ER), and Barabási–Albert (BA) graphs, our approach consistently reduces the normalized root-mean-square error (NRMSE) compared to random sampling; the gain increases with the sampling rate and persists under selection-graph mismatch. The method is simple, eigendecomposition-free, and scales linearly in the number of edges per Hutchinson probe.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"556-558"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LSP.2026.3653238
Da-Hee Yang;Joon-Hyuk Chang
Noise-robust automatic speech recognition (ASR) has been commonly addressed by applying speech enhancement (SE) at the waveform level before recognition. However, speech-level enhancement does not always translate into consistent recognition improvements due to residual distortions and mismatches with the latent space of the ASR encoder. In this letter, we introduce a complementary strategy termed latent-level enhancement, where distorted representations are refined during ASR inference. Specifically, we propose a plug-and-play Flow Matching Refinement module (FM-Refiner) that operates on the output latents of a pretrained CTC-based ASR encoder. Trained to map imperfect latents—either directly from noisy inputs or from enhanced-but-imperfect speech—toward their clean counterparts, the FM-Refiner is applied only at inference, without fine-tuning ASR parameters. Experiments show that FM-Refiner consistently reduces word error rate, both when directly applied to noisy inputs and when combined with conventional SE front-ends. These results demonstrate that latent-level refinement via flow matching provides a lightweight and effective complement to existing SE approaches for robust ASR.
{"title":"Latent-Level Enhancement With Flow Matching for Robust Automatic Speech Recognition","authors":"Da-Hee Yang;Joon-Hyuk Chang","doi":"10.1109/LSP.2026.3653238","DOIUrl":"https://doi.org/10.1109/LSP.2026.3653238","url":null,"abstract":"Noise-robust automatic speech recognition (ASR) has been commonly addressed by applying speech enhancement (SE) at the waveform level before recognition. However, speech-level enhancement does not always translate into consistent recognition improvements due to residual distortions and mismatches with the latent space of the ASR encoder. In this letter, we introduce a complementary strategy termed latent-level enhancement, where distorted representations are refined during ASR inference. Specifically, we propose a plug-and-play Flow Matching Refinement module (FM-Refiner) that operates on the output latents of a pretrained CTC-based ASR encoder. Trained to map imperfect latents—either directly from noisy inputs or from enhanced-but-imperfect speech—toward their clean counterparts, the FM-Refiner is applied only at inference, without fine-tuning ASR parameters. Experiments show that FM-Refiner consistently reduces word error rate, both when directly applied to noisy inputs and when combined with conventional SE front-ends. These results demonstrate that latent-level refinement via flow matching provides a lightweight and effective complement to existing SE approaches for robust ASR.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"589-593"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}