Pub Date : 2026-01-01Epub Date: 2025-12-05DOI: 10.1109/lsp.2025.3640510
Christopher K Kovach, Stephen V Gliske, Erin M Radcliffe, Sam Shipley, John A Thompson, Aviva Abosch
The fourth-order time-invariant spectrum, or trispectrum, has a simple derivation as the cross-spectrum among frequency bands in the Wigner-Ville distribution (WVD). Viewed this way, the trispectrum gains intuitive meaning as a measure of the linear dependence of power across frequencies, which yields some insight into its structure and interpretation. We highlight, in particular, a two-dimensional subdomain as useful for identifying modulated oscillations when the modulating envelope is non-negative or lowpass. Spectral characteristics of the carrier and modulating signals are revealed along separate axes of a two-dimensional representation of this domain. The application of this framework, combined with a previously described additive decomposition technique for higher-order spectra, is demonstrated by the blind identification and separation of sleep spindles and beta bursts in EEG.
{"title":"Interpreting the Trispectrum as the Cross-Spectrum of the Wigner-Ville Distribution.","authors":"Christopher K Kovach, Stephen V Gliske, Erin M Radcliffe, Sam Shipley, John A Thompson, Aviva Abosch","doi":"10.1109/lsp.2025.3640510","DOIUrl":"10.1109/lsp.2025.3640510","url":null,"abstract":"<p><p>The fourth-order time-invariant spectrum, or trispectrum, has a simple derivation as the cross-spectrum among frequency bands in the Wigner-Ville distribution (WVD). Viewed this way, the trispectrum gains intuitive meaning as a measure of the linear dependence of power across frequencies, which yields some insight into its structure and interpretation. We highlight, in particular, a two-dimensional subdomain as useful for identifying modulated oscillations when the modulating envelope is non-negative or lowpass. Spectral characteristics of the carrier and modulating signals are revealed along separate axes of a two-dimensional representation of this domain. The application of this framework, combined with a previously described additive decomposition technique for higher-order spectra, is demonstrated by the blind identification and separation of sleep spindles and beta bursts in EEG.</p>","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"221-225"},"PeriodicalIF":3.9,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12829976/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146046600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1109/LSP.2025.3649590
Ying Zeng;Jialong Zhu
Medical image segmentation is fundamental to clinical diagnosis and treatment planning, yet existing models are constrained by the scarcity of annotated data, which are costly and labor-intensive to obtain. Semi-supervised learning (SSL) mitigates this issue by leveraging large volumes of unlabeled data, but most SSL methods rely solely on visual cues and often fail to capture subtle structures or low-contrast regions common in medical imaging. To address this limitation, we present LanDy, a Language-Prompted Dynamic Learning framework for semi-supervised medical image segmentation. LanDy introduces textual semantics from medical descriptions to enrich visual representations and reduce the ambiguity of pseudo-labels. Concretely, textual embeddings dynamically modulate convolutional filters to provide context-aware feature extraction, while a text-guided refinement mechanism improves the reliability of pseudo-labels on unlabeled data. Extensive experiments on benchmark datasets demonstrate that LanDy consistently outperforms state-of-the-art SSL methods, delivering more accurate and robust segmentation under annotation-efficient settings.
{"title":"Language-Prompted Dynamic Learning for Semi-Supervised Medical Image Segmentation","authors":"Ying Zeng;Jialong Zhu","doi":"10.1109/LSP.2025.3649590","DOIUrl":"https://doi.org/10.1109/LSP.2025.3649590","url":null,"abstract":"Medical image segmentation is fundamental to clinical diagnosis and treatment planning, yet existing models are constrained by the scarcity of annotated data, which are costly and labor-intensive to obtain. Semi-supervised learning (SSL) mitigates this issue by leveraging large volumes of unlabeled data, but most SSL methods rely solely on visual cues and often fail to capture subtle structures or low-contrast regions common in medical imaging. To address this limitation, we present <bold>LanDy</b>, a Language-Prompted Dynamic Learning framework for semi-supervised medical image segmentation. LanDy introduces textual semantics from medical descriptions to enrich visual representations and reduce the ambiguity of pseudo-labels. Concretely, textual embeddings dynamically modulate convolutional filters to provide context-aware feature extraction, while a text-guided refinement mechanism improves the reliability of pseudo-labels on unlabeled data. Extensive experiments on benchmark datasets demonstrate that LanDy consistently outperforms state-of-the-art SSL methods, delivering more accurate and robust segmentation under annotation-efficient settings.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"668-672"},"PeriodicalIF":3.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1109/LSP.2025.3649602
Changyong Xu;Bo Chen;Rusheng Wang;Zheming Wang
This letter investigates the distributed fusion estimation problem for uncertain systems, where noise statistics are unavailable. A scenario optimization framework is employed to handle model uncertainties, in which sampled uncertainty realizations are transformed into linear matrix inequality (LMI) constraints. By solving the resulting convex problems, local estimator gains are obtained, ensuring bounded mean-square error. Furthermore, an explicit upper bound for the fusion error is derived, and optimal fusion weights are determined through an LMI-based criterion. Finally, target tracking systems are provided to demonstrate the advantages and effectiveness of the proposed methods. The influence of the violation and confidence parameters on estimation accuracy and computational complexity is further analyzed.
{"title":"Scenario-Based Distributed Fusion Estimation for Uncertain Systems With Bounded Noise","authors":"Changyong Xu;Bo Chen;Rusheng Wang;Zheming Wang","doi":"10.1109/LSP.2025.3649602","DOIUrl":"https://doi.org/10.1109/LSP.2025.3649602","url":null,"abstract":"This letter investigates the distributed fusion estimation problem for uncertain systems, where noise statistics are unavailable. A scenario optimization framework is employed to handle model uncertainties, in which sampled uncertainty realizations are transformed into linear matrix inequality (LMI) constraints. By solving the resulting convex problems, local estimator gains are obtained, ensuring bounded mean-square error. Furthermore, an explicit upper bound for the fusion error is derived, and optimal fusion weights are determined through an LMI-based criterion. Finally, target tracking systems are provided to demonstrate the advantages and effectiveness of the proposed methods. The influence of the violation and confidence parameters on estimation accuracy and computational complexity is further analyzed.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"569-573"},"PeriodicalIF":3.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.1109/LSP.2025.3648967
Haoyi Zhao;Zeyu Xiao;Zihan Qi;Yang Zhao;Wei Jia
Atmospheric turbulence induces coupled spatio-temporal distortions, including blur, geometric deformation, and temporal jitter, which severely degrade image quality.We propose EvTurM, a practical framework leveraging event camera data for dynamic turbulence mitigation with precise motion cues and stable temporal modeling. Leveraging the high temporal resolution and dynamic range of events, EvTurM achieves robust restoration under diverse turbulence conditions. EvTurM comprises two key modules: (1) the event-aware modality enhancement module, which uses event-derived motion to enrich RGB features and recover structural details, and (2) the bidirectional modality calibration module, which jointly aligns RGB and event features in forward and backward propagation to reduce misalignment and enhance temporal consistency. Extensive experiments show EvTurM consistently surpasses existing methods and achieves superior performance.
{"title":"Event-Based Dynamic Turbulence Mitigation","authors":"Haoyi Zhao;Zeyu Xiao;Zihan Qi;Yang Zhao;Wei Jia","doi":"10.1109/LSP.2025.3648967","DOIUrl":"https://doi.org/10.1109/LSP.2025.3648967","url":null,"abstract":"Atmospheric turbulence induces coupled spatio-temporal distortions, including blur, geometric deformation, and temporal jitter, which severely degrade image quality.We propose EvTurM, a practical framework leveraging event camera data for dynamic turbulence mitigation with precise motion cues and stable temporal modeling. Leveraging the high temporal resolution and dynamic range of events, EvTurM achieves robust restoration under diverse turbulence conditions. EvTurM comprises two key modules: (1) the event-aware modality enhancement module, which uses event-derived motion to enrich RGB features and recover structural details, and (2) the bidirectional modality calibration module, which jointly aligns RGB and event features in forward and backward propagation to reduce misalignment and enhance temporal consistency. Extensive experiments show EvTurM consistently surpasses existing methods and achieves superior performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"564-568"},"PeriodicalIF":3.9,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.1109/LSP.2025.3648910
Xiaoqiang Long;Haiquan Zhao;Xinyan Hou
Traditional single-kernel or fixed-center multi-kernel collaborative correntropies fundamentally assume that errors primarily cluster around a central point (typically zero). However, in real-world complex noise environments—such as those generated by mixed interference sources with diverse mechanisms—errors may exhibit multi-modal or highly asymmetric statistical characteristics. In such cases, a single central point or multi-kernels fixed at the origin cannot effectively capture the true shape of the error distribution. To address these problems, this letter proposes a novel robust learning algorithm by introducing variable-center multi-kernel correntropy into an asymmetric correntropy framework, where the kernel centers can be positioned at arbitrary locations. Compared with the maximum asymmetric correntropy criterion (MACC) algorithm, the proposed approach offers a more generalized formulation that enhances its capability to handle more complex error distributions, thereby improving algorithm performance. Notably, existing literature has not yet provided theoretical analysis for such variable-center multi-kernel asymmetric correntropy robust algorithms. Therefore, the main contributions of this work include: conducting the first theoretical analysis of the proposed algorithm, and validating the effectiveness of the analytical methodology.
{"title":"Multi-Kernel Maximum Asymmetric Correntropy Criterion: Foundation and Analysis","authors":"Xiaoqiang Long;Haiquan Zhao;Xinyan Hou","doi":"10.1109/LSP.2025.3648910","DOIUrl":"https://doi.org/10.1109/LSP.2025.3648910","url":null,"abstract":"Traditional single-kernel or fixed-center multi-kernel collaborative correntropies fundamentally assume that errors primarily cluster around a central point (typically zero). However, in real-world complex noise environments—such as those generated by mixed interference sources with diverse mechanisms—errors may exhibit multi-modal or highly asymmetric statistical characteristics. In such cases, a single central point or multi-kernels fixed at the origin cannot effectively capture the true shape of the error distribution. To address these problems, this letter proposes a novel robust learning algorithm by introducing variable-center multi-kernel correntropy into an asymmetric correntropy framework, where the kernel centers can be positioned at arbitrary locations. Compared with the maximum asymmetric correntropy criterion (MACC) algorithm, the proposed approach offers a more generalized formulation that enhances its capability to handle more complex error distributions, thereby improving algorithm performance. Notably, existing literature has not yet provided theoretical analysis for such variable-center multi-kernel asymmetric correntropy robust algorithms. Therefore, the main contributions of this work include: conducting the first theoretical analysis of the proposed algorithm, and validating the effectiveness of the analytical methodology.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"411-415"},"PeriodicalIF":3.9,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.1109/LSP.2025.3648640
Zhen Gao;Yiping Jiang;Rong Yang;Xingqun Zhan
Recovering signals from noisy observations remains challenging due to the ill-posedness of inverse problems. While non-convex regularization methods like the standard Cauchy penalty improve estimation accuracy, it lacks adaptability across diverse scenarios. In response, this letter proposes a fractional-order Cauchy (q-Cauchy) penalty inspired by the Lq maximum likelihood estimation. By introducing the parameter $q$, the q-Cauchy penalty achieves greater adaptability in diverse scenarios. Specifically, we also derive sufficient convexity conditions for its proximal operator and propose a forward-backward solver. Simulation results demonstrate that the q-Cauchy with the appropriate $q$ outperforms the baseline methods in both 1D signal denoising and 2D image deblurring tasks.
{"title":"A Fractional-Order Cauchy Penalty With Enhanced Adaptability for Signal Recovery","authors":"Zhen Gao;Yiping Jiang;Rong Yang;Xingqun Zhan","doi":"10.1109/LSP.2025.3648640","DOIUrl":"https://doi.org/10.1109/LSP.2025.3648640","url":null,"abstract":"Recovering signals from noisy observations remains challenging due to the ill-posedness of inverse problems. While non-convex regularization methods like the standard Cauchy penalty improve estimation accuracy, it lacks adaptability across diverse scenarios. In response, this letter proposes a fractional-order Cauchy (q-Cauchy) penalty inspired by the Lq maximum likelihood estimation. By introducing the parameter <inline-formula><tex-math>$q$</tex-math></inline-formula>, the q-Cauchy penalty achieves greater adaptability in diverse scenarios. Specifically, we also derive sufficient convexity conditions for its proximal operator and propose a forward-backward solver. Simulation results demonstrate that the q-Cauchy with the appropriate <inline-formula><tex-math>$q$</tex-math></inline-formula> outperforms the baseline methods in both 1D signal denoising and 2D image deblurring tasks.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"456-460"},"PeriodicalIF":3.9,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1109/LSP.2025.3648249
Menghui Lei;Xiangyang Zeng;Mingmin Zeng;Anqi Jin
The attention mechanism improves underwater acoustic target recognition (UATR) by suppressing irrelevant features. However, due to the uncertainty and scarcity of underwater acoustic target (UWAT) signals, complicated deterministic attention modules increase the risk of model overfitting, resulting in limited improvement or even degradation in the performance of UATR. This letter proposes a Bayesian Hybrid Attention Module (BHAM) that enhances UATR based on time–frequency (T–F) features. BHAM models attention weights as random variables following Beta and Dirichlet distributions to capture uncertainty of UWAT signals and mitigate overfitting, while strengthening T–F feature representation via Bayesian channel attention and Bayesian T–F attention. By learning attention distributions in a Bayesian manner, BHAM effectively models complex dependencies in UWAT signals. Experiments on the DeepShip dataset demonstrate that BHAM alleviates overfitting and generalizes well across different network backbones.
{"title":"A Bayesian Hybrid Attention Module for Underwater Acoustic Target Recognition","authors":"Menghui Lei;Xiangyang Zeng;Mingmin Zeng;Anqi Jin","doi":"10.1109/LSP.2025.3648249","DOIUrl":"https://doi.org/10.1109/LSP.2025.3648249","url":null,"abstract":"The attention mechanism improves underwater acoustic target recognition (UATR) by suppressing irrelevant features. However, due to the uncertainty and scarcity of underwater acoustic target (UWAT) signals, complicated deterministic attention modules increase the risk of model overfitting, resulting in limited improvement or even degradation in the performance of UATR. This letter proposes a Bayesian Hybrid Attention Module (BHAM) that enhances UATR based on time–frequency (T–F) features. BHAM models attention weights as random variables following Beta and Dirichlet distributions to capture uncertainty of UWAT signals and mitigate overfitting, while strengthening T–F feature representation via Bayesian channel attention and Bayesian T–F attention. By learning attention distributions in a Bayesian manner, BHAM effectively models complex dependencies in UWAT signals. Experiments on the DeepShip dataset demonstrate that BHAM alleviates overfitting and generalizes well across different network backbones.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"441-445"},"PeriodicalIF":3.9,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1109/LSP.2025.3648638
Fangqing Xiao;Dirk Slock
The Cramér–Rao bound (CRB) quantifies the variance lower bound for unbiased estimators, but it is intractable to evaluate in linear hierarchical Bayesian models with non-Gaussian priors due to the intractable marginal likelihood. Existing methods, including variational Bayes and Markov chain Monte Carlo (MCMC)-based approaches, often have high computational cost and slow convergence. We propose an efficient framework to approximate the Fisher information matrix (FIM) and the CRB by expressing the gradient of the log marginal likelihood as a posterior expectation. Expectation propagation (EP) is used to approximate the posterior as a Gaussian, enabling accurate moment estimation compared to pure sampling-based methods. Numerical experiments on small-scale sparse models show that the EP-based CRB approximation achieves lower average normalized mean squared error (NMSE) and faster convergence than classical baselines in non-Gaussian settings.
{"title":"Efficient CRB Estimation for Linear Models via Expectation Propagation and Monte Carlo Sampling","authors":"Fangqing Xiao;Dirk Slock","doi":"10.1109/LSP.2025.3648638","DOIUrl":"https://doi.org/10.1109/LSP.2025.3648638","url":null,"abstract":"The Cramér–Rao bound (CRB) quantifies the variance lower bound for unbiased estimators, but it is intractable to evaluate in linear hierarchical Bayesian models with non-Gaussian priors due to the intractable marginal likelihood. Existing methods, including variational Bayes and Markov chain Monte Carlo (MCMC)-based approaches, often have high computational cost and slow convergence. We propose an efficient framework to approximate the Fisher information matrix (FIM) and the CRB by expressing the gradient of the log marginal likelihood as a posterior expectation. Expectation propagation (EP) is used to approximate the posterior as a Gaussian, enabling accurate moment estimation compared to pure sampling-based methods. Numerical experiments on small-scale sparse models show that the EP-based CRB approximation achieves lower average normalized mean squared error (NMSE) and faster convergence than classical baselines in non-Gaussian settings.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"451-455"},"PeriodicalIF":3.9,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As an emerging technology, Video Face Super-Resolution (VFSR) aims to reconstruct high-resolution facial images from low-quality video sequences while maintaining identity consistency, which makes it applicable to scenarios such as surveillance, video conferencing, and film restoration. Compared with image-based face restoration and general video super-resolution, VFSR is more challenging because it requires accurate facial detail reconstruction, strict identity preservation, and computational efficiency under varying poses and expressions. To address these challenges, we propose a High-Precision Identity Preserving VFSR framework (HPIP), which integrates a Multi-Scale Prediction Module (MPM) and an Identity Preservation Module (IPM). The MPM focuses on identity-critical facial regions (e.g., eyes, nose, and mouth) and leverages multi-scale feature prediction to improve reconstruction accuracy and robustness while maintaining computational efficiency. The IPM further projects features into a latent representation space, generating temporally consistent dictionary features and enhancing temporal coherence. Extensive experiments demonstrate that HPIP achieves superior performance in both qualitative and quantitative evaluations, producing visually pleasing facial details while maintaining an efficient and lightweight design.
视频人脸超分辨率(Video Face Super-Resolution, VFSR)是一项新兴技术,旨在从低质量的视频序列中重建高分辨率的人脸图像,同时保持身份一致性,适用于监控、视频会议和电影修复等场景。与基于图像的人脸恢复和一般的视频超分辨率相比,VFSR需要精确的人脸细节重建,严格的身份保持,以及不同姿态和表情下的计算效率,更具挑战性。为了解决这些问题,我们提出了一个高精度身份保持VFSR框架(HPIP),该框架集成了一个多尺度预测模块(MPM)和一个身份保持模块(IPM)。MPM专注于身份关键面部区域(例如,眼睛,鼻子和嘴巴),并利用多尺度特征预测来提高重建精度和鲁棒性,同时保持计算效率。IPM进一步将特征投射到潜在的表示空间中,生成时间一致的字典特征并增强时间一致性。大量的实验表明,HPIP在定性和定量评估中都取得了卓越的性能,在保持高效和轻量化设计的同时,产生了视觉上令人愉悦的面部细节。
{"title":"Video Face Super-Resolution With High-Precision Identity Preservation","authors":"Chaoliang Wu;Ting Zhang;Xianbin Zhang;Nian He;Yiwen Xu","doi":"10.1109/LSP.2025.3648639","DOIUrl":"https://doi.org/10.1109/LSP.2025.3648639","url":null,"abstract":"As an emerging technology, Video Face Super-Resolution (VFSR) aims to reconstruct high-resolution facial images from low-quality video sequences while maintaining identity consistency, which makes it applicable to scenarios such as surveillance, video conferencing, and film restoration. Compared with image-based face restoration and general video super-resolution, VFSR is more challenging because it requires accurate facial detail reconstruction, strict identity preservation, and computational efficiency under varying poses and expressions. To address these challenges, we propose a High-Precision Identity Preserving VFSR framework (HPIP), which integrates a Multi-Scale Prediction Module (MPM) and an Identity Preservation Module (IPM). The MPM focuses on identity-critical facial regions (<italic>e.g.</i>, eyes, nose, and mouth) and leverages multi-scale feature prediction to improve reconstruction accuracy and robustness while maintaining computational efficiency. The IPM further projects features into a latent representation space, generating temporally consistent dictionary features and enhancing temporal coherence. Extensive experiments demonstrate that HPIP achieves superior performance in both qualitative and quantitative evaluations, producing visually pleasing facial details while maintaining an efficient and lightweight design.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"406-410"},"PeriodicalIF":3.9,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generative steganography has recently attracted considerable attention due to its superior security properties. However, most existing approaches suffer from limited hiding capacity. To address this issue, this paper proposes a high-capacity image steganography framework that integrates an encoder–decoder architecture with a latent diffusion model. Specifically, a message encoder is designed to transform binary secret messages into latent-space representations through a series of ResDense modules, enabling efficient hiding of large-scale information. The encoded latent features are then guided by the latent diffusion model to synthesize visually realistic stego images. During message extraction, the stego image undergoes iterative noise addition within the diffusion process to reconstruct the latent representation, from which a message decoder accurately recovers the hidden message. Extensive experimental results demonstrate that the proposed method achieves a high hiding capacity of over 30,000 bits, outperforming state-of-the-art methods while ensuring reliable message recovery under common image storage formats such as JPEG and PNG.
{"title":"High-Capacity Image Steganography via Latent Diffusion Models","authors":"Ruijie Du;Na Wang;Cheng Xiong;Chuan Qin;Xinpeng Zhang","doi":"10.1109/LSP.2025.3647567","DOIUrl":"https://doi.org/10.1109/LSP.2025.3647567","url":null,"abstract":"Generative steganography has recently attracted considerable attention due to its superior security properties. However, most existing approaches suffer from limited hiding capacity. To address this issue, this paper proposes a high-capacity image steganography framework that integrates an encoder–decoder architecture with a latent diffusion model. Specifically, a message encoder is designed to transform binary secret messages into latent-space representations through a series of ResDense modules, enabling efficient hiding of large-scale information. The encoded latent features are then guided by the latent diffusion model to synthesize visually realistic stego images. During message extraction, the stego image undergoes iterative noise addition within the diffusion process to reconstruct the latent representation, from which a message decoder accurately recovers the hidden message. Extensive experimental results demonstrate that the proposed method achieves a high hiding capacity of over 30,000 bits, outperforming state-of-the-art methods while ensuring reliable message recovery under common image storage formats such as JPEG and PNG.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"401-405"},"PeriodicalIF":3.9,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}