Low-complexity channel estimation techniques are key to enabling efficient, reliable, and real-time communication in modern wireless devices operating under resource and energy constraints. This paper presents for the first time a low-complexity peak-power-assisted data-aided channel estimation (DACE) scheme for both single-input single-output (SISO) and multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) wireless systems. In OFDM, high peak-power levels occur when the subcarriers align in phase and constructively interfere with each other. The research proposes a peak-power-assisted channel estimation scheme that accurately selects peak-power carriers at the transmitter of an OFDM system and uses them as reliable carriers for the DACE scheme. By incorporating these reliable carriers with known pilot symbols as additional pilot signals, channel estimation accuracy significantly improves in MIMO-OFDM systems. This eliminates the need to determine reliable data symbols at the receiver, thereby significantly reducing the computational complexity of the system. However, high peak-powers are considered a major drawback in OFDM. In this work, we incorporate a companding technique to mitigate this issue and provide sufficient margin for the DACE scheme. The performance of the proposed DACE scheme is evaluated using both least square (LS) and linear minimum mean square error (LMMSE) channel estimators. In this regard, the proposed technique not only improves channel estimation accuracy but also enhances the spectral efficiency of the wireless system. It outperforms traditional channel estimators in terms of system mean square error (MSE) and bit-error-rate (BER) performance. It also reduces the pilot overhead by 50$%$ compared to traditional channel estimators and provides bandwidth optimization for MIMO-OFDM systems. This makes it a promising solution for enhancing the performance and efficiency of next-generation wireless communication systems across diverse applications.
{"title":"A Novel Low-Complexity Peak-Power-Assisted Data-Aided Channel Estimation Scheme for MIMO-OFDM Wireless Systems","authors":"Inaamullah Khan;Mohammad Mahmudul Hasan;Michael Cheffena","doi":"10.1109/OJSP.2025.3595039","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3595039","url":null,"abstract":"Low-complexity channel estimation techniques are key to enabling efficient, reliable, and real-time communication in modern wireless devices operating under resource and energy constraints. This paper presents for the first time a low-complexity peak-power-assisted data-aided channel estimation (DACE) scheme for both single-input single-output (SISO) and multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) wireless systems. In OFDM, high peak-power levels occur when the subcarriers align in phase and constructively interfere with each other. The research proposes a peak-power-assisted channel estimation scheme that accurately selects peak-power carriers at the transmitter of an OFDM system and uses them as reliable carriers for the DACE scheme. By incorporating these reliable carriers with known pilot symbols as additional pilot signals, channel estimation accuracy significantly improves in MIMO-OFDM systems. This eliminates the need to determine reliable data symbols at the receiver, thereby significantly reducing the computational complexity of the system. However, high peak-powers are considered a major drawback in OFDM. In this work, we incorporate a companding technique to mitigate this issue and provide sufficient margin for the DACE scheme. The performance of the proposed DACE scheme is evaluated using both least square (LS) and linear minimum mean square error (LMMSE) channel estimators. In this regard, the proposed technique not only improves channel estimation accuracy but also enhances the spectral efficiency of the wireless system. It outperforms traditional channel estimators in terms of system mean square error (MSE) and bit-error-rate (BER) performance. It also reduces the pilot overhead by 50<inline-formula><tex-math>$%$</tex-math></inline-formula> compared to traditional channel estimators and provides bandwidth optimization for MIMO-OFDM systems. This makes it a promising solution for enhancing the performance and efficiency of next-generation wireless communication systems across diverse applications.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"992-1003"},"PeriodicalIF":2.7,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11106695","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01DOI: 10.1109/OJSP.2025.3595046
Exequiel Oliva;Nelson Díaz;Samuel Pinilla;Esteban Vera
Extended depth-of-field (EDoF) is a desirable attribute for imaging systems where all features in the scene are in focus despite their relative distance. Traditional imaging systems can achieve EDoF by reducing the aperture size at the expense of signal-to-noise ratio, particularly relevant in spectral imaging systems where incoming light is further divided. By designing and integrating diffractive optical elements (DOEs) placed at the aperture plane of the imaging system, wavefront coding has enabled EDoF while maintaining a larger aperture size at the expense of post-processing. Nevertheless, chromatic aberrations may appear and can often be confused by defocus, jeopardizing the fidelity of the reconstructions. This work presents a novel design approach for a multispectral-aware DOE for EDoF. By considering and modeling a refractive-diffractive optical setup, our proposed system uses the stochastic optimization framework to optimize DOE patterns to preserve spectral fidelity while extending the depth-of-field simultaneously. The optimization process exploits the covariance matrix adaptation evolution strategy (CMA-ES), efficiently exploring complex, high-dimensional phase configurations without the need for explicit gradient information. The optimized DOE is constantly evaluated in a simulated imaging pipeline where the EDoF multispectral datacube is deblurred using Richardson-Lucy deconvolution. Both qualitative and quantitative results demonstrate that the proposed DOE significantly improves depth invariance and spectral fidelity of the reconstructed datacubes compared to conventional and state-of-the-art DOE designs, making it a cost-effective solution for real-world multispectral EDoF applications.
{"title":"Multispectral Extended Depth-of-Field Imaging via Stochastic Wavefront Optimization","authors":"Exequiel Oliva;Nelson Díaz;Samuel Pinilla;Esteban Vera","doi":"10.1109/OJSP.2025.3595046","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3595046","url":null,"abstract":"Extended depth-of-field (EDoF) is a desirable attribute for imaging systems where all features in the scene are in focus despite their relative distance. Traditional imaging systems can achieve EDoF by reducing the aperture size at the expense of signal-to-noise ratio, particularly relevant in spectral imaging systems where incoming light is further divided. By designing and integrating diffractive optical elements (DOEs) placed at the aperture plane of the imaging system, wavefront coding has enabled EDoF while maintaining a larger aperture size at the expense of post-processing. Nevertheless, chromatic aberrations may appear and can often be confused by defocus, jeopardizing the fidelity of the reconstructions. This work presents a novel design approach for a multispectral-aware DOE for EDoF. By considering and modeling a refractive-diffractive optical setup, our proposed system uses the stochastic optimization framework to optimize DOE patterns to preserve spectral fidelity while extending the depth-of-field simultaneously. The optimization process exploits the covariance matrix adaptation evolution strategy (CMA-ES), efficiently exploring complex, high-dimensional phase configurations without the need for explicit gradient information. The optimized DOE is constantly evaluated in a simulated imaging pipeline where the EDoF multispectral datacube is deblurred using Richardson-Lucy deconvolution. Both qualitative and quantitative results demonstrate that the proposed DOE significantly improves depth invariance and spectral fidelity of the reconstructed datacubes compared to conventional and state-of-the-art DOE designs, making it a cost-effective solution for real-world multispectral EDoF applications.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"965-974"},"PeriodicalIF":2.7,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11106763","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144842966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-23DOI: 10.1109/OJSP.2025.3589747
Yatao Liu;Mingjie Shao;Wing-Kin Ma
In massive multiple-input multiple-output (MIMO) downlink systems, the physical implementation of the base stations (BSs) requires the use of cheap and power-efficient power amplifiers (PAs) to avoid high hardware cost and high power consumption. However, such PAs usually have limited linear amplification ranges. Nonlinear distortions arising from operation beyond the linear amplification ranges can significantly degrade system performance. Existing approaches to handle the nonlinear distortions, such as digital predistortion (DPD), typically require accurate knowledge, or acquisition, of the PA transfer function. In this paper, we present a new concept for mitigation of the PA distortions. Assuming a uniform linear array (ULA) at the BS, the idea is to apply a Sigma-Delta ($Sigma Delta$) modulator to spatially shape the PA distortions to the high-angle region. By having the system operating in the low-angle region, the received signals are less affected by the PA distortions. To demonstrate the potential of this spatial $Sigma Delta$ approach, we study the application of our approach to the multi-user MIMO-orthogonal frequency division modulation (OFDM) downlink scenario. A symbol-level precoding (SLP) scheme and a zero-forcing (ZF) precoding scheme, with the new design requirement by the spatial $Sigma Delta$ approach being taken into account, are developed. Numerical simulations are performed to show the effectiveness of the developed $Sigma Delta$ precoding schemes.
{"title":"A Spatial Sigma-Delta Approach to Mitigation of Power Amplifier Distortions in Massive MIMO Downlink","authors":"Yatao Liu;Mingjie Shao;Wing-Kin Ma","doi":"10.1109/OJSP.2025.3589747","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3589747","url":null,"abstract":"In massive multiple-input multiple-output (MIMO) downlink systems, the physical implementation of the base stations (BSs) requires the use of cheap and power-efficient power amplifiers (PAs) to avoid high hardware cost and high power consumption. However, such PAs usually have limited linear amplification ranges. Nonlinear distortions arising from operation beyond the linear amplification ranges can significantly degrade system performance. Existing approaches to handle the nonlinear distortions, such as digital predistortion (DPD), typically require accurate knowledge, or acquisition, of the PA transfer function. In this paper, we present a new concept for mitigation of the PA distortions. Assuming a uniform linear array (ULA) at the BS, the idea is to apply a Sigma-Delta (<inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula>) modulator to spatially shape the PA distortions to the high-angle region. By having the system operating in the low-angle region, the received signals are less affected by the PA distortions. To demonstrate the potential of this spatial <inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula> approach, we study the application of our approach to the multi-user MIMO-orthogonal frequency division modulation (OFDM) downlink scenario. A symbol-level precoding (SLP) scheme and a zero-forcing (ZF) precoding scheme, with the new design requirement by the spatial <inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula> approach being taken into account, are developed. Numerical simulations are performed to show the effectiveness of the developed <inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula> precoding schemes.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"900-916"},"PeriodicalIF":2.7,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11091482","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-17DOI: 10.1109/OJSP.2025.3590248
Jiale Zhao;Dingding Yao;Junfeng Li
Head-Related Transfer Functions (HRTFs) play a vital role in binaural spatial audio rendering. With the release of numerous HRTF datasets in recent years, abundant data has become available to support HRTF-related research based on deep learning. However, measurement discrepancies across different datasets introduce significant variations in the data and directly merging these datasets may lead to systematic biases. The recent Listener Acoustic Personalization Challenge 2024 (European Signal Processing Conference) dealt with this issue, with the task of harmonizing different datasets to achieve lower classification accuracy while meeting thresholds over various localization metrics. To mitigate cross-dataset differences, this paper proposes a neural network-based HRTF harmonization approach aimed at eliminating dataset-specific properties embedded in the original measurements. The proposed method utilizes a perceptually relevant loss function, which jointly constrains multiple objectives, including interaural level differences, auditory-filter excitation patterns, and classification accuracy. Experimental results based on eight datasets demonstrate that the proposed approach can effectively minimize distributional disparities between datasets while mostly preserving localization performance. The classification accuracy for harmonized HRTFs between different datasets is reduced to as low as 31%, indicating a significant reduction in cross-dataset discrepancies. The proposed method ranked first in this challenge, which validates its effectiveness.
{"title":"Cross-Dataset Head-Related Transfer Function Harmonization Based on Perceptually Relevant Loss Function","authors":"Jiale Zhao;Dingding Yao;Junfeng Li","doi":"10.1109/OJSP.2025.3590248","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3590248","url":null,"abstract":"Head-Related Transfer Functions (HRTFs) play a vital role in binaural spatial audio rendering. With the release of numerous HRTF datasets in recent years, abundant data has become available to support HRTF-related research based on deep learning. However, measurement discrepancies across different datasets introduce significant variations in the data and directly merging these datasets may lead to systematic biases. The recent Listener Acoustic Personalization Challenge 2024 (European Signal Processing Conference) dealt with this issue, with the task of harmonizing different datasets to achieve lower classification accuracy while meeting thresholds over various localization metrics. To mitigate cross-dataset differences, this paper proposes a neural network-based HRTF harmonization approach aimed at eliminating dataset-specific properties embedded in the original measurements. The proposed method utilizes a perceptually relevant loss function, which jointly constrains multiple objectives, including interaural level differences, auditory-filter excitation patterns, and classification accuracy. Experimental results based on eight datasets demonstrate that the proposed approach can effectively minimize distributional disparities between datasets while mostly preserving localization performance. The classification accuracy for harmonized HRTFs between different datasets is reduced to as low as 31%, indicating a significant reduction in cross-dataset discrepancies. The proposed method ranked first in this challenge, which validates its effectiveness.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"865-875"},"PeriodicalIF":2.7,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11082560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144739816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/OJSP.2025.3588715
Erez Yosef;Raja Giryes
Image reconstruction from noisy sensor measurements is challenging and many methods have been proposed for it. Yet, most approaches focus on learning robust natural image priors while modeling the scene’s noise statistics. In extremely low-light conditions, these methods often remain insufficient. Additional information is needed, such as multiple captures or, as suggested here, scene description. As an alternative, we propose using a text-based description of the scene as an additional prior, something the photographer can easily provide. Inspired by the remarkable success of text-guided diffusion models in image generation, we show that adding image caption information significantly improves image denoising and reconstruction for both synthetic and real-world images. All code and data will be made publicly available upon publication.
{"title":"Tell Me What You See: Text-Guided Real-World Image Denoising","authors":"Erez Yosef;Raja Giryes","doi":"10.1109/OJSP.2025.3588715","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3588715","url":null,"abstract":"Image reconstruction from noisy sensor measurements is challenging and many methods have been proposed for it. Yet, most approaches focus on learning robust natural image priors while modeling the scene’s noise statistics. In extremely low-light conditions, these methods often remain insufficient. Additional information is needed, such as multiple captures or, as suggested here, scene description. As an alternative, we propose using a text-based description of the scene as an additional prior, something the photographer can easily provide. Inspired by the remarkable success of text-guided diffusion models in image generation, we show that adding image caption information significantly improves image denoising and reconstruction for both synthetic and real-world images. All code and data will be made publicly available upon publication.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"890-899"},"PeriodicalIF":2.7,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078899","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1109/OJSP.2025.3588447
Kyle Donoghue;Ashkan Ashrafi
This paper presents a novel approach to graph learning, GL-AR, which leverages estimated autoregressive coefficients to recover undirected graph structures from time-series graph signals with propagation delay. GL-AR can discern graph structures where propagation between vertices is delayed, mirroring the dynamics of many real-world systems. This is achieved by utilizing the autoregressive coefficients of time-series graph signals in GL-AR’s learning algorithm. Existing graph learning techniques typically minimize the smoothness of a graph signal on a recovered graph structure to learn instantaneous relationships. GL-AR extends this approach by showing that minimizing smoothness with autoregressive coefficients can additionally recover relationships with propagation delay. The efficacy of GL-AR is demonstrated through applications to both synthetic and real-world datasets. Specifically, this work introduces the Graph-Tensor Method, a novel technique for generating synthetic time-series graph signals that represent edges as transfer functions. This method, along with real-world data from the National Climatic Data Center, is used to evaluate GL-AR’s performance in recovering undirected graph structures. Results indicate that GL-AR’s use of autoregressive coefficients enables it to outperform state-of-the-art graph learning techniques in scenarios with nonzero propagation delays. Furthermore, GL-AR’s performance is optimized by a new automated parameter selection algorithm, which eliminates the need for computationally intensive trial-and-error methods.
{"title":"Learning Graph Structures With Autoregressive Graph Signal Models","authors":"Kyle Donoghue;Ashkan Ashrafi","doi":"10.1109/OJSP.2025.3588447","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3588447","url":null,"abstract":"This paper presents a novel approach to graph learning, GL-AR, which leverages estimated autoregressive coefficients to recover undirected graph structures from time-series graph signals with propagation delay. GL-AR can discern graph structures where propagation between vertices is delayed, mirroring the dynamics of many real-world systems. This is achieved by utilizing the autoregressive coefficients of time-series graph signals in GL-AR’s learning algorithm. Existing graph learning techniques typically minimize the smoothness of a graph signal on a recovered graph structure to learn instantaneous relationships. GL-AR extends this approach by showing that minimizing smoothness with autoregressive coefficients can additionally recover relationships with propagation delay. The efficacy of GL-AR is demonstrated through applications to both synthetic and real-world datasets. Specifically, this work introduces the Graph-Tensor Method, a novel technique for generating synthetic time-series graph signals that represent edges as transfer functions. This method, along with real-world data from the National Climatic Data Center, is used to evaluate GL-AR’s performance in recovering undirected graph structures. Results indicate that GL-AR’s use of autoregressive coefficients enables it to outperform state-of-the-art graph learning techniques in scenarios with nonzero propagation delays. Furthermore, GL-AR’s performance is optimized by a new automated parameter selection algorithm, which eliminates the need for computationally intensive trial-and-error methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"838-855"},"PeriodicalIF":2.7,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078159","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-02DOI: 10.1109/OJSP.2025.3585440
Hoang Minh Huu Nguyen;İsmaıl Şenöz;Bert De Vries
A Variational Sparse Gaussian Process (VSGP) is a sophisticated nonparametric probabilistic model that has gained significant popularity since its inception. The VSGP model is often employed as a component of larger models or in a modified form across numerous applications. However, re-deriving the update equations for inference in these variations is technically challenging, which hinders broader adoption. In a separate line of research, message passing-based inference in factor graphs has emerged as an efficient framework for automated Bayesian inference. Despite its advantages, message passing techniques have not yet been applied to VSGP-based models due to the lack of a suitable representation for VSGP models in factor graphs. To address this limitation, we introduce a Sparse Gaussian Process (SGP) node within a Forney-style factor graph (FFG). We derive variational message passing update rules for the SGP node, enabling automated and efficient inference for VSGP-based models. We validate the update rules and illustrate the benefits of the SGP node through experiments in various Gaussian Process applications.
{"title":"A Factor Graph Approach to Variational Sparse Gaussian Processes","authors":"Hoang Minh Huu Nguyen;İsmaıl Şenöz;Bert De Vries","doi":"10.1109/OJSP.2025.3585440","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3585440","url":null,"abstract":"A Variational Sparse Gaussian Process (VSGP) is a sophisticated nonparametric probabilistic model that has gained significant popularity since its inception. The VSGP model is often employed as a component of larger models or in a modified form across numerous applications. However, re-deriving the update equations for inference in these variations is technically challenging, which hinders broader adoption. In a separate line of research, message passing-based inference in factor graphs has emerged as an efficient framework for automated Bayesian inference. Despite its advantages, message passing techniques have not yet been applied to VSGP-based models due to the lack of a suitable representation for VSGP models in factor graphs. To address this limitation, we introduce a Sparse Gaussian Process (SGP) node within a Forney-style factor graph (FFG). We derive variational message passing update rules for the SGP node, enabling automated and efficient inference for VSGP-based models. We validate the update rules and illustrate the benefits of the SGP node through experiments in various Gaussian Process applications.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"815-837"},"PeriodicalIF":2.9,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11063321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emerging applications such as remote car driving, drone control, or distant mobile robot operation impose a very tight constraint on the delay between the acquisition of a video frame by a camera embedded in the operated device and its display at the remote controller. This paper introduces a new frame-level video encoder rate control technique for ultra-low-latency video coding and delivery. A Model Predictive Control approach, exploiting the buffer level at the transmitter and an estimate of the transmission rate, is used to determine the target encoding rate of each video frame to adapt with minimum delay to sudden variations of the transmission channel characteristics. Then, an $R-(QP,D)$ model of the rate $R$ of the current frame to be encoded as a function of its quantization parameter (QP) and of the distortion $D$ of the reference frame is used to get the QP matching the target rate. This QP is then fed to the video coder. The proposed approach is compared to reference algorithms, namely PANDA, FESTIVE, BBA, and BOLA, some of which have been adapted to the considered server-driven low-latency coding and transmission scenario. Simulation results based on 4G bandwidth traces show that the proposed algorithm outperforms the others at different glass-to-glass delay constraints, considering several video quality metrics.
{"title":"Model Predictive Control Algorithm for Video Coding and Uplink Delivery in Delay-Critical Applications","authors":"Mourad Aklouf;Frédéric Dufaux;Michel Kieffer;Marc Lény","doi":"10.1109/OJSP.2025.3584672","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3584672","url":null,"abstract":"Emerging applications such as remote car driving, drone control, or distant mobile robot operation impose a very tight constraint on the delay between the acquisition of a video frame by a camera embedded in the operated device and its display at the remote controller. This paper introduces a new frame-level video encoder rate control technique for ultra-low-latency video coding and delivery. A Model Predictive Control approach, exploiting the buffer level at the transmitter and an estimate of the transmission rate, is used to determine the target encoding rate of each video frame to adapt with minimum delay to sudden variations of the transmission channel characteristics. Then, an <inline-formula><tex-math>$R-(QP,D)$</tex-math></inline-formula> model of the rate <inline-formula><tex-math>$R$</tex-math></inline-formula> of the current frame to be encoded as a function of its quantization parameter (QP) and of the distortion <inline-formula><tex-math>$D$</tex-math></inline-formula> of the reference frame is used to get the QP matching the target rate. This QP is then fed to the video coder. The proposed approach is compared to reference algorithms, namely PANDA, FESTIVE, BBA, and BOLA, some of which have been adapted to the considered server-driven low-latency coding and transmission scenario. Simulation results based on 4G bandwidth traces show that the proposed algorithm outperforms the others at different glass-to-glass delay constraints, considering several video quality metrics.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"876-889"},"PeriodicalIF":2.7,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11059858","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-27DOI: 10.1109/OJSP.2025.3583963
Helena Montenegro;Jaime S. Cardoso
With the growing adoption of Deep Learning for imaging tasks in biometrics and healthcare, it becomes increasingly important to ensure privacy when using and sharing images of people. Several works enable privacy-preserving image sharing by anonymizing the images so that the corresponding individuals are no longer recognizable. Most works average images or their embeddings as an anonymization technique, relying on the assumption that the average operation is irreversible. Recently, cold diffusion models, based on the popular denoising diffusion probabilistic models, have succeeded in reversing deterministic transformations on images. In this work, we leverage cold diffusion to decompose superimposed images, empirically demonstrating that it is possible to obtain two or more identically-distributed images given their average. We propose novel sampling strategies for this task and show their efficacy on three datasets. Our findings highlight the risks of averaging images as an anonymization technique and argue for the use of alternative anonymization strategies.
{"title":"Leveraging Cold Diffusion for the Decomposition of Identically Distributed Superimposed Images","authors":"Helena Montenegro;Jaime S. Cardoso","doi":"10.1109/OJSP.2025.3583963","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3583963","url":null,"abstract":"With the growing adoption of Deep Learning for imaging tasks in biometrics and healthcare, it becomes increasingly important to ensure privacy when using and sharing images of people. Several works enable privacy-preserving image sharing by anonymizing the images so that the corresponding individuals are no longer recognizable. Most works average images or their embeddings as an anonymization technique, relying on the assumption that the average operation is irreversible. Recently, cold diffusion models, based on the popular denoising diffusion probabilistic models, have succeeded in reversing deterministic transformations on images. In this work, we leverage cold diffusion to decompose superimposed images, empirically demonstrating that it is possible to obtain two or more identically-distributed images given their average. We propose novel sampling strategies for this task and show their efficacy on three datasets. Our findings highlight the risks of averaging images as an anonymization technique and argue for the use of alternative anonymization strategies.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"784-794"},"PeriodicalIF":2.9,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11054277","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144606192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-20DOI: 10.1109/OJSP.2025.3581840
Qingfeng Liu;Mostafa El-Khamy;Kee-Bong Song
Video Panoptic Segmentation (VPS) is the most challenging video segmentation task, as it requires accurate labeling of every pixel in each frame, as well as identifying the multiple instances and tracking them across frames. In this paper, we explore state-of-the-art solutions for VPS at both the giant model regime for offline or server processing and the tiny model regime for online or edge computing. We designed Giant-VPS which achieved the first place solution in the 2024 Pixel Level Video Understanding in the Wild (PVUW) challenge. Our Giant-VPS builds on top of MinVIS and deploys the DINOv2-giant vision foundation model with a carefully designed ViT (Vision Transformer) adapter. For mobile and edge devices, we designed the Tiny-VPS model and show that our novel ViT-adapter distillation from the Giant-VPS model can further improve the accuracy of Tiny-VPS. Our Tiny-VPS is the first, in the sub-20 GFLOPS regime, to achieve competitive accuracy on VPS and VSS (Video Semantic Segmentation) benchmarks.
{"title":"Tiny-VPS: Tiny Video Panoptic Segmentation Standing on the Shoulder of Giant-VPS","authors":"Qingfeng Liu;Mostafa El-Khamy;Kee-Bong Song","doi":"10.1109/OJSP.2025.3581840","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3581840","url":null,"abstract":"Video Panoptic Segmentation (VPS) is the most challenging video segmentation task, as it requires accurate labeling of every pixel in each frame, as well as identifying the multiple instances and tracking them across frames. In this paper, we explore state-of-the-art solutions for VPS at both the giant model regime for offline or server processing and the tiny model regime for online or edge computing. We designed Giant-VPS which achieved the first place solution in the 2024 Pixel Level Video Understanding in the Wild (PVUW) challenge. Our Giant-VPS builds on top of MinVIS and deploys the DINOv2-giant vision foundation model with a carefully designed ViT (Vision Transformer) adapter. For mobile and edge devices, we designed the Tiny-VPS model and show that our novel ViT-adapter distillation from the Giant-VPS model can further improve the accuracy of Tiny-VPS. Our Tiny-VPS is the first, in the sub-20 GFLOPS regime, to achieve competitive accuracy on VPS and VSS (Video Semantic Segmentation) benchmarks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"803-814"},"PeriodicalIF":2.9,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144623936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}