Pub Date : 2025-10-06DOI: 10.1109/OJSP.2025.3618583
Qiang Heng;Xiaoqian Liu;Eric C. Chi
Convex–nonconvex (CNC) regularization is a novel paradigm that employs a nonconvex penalty function while preserving the convexity of the overall objective function. It has found successful applications in signal processing, statistics, and machine learning. Despite its wide applicability, the computation of CNC-regularized problems is still dominated by the forward–backward splitting method, which can be computationally slow in practice and is restricted to handling a single regularizer. To address these limitations, we develop a unified Anderson acceleration framework that encompasses multiple prevalent operator-splitting schemes, thereby enabling the efficient solution of a broad class of CNC-regularized problems with a quadratic data-fidelity term. We establish global convergence of the proposed algorithm to an optimal point and demonstrate its substantial speed-ups across diverse applications.
{"title":"Anderson Accelerated Operator Splitting Methods for Convex-Nonconvex Regularized Problems","authors":"Qiang Heng;Xiaoqian Liu;Eric C. Chi","doi":"10.1109/OJSP.2025.3618583","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3618583","url":null,"abstract":"Convex–nonconvex (CNC) regularization is a novel paradigm that employs a nonconvex penalty function while preserving the convexity of the overall objective function. It has found successful applications in signal processing, statistics, and machine learning. Despite its wide applicability, the computation of CNC-regularized problems is still dominated by the forward–backward splitting method, which can be computationally slow in practice and is restricted to handling a single regularizer. To address these limitations, we develop a unified Anderson acceleration framework that encompasses multiple prevalent operator-splitting schemes, thereby enabling the efficient solution of a broad class of CNC-regularized problems with a quadratic data-fidelity term. We establish global convergence of the proposed algorithm to an optimal point and demonstrate its substantial speed-ups across diverse applications.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1094-1108"},"PeriodicalIF":2.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11194222","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The primary challenge and opportunity lie in leveraging shared information across these tasks and domains to enhance the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we introduce a factorized tensor network (FTN) designed to achieve accuracy comparable to that of independent single-task or single-domain networks, while introducing a minimal number of additional parameters. The FTN approach entails incorporating task- or domain-specific low-rank tensor factors into a shared frozen network derived from a source model. This strategy allows for adaptation to numerous target domains and tasks without encountering catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. Our findings indicate that FTN attains similar accuracy as single-task or single-domain methods while using only a fraction of additional parameters per task.
{"title":"Parameter-Efficient Multi-Task and Multi-Domain Learning Using Factorized Tensor Networks","authors":"Yash Garg;Nebiyou Yismaw;Rakib Hyder;Ashley Prater-Bennette;Amit Roy-Chowdhury;M. Salman Asif","doi":"10.1109/OJSP.2025.3613142","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3613142","url":null,"abstract":"Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The primary challenge and opportunity lie in leveraging shared information across these tasks and domains to enhance the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we introduce a factorized tensor network (FTN) designed to achieve accuracy comparable to that of independent single-task or single-domain networks, while introducing a minimal number of additional parameters. The FTN approach entails incorporating task- or domain-specific low-rank tensor factors into a shared frozen network derived from a source model. This strategy allows for adaptation to numerous target domains and tasks without encountering catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. Our findings indicate that FTN attains similar accuracy as single-task or single-domain methods while using only a fraction of additional parameters per task.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1077-1085"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11175489","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-22DOI: 10.1109/OJSP.2025.3613209
Camilo Arevalo;Julián Villegas
A method for performing spatial upsampling of Head-Related Impulse Responses (HRIRs) from sparse measurements is introduced. Based on a supervised elevation-wise encoder-decoder network design, we present two variants: one that performs progressive reconstructions with feed-forward connections from higher to lower elevations, and another that excludes these connections. The variants were evaluated in terms of the errors in interaural time and level differences, as well as the spectral distortion in the ipsilateral and contralateral ears. The additional complexity introduced by the variant with feed-forward connections does not always translate into accuracy gains, making the simpler variant preferable for efficiency. Performance generally improved as the number of available measurements increased. However, accuracy was also found to strongly depend on the spatial distribution of those measurements. Compared to an average non-personalized HRIRs, interaural time differences remain similar, while the proposed method achieves higher spectral and level accuracy, highlighting its practical use for HRIR upsampling.
{"title":"Spatial Upsampling of Head-Related Impulse Responses via Elevation-Wise Encoder-Decoder Networks","authors":"Camilo Arevalo;Julián Villegas","doi":"10.1109/OJSP.2025.3613209","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3613209","url":null,"abstract":"A method for performing spatial upsampling of Head-Related Impulse Responses (HRIRs) from sparse measurements is introduced. Based on a supervised elevation-wise encoder-decoder network design, we present two variants: one that performs progressive reconstructions with feed-forward connections from higher to lower elevations, and another that excludes these connections. The variants were evaluated in terms of the errors in interaural time and level differences, as well as the spectral distortion in the ipsilateral and contralateral ears. The additional complexity introduced by the variant with feed-forward connections does not always translate into accuracy gains, making the simpler variant preferable for efficiency. Performance generally improved as the number of available measurements increased. However, accuracy was also found to strongly depend on the spatial distribution of those measurements. Compared to an average non-personalized HRIRs, interaural time differences remain similar, while the proposed method achieves higher spectral and level accuracy, highlighting its practical use for HRIR upsampling.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1086-1093"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11175513","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A spatial upsampling method for the head-related transfer function (HRTF) using deep neural networks (DNNs), consisting of an autoencoder conditioned on the source position and frequency, is proposed. On the basis of our finding that the conventional regularized linear regression (RLR)-based upsampling method can be reinterpreted as a linear autoencoder, we designed our network architecture as a nonlinear extension of the RLR-based method, whose key features are the encoder and decoder weights depending on the source positions and the latent variables independent of the source positions. We also extend this architecture to upsample HRTFs and interaural time differences (ITDs) in a single network, which allows us to efficiently obtain head-related impulse responses (HRIRs). Experimental results on upsampling accuracy and perceptual quality indicated that our proposed method can upsample HRTFs from sparse measurements with sufficient quality.
{"title":"Spatial Upsampling of Head-Related Transfer Function Using Neural Network Conditioned on Source Position and Frequency","authors":"Yuki Ito;Tomohiko Nakamura;Shoichi Koyama;Shuichi Sakamoto;Hiroshi Saruwatari","doi":"10.1109/OJSP.2025.3613132","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3613132","url":null,"abstract":"A spatial upsampling method for the head-related transfer function (HRTF) using deep neural networks (DNNs), consisting of an autoencoder conditioned on the source position and frequency, is proposed. On the basis of our finding that the conventional regularized linear regression (RLR)-based upsampling method can be reinterpreted as a linear autoencoder, we designed our network architecture as a nonlinear extension of the RLR-based method, whose key features are the encoder and decoder weights depending on the source positions and the latent variables independent of the source positions. We also extend this architecture to upsample HRTFs and interaural time differences (ITDs) in a single network, which allows us to efficiently obtain head-related impulse responses (HRIRs). Experimental results on upsampling accuracy and perceptual quality indicated that our proposed method can upsample HRTFs from sparse measurements with sufficient quality.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1109-1123"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11175492","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145405295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-08DOI: 10.1109/OJSP.2025.3607278
Metin Calis;Massimo Mischi;Alle-Jan van der Veen;Raj Thilak Rajan;Borbàla Hunyadi
Tensor decomposition methods for signal processing applications are an active area of research. Real data are often low-rank, noisy, and come in a higher-order format. As such, low-rank tensor approximation methods that account for the high-order structure of the data are often used for denoising. One way to represent a tensor in a low-rank form is to decompose the tensor into a set of orthonormal factor matrices and an all-orthogonal core tensor using a higher-order singular value decomposition. Under noisy measurements, the lower bound for recovering the factor matrices and the core tensor is unknown. In this paper, we exploit the well-studied constrained Cramér-Rao bound to calculate a lower bound on the mean squared error of the unbiased estimates of the components of the multilinear singular value decomposition under additive white Gaussian noise, and we validate our approach through simulations.
{"title":"Constrained Cramér-Rao Bound for Higher-Order Singular Value Decomposition","authors":"Metin Calis;Massimo Mischi;Alle-Jan van der Veen;Raj Thilak Rajan;Borbàla Hunyadi","doi":"10.1109/OJSP.2025.3607278","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3607278","url":null,"abstract":"Tensor decomposition methods for signal processing applications are an active area of research. Real data are often low-rank, noisy, and come in a higher-order format. As such, low-rank tensor approximation methods that account for the high-order structure of the data are often used for denoising. One way to represent a tensor in a low-rank form is to decompose the tensor into a set of orthonormal factor matrices and an all-orthogonal core tensor using a higher-order singular value decomposition. Under noisy measurements, the lower bound for recovering the factor matrices and the core tensor is unknown. In this paper, we exploit the well-studied constrained Cramér-Rao bound to calculate a lower bound on the mean squared error of the unbiased estimates of the components of the multilinear singular value decomposition under additive white Gaussian noise, and we validate our approach through simulations.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1048-1055"},"PeriodicalIF":2.7,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11153050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145090152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-28DOI: 10.1109/OJSP.2025.3604381
Quade Butler;Youssef Ziada;S. Andrew Gadsden
Gaussian filters use quadrature rules or cubature rules to recursively solve Gaussian-weighted integrals. Classical and contemporary methods use stable rules with a minimal number of cubature points to achieve the highest accuracy. Gaussian quadrature is widely believed to be optimal due to its polynomial degree of exactness and higher degree cubature methods often require complex optimization to solve moment equations. In this paper, Gaussian-weighted integrals and Gaussian filtering are approached using a double exponential (DE) transformation and the trapezoidal rule. The DE rule is principled in high rates of convergence for certain integrands and the DE transform ensures that the trapezoidal rule maximizes its performance. A novel spherical-radial cubature rule is derived for Gaussian-weighted integrals where it is shown to be perfectly stable and highly efficient. A new Gaussian filter is then built on top of this cubature rule. The filter is shown to be stable with bounded estimation error. The effect of varying the number of cubature points on filter stability and convergence is also examined. The advantages of the DE method over comparable Gaussian filters and their cubature methods are outlined. These advantages are realized in two numerical examples: a challenging non-polynomial integral and a benchmark filtering problem. The results show that simple and fundamental cubature methods can lead to great improvements in performance when applied correctly.
{"title":"Gaussian Filtering Using a Spherical-Radial Double Exponential Cubature","authors":"Quade Butler;Youssef Ziada;S. Andrew Gadsden","doi":"10.1109/OJSP.2025.3604381","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3604381","url":null,"abstract":"Gaussian filters use quadrature rules or cubature rules to recursively solve Gaussian-weighted integrals. Classical and contemporary methods use stable rules with a minimal number of cubature points to achieve the highest accuracy. Gaussian quadrature is widely believed to be optimal due to its polynomial degree of exactness and higher degree cubature methods often require complex optimization to solve moment equations. In this paper, Gaussian-weighted integrals and Gaussian filtering are approached using a double exponential (DE) transformation and the <italic>trapezoidal rule</i>. The DE rule is principled in high rates of convergence for certain integrands and the DE transform ensures that the trapezoidal rule maximizes its performance. A novel spherical-radial cubature rule is derived for Gaussian-weighted integrals where it is shown to be perfectly stable and highly efficient. A new Gaussian filter is then built on top of this cubature rule. The filter is shown to be stable with bounded estimation error. The effect of varying the number of cubature points on filter stability and convergence is also examined. The advantages of the DE method over comparable Gaussian filters and their cubature methods are outlined. These advantages are realized in two numerical examples: a challenging non-polynomial integral and a benchmark filtering problem. The results show that simple and fundamental cubature methods can lead to great improvements in performance when applied correctly.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1056-1076"},"PeriodicalIF":2.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11144509","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-25DOI: 10.1109/OJSP.2025.3602742
Kazutoshi Akita;Norimichi Ukita
Super-resolution (SR) with arbitrary scale factor and cost-and-quality controllability at test time is essential for various applications. While several arbitrary-scale SR methods have been proposed, these methods require us to modify the model structure and retrain it to control the computational cost and SR quality. To address this limitation, we propose a novel SR method using a Recurrent Neural Network (RNN) with the Fourier representation. In our method, the RNN sequentially estimates Fourier components, each consisting of frequency and amplitude, and aggregates these components to reconstruct an SR image. Since the RNN can adjust the number of recurrences at test time, we can control the computational cost and SR quality in a single model: fewer recurrences (i.e., fewer Fourier components) lead to lower cost but lower quality, while more recurrences (i.e., more Fourier components) lead to better quality but more cost. Experimental results prove that more Fourier components improve the PSNR score. Furthermore, even with fewer Fourier components, our method achieves a lower PSNR drop than other state-of-the-art arbitrary-scale SR methods.
{"title":"Test-Time Cost-and-Quality Controllable Arbitrary-Scale Super-Resolution With Variable Fourier Components","authors":"Kazutoshi Akita;Norimichi Ukita","doi":"10.1109/OJSP.2025.3602742","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3602742","url":null,"abstract":"Super-resolution (SR) with arbitrary scale factor and cost-and-quality controllability at test time is essential for various applications. While several arbitrary-scale SR methods have been proposed, these methods require us to modify the model structure and retrain it to control the computational cost and SR quality. To address this limitation, we propose a novel SR method using a Recurrent Neural Network (RNN) with the Fourier representation. In our method, the RNN sequentially estimates Fourier components, each consisting of frequency and amplitude, and aggregates these components to reconstruct an SR image. Since the RNN can adjust the number of recurrences at test time, we can control the computational cost and SR quality in a single model: fewer recurrences (i.e., fewer Fourier components) lead to lower cost but lower quality, while more recurrences (i.e., more Fourier components) lead to better quality but more cost. Experimental results prove that more Fourier components improve the PSNR score. Furthermore, even with fewer Fourier components, our method achieves a lower PSNR drop than other state-of-the-art arbitrary-scale SR methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1017-1030"},"PeriodicalIF":2.7,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11141341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145036198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, various robust algorithms based on hyperbolic cosine and sine functions, such as hyperbolic cosine (HCAF), exponential hyperbolic cosine, joint logarithmic hyperbolic cosine adaptive filter, etc., have been predominantly employed for different aspects of adaptive filtering, including nonlinear-system-identification. Further, in this manuscript, an attempt is made to elevate the performance of nonlinear system identification in the wake of impulsive noise interference along with consideration of a sparse environment. Henceforth, in lieu of this, the present paper introduces a new sparsity-apprised logarithmic hyperbolic tan adaptive filter (SA-LHTAF) to handle impulsive noise while dealing with sparse systems. It utilizes a $l_{1}$ norm-related sparsity penalty factor in the robust cost function constructed with a logarithmic hyperbolic tangent function. Further, an improved SA-LHTAF (ISA-LHTAF) is introduced for varying sparsity or moderately sparse systems employing the log sum penalty factor in the proposed technique. The weight update for the proposed technique has been derived from the modified cost function. In addition, the conditions for the upper bound on the convergence factor have been derived. The efficacy of the developed robust techniques is demonstrated for identifying nonlinear systems along with feedback paths of behind-the-ear (BTE) hearing aid. In addition, the proposed techniques are evaluated for training an acoustic feedback canceller for hearing aids.
{"title":"Sparsity Apprised Logarithmic Hyperbolic Tan Adaptive Filters for Nonlinear System Identification and Acoustic Feedback Cancellation","authors":"Neetu Chikyal;Vasundhara;Chayan Bhar;Asutosh Kar;Mads Græsbøll Christensen","doi":"10.1109/OJSP.2025.3600904","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3600904","url":null,"abstract":"Recently, various robust algorithms based on hyperbolic cosine and sine functions, such as hyperbolic cosine (HCAF), exponential hyperbolic cosine, joint logarithmic hyperbolic cosine adaptive filter, etc., have been predominantly employed for different aspects of adaptive filtering, including nonlinear-system-identification. Further, in this manuscript, an attempt is made to elevate the performance of nonlinear system identification in the wake of impulsive noise interference along with consideration of a sparse environment. Henceforth, in lieu of this, the present paper introduces a new sparsity-apprised logarithmic hyperbolic tan adaptive filter (SA-LHTAF) to handle impulsive noise while dealing with sparse systems. It utilizes a <inline-formula><tex-math>$l_{1}$</tex-math></inline-formula> norm-related sparsity penalty factor in the robust cost function constructed with a logarithmic hyperbolic tangent function. Further, an improved SA-LHTAF (ISA-LHTAF) is introduced for varying sparsity or moderately sparse systems employing the log sum penalty factor in the proposed technique. The weight update for the proposed technique has been derived from the modified cost function. In addition, the conditions for the upper bound on the convergence factor have been derived. The efficacy of the developed robust techniques is demonstrated for identifying nonlinear systems along with feedback paths of behind-the-ear (BTE) hearing aid. In addition, the proposed techniques are evaluated for training an acoustic feedback canceller for hearing aids.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1031-1047"},"PeriodicalIF":2.7,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11130900","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145090171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-11DOI: 10.1109/OJSP.2025.3597873
Phan Thi Huyen Thanh;Trung Thai Tran;The Hiep Nguyen;Minh Huy Vu Nguyen;Tran Vu Pham;Truong Vinh Truong Duy;Duc Dung Nguyen
This paper introduces a general plug-in framework designed to enhance the robustness and cross-domain generalization of self-supervised depth estimation models. Current models often struggle with real-world deployment due to their limited ability to generalize across diverse domains, such as varying lighting and weather conditions. Single-domain models are optimized for specific scenarios while existing multi-domain approaches typically rely on paired images, which are rarely available in real-world datasets. Our framework addresses these limitations by training directly on unpaired real images from multiple domains. Daytime images serve as a reference to guide the model in learning consistent depth distributions across these diverse domains through adversarial training, eliminating the need for paired images. To refine regions prone to artifacts, we augment the discriminator with positional encoding, which is combined with the predicted depth maps. We also incorporate a dynamic normalization mechanism to capture shared depth features across domains, removing the requirement for separate domain-specific encoders. Furthermore, we introduce a new benchmark designed for a more comprehensive evaluation, encompassing previously unaddressed real-world scenarios. By focusing on unpaired real data, our framework significantly improves the generalization capabilities of existing models, enabling them to better adapt to the complexities and authentic data encountered in real-world environments.
{"title":"ULDepth: Transform Self-Supervised Depth Estimation to Unpaired Multi-Domain Learning","authors":"Phan Thi Huyen Thanh;Trung Thai Tran;The Hiep Nguyen;Minh Huy Vu Nguyen;Tran Vu Pham;Truong Vinh Truong Duy;Duc Dung Nguyen","doi":"10.1109/OJSP.2025.3597873","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3597873","url":null,"abstract":"This paper introduces a general plug-in framework designed to enhance the robustness and cross-domain generalization of self-supervised depth estimation models. Current models often struggle with real-world deployment due to their limited ability to generalize across diverse domains, such as varying lighting and weather conditions. Single-domain models are optimized for specific scenarios while existing multi-domain approaches typically rely on paired images, which are rarely available in real-world datasets. Our framework addresses these limitations by training directly on unpaired real images from multiple domains. Daytime images serve as a reference to guide the model in learning consistent depth distributions across these diverse domains through adversarial training, eliminating the need for paired images. To refine regions prone to artifacts, we augment the discriminator with positional encoding, which is combined with the predicted depth maps. We also incorporate a dynamic normalization mechanism to capture shared depth features across domains, removing the requirement for separate domain-specific encoders. Furthermore, we introduce a new benchmark designed for a more comprehensive evaluation, encompassing previously unaddressed real-world scenarios. By focusing on unpaired real data, our framework significantly improves the generalization capabilities of existing models, enabling them to better adapt to the complexities and authentic data encountered in real-world environments.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1004-1016"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11122640","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144928877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-11DOI: 10.1109/OJSP.2025.3597867
Evan Scope Crafts;Umberto Villa
In recent years, the ascendance of diffusion modeling as a state-of-the-art generative modeling approach has spurred significant interest in their use as priors in Bayesian inverse problems. However, it is unclear how to optimally integrate a diffusion model trained on the prior distribution with a given likelihood function to obtain posterior samples. While algorithms developed for this purpose can produce high-quality, diverse point estimates of the unknown parameters of interest, they are often tested on problems where the prior distribution is analytically unknown, making it difficult to assess their performance in providing rigorous uncertainty quantification. Motivated by this challenge, this work introduces three benchmark problems for evaluating the performance of diffusion model based samplers. The benchmark problems, which are inspired by problems in image inpainting, x-ray tomography, and phase retrieval, have a posterior density that is analytically known. In this setting, approximate ground-truth posterior samples can be obtained, enabling principled evaluation of the performance of posterior sampling algorithms. This work also introduces a general framework for diffusion model based posterior sampling, Bayesian Inverse Problem Solvers through Diffusion Annealing (BIPSDA). This framework unifies several recently proposed diffusion-model-based posterior sampling algorithms and contains novel algorithms that can be realized through flexible combinations of design choices. We tested the performance of a set of BIPSDA algorithms, including previously proposed state-of-the-art approaches, on the proposed benchmark problems. The results provide insight into the strengths and limitations of existing diffusion-model based posterior samplers, while the benchmark problems provide a testing ground for future algorithmic developments.
{"title":"Benchmarking Diffusion Annealing-Based Bayesian Inverse Problem Solvers","authors":"Evan Scope Crafts;Umberto Villa","doi":"10.1109/OJSP.2025.3597867","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3597867","url":null,"abstract":"In recent years, the ascendance of diffusion modeling as a state-of-the-art generative modeling approach has spurred significant interest in their use as priors in Bayesian inverse problems. However, it is unclear how to optimally integrate a diffusion model trained on the prior distribution with a given likelihood function to obtain posterior samples. While algorithms developed for this purpose can produce high-quality, diverse point estimates of the unknown parameters of interest, they are often tested on problems where the prior distribution is analytically unknown, making it difficult to assess their performance in providing rigorous uncertainty quantification. Motivated by this challenge, this work introduces three benchmark problems for evaluating the performance of diffusion model based samplers. The benchmark problems, which are inspired by problems in image inpainting, x-ray tomography, and phase retrieval, have a posterior density that is analytically known. In this setting, approximate ground-truth posterior samples can be obtained, enabling principled evaluation of the performance of posterior sampling algorithms. This work also introduces a general framework for diffusion model based posterior sampling, Bayesian Inverse Problem Solvers through Diffusion Annealing (BIPSDA). This framework unifies several recently proposed diffusion-model-based posterior sampling algorithms and contains novel algorithms that can be realized through flexible combinations of design choices. We tested the performance of a set of BIPSDA algorithms, including previously proposed state-of-the-art approaches, on the proposed benchmark problems. The results provide insight into the strengths and limitations of existing diffusion-model based posterior samplers, while the benchmark problems provide a testing ground for future algorithmic developments.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"975-991"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11122619","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144891005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}