Pub Date : 2026-01-16DOI: 10.1109/OJSP.2026.3654783
Yujie Zhang;Juan Vidal Alegria;Jose Flordelis;Erik L. Bengtsson;Ove Edfors
The physical placement of antennas is a design factor for Distributed Multiple-Input Multiple-Output (D-MIMO) systems, but finding the optimal layout is a computationally intensive, non-convex problem. Prior research often addresses this by directly optimizing the coordinates of each distributed panels using complex numerical techniques, such as convex relaxation or iterative algorithms. While viable, these methods can be computationally demanding and offer limited insight into the structural properties of optimal deployments. In contrast, this paper introduces a structured, parametric optimization framework. We constrain the panel deployment to a lattice, reducing the high-dimensional problem to optimizing a few parameters that define the lattice's overall scale and shape. Through numerical simulations, our method is shown to perform nearly indistinguishable from that achieved by a highly complex benchmark, while it outperforms standard approaches like Majorization-Minimizing-Lloyd's algorithm (MM-Lloyd). Furthermore, we identify that a simple, non-optimized, evenly spaced grid can achieve 96% of the benchmark's performance, offering a highly efficient and practical heuristic.
{"title":"Deployment Strategy for Indoor Distributed MIMO System","authors":"Yujie Zhang;Juan Vidal Alegria;Jose Flordelis;Erik L. Bengtsson;Ove Edfors","doi":"10.1109/OJSP.2026.3654783","DOIUrl":"https://doi.org/10.1109/OJSP.2026.3654783","url":null,"abstract":"The physical placement of antennas is a design factor for Distributed Multiple-Input Multiple-Output (D-MIMO) systems, but finding the optimal layout is a computationally intensive, non-convex problem. Prior research often addresses this by directly optimizing the coordinates of each distributed panels using complex numerical techniques, such as convex relaxation or iterative algorithms. While viable, these methods can be computationally demanding and offer limited insight into the structural properties of optimal deployments. In contrast, this paper introduces a structured, parametric optimization framework. We constrain the panel deployment to a lattice, reducing the high-dimensional problem to optimizing a few parameters that define the lattice's overall scale and shape. Through numerical simulations, our method is shown to perform nearly indistinguishable from that achieved by a highly complex benchmark, while it outperforms standard approaches like Majorization-Minimizing-Lloyd's algorithm (MM-Lloyd). Furthermore, we identify that a simple, non-optimized, evenly spaced grid can achieve 96% of the benchmark's performance, offering a highly efficient and practical heuristic.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"305-313"},"PeriodicalIF":2.7,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11355794","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1109/OJSP.2026.3654886
Karim T. Abou–Moustafa
Tyler's $M$-estimator (TME) is an accurate and efficient robust estimator for the scatter matrix when the data are samples from an elliptical distribution with heavy-tails and the number of samples $n$ is larger than the number of variables $p$. Unfortunately, TME is not defined when $p > n$, and various research works have proposed regularized versions of TME using the spirit of Ledoit & Wolf estimator whose performance depends on a carefully chosen shrinkage coefficient $alpha in (0,1)$. In this paper, we consider the problem of estimating an optimal shrinkage coefficient $alpha$ for Regularized TME (RTME). In particular, we propose to estimate an optimal shrinkage coefficient by setting $alpha$ as the solution to a suitably chosen objective function; namely the leave-one-out cross-validated (LOOCV) log-likelihood loss. LOOCV, however, is computationally prohibitive even for moderate values of $n$. To this end, we propose a computationally efficient approximation for the LOOCV log-likelihood loss that eliminates the need for invoking the RTME procedure $n$ times for each sample left out during the LOOCV procedure. This approximation yields an $O(n)$ reduction in the running time complexity for the LOOCV procedure, which results in a significant speedup for computing the LOOCV estimate. We demonstrate the efficacy of the proposed approach on synthetic high-dimensional data sampled from heavy-tailed elliptical distributions, as well as on real high-dimensional datasets for object and face recognition. Our experiments demonstrate that the proposed method is efficient and consistently more accurate than other methods in the literature for shrinkage coefficient estimation.
{"title":"Efficient Learning of Regularized Tyler's M-Estimator","authors":"Karim T. Abou–Moustafa","doi":"10.1109/OJSP.2026.3654886","DOIUrl":"https://doi.org/10.1109/OJSP.2026.3654886","url":null,"abstract":"Tyler's <inline-formula><tex-math>$M$</tex-math></inline-formula>-estimator (TME) is an accurate and efficient robust estimator for the scatter matrix when the data are samples from an <italic>elliptical distribution</i> with heavy-tails and the number of samples <inline-formula><tex-math>$n$</tex-math></inline-formula> is larger than the number of variables <inline-formula><tex-math>$p$</tex-math></inline-formula>. Unfortunately, TME is not defined when <inline-formula><tex-math>$p > n$</tex-math></inline-formula>, and various research works have proposed regularized versions of TME using the spirit of Ledoit & Wolf estimator whose performance depends on a carefully chosen <italic>shrinkage coefficient</i> <inline-formula><tex-math>$alpha in (0,1)$</tex-math></inline-formula>. In this paper, we consider the problem of estimating an optimal shrinkage coefficient <inline-formula><tex-math>$alpha$</tex-math></inline-formula> for Regularized TME (RTME). In particular, we propose to estimate an optimal shrinkage coefficient by setting <inline-formula><tex-math>$alpha$</tex-math></inline-formula> as the solution to a suitably chosen objective function; namely the leave-one-out cross-validated (LOOCV) log-likelihood loss. LOOCV, however, is computationally prohibitive even for moderate values of <inline-formula><tex-math>$n$</tex-math></inline-formula>. To this end, we propose a computationally efficient approximation for the LOOCV log-likelihood loss that eliminates the need for invoking the RTME procedure <inline-formula><tex-math>$n$</tex-math></inline-formula> times for each sample left out during the LOOCV procedure. This approximation yields an <inline-formula><tex-math>$O(n)$</tex-math></inline-formula> reduction in the running time complexity for the LOOCV procedure, which results in a significant speedup for computing the LOOCV estimate. We demonstrate the efficacy of the proposed approach on synthetic high-dimensional data sampled from heavy-tailed elliptical distributions, as well as on real high-dimensional datasets for object and face recognition. Our experiments demonstrate that the proposed method is efficient and consistently more accurate than other methods in the literature for shrinkage coefficient estimation.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"91-100"},"PeriodicalIF":2.7,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11355793","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1109/OJSP.2026.3654888
Martin Andersson;Anubhab Chowdhury;Erik G. Larsson
In dynamic time-division duplexing (TDD) systems, half-duplex access points (APs) scheduled either in uplink (UL) or downlink (DL), simultaneously serve users operating in UL and DL on the same time-frequency resources. This incurs cross-link interference from APs operating in DL to APs in UL, and similarly from users operating in UL to users in DL. In this paper, we develop a scalable method for UL-user-to-DL-user interference (UUI) mitigation in dynamic TDD networks. To this end, we note that the UUI observed at each DL user will be predominantly caused by the UL users in its close vicinity. Hence, we propose to form local clusters of UL users for each DL user, including only the UL users that are expected to cause significant UUI to the DL user. Then, we present a graph-coloring-based pilot reuse algorithm that ensures orthogonal pilots among the UL users within the local clusters of each DL user, while maximizing the pilot reuse (i.e., minimizing the pilot length) for scalability. Further, we introduce beamformed DL pilots to estimate effective DL channels (the physical DL channel multiplied by the DL precoding matrix), which together with estimated user-to-user channels enable the multi-antenna DL users to design combining vectors maximizing their signal-to-interference-and-noise ratios, by taking both desired signal amplification and UUI suppression into consideration. We derive an achievable DL spectral efficiency with our proposed pilot schemes and combining vectors. Numerical results show the effectiveness of our proposed UUI mitigation scheme, both in terms of pilot overhead reduction and UUI suppression with our designed combining vectors.
{"title":"User-to-User Clustering, Channel Estimation & Cross-Link Interference Mitigation for Dynamic TDD Systems","authors":"Martin Andersson;Anubhab Chowdhury;Erik G. Larsson","doi":"10.1109/OJSP.2026.3654888","DOIUrl":"https://doi.org/10.1109/OJSP.2026.3654888","url":null,"abstract":"In dynamic time-division duplexing (TDD) systems, half-duplex access points (APs) scheduled either in uplink (UL) or downlink (DL), simultaneously serve users operating in UL and DL on the same time-frequency resources. This incurs cross-link interference from APs operating in DL to APs in UL, and similarly from users operating in UL to users in DL. In this paper, we develop a scalable method for UL-user-to-DL-user interference (UUI) mitigation in dynamic TDD networks. To this end, we note that the UUI observed at each DL user will be predominantly caused by the UL users in its close vicinity. Hence, we propose to form local clusters of UL users for each DL user, including only the UL users that are expected to cause significant UUI to the DL user. Then, we present a graph-coloring-based pilot reuse algorithm that ensures orthogonal pilots among the UL users within the local clusters of each DL user, while maximizing the pilot reuse (i.e., minimizing the pilot length) for scalability. Further, we introduce beamformed DL pilots to estimate effective DL channels (the physical DL channel multiplied by the DL precoding matrix), which together with estimated user-to-user channels enable the multi-antenna DL users to design combining vectors maximizing their signal-to-interference-and-noise ratios, by taking both desired signal amplification and UUI suppression into consideration. We derive an achievable DL spectral efficiency with our proposed pilot schemes and combining vectors. Numerical results show the effectiveness of our proposed UUI mitigation scheme, both in terms of pilot overhead reduction and UUI suppression with our designed combining vectors.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"238-246"},"PeriodicalIF":2.7,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11355849","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/OJSP.2026.3653668
Angelo Coluccia;Emanuele Mele;Alessio Fascista
The design of detectors with constant false alarm rate (CFAR) property is a cornerstone challenge in radar signal processing. To this aim, a forward-thinking strategy is to construct decision statistics as functions of suitable maximal invariants associated with invariant tests that guarantee, through a suitable transformation group, the CFAR property by construction. However, the distribution of such maximal invariants is often difficult to handle, especially for complicated non-Gaussian models, making the derivation of the generalized likelihood ratio test (GLRT) challenging. In this work, we introduce a novel learning-based CFAR detection framework, in which a trained probabilistic encoder maps maximal invariant statistics (or functions thereof) to a convenient low-dimensional latent space where a latent GLRT-based detector (L-GLRT) is easy to derive. A cross-entropy loss with Kullback-Leibler (KL) divergence regularization is adopted to encourage the latent distributions under both $H_{0}$ (target free) and $H_{1}$ (target present) hypotheses to be as close as possible, in an information-theoretic sense, to Gaussian densities. Mismatched data incorporated under either hypotheses are introduced to promote robustness or selectivity. The approach unifies Gaussian and non-Gaussian settings, spanning from point-like to extended targets under the complex multivariate elliptically contoured matrix (CMECM) family, and is benchmarked against state-of-the-art classical and data-driven detectors. Moreover, since the latent space is low dimensional, insightful visualization of the behavior of the designed L-GLRT detectors can be obtained. Numerical results show that the proposed method achieves superior robustness/selectivity trade-offs while preserving CFAR guarantees by design and containing $P_{d}$ losses under matched conditions.
{"title":"GLRT-Based CFAR Detection in the Latent Space for Extended Targets in Gaussian and Non-Gaussian Disturbance","authors":"Angelo Coluccia;Emanuele Mele;Alessio Fascista","doi":"10.1109/OJSP.2026.3653668","DOIUrl":"https://doi.org/10.1109/OJSP.2026.3653668","url":null,"abstract":"The design of detectors with constant false alarm rate (CFAR) property is a cornerstone challenge in radar signal processing. To this aim, a forward-thinking strategy is to construct decision statistics as functions of suitable maximal invariants associated with invariant tests that guarantee, through a suitable transformation group, the CFAR property by construction. However, the distribution of such maximal invariants is often difficult to handle, especially for complicated non-Gaussian models, making the derivation of the generalized likelihood ratio test (GLRT) challenging. In this work, we introduce a novel learning-based CFAR detection framework, in which a trained probabilistic encoder maps maximal invariant statistics (or functions thereof) to a convenient low-dimensional latent space where a latent GLRT-based detector (L-GLRT) is easy to derive. A cross-entropy loss with Kullback-Leibler (KL) divergence regularization is adopted to encourage the latent distributions under both <inline-formula><tex-math>$H_{0}$</tex-math></inline-formula> (target free) and <inline-formula><tex-math>$H_{1}$</tex-math></inline-formula> (target present) hypotheses to be as close as possible, in an information-theoretic sense, to Gaussian densities. Mismatched data incorporated under either hypotheses are introduced to promote robustness or selectivity. The approach unifies Gaussian and non-Gaussian settings, spanning from point-like to extended targets under the complex multivariate elliptically contoured matrix (CMECM) family, and is benchmarked against state-of-the-art classical and data-driven detectors. Moreover, since the latent space is low dimensional, insightful visualization of the behavior of the designed L-GLRT detectors can be obtained. Numerical results show that the proposed method achieves superior robustness/selectivity trade-offs while preserving CFAR guarantees by design and containing <inline-formula><tex-math>$P_{d}$</tex-math></inline-formula> losses under matched conditions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"285-295"},"PeriodicalIF":2.7,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11347597","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective evaluation of synthesized speech is critical for advancing speech generation systems, yet existing metrics for intelligibility and prosody remain limited in scope and weakly correlated with human perception. Word Error Rate (WER) provides only a coarse text-based measure of intelligibility, while F0-RMSE and related pitch-based metrics offer a narrow, reference-dependent view of prosody. To address these limitations, we propose TTScore, a targeted and reference-free evaluation framework based on conditional prediction of discrete speech tokens. TTScore employs two sequence-to-sequence predictors conditioned on input text: TTScore-int, which measures intelligibility through content tokens, and TTScore-pro, which evaluates prosody from the perspective of pitch, through prosody tokens. For each synthesized utterance, the predictors compute the likelihood of the corresponding token sequences, yielding interpretable scores that capture alignment with intended linguistic content and prosodic structure. Experiments on the SOMOS, VoiceMOS, and TTSArena benchmarks demonstrate that TTScore-int and TTScore-pro provide reliable, aspect-specific evaluation and achieve stronger correlations with human judgments of overall quality than existing intelligibility and prosody-focused metrics.
{"title":"Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens","authors":"Ismail Rasim Ulgen;Zongyang Du;Junchen Lu;Philipp Koehn;Berrak Sisman","doi":"10.1109/OJSP.2026.3653666","DOIUrl":"https://doi.org/10.1109/OJSP.2026.3653666","url":null,"abstract":"Objective evaluation of synthesized speech is critical for advancing speech generation systems, yet existing metrics for intelligibility and prosody remain limited in scope and weakly correlated with human perception. Word Error Rate (WER) provides only a coarse text-based measure of intelligibility, while F0-RMSE and related pitch-based metrics offer a narrow, reference-dependent view of prosody. To address these limitations, we propose <italic>TTScore</i>, a targeted and reference-free evaluation framework based on conditional prediction of discrete speech tokens. TTScore employs two sequence-to-sequence predictors conditioned on input text: <italic>TTScore-int</i>, which measures intelligibility through content tokens, and <italic>TTScore-pro</i>, which evaluates prosody from the perspective of pitch, through prosody tokens. For each synthesized utterance, the predictors compute the likelihood of the corresponding token sequences, yielding interpretable scores that capture alignment with intended linguistic content and prosodic structure. Experiments on the SOMOS, VoiceMOS, and TTSArena benchmarks demonstrate that TTScore-int and TTScore-pro provide reliable, aspect-specific evaluation and achieve stronger correlations with human judgments of overall quality than existing intelligibility and prosody-focused metrics.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"247-256"},"PeriodicalIF":2.7,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11347554","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parallel to the development of advanced deepfake audio generation, audio deepfake detection has also seen significant progress. However, a standardized and comprehensive benchmark is still missing. To address this, we introduce Speech DeepFake (DF) Arena, the first comprehensive benchmark for audio deepfake detection. Speech DF Arena provides a toolkit to uniformly evaluate detection systems, currently across 14 diverse datasets and attack scenarios, standardized evaluation metrics and protocols for reproducibility and transparency. It also includes a leaderboard to compare and rank the systems to help researchers and developers enhance their reliability and robustness. We include 14 evaluation sets, 14 state-of-the-art open-source and 4 proprietary detection systems, totalling 18 systems in the leaderboard. Our study presents many systems exhibiting high EER in out-of-domain scenarios, highlighting the need for extensive cross-domain evaluation. The leaderboard is hosted on HuggingFace1 and a toolkit for reproducing results across the listed datasets is available on GitHub2.
{"title":"Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models","authors":"Sandipana Dowerah;Atharva Kulkarni;Ajinkya Kulkarni;Hoan My Tran;Joonas Kalda;Artem Fedorchenko;Benoit Fauve;Damien Lolive;Tanel Alumäe;Mathew Magimai.-Doss","doi":"10.1109/OJSP.2026.3652496","DOIUrl":"https://doi.org/10.1109/OJSP.2026.3652496","url":null,"abstract":"Parallel to the development of advanced deepfake audio generation, audio deepfake detection has also seen significant progress. However, a standardized and comprehensive benchmark is still missing. To address this, we introduce Speech DeepFake (DF) Arena, the first comprehensive benchmark for audio deepfake detection. Speech DF Arena provides a toolkit to uniformly evaluate detection systems, currently across 14 diverse datasets and attack scenarios, standardized evaluation metrics and protocols for reproducibility and transparency. It also includes a leaderboard to compare and rank the systems to help researchers and developers enhance their reliability and robustness. We include 14 evaluation sets, 14 state-of-the-art open-source and 4 proprietary detection systems, totalling 18 systems in the leaderboard. Our study presents many systems exhibiting high EER in out-of-domain scenarios, highlighting the need for extensive cross-domain evaluation. The leaderboard is hosted on HuggingFace<sup>1</sup> and a toolkit for reproducing results across the listed datasets is available on GitHub<sup>2</sup>.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"73-81"},"PeriodicalIF":2.7,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11345101","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.1109/OJSP.2025.3650619
Miguel Ferrer;María de Diego;Alberto Gonzalez
Active Noise Control (ANC) systems are typically based on adaptive filters. However, electroacoustic transducers and their associated electronic components often exhibit nonlinear behaviors that linear controllers cannot accurately model, resulting in suboptimal performance. While neural networks and other advanced models have been proposed to address these limitations, their high computational demands and inherent latency frequently restrict real-time deployment. This work investigates the use of lightweight, computationally efficient machine learning (ML) models that operate on a sample-by-sample basis using simple digital operators. The proposed models are applied to ANC under nonlinear conditions, including distortions in the primary path, the secondary path, and the reference signal. The approach enhances noise attenuation while preserving low computational complexity, thereby enabling real-time implementation on embedded systems. Simulation results confirm the effectiveness of the method across a variety of nonlinear scenarios, demonstrating superior noise reduction and control accuracy compared to conventional linear ANC schemes, and achieving this at a significantly lower cost than alternative nonlinear approaches.
{"title":"Low-Complexity Machine Learning Models for Active Noise Control in Nonlinear Systems","authors":"Miguel Ferrer;María de Diego;Alberto Gonzalez","doi":"10.1109/OJSP.2025.3650619","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3650619","url":null,"abstract":"Active Noise Control (ANC) systems are typically based on adaptive filters. However, electroacoustic transducers and their associated electronic components often exhibit nonlinear behaviors that linear controllers cannot accurately model, resulting in suboptimal performance. While neural networks and other advanced models have been proposed to address these limitations, their high computational demands and inherent latency frequently restrict real-time deployment. This work investigates the use of lightweight, computationally efficient machine learning (ML) models that operate on a sample-by-sample basis using simple digital operators. The proposed models are applied to ANC under nonlinear conditions, including distortions in the primary path, the secondary path, and the reference signal. The approach enhances noise attenuation while preserving low computational complexity, thereby enabling real-time implementation on embedded systems. Simulation results confirm the effectiveness of the method across a variety of nonlinear scenarios, demonstrating superior noise reduction and control accuracy compared to conventional linear ANC schemes, and achieving this at a significantly lower cost than alternative nonlinear approaches.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"165-172"},"PeriodicalIF":2.7,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11328793","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01DOI: 10.1109/OJSP.2025.3650439
Oksana Moryakova;Håkan Johansson
This paper introduces a closed-form least-squares (LS) design approach for fast-convolution (FC) based variable-bandwidth (VBW) finite-impulse-response (FIR) filters. The proposed LS design utilizes frequency sampling and the VBW filter frequency-domain implementation using the overlap-save (OLS) method, that together offer significant savings in implementation and online bandwidth reconfiguration complexities. Since combining frequency-domain design and OLS implementation leads to a linear periodic time-varying (LPTV) behavior of the VBW filter, a set of the corresponding time-invariant impulse responses is considered in the proposed design. Through numerical examples, it is demonstrated that the proposed approach enables not only closed-form design of FC-based VBW filters with substantial complexity reductions compared to existing solutions for a given performance, but also allows the variable bandwidth range to be extended without any increase in complexity. Moreover, a way of reducing the maximum approximation error energy over the whole set of the time-invariant filters of the LPTV system is shown by introducing appropriate weighting functions in the design.
{"title":"Closed-Form Least-Squares Design of Fast-Convolution Based Variable-Bandwidth FIR Filters","authors":"Oksana Moryakova;Håkan Johansson","doi":"10.1109/OJSP.2025.3650439","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3650439","url":null,"abstract":"This paper introduces a closed-form least-squares (LS) design approach for fast-convolution (FC) based variable-bandwidth (VBW) finite-impulse-response (FIR) filters. The proposed LS design utilizes frequency sampling and the VBW filter frequency-domain implementation using the overlap-save (OLS) method, that together offer significant savings in implementation and online bandwidth reconfiguration complexities. Since combining frequency-domain design and OLS implementation leads to a linear periodic time-varying (LPTV) behavior of the VBW filter, a set of the corresponding time-invariant impulse responses is considered in the proposed design. Through numerical examples, it is demonstrated that the proposed approach enables not only closed-form design of FC-based VBW filters with substantial complexity reductions compared to existing solutions for a given performance, but also allows the variable bandwidth range to be extended without any increase in complexity. Moreover, a way of reducing the maximum approximation error energy over the whole set of the time-invariant filters of the LPTV system is shown by introducing appropriate weighting functions in the design.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"54-63"},"PeriodicalIF":2.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11321300","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01DOI: 10.1109/OJSP.2025.3650437
Suhyun Ahn;Donggyu Lee;Jinah Park
Vision Foundation Models (VFMs), such as DINOv2 and SAM, have demonstrated unprecedented generalizability in natural imaging and show strong promise in medical imaging due to their semantically rich representations. However, their effective application to 3D volumetric segmentation remains largely underexplored, especially concerning optimal adaptation strategies for transferring 2D pre-trained knowledge to the structurally disparate 3D domain. To address this, we present a comprehensive investigation into the transferability and task-specific adaptability of six diverse 2D VFMs (including Self-Supervised, Vision-Language, and Segmentation Generalists) for 3D medical image segmentation. We systematically evaluate four distinct transfer learning paradigms, including advanced Fine-Tuning methods, across four heterogeneous 3D medical datasets. Our results establish VFMs as a powerful and cost-effective generalist baseline, consistently outperforming non-pretrained and standard 3D ViT architectures despite the substantial domain shift. Crucially, our systematic exploration reveals that parameter-efficient fine-tuning achieves the highest segmentation accuracy across all datasets. Feature-level analyses using PCA and CKA provide key insights, confirming that optimal performance stems from successfully balancing the preservation of generalizable low-level visual features with the adaptation of high-level, task-specific semantics.
视觉基础模型(visual Foundation Models, VFMs),如DINOv2和SAM,在自然成像中表现出了前所未有的通用性,并且由于其语义丰富的表示,在医学成像中显示出强大的前景。然而,它们在三维体分割中的有效应用仍未得到充分的探索,特别是关于将二维预训练知识转移到结构不同的三维领域的最佳自适应策略。为了解决这一问题,我们对六种不同的2D VFMs(包括自我监督,视觉语言和分割通才)在3D医学图像分割中的可转移性和任务特定适应性进行了全面的研究。我们系统地评估了四种不同的迁移学习范式,包括先进的微调方法,跨越四个异构的3D医疗数据集。我们的研究结果表明,尽管领域发生了重大变化,但vfm作为一种强大且具有成本效益的通用基准,始终优于非预训练和标准的3D ViT架构。至关重要的是,我们的系统探索表明,参数有效的微调在所有数据集中实现了最高的分割精度。使用PCA和CKA的特征级分析提供了关键的见解,证实了最佳性能源于成功地平衡可概括的低级视觉特征的保存与高级任务特定语义的适应。
{"title":"Adaptability of Vision Foundation Models for 3D Medical Image Segmentation","authors":"Suhyun Ahn;Donggyu Lee;Jinah Park","doi":"10.1109/OJSP.2025.3650437","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3650437","url":null,"abstract":"Vision Foundation Models (VFMs), such as DINOv2 and SAM, have demonstrated unprecedented generalizability in natural imaging and show strong promise in medical imaging due to their semantically rich representations. However, their effective application to 3D volumetric segmentation remains largely underexplored, especially concerning optimal adaptation strategies for transferring 2D pre-trained knowledge to the structurally disparate 3D domain. To address this, we present a comprehensive investigation into the transferability and task-specific adaptability of six diverse 2D VFMs (including Self-Supervised, Vision-Language, and Segmentation Generalists) for 3D medical image segmentation. We systematically evaluate four distinct transfer learning paradigms, including advanced Fine-Tuning methods, across four heterogeneous 3D medical datasets. Our results establish VFMs as a powerful and cost-effective generalist baseline, consistently outperforming non-pretrained and standard 3D ViT architectures despite the substantial domain shift. Crucially, our systematic exploration reveals that parameter-efficient fine-tuning achieves the highest segmentation accuracy across all datasets. Feature-level analyses using PCA and CKA provide key insights, confirming that optimal performance stems from successfully balancing the preservation of generalizable low-level visual features with the adaptation of high-level, task-specific semantics.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"82-90"},"PeriodicalIF":2.7,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11321196","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.1109/OJSP.2025.3648710
Lucas Goncalves;Huang-Cheng Chou;Ali N. Salman;Chi-Chun Lee;Carlos Busso
Audio-visual emotion recognition (AVER) often performs well under ideal conditions but faces significant challenges in scenarios with missing modalities (e.g., missing frames of audio and/or video). Addressing these challenges is crucial for the effective deployment of AVER systems in human-computer interaction (HCI) applications, where robustness can significantly impact user experience. This study introduces a novel approach that enhances AVER robustness by leveraging a decoder-like summarizer structure. This structure processes audio and visual content and generates contextual summaries that effectively capture emotional cues even when modalities are degraded. To enhance system resilience against missing modalities, we integrate modality dropout during training, enabling the summarizer to adaptively handle these scenarios. We define the context summary length as the number of learnable query tokens used in the summarizer, a fixed hyperparameter in our model. We analyze how varying context summary lengths affect performance, identifying an optimal balance between compression and expressiveness. In addition to improving robustness, we systematically evaluate model calibration across emotions in current state-of-the-art (SOTA) AVER methods. Our experiments on the MSP-IMPROV and CREMA-D databases demonstrate that our model achieves superior performance across macro-, micro-, and weighted-F1 scores, both under ideal conditions and in scenarios with modality losses. Additionally, we conduct ablation studies to assess the impact of different context lengths on our summarizer structure in terms of overall AVER performance.
{"title":"Contextual Attention for Robust Audio-Visual Emotion Recognition","authors":"Lucas Goncalves;Huang-Cheng Chou;Ali N. Salman;Chi-Chun Lee;Carlos Busso","doi":"10.1109/OJSP.2025.3648710","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3648710","url":null,"abstract":"<italic>Audio-visual emotion recognition</i> (AVER) often performs well under ideal conditions but faces significant challenges in scenarios with missing modalities (e.g., missing frames of audio and/or video). Addressing these challenges is crucial for the effective deployment of AVER systems in <italic>human-computer interaction</i> (HCI) applications, where robustness can significantly impact user experience. This study introduces a novel approach that enhances AVER robustness by leveraging a decoder-like summarizer structure. This structure processes audio and visual content and generates contextual summaries that effectively capture emotional cues even when modalities are degraded. To enhance system resilience against missing modalities, we integrate modality dropout during training, enabling the summarizer to adaptively handle these scenarios. We define the context summary length as the number of learnable query tokens used in the summarizer, a fixed hyperparameter in our model. We analyze how varying context summary lengths affect performance, identifying an optimal balance between compression and expressiveness. In addition to improving robustness, we systematically evaluate model calibration across emotions in current <italic>state-of-the-art</i> (SOTA) AVER methods. Our experiments on the MSP-IMPROV and CREMA-D databases demonstrate that our model achieves superior performance across macro-, micro-, and weighted-F1 scores, both under ideal conditions and in scenarios with modality losses. Additionally, we conduct ablation studies to assess the impact of different context lengths on our summarizer structure in terms of overall AVER performance.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"42-53"},"PeriodicalIF":2.7,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11316221","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}