Pub Date : 2025-12-19DOI: 10.1109/OJSP.2025.3646127
ALI EMRE BALCı;Raj Thilak Rajan
Accurate tracking of targets is vital for safe and reliable operations, particularly in complex and dynamic environments such as urban areas. Traditional tracking methods, including Kalman and particle filters, often perform poorly in real world scenarios, due to inaccurate models and sparse or noisy measurements. Gaussian process (GP) based methods offer a flexible and data driven alternative with uncertainty quantification that does not depend on predefined dynamical equations. However, state of the art GP tracking approaches require expensive hyperparameter optimization, which limits their practicality for real time applications. In this work, we introduce a novel GP mixture based computationally efficient tracking method, which is capable of modeling complex system behavior and adapt to changing dynamics. Our proposed solution, named Multiple Model Recursive Gaussian Process (MM-RGP), adapts continuously to changing dynamics, is capable of modeling complex behavior, and is robust against sparse observation. In addition, the proposed method avoids hyperparameter optimization and adapts to incoming data. We demonstrate the effectiveness of our solution using the example of uncrewed aerial vehicle (UAV) tracking, with both simulated and real datasets, and propose directions for extending our work.
{"title":"Multiple Model Recursive Gaussian Process for Robust Target Tracking","authors":"ALI EMRE BALCı;Raj Thilak Rajan","doi":"10.1109/OJSP.2025.3646127","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3646127","url":null,"abstract":"Accurate tracking of targets is vital for safe and reliable operations, particularly in complex and dynamic environments such as urban areas. Traditional tracking methods, including Kalman and particle filters, often perform poorly in real world scenarios, due to inaccurate models and sparse or noisy measurements. Gaussian process (GP) based methods offer a flexible and data driven alternative with uncertainty quantification that does not depend on predefined dynamical equations. However, state of the art GP tracking approaches require expensive hyperparameter optimization, which limits their practicality for real time applications. In this work, we introduce a novel GP mixture based computationally efficient tracking method, which is capable of modeling complex system behavior and adapt to changing dynamics. Our proposed solution, named Multiple Model Recursive Gaussian Process (MM-RGP), adapts continuously to changing dynamics, is capable of modeling complex behavior, and is robust against sparse observation. In addition, the proposed method avoids hyperparameter optimization and adapts to incoming data. We demonstrate the effectiveness of our solution using the example of uncrewed aerial vehicle (UAV) tracking, with both simulated and real datasets, and propose directions for extending our work.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"23-31"},"PeriodicalIF":2.7,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11304544","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1109/OJSP.2025.3635745
{"title":"List of Reviewers","authors":"","doi":"10.1109/OJSP.2025.3635745","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3635745","url":null,"abstract":"","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1203-1206"},"PeriodicalIF":2.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11300295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1109/OJSP.2025.3644786
Peng Luo;Boyu Pang;Defeng Wu;W. Zeng
This paper presents a novel direction of arrival (DOA) estimation method via rotation spatial differencing technique that offers high resolution, robustness, and stable performance. To suppress external environmental noise and improve estimation accuracy, a new modified covariance matrix is constructed using a rotation matrix technique. Additionally, a spatial differencing matrix is built with neighboring subarrays to achieve signal decoherence. Due to the introduction of the new modified covariance matrix, the differencing matrix has a new feature without compromising the array flow pattern, leading to improved spatial differencing. Finally, a multiple signal classification (MUSIC) spectral search algorithm using the singular value decomposition (SVD) is applied to accurately localize both uncorrelated and coherent signals at once, which greatly facilitates the DOA estimation process. Experimental results demonstrate that the proposed method delivers superior DOA estimation performance, providing accurate and stable signal direction estimation.
{"title":"Direction of Arrival Estimation for the Coexistence of Uncorrelated and Coherent Signals via Rotation Spatial Differencing Method","authors":"Peng Luo;Boyu Pang;Defeng Wu;W. Zeng","doi":"10.1109/OJSP.2025.3644786","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3644786","url":null,"abstract":"This paper presents a novel direction of arrival (DOA) estimation method via rotation spatial differencing technique that offers high resolution, robustness, and stable performance. To suppress external environmental noise and improve estimation accuracy, a new modified covariance matrix is constructed using a rotation matrix technique. Additionally, a spatial differencing matrix is built with neighboring subarrays to achieve signal decoherence. Due to the introduction of the new modified covariance matrix, the differencing matrix has a new feature without compromising the array flow pattern, leading to improved spatial differencing. Finally, a multiple signal classification (MUSIC) spectral search algorithm using the singular value decomposition (SVD) is applied to accurately localize both uncorrelated and coherent signals at once, which greatly facilitates the DOA estimation process. Experimental results demonstrate that the proposed method delivers superior DOA estimation performance, providing accurate and stable signal direction estimation.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"11-22"},"PeriodicalIF":2.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11300944","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1109/OJSP.2025.3640517
Yoshiki Masuyama;Gordon Wichern;François G. Germain;Christopher Ick;Jonathan Le Roux
This paper gives an in-depth description of our submission to Task 2 of the Listener Acoustic Personalization (LAP) challenge 2024, which aims to reconstruct head-related transfer functions (HRTFs) with dense spatial grids from sparse measurements. Neural fields (NFs) with parameter-efficient fine-tuning (PEFT) have led to dramatic performance improvements in HRTF spatial upsampling and personalization. Despite these advances, spatial upsampling performance remains limited in scenarios with very sparse measurements. Our proposed system, named retrieval-augmented NF (RANF), incorporates HRTFs retrieved from a dataset as auxiliary inputs. We leverage multiple retrievals via transform-average-concatenate and adopt a PEFT technique tailored for retrieval augmentation. Furthermore, we capitalize on the results of a signal-processing-based spatial upsampling method as optional inputs. By incorporating these auxiliary inputs, our system demonstrated state-of-the-art performance on the SONICOM dataset and placed first in Task 2 of the LAP challenge 2024.
{"title":"RANF: Neural Field-Based HRTF Spatial Upsampling With Retrieval Augmentation and Parameter Efficient Fine-Tuning","authors":"Yoshiki Masuyama;Gordon Wichern;François G. Germain;Christopher Ick;Jonathan Le Roux","doi":"10.1109/OJSP.2025.3640517","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3640517","url":null,"abstract":"This paper gives an in-depth description of our submission to Task 2 of the Listener Acoustic Personalization (LAP) challenge 2024, which aims to reconstruct head-related transfer functions (HRTFs) with dense spatial grids from sparse measurements. Neural fields (NFs) with parameter-efficient fine-tuning (PEFT) have led to dramatic performance improvements in HRTF spatial upsampling and personalization. Despite these advances, spatial upsampling performance remains limited in scenarios with very sparse measurements. Our proposed system, named retrieval-augmented NF (RANF), incorporates HRTFs retrieved from a dataset as auxiliary inputs. We leverage multiple retrievals via transform-average-concatenate and adopt a PEFT technique tailored for retrieval augmentation. Furthermore, we capitalize on the results of a signal-processing-based spatial upsampling method as optional inputs. By incorporating these auxiliary inputs, our system demonstrated state-of-the-art performance on the SONICOM dataset and placed first in Task 2 of the LAP challenge 2024.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"32-41"},"PeriodicalIF":2.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11277386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1109/OJSP.2025.3639934
Arnout Roebben;Toon van Waterschoot;Jan Wouters;Marc Moonen
In audio signal processing applications with a microphone and a loudspeaker within the same acoustic environment, the loudspeaker signals can feed back into the microphone, thereby creating a closed-loop system that potentially leads to system instability. To remove this acoustic coupling, prediction error method (PEM) feedback cancellation algorithms aim to identify the feedback path between the loudspeaker and the microphone by assuming that the input signal can be modelled by means of an autoregressive (AR) model. It has previously been shown that this PEM framework and resulting algorithms can identify the feedback path correctly in cases where the forward path from microphone to loudspeaker is sufficiently time-varying or non-linear, or when the forward path delay equals or exceeds the order of the AR model. In this paper, it is shown that this delay-based condition can be generalised for one particular PEM-based algorithm, the so-called two-channel adaptive feedback canceller (2ch-AFC), to an invertibility-based condition, for which it is shown that identifiability can be achieved when the order of the forward path feedforward filter exceeds the order of the AR model. Additionally, the condition number of inversion of the correlation matrix as used in the 2ch-AFC algorithm can serve as a measure for monitoring the identifiability.
{"title":"Identifiability Conditions for Acoustic Feedback Cancellation With the Two-Channel Adaptive Feedback Canceller Algorithm","authors":"Arnout Roebben;Toon van Waterschoot;Jan Wouters;Marc Moonen","doi":"10.1109/OJSP.2025.3639934","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3639934","url":null,"abstract":"In audio signal processing applications with a microphone and a loudspeaker within the same acoustic environment, the loudspeaker signals can feed back into the microphone, thereby creating a closed-loop system that potentially leads to system instability. To remove this acoustic coupling, prediction error method (PEM) feedback cancellation algorithms aim to identify the feedback path between the loudspeaker and the microphone by assuming that the input signal can be modelled by means of an autoregressive (AR) model. It has previously been shown that this PEM framework and resulting algorithms can identify the feedback path correctly in cases where the forward path from microphone to loudspeaker is sufficiently time-varying or non-linear, or when the forward path delay equals or exceeds the order of the AR model. In this paper, it is shown that this delay-based condition can be generalised for one particular PEM-based algorithm, the so-called two-channel adaptive feedback canceller (2ch-AFC), to an invertibility-based condition, for which it is shown that identifiability can be achieved when the order of the forward path feedforward filter exceeds the order of the AR model. Additionally, the condition number of inversion of the correlation matrix as used in the 2ch-AFC algorithm can serve as a measure for monitoring the identifiability.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"1-10"},"PeriodicalIF":2.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11275691","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145802367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/OJSP.2025.3633577
Stefan Thaleiser;Gerald Enzner;Rainer Martin;Aleksej Chinaev
Binaural processing is becoming an important feature of high-end commercial headsets and hearing aids. Speech enhancement with binaural output requires adequate treatment of spatial cues in addition to desirable noise reduction and simultaneous speech preservation. Binaural speech enhancement was traditionally approached with model-based statistical signal processing, where the principle of common-gain filtering with identical treatment of left- and right-ear signals has been designed to achieve enhancement constrained by strict binaural cue preservation. However, model-based approaches may also be instructive for the design of modern deep learning architectures. In this article, the common-gain paradigm is therefore embedded into an artificial neural network approach. In order to maintain the desired common-gain property end-to-end, we derive the requirements for compressed feature formation and data normalization. Binaural experiments with moderate-sized artificial neural networks demonstrate the superiority of the proposed common-gain autoencoder network over model-based processing and related unconstrained network architectures for anechoic and reverberant noisy speech in terms of segmental SNR, binaural perception-based metrics MBSTOI, better-ear HASQI, and a listening experiment.
{"title":"Common-Gain Autoencoder Network for Binaural Speech Enhancement","authors":"Stefan Thaleiser;Gerald Enzner;Rainer Martin;Aleksej Chinaev","doi":"10.1109/OJSP.2025.3633577","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3633577","url":null,"abstract":"Binaural processing is becoming an important feature of high-end commercial headsets and hearing aids. Speech enhancement with binaural output requires adequate treatment of spatial cues in addition to desirable noise reduction and simultaneous speech preservation. Binaural speech enhancement was traditionally approached with model-based statistical signal processing, where the principle of common-gain filtering with identical treatment of left- and right-ear signals has been designed to achieve enhancement constrained by strict binaural cue preservation. However, model-based approaches may also be instructive for the design of modern deep learning architectures. In this article, the common-gain paradigm is therefore embedded into an artificial neural network approach. In order to maintain the desired common-gain property end-to-end, we derive the requirements for compressed feature formation and data normalization. Binaural experiments with moderate-sized artificial neural networks demonstrate the superiority of the proposed common-gain autoencoder network over model-based processing and related unconstrained network architectures for anechoic and reverberant noisy speech in terms of segmental SNR, binaural perception-based metrics MBSTOI, better-ear HASQI, and a listening experiment.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1193-1202"},"PeriodicalIF":2.7,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250640","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/OJSP.2025.3633567
Chang-Bin Jeon;Gordon Wichern;François G. Germain;Jonathan Le Roux
In music source separation, a standard data augmentation technique involves creating new training examples by randomly combining instrument stems from different songs. However, these randomly mixed samples lack the natural coherence of real music, as their stems do not share a consistent beat or tonality, often resulting in a cacophony. Despite this apparent distribution shift, random mixing has been widely adopted due to its effectiveness. In this work, we investigate why random mixing improves performance when training a state-of-the-art music source separation model and analyze the factors that cause performance gains to plateau despite the theoretically limitless number of possible combinations. We further explore the impact of beat and tonality mismatches on separation performance. Beyond analyzing random mixing, we introduce ways to further enhance its effectiveness. First, we explore a multi-segment sampling strategy that increases the diversity of training examples by selecting multiple segments for the target source. Second, we incorporate a digital parametric equalizer, a fundamental tool in music production, to maximize the timbral diversity of random mixes. Our experiments demonstrate that a model trained with only 100 songs from the MUSDB18-HQ dataset, combined with our proposed methods, achieves competitive performance to a BS-RNN model trained with 1,750 additional songs.
{"title":"Embracing Cacophony: Explaining and Improving Random Mixing in Music Source Separation","authors":"Chang-Bin Jeon;Gordon Wichern;François G. Germain;Jonathan Le Roux","doi":"10.1109/OJSP.2025.3633567","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3633567","url":null,"abstract":"In music source separation, a standard data augmentation technique involves creating new training examples by randomly combining instrument stems from different songs. However, these randomly mixed samples lack the natural coherence of real music, as their stems do not share a consistent beat or tonality, often resulting in a cacophony. Despite this apparent distribution shift, random mixing has been widely adopted due to its effectiveness. In this work, we investigate why random mixing improves performance when training a state-of-the-art music source separation model and analyze the factors that cause performance gains to plateau despite the theoretically limitless number of possible combinations. We further explore the impact of beat and tonality mismatches on separation performance. Beyond analyzing random mixing, we introduce ways to further enhance its effectiveness. First, we explore a multi-segment sampling strategy that increases the diversity of training examples by selecting multiple segments for the target source. Second, we incorporate a digital parametric equalizer, a fundamental tool in music production, to maximize the timbral diversity of random mixes. Our experiments demonstrate that a model trained with only 100 songs from the MUSDB18-HQ dataset, combined with our proposed methods, achieves competitive performance to a BS-RNN model trained with 1,750 additional songs.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1179-1192"},"PeriodicalIF":2.7,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250641","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1109/OJSP.2025.3627123
Furkan Mert Algan;Umut Yazgan;Driton Salihu;Cem Eteke;Eckehard Steinbach
We present LEMON, a mesh editing pipeline that integrates neural deferred shading with localized mesh optimization to enable fast and precise editing of polygonal meshes guided by text prompts. Existing solutions for this problem tend to focus on a single task, either geometry or novel view synthesis, which often leads to disjointed results between the mesh and view. Our approach starts by identifying the most important vertices in the mesh for editing, using a segmentation model to focus on these key regions. Given multi-view images of an object, we optimize a neural shader and a polygonal mesh while extracting the normal map and the rendered image from each view. Using these outputs as conditioning data, we edit the input images with a text-to-image diffusion model and iteratively update our dataset while deforming the mesh. This process results in a polygonal mesh that is edited according to the given text instruction, preserving the geometric characteristics of the initial mesh while focusing on the most significant areas. We evaluate our pipeline on the DTU dataset, demonstrating that it generates finely-edited meshes more rapidly than the current state-of-the-art methods. We include our code and additional results in the supplementary material.
{"title":"LEMON: Localized Editing With Mesh Optimization and Neural Shaders","authors":"Furkan Mert Algan;Umut Yazgan;Driton Salihu;Cem Eteke;Eckehard Steinbach","doi":"10.1109/OJSP.2025.3627123","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3627123","url":null,"abstract":"We present LEMON, a mesh editing pipeline that integrates neural deferred shading with localized mesh optimization to enable fast and precise editing of polygonal meshes guided by text prompts. Existing solutions for this problem tend to focus on a single task, either geometry or novel view synthesis, which often leads to disjointed results between the mesh and view. Our approach starts by identifying the most important vertices in the mesh for editing, using a segmentation model to focus on these key regions. Given multi-view images of an object, we optimize a neural shader and a polygonal mesh while extracting the normal map and the rendered image from each view. Using these outputs as conditioning data, we edit the input images with a text-to-image diffusion model and iteratively update our dataset while deforming the mesh. This process results in a polygonal mesh that is edited according to the given text instruction, preserving the geometric characteristics of the initial mesh while focusing on the most significant areas. We evaluate our pipeline on the DTU dataset, demonstrating that it generates finely-edited meshes more rapidly than the current state-of-the-art methods. We include our code and additional results in the supplementary material.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1161-1168"},"PeriodicalIF":2.7,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11222920","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-27DOI: 10.1109/OJSP.2025.3625863
Ping Hu;Mert Kayaalp;Ali H. Sayed
Distributed decision-making over graphs involves a group of agents that collaboratively work toward a common objective. In the social learning framework, the agents are tasked to infer an unknown state from a finite set by using a stream of local observations. The probability of decision errors for each agent asymptotically converges to zero at an exponential rate, characterized by the error exponent, which depends on the combination policy employed by the network. This work addresses the challenge of identifying optimal combination policies to maximize the error exponent for the true state while ensuring the errors for all other states converge to zero as well. We derive an upper bound on the achievable error exponent under the social learning rule, and then establish conditions for the combination policy to reach this upper bound. Moreover, we examine the performance loss scenarios when the combination policy is chosen inappropriately. From a geometric perspective, each combination policy induces a weighted nearest neighbor classifier where the weights correspond to the agents’ Perron centralities. By implementing an optimized combination policy, we enhance the error exponent, leading to improved accuracy and efficiency in the distributed decision-making process.
{"title":"Minimizing the Probability of Error for Decision Making Over Graphs","authors":"Ping Hu;Mert Kayaalp;Ali H. Sayed","doi":"10.1109/OJSP.2025.3625863","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3625863","url":null,"abstract":"Distributed decision-making over graphs involves a group of agents that collaboratively work toward a common objective. In the social learning framework, the agents are tasked to infer an unknown state from a finite set by using a stream of local observations. The probability of decision errors for each agent asymptotically converges to zero at an exponential rate, characterized by the <italic>error exponent</i>, which depends on the combination policy employed by the network. This work addresses the challenge of identifying optimal combination policies to maximize the error exponent for the true state while ensuring the errors for all other states converge to zero as well. We derive an upper bound on the achievable error exponent under the social learning rule, and then establish conditions for the combination policy to reach this upper bound. Moreover, we examine the performance loss scenarios when the combination policy is chosen inappropriately. From a geometric perspective, each combination policy induces a weighted nearest neighbor classifier where the weights correspond to the agents’ Perron centralities. By implementing an optimized combination policy, we enhance the error exponent, leading to improved accuracy and efficiency in the distributed decision-making process.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1139-1160"},"PeriodicalIF":2.7,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11217991","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13DOI: 10.1109/OJSP.2025.3620713
Christos Korgialas;Constantine Kotropoulos
An approach to Source Device Identification (SDI) is proposed, leveraging a Residual Network (ResNet) architecture enhanced with the Convolutional Block Attention Module (CBAM). The approach employs log-Mel spectrograms of audio content from videos in the VISION dataset captured by 35 different devices. A content-disjoint evaluation protocol is applied at the recording level to eliminate content bias across splits, supported by fixed-length segmentation and structured patch extraction for input generation. Moreover, Gradient-weighted Class Activation Mapping (Grad-CAM) is exploited to highlight the spectrogram regions that contribute most to the identification process, thus enabling interpretability. Quantitatively, the CBAM ResNet model is compared with existing methods, demonstrating an increased SDI accuracy across scenarios, including flat, indoor, and outdoor environments. A statistical significance test is conducted to assess the SDI accuracies, while an ablation study is performed to analyze the effect of attention mechanisms on the proposed model’s performance. Additional evaluations are performed using the FloreView and POLIPHONE datasets to validate the model’s generalization capabilities across unseen devices via transfer learning, assessing robustness under various conditions.
{"title":"Attention Source Device Identification Using Audio Content From Videos and Grad-CAM Explanations","authors":"Christos Korgialas;Constantine Kotropoulos","doi":"10.1109/OJSP.2025.3620713","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3620713","url":null,"abstract":"An approach to Source Device Identification (SDI) is proposed, leveraging a Residual Network (ResNet) architecture enhanced with the Convolutional Block Attention Module (CBAM). The approach employs log-Mel spectrograms of audio content from videos in the VISION dataset captured by 35 different devices. A content-disjoint evaluation protocol is applied at the recording level to eliminate content bias across splits, supported by fixed-length segmentation and structured patch extraction for input generation. Moreover, Gradient-weighted Class Activation Mapping (Grad-CAM) is exploited to highlight the spectrogram regions that contribute most to the identification process, thus enabling interpretability. Quantitatively, the CBAM ResNet model is compared with existing methods, demonstrating an increased SDI accuracy across scenarios, including flat, indoor, and outdoor environments. A statistical significance test is conducted to assess the SDI accuracies, while an ablation study is performed to analyze the effect of attention mechanisms on the proposed model’s performance. Additional evaluations are performed using the FloreView and POLIPHONE datasets to validate the model’s generalization capabilities across unseen devices via transfer learning, assessing robustness under various conditions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1124-1138"},"PeriodicalIF":2.7,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11202249","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}