首页 > 最新文献

IEEE open journal of signal processing最新文献

英文 中文
Multiple Model Recursive Gaussian Process for Robust Target Tracking 多模型递归高斯过程鲁棒目标跟踪
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-19 DOI: 10.1109/OJSP.2025.3646127
ALI EMRE BALCı;Raj Thilak Rajan
Accurate tracking of targets is vital for safe and reliable operations, particularly in complex and dynamic environments such as urban areas. Traditional tracking methods, including Kalman and particle filters, often perform poorly in real world scenarios, due to inaccurate models and sparse or noisy measurements. Gaussian process (GP) based methods offer a flexible and data driven alternative with uncertainty quantification that does not depend on predefined dynamical equations. However, state of the art GP tracking approaches require expensive hyperparameter optimization, which limits their practicality for real time applications. In this work, we introduce a novel GP mixture based computationally efficient tracking method, which is capable of modeling complex system behavior and adapt to changing dynamics. Our proposed solution, named Multiple Model Recursive Gaussian Process (MM-RGP), adapts continuously to changing dynamics, is capable of modeling complex behavior, and is robust against sparse observation. In addition, the proposed method avoids hyperparameter optimization and adapts to incoming data. We demonstrate the effectiveness of our solution using the example of uncrewed aerial vehicle (UAV) tracking, with both simulated and real datasets, and propose directions for extending our work.
准确跟踪目标对于安全可靠的行动至关重要,特别是在复杂和动态的环境中,如城市地区。传统的跟踪方法,包括卡尔曼和粒子滤波,由于模型不准确和稀疏或有噪声的测量,在现实世界的场景中往往表现不佳。基于高斯过程(GP)的方法提供了一种灵活的、数据驱动的替代方法,它不依赖于预定义的动力学方程,具有不确定性量化。然而,最先进的GP跟踪方法需要昂贵的超参数优化,这限制了它们在实时应用中的实用性。在这项工作中,我们引入了一种新的基于GP混合的计算高效跟踪方法,该方法能够模拟复杂系统的行为并适应不断变化的动力学。我们提出的多模型递归高斯过程(MM-RGP)能够持续适应不断变化的动态,能够对复杂行为进行建模,并且对稀疏观测具有鲁棒性。此外,该方法避免了超参数优化,对输入数据具有适应性。我们使用模拟和真实数据集的无人驾驶飞行器(UAV)跟踪示例证明了我们的解决方案的有效性,并提出了扩展我们工作的方向。
{"title":"Multiple Model Recursive Gaussian Process for Robust Target Tracking","authors":"ALI EMRE BALCı;Raj Thilak Rajan","doi":"10.1109/OJSP.2025.3646127","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3646127","url":null,"abstract":"Accurate tracking of targets is vital for safe and reliable operations, particularly in complex and dynamic environments such as urban areas. Traditional tracking methods, including Kalman and particle filters, often perform poorly in real world scenarios, due to inaccurate models and sparse or noisy measurements. Gaussian process (GP) based methods offer a flexible and data driven alternative with uncertainty quantification that does not depend on predefined dynamical equations. However, state of the art GP tracking approaches require expensive hyperparameter optimization, which limits their practicality for real time applications. In this work, we introduce a novel GP mixture based computationally efficient tracking method, which is capable of modeling complex system behavior and adapt to changing dynamics. Our proposed solution, named Multiple Model Recursive Gaussian Process (MM-RGP), adapts continuously to changing dynamics, is capable of modeling complex behavior, and is robust against sparse observation. In addition, the proposed method avoids hyperparameter optimization and adapts to incoming data. We demonstrate the effectiveness of our solution using the example of uncrewed aerial vehicle (UAV) tracking, with both simulated and real datasets, and propose directions for extending our work.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"23-31"},"PeriodicalIF":2.7,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11304544","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
List of Reviewers 审稿人名单
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-15 DOI: 10.1109/OJSP.2025.3635745
{"title":"List of Reviewers","authors":"","doi":"10.1109/OJSP.2025.3635745","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3635745","url":null,"abstract":"","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1203-1206"},"PeriodicalIF":2.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11300295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Direction of Arrival Estimation for the Coexistence of Uncorrelated and Coherent Signals via Rotation Spatial Differencing Method 基于旋转空间差分的非相关和相干共存信号的到达方向估计
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-15 DOI: 10.1109/OJSP.2025.3644786
Peng Luo;Boyu Pang;Defeng Wu;W. Zeng
This paper presents a novel direction of arrival (DOA) estimation method via rotation spatial differencing technique that offers high resolution, robustness, and stable performance. To suppress external environmental noise and improve estimation accuracy, a new modified covariance matrix is constructed using a rotation matrix technique. Additionally, a spatial differencing matrix is built with neighboring subarrays to achieve signal decoherence. Due to the introduction of the new modified covariance matrix, the differencing matrix has a new feature without compromising the array flow pattern, leading to improved spatial differencing. Finally, a multiple signal classification (MUSIC) spectral search algorithm using the singular value decomposition (SVD) is applied to accurately localize both uncorrelated and coherent signals at once, which greatly facilitates the DOA estimation process. Experimental results demonstrate that the proposed method delivers superior DOA estimation performance, providing accurate and stable signal direction estimation.
提出了一种基于旋转空间差分技术的DOA估计方法,该方法具有高分辨率、鲁棒性和稳定性。为了抑制外部环境噪声,提高估计精度,利用旋转矩阵技术构造了一个新的修正协方差矩阵。此外,利用相邻子阵列构建空间差分矩阵,实现信号去相干。由于引入了新的修正协方差矩阵,差分矩阵在不影响阵列流型的情况下具有新的特征,从而改善了空间差分。最后,采用基于奇异值分解(SVD)的多信号分类(MUSIC)频谱搜索算法,同时精确定位不相关和相干信号,极大地简化了DOA估计过程。实验结果表明,该方法具有较好的DOA估计性能,能够提供准确、稳定的信号方向估计。
{"title":"Direction of Arrival Estimation for the Coexistence of Uncorrelated and Coherent Signals via Rotation Spatial Differencing Method","authors":"Peng Luo;Boyu Pang;Defeng Wu;W. Zeng","doi":"10.1109/OJSP.2025.3644786","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3644786","url":null,"abstract":"This paper presents a novel direction of arrival (DOA) estimation method via rotation spatial differencing technique that offers high resolution, robustness, and stable performance. To suppress external environmental noise and improve estimation accuracy, a new modified covariance matrix is constructed using a rotation matrix technique. Additionally, a spatial differencing matrix is built with neighboring subarrays to achieve signal decoherence. Due to the introduction of the new modified covariance matrix, the differencing matrix has a new feature without compromising the array flow pattern, leading to improved spatial differencing. Finally, a multiple signal classification (MUSIC) spectral search algorithm using the singular value decomposition (SVD) is applied to accurately localize both uncorrelated and coherent signals at once, which greatly facilitates the DOA estimation process. Experimental results demonstrate that the proposed method delivers superior DOA estimation performance, providing accurate and stable signal direction estimation.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"11-22"},"PeriodicalIF":2.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11300944","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RANF: Neural Field-Based HRTF Spatial Upsampling With Retrieval Augmentation and Parameter Efficient Fine-Tuning 基于神经场的HRTF空间上采样与检索增强和参数有效微调
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-04 DOI: 10.1109/OJSP.2025.3640517
Yoshiki Masuyama;Gordon Wichern;François G. Germain;Christopher Ick;Jonathan Le Roux
This paper gives an in-depth description of our submission to Task 2 of the Listener Acoustic Personalization (LAP) challenge 2024, which aims to reconstruct head-related transfer functions (HRTFs) with dense spatial grids from sparse measurements. Neural fields (NFs) with parameter-efficient fine-tuning (PEFT) have led to dramatic performance improvements in HRTF spatial upsampling and personalization. Despite these advances, spatial upsampling performance remains limited in scenarios with very sparse measurements. Our proposed system, named retrieval-augmented NF (RANF), incorporates HRTFs retrieved from a dataset as auxiliary inputs. We leverage multiple retrievals via transform-average-concatenate and adopt a PEFT technique tailored for retrieval augmentation. Furthermore, we capitalize on the results of a signal-processing-based spatial upsampling method as optional inputs. By incorporating these auxiliary inputs, our system demonstrated state-of-the-art performance on the SONICOM dataset and placed first in Task 2 of the LAP challenge 2024.
本文深入描述了我们提交给听众声学个性化(LAP)挑战2024的任务2,该任务旨在从稀疏测量中重建具有密集空间网格的头部相关传递函数(hrtf)。具有参数有效微调(PEFT)的神经场(NFs)在HRTF空间上采样和个性化方面取得了显著的性能改进。尽管取得了这些进展,但在测量非常稀疏的情况下,空间上采样性能仍然有限。我们提出的系统名为检索增强NF (RANF),它将从数据集检索的hrtf作为辅助输入。我们通过转换-平均-连接利用多个检索,并采用为检索增强量身定制的PEFT技术。此外,我们利用基于信号处理的空间上采样方法的结果作为可选输入。通过整合这些辅助输入,我们的系统在SONICOM数据集上展示了最先进的性能,并在LAP挑战2024的任务2中排名第一。
{"title":"RANF: Neural Field-Based HRTF Spatial Upsampling With Retrieval Augmentation and Parameter Efficient Fine-Tuning","authors":"Yoshiki Masuyama;Gordon Wichern;François G. Germain;Christopher Ick;Jonathan Le Roux","doi":"10.1109/OJSP.2025.3640517","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3640517","url":null,"abstract":"This paper gives an in-depth description of our submission to Task 2 of the Listener Acoustic Personalization (LAP) challenge 2024, which aims to reconstruct head-related transfer functions (HRTFs) with dense spatial grids from sparse measurements. Neural fields (NFs) with parameter-efficient fine-tuning (PEFT) have led to dramatic performance improvements in HRTF spatial upsampling and personalization. Despite these advances, spatial upsampling performance remains limited in scenarios with very sparse measurements. Our proposed system, named retrieval-augmented NF (RANF), incorporates HRTFs retrieved from a dataset as auxiliary inputs. We leverage multiple retrievals via transform-average-concatenate and adopt a PEFT technique tailored for retrieval augmentation. Furthermore, we capitalize on the results of a signal-processing-based spatial upsampling method as optional inputs. By incorporating these auxiliary inputs, our system demonstrated state-of-the-art performance on the SONICOM dataset and placed first in Task 2 of the LAP challenge 2024.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"32-41"},"PeriodicalIF":2.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11277386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifiability Conditions for Acoustic Feedback Cancellation With the Two-Channel Adaptive Feedback Canceller Algorithm 双通道自适应反馈抵消算法声反馈抵消的可识别性条件
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-03 DOI: 10.1109/OJSP.2025.3639934
Arnout Roebben;Toon van Waterschoot;Jan Wouters;Marc Moonen
In audio signal processing applications with a microphone and a loudspeaker within the same acoustic environment, the loudspeaker signals can feed back into the microphone, thereby creating a closed-loop system that potentially leads to system instability. To remove this acoustic coupling, prediction error method (PEM) feedback cancellation algorithms aim to identify the feedback path between the loudspeaker and the microphone by assuming that the input signal can be modelled by means of an autoregressive (AR) model. It has previously been shown that this PEM framework and resulting algorithms can identify the feedback path correctly in cases where the forward path from microphone to loudspeaker is sufficiently time-varying or non-linear, or when the forward path delay equals or exceeds the order of the AR model. In this paper, it is shown that this delay-based condition can be generalised for one particular PEM-based algorithm, the so-called two-channel adaptive feedback canceller (2ch-AFC), to an invertibility-based condition, for which it is shown that identifiability can be achieved when the order of the forward path feedforward filter exceeds the order of the AR model. Additionally, the condition number of inversion of the correlation matrix as used in the 2ch-AFC algorithm can serve as a measure for monitoring the identifiability.
在音频信号处理应用中,麦克风和扬声器处于同一声学环境中,扬声器信号可以反馈到麦克风中,从而形成闭环系统,可能导致系统不稳定。为了消除这种声学耦合,预测误差法(PEM)反馈抵消算法旨在通过假设输入信号可以通过自回归(AR)模型建模来识别扬声器和麦克风之间的反馈路径。先前的研究表明,在从麦克风到扬声器的前向路径充分时变或非线性的情况下,或者当前向路径延迟等于或超过AR模型的阶数时,该PEM框架和由此产生的算法可以正确识别反馈路径。在本文中,证明了这种基于延迟的条件可以推广到一个特定的基于pom的算法,即所谓的双通道自适应反馈抵消(2ch-AFC),以可逆性为基础的条件,并且表明当前向路径前馈滤波器的阶数超过AR模型的阶数时,可以实现可辨识性。此外,2ch-AFC算法中使用的相关矩阵反演条件数可以作为监控可识别性的一种措施。
{"title":"Identifiability Conditions for Acoustic Feedback Cancellation With the Two-Channel Adaptive Feedback Canceller Algorithm","authors":"Arnout Roebben;Toon van Waterschoot;Jan Wouters;Marc Moonen","doi":"10.1109/OJSP.2025.3639934","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3639934","url":null,"abstract":"In audio signal processing applications with a microphone and a loudspeaker within the same acoustic environment, the loudspeaker signals can feed back into the microphone, thereby creating a closed-loop system that potentially leads to system instability. To remove this acoustic coupling, prediction error method (PEM) feedback cancellation algorithms aim to identify the feedback path between the loudspeaker and the microphone by assuming that the input signal can be modelled by means of an autoregressive (AR) model. It has previously been shown that this PEM framework and resulting algorithms can identify the feedback path correctly in cases where the forward path from microphone to loudspeaker is sufficiently time-varying or non-linear, or when the forward path delay equals or exceeds the order of the AR model. In this paper, it is shown that this delay-based condition can be generalised for one particular PEM-based algorithm, the so-called two-channel adaptive feedback canceller (2ch-AFC), to an invertibility-based condition, for which it is shown that identifiability can be achieved when the order of the forward path feedforward filter exceeds the order of the AR model. Additionally, the condition number of inversion of the correlation matrix as used in the 2ch-AFC algorithm can serve as a measure for monitoring the identifiability.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"7 ","pages":"1-10"},"PeriodicalIF":2.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11275691","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145802367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Common-Gain Autoencoder Network for Binaural Speech Enhancement 双耳语音增强的共增益自编码器网络
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/OJSP.2025.3633577
Stefan Thaleiser;Gerald Enzner;Rainer Martin;Aleksej Chinaev
Binaural processing is becoming an important feature of high-end commercial headsets and hearing aids. Speech enhancement with binaural output requires adequate treatment of spatial cues in addition to desirable noise reduction and simultaneous speech preservation. Binaural speech enhancement was traditionally approached with model-based statistical signal processing, where the principle of common-gain filtering with identical treatment of left- and right-ear signals has been designed to achieve enhancement constrained by strict binaural cue preservation. However, model-based approaches may also be instructive for the design of modern deep learning architectures. In this article, the common-gain paradigm is therefore embedded into an artificial neural network approach. In order to maintain the desired common-gain property end-to-end, we derive the requirements for compressed feature formation and data normalization. Binaural experiments with moderate-sized artificial neural networks demonstrate the superiority of the proposed common-gain autoencoder network over model-based processing and related unconstrained network architectures for anechoic and reverberant noisy speech in terms of segmental SNR, binaural perception-based metrics MBSTOI, better-ear HASQI, and a listening experiment.
双耳处理正在成为高端商用耳机和助听器的重要特征。双耳输出的语音增强除了需要降噪和同步语音保存外,还需要对空间线索进行适当的处理。传统的双耳语音增强方法是基于模型的统计信号处理,其中设计了对左耳和右耳信号进行相同处理的共增益滤波原理,以实现严格的双耳线索保存约束下的增强。然而,基于模型的方法也可能对现代深度学习架构的设计有指导意义。在本文中,共增益范式因此被嵌入到人工神经网络方法中。为了保持期望的端到端共增益特性,我们推导了压缩特征形成和数据归一化的要求。中等规模人工神经网络的双耳实验表明,在段信噪比、基于双耳感知的指标MBSTOI、更优耳HASQI和听力实验方面,所提出的共增益自编码器网络优于基于模型的处理和相关的无约束网络架构。
{"title":"Common-Gain Autoencoder Network for Binaural Speech Enhancement","authors":"Stefan Thaleiser;Gerald Enzner;Rainer Martin;Aleksej Chinaev","doi":"10.1109/OJSP.2025.3633577","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3633577","url":null,"abstract":"Binaural processing is becoming an important feature of high-end commercial headsets and hearing aids. Speech enhancement with binaural output requires adequate treatment of spatial cues in addition to desirable noise reduction and simultaneous speech preservation. Binaural speech enhancement was traditionally approached with model-based statistical signal processing, where the principle of common-gain filtering with identical treatment of left- and right-ear signals has been designed to achieve enhancement constrained by strict binaural cue preservation. However, model-based approaches may also be instructive for the design of modern deep learning architectures. In this article, the common-gain paradigm is therefore embedded into an artificial neural network approach. In order to maintain the desired common-gain property end-to-end, we derive the requirements for compressed feature formation and data normalization. Binaural experiments with moderate-sized artificial neural networks demonstrate the superiority of the proposed common-gain autoencoder network over model-based processing and related unconstrained network architectures for anechoic and reverberant noisy speech in terms of segmental SNR, binaural perception-based metrics MBSTOI, better-ear HASQI, and a listening experiment.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1193-1202"},"PeriodicalIF":2.7,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250640","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Embracing Cacophony: Explaining and Improving Random Mixing in Music Source Separation 拥抱杂音:解释和改进音乐源分离中的随机混音
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/OJSP.2025.3633567
Chang-Bin Jeon;Gordon Wichern;François G. Germain;Jonathan Le Roux
In music source separation, a standard data augmentation technique involves creating new training examples by randomly combining instrument stems from different songs. However, these randomly mixed samples lack the natural coherence of real music, as their stems do not share a consistent beat or tonality, often resulting in a cacophony. Despite this apparent distribution shift, random mixing has been widely adopted due to its effectiveness. In this work, we investigate why random mixing improves performance when training a state-of-the-art music source separation model and analyze the factors that cause performance gains to plateau despite the theoretically limitless number of possible combinations. We further explore the impact of beat and tonality mismatches on separation performance. Beyond analyzing random mixing, we introduce ways to further enhance its effectiveness. First, we explore a multi-segment sampling strategy that increases the diversity of training examples by selecting multiple segments for the target source. Second, we incorporate a digital parametric equalizer, a fundamental tool in music production, to maximize the timbral diversity of random mixes. Our experiments demonstrate that a model trained with only 100 songs from the MUSDB18-HQ dataset, combined with our proposed methods, achieves competitive performance to a BS-RNN model trained with 1,750 additional songs.
在音乐源分离中,标准的数据增强技术包括通过随机组合来自不同歌曲的乐器梗来创建新的训练样例。然而,这些随机混合的样本缺乏真正音乐的自然连贯性,因为它们的茎不共享一致的节拍或调性,经常导致不和谐的声音。尽管存在这种明显的分布转移,但由于其有效性,随机混合已被广泛采用。在这项工作中,我们研究了为什么在训练最先进的音乐源分离模型时随机混合可以提高性能,并分析了导致性能增长趋于平稳的因素,尽管理论上可能的组合数量是无限的。我们进一步探讨了节拍和调性不匹配对分离性能的影响。除了分析随机混合之外,我们还介绍了进一步提高其有效性的方法。首先,我们探索了一种多段采样策略,通过为目标源选择多个段来增加训练样例的多样性。其次,我们结合了数字参数均衡器,这是音乐制作中的基本工具,以最大限度地提高随机混音的音色多样性。我们的实验表明,仅使用来自MUSDB18-HQ数据集的100首歌曲训练的模型,结合我们提出的方法,与使用1,750首额外歌曲训练的BS-RNN模型相比,取得了具有竞争力的性能。
{"title":"Embracing Cacophony: Explaining and Improving Random Mixing in Music Source Separation","authors":"Chang-Bin Jeon;Gordon Wichern;François G. Germain;Jonathan Le Roux","doi":"10.1109/OJSP.2025.3633567","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3633567","url":null,"abstract":"In music source separation, a standard data augmentation technique involves creating new training examples by randomly combining instrument stems from different songs. However, these randomly mixed samples lack the natural coherence of real music, as their stems do not share a consistent beat or tonality, often resulting in a cacophony. Despite this apparent distribution shift, random mixing has been widely adopted due to its effectiveness. In this work, we investigate why random mixing improves performance when training a state-of-the-art music source separation model and analyze the factors that cause performance gains to plateau despite the theoretically limitless number of possible combinations. We further explore the impact of beat and tonality mismatches on separation performance. Beyond analyzing random mixing, we introduce ways to further enhance its effectiveness. First, we explore a multi-segment sampling strategy that increases the diversity of training examples by selecting multiple segments for the target source. Second, we incorporate a digital parametric equalizer, a fundamental tool in music production, to maximize the timbral diversity of random mixes. Our experiments demonstrate that a model trained with only 100 songs from the MUSDB18-HQ dataset, combined with our proposed methods, achieves competitive performance to a BS-RNN model trained with 1,750 additional songs.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1179-1192"},"PeriodicalIF":2.7,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250641","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LEMON: Localized Editing With Mesh Optimization and Neural Shaders 柠檬:局部编辑与网格优化和神经着色器
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-30 DOI: 10.1109/OJSP.2025.3627123
Furkan Mert Algan;Umut Yazgan;Driton Salihu;Cem Eteke;Eckehard Steinbach
We present LEMON, a mesh editing pipeline that integrates neural deferred shading with localized mesh optimization to enable fast and precise editing of polygonal meshes guided by text prompts. Existing solutions for this problem tend to focus on a single task, either geometry or novel view synthesis, which often leads to disjointed results between the mesh and view. Our approach starts by identifying the most important vertices in the mesh for editing, using a segmentation model to focus on these key regions. Given multi-view images of an object, we optimize a neural shader and a polygonal mesh while extracting the normal map and the rendered image from each view. Using these outputs as conditioning data, we edit the input images with a text-to-image diffusion model and iteratively update our dataset while deforming the mesh. This process results in a polygonal mesh that is edited according to the given text instruction, preserving the geometric characteristics of the initial mesh while focusing on the most significant areas. We evaluate our pipeline on the DTU dataset, demonstrating that it generates finely-edited meshes more rapidly than the current state-of-the-art methods. We include our code and additional results in the supplementary material.
我们提出了一个网格编辑管道LEMON,它集成了神经递延着色和局部网格优化,可以在文本提示的指导下快速精确地编辑多边形网格。针对该问题的现有解决方案往往集中在单一任务上,要么是几何,要么是新的视图合成,这往往导致网格和视图之间的结果脱节。我们的方法首先确定网格中最重要的顶点进行编辑,使用分割模型专注于这些关键区域。给定一个对象的多视图图像,我们优化了一个神经着色器和一个多边形网格,同时从每个视图中提取法线贴图和渲染图像。使用这些输出作为条件数据,我们使用文本到图像扩散模型编辑输入图像,并在变形网格时迭代更新我们的数据集。这个过程产生一个多边形网格,根据给定的文本指令进行编辑,保留初始网格的几何特征,同时关注最重要的区域。我们在DTU数据集上评估了我们的管道,证明它比目前最先进的方法更快地生成精细编辑的网格。我们在补充材料中包含了我们的代码和其他结果。
{"title":"LEMON: Localized Editing With Mesh Optimization and Neural Shaders","authors":"Furkan Mert Algan;Umut Yazgan;Driton Salihu;Cem Eteke;Eckehard Steinbach","doi":"10.1109/OJSP.2025.3627123","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3627123","url":null,"abstract":"We present LEMON, a mesh editing pipeline that integrates neural deferred shading with localized mesh optimization to enable fast and precise editing of polygonal meshes guided by text prompts. Existing solutions for this problem tend to focus on a single task, either geometry or novel view synthesis, which often leads to disjointed results between the mesh and view. Our approach starts by identifying the most important vertices in the mesh for editing, using a segmentation model to focus on these key regions. Given multi-view images of an object, we optimize a neural shader and a polygonal mesh while extracting the normal map and the rendered image from each view. Using these outputs as conditioning data, we edit the input images with a text-to-image diffusion model and iteratively update our dataset while deforming the mesh. This process results in a polygonal mesh that is edited according to the given text instruction, preserving the geometric characteristics of the initial mesh while focusing on the most significant areas. We evaluate our pipeline on the DTU dataset, demonstrating that it generates finely-edited meshes more rapidly than the current state-of-the-art methods. We include our code and additional results in the supplementary material.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1161-1168"},"PeriodicalIF":2.7,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11222920","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimizing the Probability of Error for Decision Making Over Graphs 最小化图上决策的错误概率
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-27 DOI: 10.1109/OJSP.2025.3625863
Ping Hu;Mert Kayaalp;Ali H. Sayed
Distributed decision-making over graphs involves a group of agents that collaboratively work toward a common objective. In the social learning framework, the agents are tasked to infer an unknown state from a finite set by using a stream of local observations. The probability of decision errors for each agent asymptotically converges to zero at an exponential rate, characterized by the error exponent, which depends on the combination policy employed by the network. This work addresses the challenge of identifying optimal combination policies to maximize the error exponent for the true state while ensuring the errors for all other states converge to zero as well. We derive an upper bound on the achievable error exponent under the social learning rule, and then establish conditions for the combination policy to reach this upper bound. Moreover, we examine the performance loss scenarios when the combination policy is chosen inappropriately. From a geometric perspective, each combination policy induces a weighted nearest neighbor classifier where the weights correspond to the agents’ Perron centralities. By implementing an optimized combination policy, we enhance the error exponent, leading to improved accuracy and efficiency in the distributed decision-making process.
图上的分布式决策涉及一组为共同目标协同工作的代理。在社会学习框架中,代理的任务是通过使用局部观察流从有限集合中推断未知状态。每个智能体的决策错误概率以指数速率渐近收敛于零,并以误差指数为特征,该指数取决于网络所采用的组合策略。这项工作解决了识别最佳组合策略的挑战,以最大化真实状态的误差指数,同时确保所有其他状态的误差也收敛于零。在社会学习规则下,导出了可实现误差指数的上界,并建立了组合策略达到该上界的条件。此外,我们还研究了当组合策略选择不当时的性能损失情况。从几何角度来看,每个组合策略诱导一个加权最近邻分类器,其中权重对应于代理的Perron中心性。通过优化组合策略,提高了误差指数,提高了分布式决策过程的准确性和效率。
{"title":"Minimizing the Probability of Error for Decision Making Over Graphs","authors":"Ping Hu;Mert Kayaalp;Ali H. Sayed","doi":"10.1109/OJSP.2025.3625863","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3625863","url":null,"abstract":"Distributed decision-making over graphs involves a group of agents that collaboratively work toward a common objective. In the social learning framework, the agents are tasked to infer an unknown state from a finite set by using a stream of local observations. The probability of decision errors for each agent asymptotically converges to zero at an exponential rate, characterized by the <italic>error exponent</i>, which depends on the combination policy employed by the network. This work addresses the challenge of identifying optimal combination policies to maximize the error exponent for the true state while ensuring the errors for all other states converge to zero as well. We derive an upper bound on the achievable error exponent under the social learning rule, and then establish conditions for the combination policy to reach this upper bound. Moreover, we examine the performance loss scenarios when the combination policy is chosen inappropriately. From a geometric perspective, each combination policy induces a weighted nearest neighbor classifier where the weights correspond to the agents’ Perron centralities. By implementing an optimized combination policy, we enhance the error exponent, leading to improved accuracy and efficiency in the distributed decision-making process.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1139-1160"},"PeriodicalIF":2.7,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11217991","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention Source Device Identification Using Audio Content From Videos and Grad-CAM Explanations 注意源设备识别使用音频内容从视频和Grad-CAM解释
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-13 DOI: 10.1109/OJSP.2025.3620713
Christos Korgialas;Constantine Kotropoulos
An approach to Source Device Identification (SDI) is proposed, leveraging a Residual Network (ResNet) architecture enhanced with the Convolutional Block Attention Module (CBAM). The approach employs log-Mel spectrograms of audio content from videos in the VISION dataset captured by 35 different devices. A content-disjoint evaluation protocol is applied at the recording level to eliminate content bias across splits, supported by fixed-length segmentation and structured patch extraction for input generation. Moreover, Gradient-weighted Class Activation Mapping (Grad-CAM) is exploited to highlight the spectrogram regions that contribute most to the identification process, thus enabling interpretability. Quantitatively, the CBAM ResNet model is compared with existing methods, demonstrating an increased SDI accuracy across scenarios, including flat, indoor, and outdoor environments. A statistical significance test is conducted to assess the SDI accuracies, while an ablation study is performed to analyze the effect of attention mechanisms on the proposed model’s performance. Additional evaluations are performed using the FloreView and POLIPHONE datasets to validate the model’s generalization capabilities across unseen devices via transfer learning, assessing robustness under various conditions.
提出了一种利用卷积块注意模块(CBAM)增强的残差网络(ResNet)架构实现源设备识别(SDI)的方法。该方法采用35种不同设备捕获的VISION数据集中视频音频内容的对数-梅尔谱图。在记录级别应用内容不相交评估协议来消除跨分割的内容偏差,支持输入生成的固定长度分割和结构化补丁提取。此外,利用梯度加权类激活映射(Grad-CAM)来突出对识别过程贡献最大的谱图区域,从而实现可解释性。在定量方面,CBAM ResNet模型与现有方法进行了比较,证明了在平面、室内和室外环境等场景下SDI精度的提高。通过统计显著性检验来评估SDI的准确性,通过消融研究来分析注意机制对所提模型性能的影响。使用FloreView和POLIPHONE数据集进行额外的评估,通过迁移学习验证模型在未见设备上的泛化能力,评估各种条件下的鲁棒性。
{"title":"Attention Source Device Identification Using Audio Content From Videos and Grad-CAM Explanations","authors":"Christos Korgialas;Constantine Kotropoulos","doi":"10.1109/OJSP.2025.3620713","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3620713","url":null,"abstract":"An approach to Source Device Identification (SDI) is proposed, leveraging a Residual Network (ResNet) architecture enhanced with the Convolutional Block Attention Module (CBAM). The approach employs log-Mel spectrograms of audio content from videos in the VISION dataset captured by 35 different devices. A content-disjoint evaluation protocol is applied at the recording level to eliminate content bias across splits, supported by fixed-length segmentation and structured patch extraction for input generation. Moreover, Gradient-weighted Class Activation Mapping (Grad-CAM) is exploited to highlight the spectrogram regions that contribute most to the identification process, thus enabling interpretability. Quantitatively, the CBAM ResNet model is compared with existing methods, demonstrating an increased SDI accuracy across scenarios, including flat, indoor, and outdoor environments. A statistical significance test is conducted to assess the SDI accuracies, while an ablation study is performed to analyze the effect of attention mechanisms on the proposed model’s performance. Additional evaluations are performed using the FloreView and POLIPHONE datasets to validate the model’s generalization capabilities across unseen devices via transfer learning, assessing robustness under various conditions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1124-1138"},"PeriodicalIF":2.7,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11202249","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1