首页 > 最新文献

IEEE open journal of signal processing最新文献

英文 中文
List of Reviewers 审稿人名单
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-15 DOI: 10.1109/OJSP.2025.3635745
{"title":"List of Reviewers","authors":"","doi":"10.1109/OJSP.2025.3635745","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3635745","url":null,"abstract":"","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1203-1206"},"PeriodicalIF":2.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11300295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Common-Gain Autoencoder Network for Binaural Speech Enhancement 双耳语音增强的共增益自编码器网络
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/OJSP.2025.3633577
Stefan Thaleiser;Gerald Enzner;Rainer Martin;Aleksej Chinaev
Binaural processing is becoming an important feature of high-end commercial headsets and hearing aids. Speech enhancement with binaural output requires adequate treatment of spatial cues in addition to desirable noise reduction and simultaneous speech preservation. Binaural speech enhancement was traditionally approached with model-based statistical signal processing, where the principle of common-gain filtering with identical treatment of left- and right-ear signals has been designed to achieve enhancement constrained by strict binaural cue preservation. However, model-based approaches may also be instructive for the design of modern deep learning architectures. In this article, the common-gain paradigm is therefore embedded into an artificial neural network approach. In order to maintain the desired common-gain property end-to-end, we derive the requirements for compressed feature formation and data normalization. Binaural experiments with moderate-sized artificial neural networks demonstrate the superiority of the proposed common-gain autoencoder network over model-based processing and related unconstrained network architectures for anechoic and reverberant noisy speech in terms of segmental SNR, binaural perception-based metrics MBSTOI, better-ear HASQI, and a listening experiment.
双耳处理正在成为高端商用耳机和助听器的重要特征。双耳输出的语音增强除了需要降噪和同步语音保存外,还需要对空间线索进行适当的处理。传统的双耳语音增强方法是基于模型的统计信号处理,其中设计了对左耳和右耳信号进行相同处理的共增益滤波原理,以实现严格的双耳线索保存约束下的增强。然而,基于模型的方法也可能对现代深度学习架构的设计有指导意义。在本文中,共增益范式因此被嵌入到人工神经网络方法中。为了保持期望的端到端共增益特性,我们推导了压缩特征形成和数据归一化的要求。中等规模人工神经网络的双耳实验表明,在段信噪比、基于双耳感知的指标MBSTOI、更优耳HASQI和听力实验方面,所提出的共增益自编码器网络优于基于模型的处理和相关的无约束网络架构。
{"title":"Common-Gain Autoencoder Network for Binaural Speech Enhancement","authors":"Stefan Thaleiser;Gerald Enzner;Rainer Martin;Aleksej Chinaev","doi":"10.1109/OJSP.2025.3633577","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3633577","url":null,"abstract":"Binaural processing is becoming an important feature of high-end commercial headsets and hearing aids. Speech enhancement with binaural output requires adequate treatment of spatial cues in addition to desirable noise reduction and simultaneous speech preservation. Binaural speech enhancement was traditionally approached with model-based statistical signal processing, where the principle of common-gain filtering with identical treatment of left- and right-ear signals has been designed to achieve enhancement constrained by strict binaural cue preservation. However, model-based approaches may also be instructive for the design of modern deep learning architectures. In this article, the common-gain paradigm is therefore embedded into an artificial neural network approach. In order to maintain the desired common-gain property end-to-end, we derive the requirements for compressed feature formation and data normalization. Binaural experiments with moderate-sized artificial neural networks demonstrate the superiority of the proposed common-gain autoencoder network over model-based processing and related unconstrained network architectures for anechoic and reverberant noisy speech in terms of segmental SNR, binaural perception-based metrics MBSTOI, better-ear HASQI, and a listening experiment.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1193-1202"},"PeriodicalIF":2.7,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250640","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Embracing Cacophony: Explaining and Improving Random Mixing in Music Source Separation 拥抱杂音:解释和改进音乐源分离中的随机混音
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/OJSP.2025.3633567
Chang-Bin Jeon;Gordon Wichern;François G. Germain;Jonathan Le Roux
In music source separation, a standard data augmentation technique involves creating new training examples by randomly combining instrument stems from different songs. However, these randomly mixed samples lack the natural coherence of real music, as their stems do not share a consistent beat or tonality, often resulting in a cacophony. Despite this apparent distribution shift, random mixing has been widely adopted due to its effectiveness. In this work, we investigate why random mixing improves performance when training a state-of-the-art music source separation model and analyze the factors that cause performance gains to plateau despite the theoretically limitless number of possible combinations. We further explore the impact of beat and tonality mismatches on separation performance. Beyond analyzing random mixing, we introduce ways to further enhance its effectiveness. First, we explore a multi-segment sampling strategy that increases the diversity of training examples by selecting multiple segments for the target source. Second, we incorporate a digital parametric equalizer, a fundamental tool in music production, to maximize the timbral diversity of random mixes. Our experiments demonstrate that a model trained with only 100 songs from the MUSDB18-HQ dataset, combined with our proposed methods, achieves competitive performance to a BS-RNN model trained with 1,750 additional songs.
在音乐源分离中,标准的数据增强技术包括通过随机组合来自不同歌曲的乐器梗来创建新的训练样例。然而,这些随机混合的样本缺乏真正音乐的自然连贯性,因为它们的茎不共享一致的节拍或调性,经常导致不和谐的声音。尽管存在这种明显的分布转移,但由于其有效性,随机混合已被广泛采用。在这项工作中,我们研究了为什么在训练最先进的音乐源分离模型时随机混合可以提高性能,并分析了导致性能增长趋于平稳的因素,尽管理论上可能的组合数量是无限的。我们进一步探讨了节拍和调性不匹配对分离性能的影响。除了分析随机混合之外,我们还介绍了进一步提高其有效性的方法。首先,我们探索了一种多段采样策略,通过为目标源选择多个段来增加训练样例的多样性。其次,我们结合了数字参数均衡器,这是音乐制作中的基本工具,以最大限度地提高随机混音的音色多样性。我们的实验表明,仅使用来自MUSDB18-HQ数据集的100首歌曲训练的模型,结合我们提出的方法,与使用1,750首额外歌曲训练的BS-RNN模型相比,取得了具有竞争力的性能。
{"title":"Embracing Cacophony: Explaining and Improving Random Mixing in Music Source Separation","authors":"Chang-Bin Jeon;Gordon Wichern;François G. Germain;Jonathan Le Roux","doi":"10.1109/OJSP.2025.3633567","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3633567","url":null,"abstract":"In music source separation, a standard data augmentation technique involves creating new training examples by randomly combining instrument stems from different songs. However, these randomly mixed samples lack the natural coherence of real music, as their stems do not share a consistent beat or tonality, often resulting in a cacophony. Despite this apparent distribution shift, random mixing has been widely adopted due to its effectiveness. In this work, we investigate why random mixing improves performance when training a state-of-the-art music source separation model and analyze the factors that cause performance gains to plateau despite the theoretically limitless number of possible combinations. We further explore the impact of beat and tonality mismatches on separation performance. Beyond analyzing random mixing, we introduce ways to further enhance its effectiveness. First, we explore a multi-segment sampling strategy that increases the diversity of training examples by selecting multiple segments for the target source. Second, we incorporate a digital parametric equalizer, a fundamental tool in music production, to maximize the timbral diversity of random mixes. Our experiments demonstrate that a model trained with only 100 songs from the MUSDB18-HQ dataset, combined with our proposed methods, achieves competitive performance to a BS-RNN model trained with 1,750 additional songs.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1179-1192"},"PeriodicalIF":2.7,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250641","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LEMON: Localized Editing With Mesh Optimization and Neural Shaders 柠檬:局部编辑与网格优化和神经着色器
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-30 DOI: 10.1109/OJSP.2025.3627123
Furkan Mert Algan;Umut Yazgan;Driton Salihu;Cem Eteke;Eckehard Steinbach
We present LEMON, a mesh editing pipeline that integrates neural deferred shading with localized mesh optimization to enable fast and precise editing of polygonal meshes guided by text prompts. Existing solutions for this problem tend to focus on a single task, either geometry or novel view synthesis, which often leads to disjointed results between the mesh and view. Our approach starts by identifying the most important vertices in the mesh for editing, using a segmentation model to focus on these key regions. Given multi-view images of an object, we optimize a neural shader and a polygonal mesh while extracting the normal map and the rendered image from each view. Using these outputs as conditioning data, we edit the input images with a text-to-image diffusion model and iteratively update our dataset while deforming the mesh. This process results in a polygonal mesh that is edited according to the given text instruction, preserving the geometric characteristics of the initial mesh while focusing on the most significant areas. We evaluate our pipeline on the DTU dataset, demonstrating that it generates finely-edited meshes more rapidly than the current state-of-the-art methods. We include our code and additional results in the supplementary material.
我们提出了一个网格编辑管道LEMON,它集成了神经递延着色和局部网格优化,可以在文本提示的指导下快速精确地编辑多边形网格。针对该问题的现有解决方案往往集中在单一任务上,要么是几何,要么是新的视图合成,这往往导致网格和视图之间的结果脱节。我们的方法首先确定网格中最重要的顶点进行编辑,使用分割模型专注于这些关键区域。给定一个对象的多视图图像,我们优化了一个神经着色器和一个多边形网格,同时从每个视图中提取法线贴图和渲染图像。使用这些输出作为条件数据,我们使用文本到图像扩散模型编辑输入图像,并在变形网格时迭代更新我们的数据集。这个过程产生一个多边形网格,根据给定的文本指令进行编辑,保留初始网格的几何特征,同时关注最重要的区域。我们在DTU数据集上评估了我们的管道,证明它比目前最先进的方法更快地生成精细编辑的网格。我们在补充材料中包含了我们的代码和其他结果。
{"title":"LEMON: Localized Editing With Mesh Optimization and Neural Shaders","authors":"Furkan Mert Algan;Umut Yazgan;Driton Salihu;Cem Eteke;Eckehard Steinbach","doi":"10.1109/OJSP.2025.3627123","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3627123","url":null,"abstract":"We present LEMON, a mesh editing pipeline that integrates neural deferred shading with localized mesh optimization to enable fast and precise editing of polygonal meshes guided by text prompts. Existing solutions for this problem tend to focus on a single task, either geometry or novel view synthesis, which often leads to disjointed results between the mesh and view. Our approach starts by identifying the most important vertices in the mesh for editing, using a segmentation model to focus on these key regions. Given multi-view images of an object, we optimize a neural shader and a polygonal mesh while extracting the normal map and the rendered image from each view. Using these outputs as conditioning data, we edit the input images with a text-to-image diffusion model and iteratively update our dataset while deforming the mesh. This process results in a polygonal mesh that is edited according to the given text instruction, preserving the geometric characteristics of the initial mesh while focusing on the most significant areas. We evaluate our pipeline on the DTU dataset, demonstrating that it generates finely-edited meshes more rapidly than the current state-of-the-art methods. We include our code and additional results in the supplementary material.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1161-1168"},"PeriodicalIF":2.7,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11222920","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimizing the Probability of Error for Decision Making Over Graphs 最小化图上决策的错误概率
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-27 DOI: 10.1109/OJSP.2025.3625863
Ping Hu;Mert Kayaalp;Ali H. Sayed
Distributed decision-making over graphs involves a group of agents that collaboratively work toward a common objective. In the social learning framework, the agents are tasked to infer an unknown state from a finite set by using a stream of local observations. The probability of decision errors for each agent asymptotically converges to zero at an exponential rate, characterized by the error exponent, which depends on the combination policy employed by the network. This work addresses the challenge of identifying optimal combination policies to maximize the error exponent for the true state while ensuring the errors for all other states converge to zero as well. We derive an upper bound on the achievable error exponent under the social learning rule, and then establish conditions for the combination policy to reach this upper bound. Moreover, we examine the performance loss scenarios when the combination policy is chosen inappropriately. From a geometric perspective, each combination policy induces a weighted nearest neighbor classifier where the weights correspond to the agents’ Perron centralities. By implementing an optimized combination policy, we enhance the error exponent, leading to improved accuracy and efficiency in the distributed decision-making process.
图上的分布式决策涉及一组为共同目标协同工作的代理。在社会学习框架中,代理的任务是通过使用局部观察流从有限集合中推断未知状态。每个智能体的决策错误概率以指数速率渐近收敛于零,并以误差指数为特征,该指数取决于网络所采用的组合策略。这项工作解决了识别最佳组合策略的挑战,以最大化真实状态的误差指数,同时确保所有其他状态的误差也收敛于零。在社会学习规则下,导出了可实现误差指数的上界,并建立了组合策略达到该上界的条件。此外,我们还研究了当组合策略选择不当时的性能损失情况。从几何角度来看,每个组合策略诱导一个加权最近邻分类器,其中权重对应于代理的Perron中心性。通过优化组合策略,提高了误差指数,提高了分布式决策过程的准确性和效率。
{"title":"Minimizing the Probability of Error for Decision Making Over Graphs","authors":"Ping Hu;Mert Kayaalp;Ali H. Sayed","doi":"10.1109/OJSP.2025.3625863","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3625863","url":null,"abstract":"Distributed decision-making over graphs involves a group of agents that collaboratively work toward a common objective. In the social learning framework, the agents are tasked to infer an unknown state from a finite set by using a stream of local observations. The probability of decision errors for each agent asymptotically converges to zero at an exponential rate, characterized by the <italic>error exponent</i>, which depends on the combination policy employed by the network. This work addresses the challenge of identifying optimal combination policies to maximize the error exponent for the true state while ensuring the errors for all other states converge to zero as well. We derive an upper bound on the achievable error exponent under the social learning rule, and then establish conditions for the combination policy to reach this upper bound. Moreover, we examine the performance loss scenarios when the combination policy is chosen inappropriately. From a geometric perspective, each combination policy induces a weighted nearest neighbor classifier where the weights correspond to the agents’ Perron centralities. By implementing an optimized combination policy, we enhance the error exponent, leading to improved accuracy and efficiency in the distributed decision-making process.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1139-1160"},"PeriodicalIF":2.7,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11217991","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention Source Device Identification Using Audio Content From Videos and Grad-CAM Explanations 注意源设备识别使用音频内容从视频和Grad-CAM解释
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-13 DOI: 10.1109/OJSP.2025.3620713
Christos Korgialas;Constantine Kotropoulos
An approach to Source Device Identification (SDI) is proposed, leveraging a Residual Network (ResNet) architecture enhanced with the Convolutional Block Attention Module (CBAM). The approach employs log-Mel spectrograms of audio content from videos in the VISION dataset captured by 35 different devices. A content-disjoint evaluation protocol is applied at the recording level to eliminate content bias across splits, supported by fixed-length segmentation and structured patch extraction for input generation. Moreover, Gradient-weighted Class Activation Mapping (Grad-CAM) is exploited to highlight the spectrogram regions that contribute most to the identification process, thus enabling interpretability. Quantitatively, the CBAM ResNet model is compared with existing methods, demonstrating an increased SDI accuracy across scenarios, including flat, indoor, and outdoor environments. A statistical significance test is conducted to assess the SDI accuracies, while an ablation study is performed to analyze the effect of attention mechanisms on the proposed model’s performance. Additional evaluations are performed using the FloreView and POLIPHONE datasets to validate the model’s generalization capabilities across unseen devices via transfer learning, assessing robustness under various conditions.
提出了一种利用卷积块注意模块(CBAM)增强的残差网络(ResNet)架构实现源设备识别(SDI)的方法。该方法采用35种不同设备捕获的VISION数据集中视频音频内容的对数-梅尔谱图。在记录级别应用内容不相交评估协议来消除跨分割的内容偏差,支持输入生成的固定长度分割和结构化补丁提取。此外,利用梯度加权类激活映射(Grad-CAM)来突出对识别过程贡献最大的谱图区域,从而实现可解释性。在定量方面,CBAM ResNet模型与现有方法进行了比较,证明了在平面、室内和室外环境等场景下SDI精度的提高。通过统计显著性检验来评估SDI的准确性,通过消融研究来分析注意机制对所提模型性能的影响。使用FloreView和POLIPHONE数据集进行额外的评估,通过迁移学习验证模型在未见设备上的泛化能力,评估各种条件下的鲁棒性。
{"title":"Attention Source Device Identification Using Audio Content From Videos and Grad-CAM Explanations","authors":"Christos Korgialas;Constantine Kotropoulos","doi":"10.1109/OJSP.2025.3620713","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3620713","url":null,"abstract":"An approach to Source Device Identification (SDI) is proposed, leveraging a Residual Network (ResNet) architecture enhanced with the Convolutional Block Attention Module (CBAM). The approach employs log-Mel spectrograms of audio content from videos in the VISION dataset captured by 35 different devices. A content-disjoint evaluation protocol is applied at the recording level to eliminate content bias across splits, supported by fixed-length segmentation and structured patch extraction for input generation. Moreover, Gradient-weighted Class Activation Mapping (Grad-CAM) is exploited to highlight the spectrogram regions that contribute most to the identification process, thus enabling interpretability. Quantitatively, the CBAM ResNet model is compared with existing methods, demonstrating an increased SDI accuracy across scenarios, including flat, indoor, and outdoor environments. A statistical significance test is conducted to assess the SDI accuracies, while an ablation study is performed to analyze the effect of attention mechanisms on the proposed model’s performance. Additional evaluations are performed using the FloreView and POLIPHONE datasets to validate the model’s generalization capabilities across unseen devices via transfer learning, assessing robustness under various conditions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1124-1138"},"PeriodicalIF":2.7,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11202249","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anderson Accelerated Operator Splitting Methods for Convex-Nonconvex Regularized Problems 凸-非凸正则化问题的Anderson加速算子分裂方法
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-06 DOI: 10.1109/OJSP.2025.3618583
Qiang Heng;Xiaoqian Liu;Eric C. Chi
Convex–nonconvex (CNC) regularization is a novel paradigm that employs a nonconvex penalty function while preserving the convexity of the overall objective function. It has found successful applications in signal processing, statistics, and machine learning. Despite its wide applicability, the computation of CNC-regularized problems is still dominated by the forward–backward splitting method, which can be computationally slow in practice and is restricted to handling a single regularizer. To address these limitations, we develop a unified Anderson acceleration framework that encompasses multiple prevalent operator-splitting schemes, thereby enabling the efficient solution of a broad class of CNC-regularized problems with a quadratic data-fidelity term. We establish global convergence of the proposed algorithm to an optimal point and demonstrate its substantial speed-ups across diverse applications.
凸-非凸正则化(CNC)是一种采用非凸惩罚函数同时保持整体目标函数的凸性的新范式。它已经在信号处理、统计学和机器学习中得到了成功的应用。尽管具有广泛的适用性,但cnc正则化问题的计算仍然以前向向后分裂方法为主,这种方法在实际中计算速度很慢,并且仅限于处理单个正则化器。为了解决这些限制,我们开发了一个统一的安德森加速框架,该框架包含多个流行的算子分裂方案,从而能够有效地解决具有二次数据保真度项的广泛类别的cnc正则化问题。我们建立了该算法到最优点的全局收敛性,并证明了其在不同应用中的显著加速。
{"title":"Anderson Accelerated Operator Splitting Methods for Convex-Nonconvex Regularized Problems","authors":"Qiang Heng;Xiaoqian Liu;Eric C. Chi","doi":"10.1109/OJSP.2025.3618583","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3618583","url":null,"abstract":"Convex–nonconvex (CNC) regularization is a novel paradigm that employs a nonconvex penalty function while preserving the convexity of the overall objective function. It has found successful applications in signal processing, statistics, and machine learning. Despite its wide applicability, the computation of CNC-regularized problems is still dominated by the forward–backward splitting method, which can be computationally slow in practice and is restricted to handling a single regularizer. To address these limitations, we develop a unified Anderson acceleration framework that encompasses multiple prevalent operator-splitting schemes, thereby enabling the efficient solution of a broad class of CNC-regularized problems with a quadratic data-fidelity term. We establish global convergence of the proposed algorithm to an optimal point and demonstrate its substantial speed-ups across diverse applications.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1094-1108"},"PeriodicalIF":2.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11194222","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameter-Efficient Multi-Task and Multi-Domain Learning Using Factorized Tensor Networks 基于分解张量网络的参数高效多任务多域学习
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-22 DOI: 10.1109/OJSP.2025.3613142
Yash Garg;Nebiyou Yismaw;Rakib Hyder;Ashley Prater-Bennette;Amit Roy-Chowdhury;M. Salman Asif
Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The primary challenge and opportunity lie in leveraging shared information across these tasks and domains to enhance the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we introduce a factorized tensor network (FTN) designed to achieve accuracy comparable to that of independent single-task or single-domain networks, while introducing a minimal number of additional parameters. The FTN approach entails incorporating task- or domain-specific low-rank tensor factors into a shared frozen network derived from a source model. This strategy allows for adaptation to numerous target domains and tasks without encountering catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. Our findings indicate that FTN attains similar accuracy as single-task or single-domain methods while using only a fraction of additional parameters per task.
多任务和多领域学习方法寻求使用一个统一的网络,共同或依次学习多个任务/领域。主要的挑战和机遇在于利用这些任务和领域之间的共享信息来提高统一网络的效率。效率可以体现在准确性、存储成本、计算或样本复杂性方面。在本文中,我们引入了一个分解张量网络(FTN),旨在达到与独立的单任务或单域网络相当的精度,同时引入了最少数量的附加参数。FTN方法需要将任务或领域特定的低秩张量因子合并到从源模型派生的共享冻结网络中。这种策略允许适应许多目标领域和任务,而不会遇到灾难性的遗忘。此外,与现有方法相比,FTN需要的任务特定参数数量要少得多。我们在广泛使用的多领域和多任务数据集上进行了实验。我们展示了基于不同主干的卷积架构和基于变压器的架构的实验。我们的研究结果表明,FTN在每个任务只使用一小部分额外参数的情况下,获得了与单任务或单域方法相似的准确性。
{"title":"Parameter-Efficient Multi-Task and Multi-Domain Learning Using Factorized Tensor Networks","authors":"Yash Garg;Nebiyou Yismaw;Rakib Hyder;Ashley Prater-Bennette;Amit Roy-Chowdhury;M. Salman Asif","doi":"10.1109/OJSP.2025.3613142","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3613142","url":null,"abstract":"Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The primary challenge and opportunity lie in leveraging shared information across these tasks and domains to enhance the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we introduce a factorized tensor network (FTN) designed to achieve accuracy comparable to that of independent single-task or single-domain networks, while introducing a minimal number of additional parameters. The FTN approach entails incorporating task- or domain-specific low-rank tensor factors into a shared frozen network derived from a source model. This strategy allows for adaptation to numerous target domains and tasks without encountering catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. Our findings indicate that FTN attains similar accuracy as single-task or single-domain methods while using only a fraction of additional parameters per task.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1077-1085"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11175489","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Upsampling of Head-Related Impulse Responses via Elevation-Wise Encoder-Decoder Networks 通过高程方向编码器-解码器网络的头部相关脉冲响应空间上采样
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-22 DOI: 10.1109/OJSP.2025.3613209
Camilo Arevalo;Julián Villegas
A method for performing spatial upsampling of Head-Related Impulse Responses (HRIRs) from sparse measurements is introduced. Based on a supervised elevation-wise encoder-decoder network design, we present two variants: one that performs progressive reconstructions with feed-forward connections from higher to lower elevations, and another that excludes these connections. The variants were evaluated in terms of the errors in interaural time and level differences, as well as the spectral distortion in the ipsilateral and contralateral ears. The additional complexity introduced by the variant with feed-forward connections does not always translate into accuracy gains, making the simpler variant preferable for efficiency. Performance generally improved as the number of available measurements increased. However, accuracy was also found to strongly depend on the spatial distribution of those measurements. Compared to an average non-personalized HRIRs, interaural time differences remain similar, while the proposed method achieves higher spectral and level accuracy, highlighting its practical use for HRIR upsampling.
介绍了一种利用稀疏测量对头部相关脉冲响应进行空间上采样的方法。基于监督海拔方向的编码器-解码器网络设计,我们提出了两种变体:一种是通过从高海拔到低海拔的前馈连接执行渐进式重建,另一种是排除这些连接。根据耳间时间误差和水平差,以及同侧和对侧耳的频谱失真来评估变异。带有前馈连接的变体所带来的额外复杂性并不总是转化为准确性的提高,这使得更简单的变体更有利于效率。性能通常随着可用度量的增加而提高。然而,人们还发现,准确性在很大程度上取决于这些测量的空间分布。与非个性化的平均HRIR相比,该方法保持了相似的时间差异,同时获得了更高的光谱和水平精度,突出了其在HRIR上采样中的实用性。
{"title":"Spatial Upsampling of Head-Related Impulse Responses via Elevation-Wise Encoder-Decoder Networks","authors":"Camilo Arevalo;Julián Villegas","doi":"10.1109/OJSP.2025.3613209","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3613209","url":null,"abstract":"A method for performing spatial upsampling of Head-Related Impulse Responses (HRIRs) from sparse measurements is introduced. Based on a supervised elevation-wise encoder-decoder network design, we present two variants: one that performs progressive reconstructions with feed-forward connections from higher to lower elevations, and another that excludes these connections. The variants were evaluated in terms of the errors in interaural time and level differences, as well as the spectral distortion in the ipsilateral and contralateral ears. The additional complexity introduced by the variant with feed-forward connections does not always translate into accuracy gains, making the simpler variant preferable for efficiency. Performance generally improved as the number of available measurements increased. However, accuracy was also found to strongly depend on the spatial distribution of those measurements. Compared to an average non-personalized HRIRs, interaural time differences remain similar, while the proposed method achieves higher spectral and level accuracy, highlighting its practical use for HRIR upsampling.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1086-1093"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11175513","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Upsampling of Head-Related Transfer Function Using Neural Network Conditioned on Source Position and Frequency 基于源位置和频率的神经网络头部相关传递函数空间上采样
IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-22 DOI: 10.1109/OJSP.2025.3613132
Yuki Ito;Tomohiko Nakamura;Shoichi Koyama;Shuichi Sakamoto;Hiroshi Saruwatari
A spatial upsampling method for the head-related transfer function (HRTF) using deep neural networks (DNNs), consisting of an autoencoder conditioned on the source position and frequency, is proposed. On the basis of our finding that the conventional regularized linear regression (RLR)-based upsampling method can be reinterpreted as a linear autoencoder, we designed our network architecture as a nonlinear extension of the RLR-based method, whose key features are the encoder and decoder weights depending on the source positions and the latent variables independent of the source positions. We also extend this architecture to upsample HRTFs and interaural time differences (ITDs) in a single network, which allows us to efficiently obtain head-related impulse responses (HRIRs). Experimental results on upsampling accuracy and perceptual quality indicated that our proposed method can upsample HRTFs from sparse measurements with sufficient quality.
提出了一种基于深度神经网络(dnn)的头部相关传递函数(HRTF)空间上采样方法,该方法由一个以源位置和频率为条件的自编码器组成。基于传统的正则化线性回归(RLR)上采样方法可以被重新解释为线性自编码器,我们将网络架构设计为基于RLR方法的非线性扩展,其关键特征是编码器和解码器的权重取决于源位置和独立于源位置的潜在变量。我们还扩展了该架构,在单个网络中对hrtf和内部时差(ITDs)进行上采样,这使我们能够有效地获得头部相关脉冲响应(HRIRs)。上采样精度和感知质量的实验结果表明,该方法能够以足够的质量从稀疏测量中上采样hrtf。
{"title":"Spatial Upsampling of Head-Related Transfer Function Using Neural Network Conditioned on Source Position and Frequency","authors":"Yuki Ito;Tomohiko Nakamura;Shoichi Koyama;Shuichi Sakamoto;Hiroshi Saruwatari","doi":"10.1109/OJSP.2025.3613132","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3613132","url":null,"abstract":"A spatial upsampling method for the head-related transfer function (HRTF) using deep neural networks (DNNs), consisting of an autoencoder conditioned on the source position and frequency, is proposed. On the basis of our finding that the conventional regularized linear regression (RLR)-based upsampling method can be reinterpreted as a linear autoencoder, we designed our network architecture as a nonlinear extension of the RLR-based method, whose key features are the encoder and decoder weights depending on the source positions and the latent variables independent of the source positions. We also extend this architecture to upsample HRTFs and interaural time differences (ITDs) in a single network, which allows us to efficiently obtain head-related impulse responses (HRIRs). Experimental results on upsampling accuracy and perceptual quality indicated that our proposed method can upsample HRTFs from sparse measurements with sufficient quality.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1109-1123"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11175492","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145405295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1