IEEE open journal of signal processing最新文献_第10页

List of Reviewers 审查员名单

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-12-08 DOI: 10.1109/OJSP.2023.3338430

引用次数: 0

Attention and Sequence Modeling for Match-Mismatch Classification of Speech Stimulus and EEG Response 语音刺激和脑电图反应匹配-不匹配分类的注意力和序列建模

IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-12-06 DOI: 10.1109/OJSP.2023.3340063

Marvin Borsdorf;Siqi Cai;Saurav Pahuja;Dashanka De Silva;Haizhou Li;Tanja Schultz

For the development of neuro-steered hearing aids, it is important to study the relationship between a speech stimulus and the elicited EEG response of a human listener. The recent Auditory EEG Decoding Challenge 2023 (Signal Processing Grand Challenge, IEEE International Conference on Acoustics, Speech and Signal Processing) dealt with this relationship in the context of a match-mismatch classification task. The challenge's task was to find the speech stimulus that elicited a specific EEG response from two given speech stimuli. Participating in the challenge, we adopted the challenge's baseline model and explored an attention encoder to replace the spatial convolution in the EEG processing pipeline, as well as additional sequence modeling methods based on RNN, LSTM, and GRU to preprocess the speech stimuli. We compared speech envelopes and mel-spectrograms as two different types of input speech stimulus and evaluated our models on a test set as well as held-out stories and held-out subjects benchmark sets. In this work, we show that the mel-spectrograms generally offer better results. Replacing the spatial convolution with an attention encoder helps to capture better spatial and temporal information in the EEG response. Additionally, the sequence modeling methods can further enhance the performance, when mel-spectrograms are used. Consequently, both lead to higher performances on the test set and held-out stories benchmark set. Our best model outperforms the baseline by 1.91% on the test set and 1.35% on the total ranking score. We ranked second in the challenge.

为了开发神经导向助听器，研究语音刺激与人类听者脑电图反应之间的关系非常重要。最近举办的听觉脑电图解码挑战赛 2023（信号处理大挑战，IEEE 声学、语音和信号处理国际会议）在匹配-不匹配分类任务的背景下讨论了这种关系。挑战赛的任务是从两个给定的语音刺激中找出能引起特定脑电图反应的语音刺激。在参与挑战赛的过程中，我们采用了挑战赛的基线模型，并探索了一种注意力编码器来取代脑电图处理管道中的空间卷积，以及基于 RNN、LSTM 和 GRU 的其他序列建模方法来预处理语音刺激。我们比较了作为两种不同类型输入语音刺激的语音包络线和旋律频谱图，并在测试集以及憋出的故事和憋出的受试者基准集上评估了我们的模型。在这项工作中，我们发现旋律频谱图通常能提供更好的结果。用注意力编码器取代空间卷积有助于更好地捕捉 EEG 反应中的空间和时间信息。此外，序列建模方法也能在使用旋律谱图时进一步提高性能。因此，这两种方法都能在测试集和保留故事基准集上获得更高的性能。我们的最佳模型在测试集上比基准高出 1.91%，在总排名得分上比基准高出 1.35%。我们在挑战赛中排名第二。

{"title":"Attention and Sequence Modeling for Match-Mismatch Classification of Speech Stimulus and EEG Response","authors":"Marvin Borsdorf;Siqi Cai;Saurav Pahuja;Dashanka De Silva;Haizhou Li;Tanja Schultz","doi":"10.1109/OJSP.2023.3340063","DOIUrl":"https://doi.org/10.1109/OJSP.2023.3340063","url":null,"abstract":"For the development of neuro-steered hearing aids, it is important to study the relationship between a speech stimulus and the elicited EEG response of a human listener. The recent Auditory EEG Decoding Challenge 2023 (Signal Processing Grand Challenge, IEEE International Conference on Acoustics, Speech and Signal Processing) dealt with this relationship in the context of a match-mismatch classification task. The challenge's task was to find the speech stimulus that elicited a specific EEG response from two given speech stimuli. Participating in the challenge, we adopted the challenge's baseline model and explored an attention encoder to replace the spatial convolution in the EEG processing pipeline, as well as additional sequence modeling methods based on RNN, LSTM, and GRU to preprocess the speech stimuli. We compared speech envelopes and mel-spectrograms as two different types of input speech stimulus and evaluated our models on a test set as well as held-out stories and held-out subjects benchmark sets. In this work, we show that the mel-spectrograms generally offer better results. Replacing the spatial convolution with an attention encoder helps to capture better spatial and temporal information in the EEG response. Additionally, the sequence modeling methods can further enhance the performance, when mel-spectrograms are used. Consequently, both lead to higher performances on the test set and held-out stories benchmark set. Our best model outperforms the baseline by 1.91% on the test set and 1.35% on the total ranking score. We ranked second in the challenge.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"799-809"},"PeriodicalIF":2.9,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10345738","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142013430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low Cost Variable Step-Size LMS With Maximum Similarity to the Affine Projection Algorithm 与仿射投影算法具有最大相似性的低成本可变步长 LMS

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-12-06 DOI: 10.1109/OJSP.2023.3340106

Miguel Ferrer;María de Diego;Alberto Gonzalez

The LMS algorithm is widely employed in adaptive systems due to its robustness, simplicity, and reasonable performance. However, it is well known that this algorithm suffers from a slow convergence speed when dealing with colored reference signals. Numerous variants and alternative algorithms have been proposed to address this issue, though all of them entail an increase in computational cost. Among the proposed alternatives, the affine projection algorithm stands out. This algorithm has the peculiarity of starting from

$N$

data vectors of the reference signal. It transforms these vectors into as many data vectors suitably normalized in energy and mutually orthogonal. In this work, we propose a version of the LMS algorithm that, similar to the affine projection algorithm, starts from

$N$

data vectors of the reference signal but corrects them by using only a scalar factor that functions as a convergence step. Our goal is to align the behavior of this algorithm with the behavior of the affine projection algorithm without significantly increasing the computational cost of the LMS.

LMS 算法因其鲁棒性、简单性和合理的性能而被广泛应用于自适应系统中。然而，众所周知，该算法在处理彩色参考信号时收敛速度较慢。为了解决这个问题，人们提出了许多变种和替代算法，但所有这些算法都会增加计算成本。在提出的替代算法中，仿射投影算法脱颖而出。该算法的特点是从 $N$ 的参考信号数据向量开始。它将这些向量转换成尽可能多的能量归一化且相互正交的数据向量。在这项工作中，我们提出了 LMS 算法的一个版本，该版本与仿射投影算法类似，从参考信号的 $N$ 数据向量开始，但只使用一个标量因子作为收敛步骤对其进行修正。我们的目标是使该算法的行为与仿射投影算法的行为相一致，同时不显著增加 LMS 的计算成本。

{"title":"Low Cost Variable Step-Size LMS With Maximum Similarity to the Affine Projection Algorithm","authors":"Miguel Ferrer;María de Diego;Alberto Gonzalez","doi":"10.1109/OJSP.2023.3340106","DOIUrl":"https://doi.org/10.1109/OJSP.2023.3340106","url":null,"abstract":"The LMS algorithm is widely employed in adaptive systems due to its robustness, simplicity, and reasonable performance. However, it is well known that this algorithm suffers from a slow convergence speed when dealing with colored reference signals. Numerous variants and alternative algorithms have been proposed to address this issue, though all of them entail an increase in computational cost. Among the proposed alternatives, the affine projection algorithm stands out. This algorithm has the peculiarity of starting from \u0000<inline-formula><tex-math>$N$</tex-math></inline-formula>\u0000 data vectors of the reference signal. It transforms these vectors into as many data vectors suitably normalized in energy and mutually orthogonal. In this work, we propose a version of the LMS algorithm that, similar to the affine projection algorithm, starts from \u0000<inline-formula><tex-math>$N$</tex-math></inline-formula>\u0000 data vectors of the reference signal but corrects them by using only a scalar factor that functions as a convergence step. Our goal is to align the behavior of this algorithm with the behavior of the affine projection algorithm without significantly increasing the computational cost of the LMS.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"82-91"},"PeriodicalIF":0.0,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10345730","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139060180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Neural-SRP Method for Universal Robust Multi-Source Tracking 用于通用鲁棒性多源跟踪的神经-SRP 方法

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-12-06 DOI: 10.1109/OJSP.2023.3340057

Eric Grinstein;Christopher M. Hicks;Toon van Waterschoot;Mike Brookes;Patrick A. Naylor

Neural networks have achieved state-of-the-art performance on the task of acoustic Direction-of-Arrival (DOA) estimation using microphone arrays. Neural models can be classified as end-to-end or hybrid, each class showing advantages and disadvantages. This work introduces Neural-SRP, an end-to-end neural network architecture for DOA estimation inspired by the classical Steered Response Power (SRP) method, which overcomes limitations of current neural models. We evaluate the architecture on multiple scenarios, namely, multi-source DOA tracking and single-source DOA tracking under the presence of directional and diffuse noise. The experiments demonstrate that our proposed method compares favourably in terms of computational and localization performance with established neural methods on various recorded and simulated benchmark datasets.

神经网络在利用麦克风阵列进行声波到达方向（DOA）估算的任务中取得了最先进的性能。神经模型可分为端到端和混合模型，每一类都各有优缺点。本研究介绍了 Neural-SRP，这是一种用于 DOA 估计的端到端神经网络架构，其灵感来自经典的转向响应功率（SRP）方法，克服了当前神经模型的局限性。我们在多种情况下对该架构进行了评估，即在存在定向噪声和漫反射噪声的情况下进行多源 DOA 跟踪和单源 DOA 跟踪。实验证明，在各种记录和模拟基准数据集上，我们提出的方法在计算和定位性能方面优于已有的神经方法。

引用次数: 0

On Minimizing the Probability of Large Errors in Robust Point Cloud Registration 论最小化鲁棒性点云注册中的大误差概率

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-12-06 DOI: 10.1109/OJSP.2023.3340111

AMIT EFRAIM;Joseph M. Francos

In solving a model fitting problem, the existence of outliers in the set of measurements can have a devastating effect on the solution accuracy. Traditionally, in order to overcome this problem, robust point cloud registration algorithms are composed of transformation hypothesis generation, followed by hypothesis evaluation aimed at selecting the best hypothesized estimate. Hypotheses evaluation is commonly performed using the sample consensus criterion. However, since this criterion accounts only for the consensus size, it fails when the maximal sample consensus is incorrect. We propose a new hypothesis evaluation approach, generalizing the sample consensus approach, where instead of the sample consensus, the transformation that maximizes the point clouds feature correlation is selected as the best hypothesis. The feature vector at each point contains information such as on local geometry and semantic context. Utilizing this information in the hypotheses evaluation and selection procedure allows for a correct decision even when the hypothesis yielding the maximal sample consensus is false. Consequently, the probability of selecting the correct model increases. We show both mathematically and empirically that substituting the sample consensus criterion with the proposed point cloud feature correlation hypothesis test (PC-FCHT) lowers the probability of large registration errors, compared to using the special case of sample consensus. The proposed PC-FCHT is applicable to any algorithm that follows the hypothesis generation and evaluation scheme, potentially improving the success rates of a wide family of point cloud registration algorithms.

在解决模型拟合问题时，测量集合中存在的离群值会对求解精度产生破坏性影响。传统上，为了克服这一问题，稳健的点云注册算法由变换假设生成和假设评估两部分组成，假设评估的目的是选出最佳的假设估计值。假设评估通常采用样本共识标准。然而，由于该标准只考虑共识大小，因此当最大样本共识不正确时，该标准就会失效。我们在样本共识的基础上提出了一种新的假设评估方法，即选择点云特征相关性最大的变换作为最佳假设，而不是样本共识。每个点的特征向量都包含局部几何和语义背景等信息。在假设评估和选择过程中利用这些信息，即使产生最大样本共识的假设是错误的，也能做出正确的决定。因此，选择正确模型的概率也会增加。我们从数学和经验两方面证明，与使用样本共识的特殊情况相比，用提出的点云特征相关性假设检验（PC-FCHT）代替样本共识标准可降低出现较大注册错误的概率。提出的 PC-FCHT 适用于任何遵循假设生成和评估方案的算法，有可能提高各种点云注册算法的成功率。

{"title":"On Minimizing the Probability of Large Errors in Robust Point Cloud Registration","authors":"AMIT EFRAIM;Joseph M. Francos","doi":"10.1109/OJSP.2023.3340111","DOIUrl":"https://doi.org/10.1109/OJSP.2023.3340111","url":null,"abstract":"In solving a model fitting problem, the existence of outliers in the set of measurements can have a devastating effect on the solution accuracy. Traditionally, in order to overcome this problem, robust point cloud registration algorithms are composed of transformation hypothesis generation, followed by hypothesis evaluation aimed at selecting the best hypothesized estimate. Hypotheses evaluation is commonly performed using the sample consensus criterion. However, since this criterion accounts only for the consensus size, it fails when the maximal sample consensus is incorrect. We propose a new hypothesis evaluation approach, generalizing the sample consensus approach, where instead of the sample consensus, the transformation that maximizes the point clouds feature correlation is selected as the best hypothesis. The feature vector at each point contains information such as on local geometry and semantic context. Utilizing this information in the hypotheses evaluation and selection procedure allows for a correct decision even when the hypothesis yielding the maximal sample consensus is false. Consequently, the probability of selecting the correct model increases. We show both mathematically and empirically that substituting the sample consensus criterion with the proposed point cloud feature correlation hypothesis test (PC-FCHT) lowers the probability of large registration errors, compared to using the special case of sample consensus. The proposed PC-FCHT is applicable to any algorithm that follows the hypothesis generation and evaluation scheme, potentially improving the success rates of a wide family of point cloud registration algorithms.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"39-47"},"PeriodicalIF":0.0,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10345750","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139060220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploiting a Spatial Attention Mechanism for Improved Depth Completion and Feature Fusion in Novel View Synthesis 利用空间注意力机制改进新视图合成中的深度补全和特征融合

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-12-06 DOI: 10.1109/OJSP.2023.3340064

Anh Minh Truong;Wilfried Philips;Peter Veelaert

Many image-based rendering (IBR) methods rely on depth estimates obtained from structured light or time-of-flight depth sensors to synthesize novel views from sparse camera networks. However, these estimates often contain missing or noisy regions, resulting in an incorrect mapping between source and target views. This situation makes the fusion process more challenging, as the visual information is misaligned, inconsistent, or missing. In this work, we first implement a lightweight network based on the transformer, which is well-known for its capability to model long-range relationships within input data, to extract spatial features from color images. These features are then used to enhance the quality of completed depth maps. Furthermore, we combine a sequential deep neural network with a spatial attention mechanism to effectively fuse the projected features from multiple source viewpoints. This approach enables us to integrate information from an arbitrary number of source viewpoints as well as improve accuracy in synthesized views. Experimental results on challenging datasets demonstrate that our method achieves superior synthesized image quality compared to state-of-the-art (SOTA) methods.

许多基于图像的渲染（IBR）方法都依赖于从结构光或飞行时间深度传感器获得的深度估计值，以便从稀疏的摄像机网络合成新的视图。然而，这些估计值往往包含缺失或噪声区域，导致源视图和目标视图之间的映射不正确。由于视觉信息不对齐、不一致或缺失，这种情况使得融合过程更具挑战性。在这项工作中，我们首先实施了基于变换器的轻量级网络，从彩色图像中提取空间特征。然后利用这些特征来提高完成的深度图的质量。此外，我们还将顺序深度神经网络与空间注意力机制相结合，以有效融合来自多个源视角的投射特征。这种方法使我们能够整合来自任意数量源视点的信息，并提高合成视图的准确性。在具有挑战性的数据集上的实验结果表明，与最先进的（SOTA）方法相比，我们的方法实现了更高的合成图像质量。

{"title":"Exploiting a Spatial Attention Mechanism for Improved Depth Completion and Feature Fusion in Novel View Synthesis","authors":"Anh Minh Truong;Wilfried Philips;Peter Veelaert","doi":"10.1109/OJSP.2023.3340064","DOIUrl":"https://doi.org/10.1109/OJSP.2023.3340064","url":null,"abstract":"Many image-based rendering (IBR) methods rely on depth estimates obtained from structured light or time-of-flight depth sensors to synthesize novel views from sparse camera networks. However, these estimates often contain missing or noisy regions, resulting in an incorrect mapping between source and target views. This situation makes the fusion process more challenging, as the visual information is misaligned, inconsistent, or missing. In this work, we first implement a lightweight network based on the transformer, which is well-known for its capability to model long-range relationships within input data, to extract spatial features from color images. These features are then used to enhance the quality of completed depth maps. Furthermore, we combine a sequential deep neural network with a spatial attention mechanism to effectively fuse the projected features from multiple source viewpoints. This approach enables us to integrate information from an arbitrary number of source viewpoints as well as improve accuracy in synthesized views. Experimental results on challenging datasets demonstrate that our method achieves superior synthesized image quality compared to state-of-the-art (SOTA) methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"204-212"},"PeriodicalIF":0.0,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10345792","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139060244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Kronecker-Product Beamforming With Sparse Concentric Circular Arrays 使用稀疏同心圆阵列的克朗克积波束成形

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-12-05 DOI: 10.1109/OJSP.2023.3339433

Gal Itzhak;Israel Cohen

This article presents a Kronecker-product (KP) beamforming approach incorporating sparse concentric circular arrays (SCCAs). The locations of the microphones on the SCCA are optimized concerning the broadband array directivity over a wide range of direction-of-arrival (DOA) deviations of a desired signal. A maximum directivity factor (MDF) sub-beamformer is derived accordingly with the optimal locations. Then, we propose two global beamformers obtained as a Kronecker product of a uniform linear array (ULA) and the SCCA sub-beamformer. The global beamformers differ by the type of the ULA, which is designed either as an MDF sub-beamformer along the

$mathsf {x}$

-axis or as a maximum white noise gain sub-beamformer along the

$mathsf {y}$

-axis. We analyze the performance of the proposed beamformers in terms of the directivity factor, the white noise gain, and their spatial beampatterns. Compared to traditional beamformers, the proposed beamformers exhibit considerably larger tolerance to DOA deviations concerning both the azimuth and elevation angles. Experimental results with speech signals in noisy and reverberant environments demonstrate that the proposed approach outperforms traditional beamformers regarding the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) scores when the desired speech signals deviate from the nominal DOA.

本文介绍了一种结合稀疏同心圆阵列（SCCA）的克罗内克乘积（KP）波束成形方法。SCCA 上传声器的位置经过优化，可在所需信号到达方向（DOA）偏差的大范围内实现宽带阵列指向性。根据最佳位置推导出最大指向性因子（MDF）子波束成形器。然后，我们提出了两个全局波束成形器，它们是均匀线性阵列（ULA）和 SCCA 子波束成形器的 Kronecker 乘积。全局波束成形器因 ULA 的类型而异，ULA 要么被设计为沿 $mathsf {x}$ 轴的 MDF 子波束成形器，要么被设计为沿 $mathsf {y}$ 轴的最大白噪声增益子波束成形器。我们从指向性因子、白噪声增益及其空间振型等方面分析了所提波束成形器的性能。与传统的波束成形器相比，所提出的波束成形器在方位角和仰角方面对 DOA 偏差的容忍度要大得多。对嘈杂和混响环境中语音信号的实验结果表明，当所需语音信号偏离标称 DOA 时，所提出的方法在语音质量感知评估（PESQ）和短时客观可懂度（STOI）评分方面优于传统波束成形器。

{"title":"Kronecker-Product Beamforming With Sparse Concentric Circular Arrays","authors":"Gal Itzhak;Israel Cohen","doi":"10.1109/OJSP.2023.3339433","DOIUrl":"https://doi.org/10.1109/OJSP.2023.3339433","url":null,"abstract":"This article presents a Kronecker-product (KP) beamforming approach incorporating sparse concentric circular arrays (SCCAs). The locations of the microphones on the SCCA are optimized concerning the broadband array directivity over a wide range of direction-of-arrival (DOA) deviations of a desired signal. A maximum directivity factor (MDF) sub-beamformer is derived accordingly with the optimal locations. Then, we propose two global beamformers obtained as a Kronecker product of a uniform linear array (ULA) and the SCCA sub-beamformer. The global beamformers differ by the type of the ULA, which is designed either as an MDF sub-beamformer along the \u0000<inline-formula><tex-math>$mathsf {x}$</tex-math></inline-formula>\u0000-axis or as a maximum white noise gain sub-beamformer along the \u0000<inline-formula><tex-math>$mathsf {y}$</tex-math></inline-formula>\u0000-axis. We analyze the performance of the proposed beamformers in terms of the directivity factor, the white noise gain, and their spatial beampatterns. Compared to traditional beamformers, the proposed beamformers exhibit considerably larger tolerance to DOA deviations concerning both the azimuth and elevation angles. Experimental results with speech signals in noisy and reverberant environments demonstrate that the proposed approach outperforms traditional beamformers regarding the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) scores when the desired speech signals deviate from the nominal DOA.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"64-72"},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10342869","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139060221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unrolling of Simplicial ElasticNet for Edge Flow Signal Reconstruction 用于边缘流信号重构的简化弹性网络的展开

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-12-05 DOI: 10.1109/OJSP.2023.3339376

Chengen Liu;Geert Leus;Elvin Isufi

The edge flow reconstruction task consists of retreiving edge flow signals from corrupted or incomplete measurements. This is typically solved by a regularized optimization problem on higher-order networks such as simplicial complexes and the corresponding regularizers are chosen based on prior knowledge. Tailoring this prior to the setting of interest can be challenging or it may not even be possible. Thus, we consider to learn this prior knowledge via a model-based deep learning approach. We propose a new regularized optimization problem for the simplicial edge flow reconstruction task, the simplicial ElasticNet, which combines the advantages of the

$ell _{1}$

and

$ell _{2}$

norms. We solve the simplicial ElasticNet problem via the multi-block alternating direction method of multipliers (ADMM) algorithm and provide conditions on its convergence. By unrolling the ADMM iterative steps, we develop a model-based neural network with a low requirement on the number of training data. This unrolling network replaces the fixed parameters in the iterative algorithm by learnable weights, thus exploiting the neural network's learning capability while preserving the iterative algorithm's interpretability. We enhance this unrolling network via simplicial convolutional filters to aggregate information from the edge flow neighbors, ultimately, improving the network learning expressivity. Extensive experiments on real-world and synthetic datasets validate the proposed approaches and show considerable improvements over both baselines and traditional non-model-based neural networks.

边缘流重建任务包括从损坏或不完整的测量中重新获得边缘流信号。这通常是通过高阶网络（如简单复数）上的正则优化问题来解决的，而相应的正则是根据先验知识选择的。根据感兴趣的设置调整这种先验知识可能具有挑战性，甚至不可能做到。因此，我们考虑通过基于模型的深度学习方法来学习这种先验知识。我们针对简边流重构任务提出了一个新的正则化优化问题--简边弹性网，它结合了$ell _{1}$和$ell _{2}$规范的优点。我们通过多块交替乘法（ADMM）算法来解决简单弹性网问题，并提供了收敛条件。通过展开 ADMM 迭代步骤，我们开发了一种对训练数据数量要求较低的基于模型的神经网络。这种解滚动网络用可学习的权重取代了迭代算法中的固定参数，从而利用了神经网络的学习能力，同时保留了迭代算法的可解释性。我们通过简单卷积滤波器增强了这种开卷网络，以聚合来自边缘流邻居的信息，最终提高了网络的学习表达能力。在真实世界和合成数据集上进行的大量实验验证了所提出的方法，并显示出与基线和传统的非基于模型的神经网络相比，这些方法都有显著的改进。

{"title":"Unrolling of Simplicial ElasticNet for Edge Flow Signal Reconstruction","authors":"Chengen Liu;Geert Leus;Elvin Isufi","doi":"10.1109/OJSP.2023.3339376","DOIUrl":"https://doi.org/10.1109/OJSP.2023.3339376","url":null,"abstract":"The edge flow reconstruction task consists of retreiving edge flow signals from corrupted or incomplete measurements. This is typically solved by a regularized optimization problem on higher-order networks such as simplicial complexes and the corresponding regularizers are chosen based on prior knowledge. Tailoring this prior to the setting of interest can be challenging or it may not even be possible. Thus, we consider to learn this prior knowledge via a model-based deep learning approach. We propose a new regularized optimization problem for the simplicial edge flow reconstruction task, the simplicial ElasticNet, which combines the advantages of the \u0000<inline-formula><tex-math>$ell _{1}$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$ell _{2}$</tex-math></inline-formula>\u0000 norms. We solve the simplicial ElasticNet problem via the multi-block alternating direction method of multipliers (ADMM) algorithm and provide conditions on its convergence. By unrolling the ADMM iterative steps, we develop a model-based neural network with a low requirement on the number of training data. This unrolling network replaces the fixed parameters in the iterative algorithm by learnable weights, thus exploiting the neural network's learning capability while preserving the iterative algorithm's interpretability. We enhance this unrolling network via simplicial convolutional filters to aggregate information from the edge flow neighbors, ultimately, improving the network learning expressivity. Extensive experiments on real-world and synthetic datasets validate the proposed approaches and show considerable improvements over both baselines and traditional non-model-based neural networks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"186-194"},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10342735","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139060255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation 用于电影音源分离的广义分带神经网络

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-12-05 DOI: 10.1109/OJSP.2023.3339428

Karn N. Watcharasupat;Chih-Wei Wu;Yiwei Ding;Iroro Orife;Aaron J. Hipple;Phillip A. Williams;Scott Kramer;Alexander Lerch;William Wolcott

Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.

电影音源分离是音源分离中一个相对较新的子任务，其目的是从混合物中提取对白、音乐和特效。在这项工作中，我们开发了一个模型，对频带分割 RNN 进行了概括，适用于任何完整或过度完整的频轴分割。以心理声学为动机的频率标度被用来为频段定义提供信息，而现在的频段定义具有冗余性，可以进行更可靠的特征提取。我们提出了一个由信噪比和 1-norm 的稀疏性促进特性激发的损失函数。此外，我们还利用共同编码器设置的信息共享特性，降低了训练和推理过程中的计算复杂度，提高了难以归纳的声音类别的分离性能，并允许在推理过程中使用可拆卸解码器的灵活性。我们的最佳模型在 "Divide and Remaster "数据集上树立了技术典范，其性能高于对话干的理想比率掩码。

{"title":"A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation","authors":"Karn N. Watcharasupat;Chih-Wei Wu;Yiwei Ding;Iroro Orife;Aaron J. Hipple;Phillip A. Williams;Scott Kramer;Alexander Lerch;William Wolcott","doi":"10.1109/OJSP.2023.3339428","DOIUrl":"https://doi.org/10.1109/OJSP.2023.3339428","url":null,"abstract":"Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"73-81"},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10342812","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139060182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Neural-Enhanced Factor Graph-Based Algorithm for Robust Positioning in Obstructed LOS Situations 基于神经增强因子图的算法，用于在有障碍的 LOS 情况下进行稳健定位

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing

Pub Date : 2023-11-30 DOI: 10.1109/OJSP.2023.3338113

Alexander Venus;Erik Leitinger;Stefan Tertinek;Klaus Witrisal

This paper presents a neural-enhanced probabilistic model and corresponding factor graph-based sum-product algorithm for robust localization and tracking in multipath-prone environments. The introduced hybrid probabilistic model consists of physics-based and data-driven measurement models capturing the information contained in both, the line-of-sight (LOS) component as well as in multipath components (NLOS components). The physics-based and data-driven models are embedded in a joint Bayesian framework allowing to derive from first principles a factor graph-based algorithm that fuses the information of these models. The proposed algorithm uses radio signal measurements from multiple base stations to robustly estimate the mobile agent's position together with all model parameters. It provides high localization accuracy by exploiting the position-related information of the LOS component via the physics-based model and robustness by exploiting the geometric imprint of multipath components independent of the propagation channel via the data-driven model. In a challenging numerical experiment involving obstructed LOS situations to all anchors, we show that the proposed sequential algorithm significantly outperforms state-of-the-art methods and attains the posterior Cramér-Rao lower bound even with training data limited to local regions.

本文提出了一种神经增强概率模型和相应的基于因子图的和积算法，用于多径环境中的鲁棒定位和跟踪。引入的混合概率模型由基于物理和数据驱动的测量模型组成，可捕捉视线（LOS）分量和多径分量（NLOS 分量）中包含的信息。基于物理和数据驱动的模型被嵌入到一个联合贝叶斯框架中，从而可以从第一原理推导出一种基于因子图的算法，将这些模型的信息融合在一起。所提出的算法使用来自多个基站的无线电信号测量结果来稳健地估计移动代理的位置以及所有模型参数。该算法通过基于物理的模型，利用 LOS 分量的位置相关信息，实现了高定位精度；通过数据驱动模型，利用独立于传播信道的多径分量的几何印记，实现了鲁棒性。在一个具有挑战性的数值实验中，在所有锚点的 LOS 均受阻的情况下，我们发现所提出的序列算法明显优于最先进的方法，即使训练数据仅限于局部区域，也能达到后 Cramér-Rao 下限。

{"title":"A Neural-Enhanced Factor Graph-Based Algorithm for Robust Positioning in Obstructed LOS Situations","authors":"Alexander Venus;Erik Leitinger;Stefan Tertinek;Klaus Witrisal","doi":"10.1109/OJSP.2023.3338113","DOIUrl":"https://doi.org/10.1109/OJSP.2023.3338113","url":null,"abstract":"This paper presents a neural-enhanced probabilistic model and corresponding factor graph-based sum-product algorithm for robust localization and tracking in multipath-prone environments. The introduced hybrid probabilistic model consists of physics-based and data-driven measurement models capturing the information contained in both, the line-of-sight (LOS) component as well as in multipath components (NLOS components). The physics-based and data-driven models are embedded in a joint Bayesian framework allowing to derive from first principles a factor graph-based algorithm that fuses the information of these models. The proposed algorithm uses radio signal measurements from multiple base stations to robustly estimate the mobile agent's position together with all model parameters. It provides high localization accuracy by exploiting the position-related information of the LOS component via the physics-based model and robustness by exploiting the geometric imprint of multipath components independent of the propagation channel via the data-driven model. In a challenging numerical experiment involving obstructed LOS situations to all anchors, we show that the proposed sequential algorithm significantly outperforms state-of-the-art methods and attains the posterior Cramér-Rao lower bound even with training data limited to local regions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"29-38"},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10336409","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139060334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0