首页 > 最新文献

2022 30th European Signal Processing Conference (EUSIPCO)最新文献

英文 中文
Online QoS estimation for vehicular radio environments 车载无线电环境下的在线QoS估计
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909612
Rodrigo Hernangómez, Alexandros Palaios, Gayathri Guruvayoorappan, Martin Kasparick, N. Ain, Sławomir Stańczak
Quality of service (QoS) estimation is a key enabler in wireless networks. This has been facilitated by the increasing capabilities of machine learning (ML). However, ML algorithms often underperform when presented with non-stationary data, which is typically the case for radio environments. In such environments, ML schemes might require extra signaling for retraining. In this paper, we propose an approach to online QoS estimation, where a trained model can be taken as a base estimator and fine-tuned with information from the user equipment (UE) and the cell itself. The proposed approach is based on the Adaptive Random Forest (ARF) algorithm, which uses streaming data and reacts on changes under concept drift, i.e., to changes in the data's statistical properties. This effectively allows to retrain parts of the ML model as vehicular UEs visit diverse radio environments. We evaluate this method with real data from an extensive measurement campaign in a cellular test network that covered diverse radio environments.
服务质量(QoS)估计是无线网络的关键实现因素。机器学习(ML)不断增强的能力促进了这一点。然而,机器学习算法在处理非平稳数据时往往表现不佳,这是无线电环境的典型情况。在这样的环境中,机器学习方案可能需要额外的信号来进行再训练。在本文中,我们提出了一种在线QoS估计方法,其中训练好的模型可以作为基本估计器,并使用来自用户设备(UE)和小区本身的信息进行微调。提出的方法基于自适应随机森林(ARF)算法,该算法使用流数据并对概念漂移下的变化做出反应,即对数据统计属性的变化做出反应。这有效地允许在车辆ue访问不同的无线电环境时重新训练ML模型的部分内容。我们用覆盖多种无线电环境的蜂窝测试网络中广泛测量活动的真实数据来评估这种方法。
{"title":"Online QoS estimation for vehicular radio environments","authors":"Rodrigo Hernangómez, Alexandros Palaios, Gayathri Guruvayoorappan, Martin Kasparick, N. Ain, Sławomir Stańczak","doi":"10.23919/eusipco55093.2022.9909612","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909612","url":null,"abstract":"Quality of service (QoS) estimation is a key enabler in wireless networks. This has been facilitated by the increasing capabilities of machine learning (ML). However, ML algorithms often underperform when presented with non-stationary data, which is typically the case for radio environments. In such environments, ML schemes might require extra signaling for retraining. In this paper, we propose an approach to online QoS estimation, where a trained model can be taken as a base estimator and fine-tuned with information from the user equipment (UE) and the cell itself. The proposed approach is based on the Adaptive Random Forest (ARF) algorithm, which uses streaming data and reacts on changes under concept drift, i.e., to changes in the data's statistical properties. This effectively allows to retrain parts of the ML model as vehicular UEs visit diverse radio environments. We evaluate this method with real data from an extensive measurement campaign in a cellular test network that covered diverse radio environments.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124467319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
sEMG feature extraction using Generalized Discrete Orthonormal Stockwell Transform and Modified Multi-Dimensional Scaling 基于广义离散正交斯托克韦尔变换和改进多维尺度的表面肌电信号特征提取
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909783
Somar Karheily, A. Moukadem, Jean-Baptiste Courbot, D. Abdeslam
This paper proposes a method based on a generalized version of the Discrete Orthonormal Stockwell Transform (GDOST) with Gaussian window to extract features from surface electromyography (sEMG) signals in order to identify hand's movements. The features space derived from the GDOST is then reduced by applying a modified Multi-Dimensional Scaling (MDS) method. The proposed modification on MDS consists in using a translation in kernel building instead of the direct distance calculation. The results are compared with another study applied on the same dataset where usual DOST and MDS are applied. We achieved significant improvements in classification accuracy, attaining 97.56% for 17 hand movements.
本文提出了一种基于广义离散正交斯托克韦尔变换(GDOST)和高斯窗的方法,从肌电信号中提取特征以识别手部运动。然后采用改进的多维尺度(Multi-Dimensional Scaling, MDS)方法对GDOST得到的特征空间进行缩减。提出的改进MDS的方法是在内核构建中使用翻译来代替直接的距离计算。结果与应用于同一数据集的另一项研究进行了比较,其中应用了通常的DOST和MDS。我们在17个手部动作的分类准确率上取得了显著的提高,达到了97.56%。
{"title":"sEMG feature extraction using Generalized Discrete Orthonormal Stockwell Transform and Modified Multi-Dimensional Scaling","authors":"Somar Karheily, A. Moukadem, Jean-Baptiste Courbot, D. Abdeslam","doi":"10.23919/eusipco55093.2022.9909783","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909783","url":null,"abstract":"This paper proposes a method based on a generalized version of the Discrete Orthonormal Stockwell Transform (GDOST) with Gaussian window to extract features from surface electromyography (sEMG) signals in order to identify hand's movements. The features space derived from the GDOST is then reduced by applying a modified Multi-Dimensional Scaling (MDS) method. The proposed modification on MDS consists in using a translation in kernel building instead of the direct distance calculation. The results are compared with another study applied on the same dataset where usual DOST and MDS are applied. We achieved significant improvements in classification accuracy, attaining 97.56% for 17 hand movements.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117056085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time-varying Normalizing Flow for Generative Modeling of Dynamical Signals 动态信号生成建模的时变归一化流程
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909640
Anubhab Ghosh, Aleix Espuña Fontcuberta, M. Abdalmoaty, S. Chatterjee
We develop a time-varying normalizing flow (TVNF) for explicit generative modeling of dynamical signals. Being explicit, it can generate samples of dynamical signals, and compute the likelihood of a (given) dynamical signal sample. In the proposed model, signal flow in the layers of the normalizing flow is a function of time, which is realized using an encoded representation that is the output of a recurrent neural network (RNN). Given a set of dynamical signals, the parameters of TVNF are learned according to maximum-likelihood approach in conjunction with gradient descent (backpropagation). Use of the proposed model is illustrated for a toy application scenario - maximum-likelihood based speech-phone classification task.
我们开发了一种时变归一化流(TVNF)用于动态信号的显式生成建模。它是显式的,可以生成动态信号的样本,并计算(给定)动态信号样本的似然。在提出的模型中,归一化流层中的信号流是时间的函数,这是使用循环神经网络(RNN)输出的编码表示来实现的。给定一组动态信号,根据最大似然方法结合梯度下降(反向传播)学习TVNF的参数。在基于最大似然的语音电话分类任务中,给出了该模型的应用实例。
{"title":"Time-varying Normalizing Flow for Generative Modeling of Dynamical Signals","authors":"Anubhab Ghosh, Aleix Espuña Fontcuberta, M. Abdalmoaty, S. Chatterjee","doi":"10.23919/eusipco55093.2022.9909640","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909640","url":null,"abstract":"We develop a time-varying normalizing flow (TVNF) for explicit generative modeling of dynamical signals. Being explicit, it can generate samples of dynamical signals, and compute the likelihood of a (given) dynamical signal sample. In the proposed model, signal flow in the layers of the normalizing flow is a function of time, which is realized using an encoded representation that is the output of a recurrent neural network (RNN). Given a set of dynamical signals, the parameters of TVNF are learned according to maximum-likelihood approach in conjunction with gradient descent (backpropagation). Use of the proposed model is illustrated for a toy application scenario - maximum-likelihood based speech-phone classification task.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121084299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation 基于递归神经网络的相对传递函数估计与校正在语音分离中保留空间线索
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909636
Zicheng Feng, Yu Tsao, Fei Chen
Although deep learning-based algorithms have achieved great success in single-channel and multi-channel speech separation tasks, limited studies have focused on the binaural output and the preservation of spatial cues. Existing methods indirectly preserve spatial cues by enhancing signal-to-noise ratios (SNRs), and the accuracy of spatial cue preservation remains unsatisfactory. A framework has been proposed before to directly restore the spatial cues of the separated speech by applying relative transfer function (RTF) estimation and correction after speech separation. To further improve this framework, a new RTF estimator based on recurrent neural network is proposed in this study, which directly estimates the RTF from the separated speech and the noisy mixture. The upgraded framework was evaluated with spatialized WSJ0-2mix dataset with diffused noise. Experimental results showed that the interaural time difference and interaural level difference errors of the separated speech were significantly reduced after RTF correction, and its SNR was not sacrificed. The new RTF estimator further improved the performance of the system, with about 5 times smaller model than the previous one. As the proposed framework does not rely on any specific type of model structure, it could be incorporated with both multi-channel and single-channel speech separation models.
尽管基于深度学习的算法在单通道和多通道语音分离任务中取得了巨大的成功,但在双耳输出和空间线索保存方面的研究有限。现有方法通过提高信噪比来间接保存空间线索,但空间线索保存的准确性仍不理想。之前已经提出了一个框架,通过相对传递函数(RTF)估计和语音分离后的校正,直接恢复分离语音的空间线索。为了进一步改进该框架,本文提出了一种新的基于递归神经网络的RTF估计器,直接从分离的语音和噪声混合中估计RTF。利用带扩散噪声的空间化WSJ0-2mix数据集对升级后的框架进行评价。实验结果表明,经过RTF校正后,分离语音的耳间时差和耳间音阶差误差明显减小,且不牺牲其信噪比。新的RTF估计器进一步提高了系统的性能,其模型比以前的估计器小约5倍。由于所提出的框架不依赖于任何特定类型的模型结构,因此它可以与多通道和单通道语音分离模型相结合。
{"title":"Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation","authors":"Zicheng Feng, Yu Tsao, Fei Chen","doi":"10.23919/eusipco55093.2022.9909636","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909636","url":null,"abstract":"Although deep learning-based algorithms have achieved great success in single-channel and multi-channel speech separation tasks, limited studies have focused on the binaural output and the preservation of spatial cues. Existing methods indirectly preserve spatial cues by enhancing signal-to-noise ratios (SNRs), and the accuracy of spatial cue preservation remains unsatisfactory. A framework has been proposed before to directly restore the spatial cues of the separated speech by applying relative transfer function (RTF) estimation and correction after speech separation. To further improve this framework, a new RTF estimator based on recurrent neural network is proposed in this study, which directly estimates the RTF from the separated speech and the noisy mixture. The upgraded framework was evaluated with spatialized WSJ0-2mix dataset with diffused noise. Experimental results showed that the interaural time difference and interaural level difference errors of the separated speech were significantly reduced after RTF correction, and its SNR was not sacrificed. The new RTF estimator further improved the performance of the system, with about 5 times smaller model than the previous one. As the proposed framework does not rely on any specific type of model structure, it could be incorporated with both multi-channel and single-channel speech separation models.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127228702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Robust Graph Learning Based on Minimax Concave Penalty and $gamma$-Cross Entropy 基于极大极小凹惩罚和交叉熵的高效鲁棒图学习
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909870
Tatsuya Koyakumaru, M. Yukawa
This paper presents an efficient robust method to learn sparse graphs from contaminated data. Specifically, the convex-analytic approach using the minimax concave penalty is formulated using the so-called $gamma$-lasso which exploits the $gamma-$ cross entropy. We devise a weighting technique which designs the data weights based on the $ell_{1}$ distance in addition to the Mahalanobis distance for avoiding possible failures of outlier rejection due to the combinatorial graph Laplacian structure. Numerical examples show that the proposed method significantly outperforms $gamma$-lasso and tlasso as well as the existing non-robust graph learning methods in contaminated situations.
本文提出了一种从污染数据中学习稀疏图的有效鲁棒方法。具体来说,使用极小极大凹惩罚的凸解析方法是使用所谓的$gamma$-lasso来制定的,该方法利用了$gamma$- $交叉熵。为了避免由于组合图拉普拉斯结构导致的离群值拒绝失败,我们设计了一种加权技术,除了基于马氏距离之外,还基于$ell_{1}$距离来设计数据权重。数值算例表明,该方法在污染情况下明显优于$gamma$-lasso和tlasso以及现有的非鲁棒图学习方法。
{"title":"Efficient Robust Graph Learning Based on Minimax Concave Penalty and $gamma$-Cross Entropy","authors":"Tatsuya Koyakumaru, M. Yukawa","doi":"10.23919/eusipco55093.2022.9909870","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909870","url":null,"abstract":"This paper presents an efficient robust method to learn sparse graphs from contaminated data. Specifically, the convex-analytic approach using the minimax concave penalty is formulated using the so-called $gamma$-lasso which exploits the $gamma-$ cross entropy. We devise a weighting technique which designs the data weights based on the $ell_{1}$ distance in addition to the Mahalanobis distance for avoiding possible failures of outlier rejection due to the combinatorial graph Laplacian structure. Numerical examples show that the proposed method significantly outperforms $gamma$-lasso and tlasso as well as the existing non-robust graph learning methods in contaminated situations.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"03 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127256029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Feature-based Approach on Contact-less Blood Pressure Estimation from Video Data 基于特征的视频数据非接触式血压估计方法
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909563
Carolin Wuerich, Eva-Maria Humm, C. Wiede, Gregor Schiele
Conventional blood pressure monitors and sensors have several limitations in terms of accuracy, measurement time, comfort or safety. To address these limitations, we realized and tested a surrogate-based contact-less blood pressure estimation method which relies on a single remote photoplethysmogram (rPPG) captured by camera. From this rPPG signal, we compute 120 features, and perform a sequential forward feature selection to obtain the best subset of features. With a multilayer perceptron model, we obtain a mean absolute error ± standard deviation of MAE $5.50pm 4.52$ mmHg for systolic pressure and $3.73pm 2.86$ mmHg for diastolic pressure. In contrast to previous studies, our model is trained and tested on a data set including normotensive, pre-hypertensive and hypertensive values.
传统的血压监测器和传感器在准确性、测量时间、舒适性或安全性方面存在一些限制。为了解决这些限制,我们实现并测试了一种基于代理的非接触式血压估计方法,该方法依赖于相机捕获的单个远程光电容积图(rPPG)。从该rPPG信号中,我们计算了120个特征,并进行了顺序前向特征选择以获得最佳特征子集。通过多层感知器模型,我们得到收缩压的平均绝对误差±标准差为5.50美元pm 4.52美元mmHg,舒张压的平均绝对误差为3.73美元pm 2.86美元mmHg。与之前的研究相反,我们的模型是在包括血压正常值、高血压前期值和高血压值的数据集上进行训练和测试的。
{"title":"A Feature-based Approach on Contact-less Blood Pressure Estimation from Video Data","authors":"Carolin Wuerich, Eva-Maria Humm, C. Wiede, Gregor Schiele","doi":"10.23919/eusipco55093.2022.9909563","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909563","url":null,"abstract":"Conventional blood pressure monitors and sensors have several limitations in terms of accuracy, measurement time, comfort or safety. To address these limitations, we realized and tested a surrogate-based contact-less blood pressure estimation method which relies on a single remote photoplethysmogram (rPPG) captured by camera. From this rPPG signal, we compute 120 features, and perform a sequential forward feature selection to obtain the best subset of features. With a multilayer perceptron model, we obtain a mean absolute error ± standard deviation of MAE $5.50pm 4.52$ mmHg for systolic pressure and $3.73pm 2.86$ mmHg for diastolic pressure. In contrast to previous studies, our model is trained and tested on a data set including normotensive, pre-hypertensive and hypertensive values.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127326556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fault-tolerant Radar Signal Processing using Selective Observation Windows and Peak Detection 采用选择性观测窗口和峰值检测的容错雷达信号处理
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909550
Michael Beyer, A. Guntoro, Holger Blume
Soft errors, such as bit flips, pose a serious threat to the functional safety of systems. Thus, ensuring the correct operation even in case of errors is particularly relevant for safety-critical applications. In this paper, we present a novel error detection and mitigation method for parallel FFTs in radar signal processing. We systematically define small observation windows in the 2D spectrum to detect peaks caused by soft errors. This enables protecting FFTs with several orders of magnitude lower computational overhead compared to related work. We conduct fault injection experiments to validate our method. Our experiments show that targets can be reliably detected even at higher error rates where more than 500 bit flips are present.
软错误,如位翻转,对系统的功能安全构成严重威胁。因此,即使在出现错误的情况下也要确保正确的操作,这对于安全关键型应用程序尤为重要。本文针对雷达信号处理中的并行fft,提出了一种新的误差检测与抑制方法。我们系统地在二维光谱中定义小观测窗口,以检测由软误差引起的峰。与相关工作相比,这使得保护fft的计算开销降低了几个数量级。我们进行了故障注入实验来验证我们的方法。我们的实验表明,即使在超过500位翻转存在的较高错误率下,目标也可以可靠地检测到。
{"title":"Fault-tolerant Radar Signal Processing using Selective Observation Windows and Peak Detection","authors":"Michael Beyer, A. Guntoro, Holger Blume","doi":"10.23919/eusipco55093.2022.9909550","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909550","url":null,"abstract":"Soft errors, such as bit flips, pose a serious threat to the functional safety of systems. Thus, ensuring the correct operation even in case of errors is particularly relevant for safety-critical applications. In this paper, we present a novel error detection and mitigation method for parallel FFTs in radar signal processing. We systematically define small observation windows in the 2D spectrum to detect peaks caused by soft errors. This enables protecting FFTs with several orders of magnitude lower computational overhead compared to related work. We conduct fault injection experiments to validate our method. Our experiments show that targets can be reliably detected even at higher error rates where more than 500 bit flips are present.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124796845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Auto-weighted Sequential Wasserstein Distance and Application to Sequence Matching 自加权序列Wasserstein距离及其在序列匹配中的应用
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909780
Mitsuhiko Horie, Hiroyuki Kasai
Sequence matching problems have been central to the field of data analysis for decades. Such problems arise in widely diverse areas including computer vision, speech processing, bioinformatics, and natural language processing. However, solving such problems efficiently is difficult because one must consider temporal consistency, neighborhood structure similarity, robustness to noise and outliers, and flexibility on start-end matching points. This paper presents a proposal of a shape-aware Wasserstein distance between sequences building upon optimal transport (OT) framework. The proposed distance considers similarity measures of the elements, their neighborhood structures, and temporal positions. We incorporate these similarity measures into three ground cost matrixes of the OT formulation. The noteworthy contribution is that we formulate these measures as independent OT distances with a single shared optimal transport matrix, and adjust those weights automatically according to their effects on the total OT distance. Numerical evaluations suggest that the sequence matching method using our proposed Wasserstein distance robustly outperforms state-of-the-art methods across different real-world datasets.
序列匹配问题几十年来一直是数据分析领域的核心问题。这些问题出现在广泛的不同领域,包括计算机视觉、语音处理、生物信息学和自然语言处理。然而,有效地解决这类问题是困难的,因为必须考虑时间一致性、邻域结构相似性、对噪声和异常值的鲁棒性以及起止匹配点的灵活性。本文提出了一种基于最优传输(OT)框架的序列间形状感知的Wasserstein距离。所提出的距离考虑了元素、它们的邻域结构和时间位置的相似性度量。我们将这些相似性措施纳入OT公式的三个地面成本矩阵中。值得注意的贡献是,我们将这些度量表示为具有单个共享最优传输矩阵的独立OT距离,并根据它们对总OT距离的影响自动调整这些权重。数值评估表明,使用我们提出的Wasserstein距离的序列匹配方法在不同的现实世界数据集上稳稳地优于最先进的方法。
{"title":"Auto-weighted Sequential Wasserstein Distance and Application to Sequence Matching","authors":"Mitsuhiko Horie, Hiroyuki Kasai","doi":"10.23919/eusipco55093.2022.9909780","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909780","url":null,"abstract":"Sequence matching problems have been central to the field of data analysis for decades. Such problems arise in widely diverse areas including computer vision, speech processing, bioinformatics, and natural language processing. However, solving such problems efficiently is difficult because one must consider temporal consistency, neighborhood structure similarity, robustness to noise and outliers, and flexibility on start-end matching points. This paper presents a proposal of a shape-aware Wasserstein distance between sequences building upon optimal transport (OT) framework. The proposed distance considers similarity measures of the elements, their neighborhood structures, and temporal positions. We incorporate these similarity measures into three ground cost matrixes of the OT formulation. The noteworthy contribution is that we formulate these measures as independent OT distances with a single shared optimal transport matrix, and adjust those weights automatically according to their effects on the total OT distance. Numerical evaluations suggest that the sequence matching method using our proposed Wasserstein distance robustly outperforms state-of-the-art methods across different real-world datasets.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126157857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Location-invariant representations for acoustic scene classification 声学场景分类的位置不变表示
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909672
Akansha Tyagi, Padmanabhan Rajan
High intra-class variance is one of the significant challenges in solving the problem of acoustic scene classification. This work identifies the recording location (or city) of an audio sample as a source of intra-class variation. We overcome this variation by utilising multi-view learning, where each recording location is considered as a view. Canonical correlation analysis (CCA) based multi-view algorithms learn a subspace where samples from the same class are brought together, and samples from different classes are moved apart, irrespective of the views. By considering cities as views, and by using several variants of CCA algorithms, we show that intra-class variation can be reduced, and location-invariant representations can be learnt. The proposed method demonstrates an improvement of more than 8% on the DCASE 2018 and 2019 datasets, when compared to not using the view information.
高类内方差是解决声场景分类问题的重要挑战之一。这项工作将音频样本的录制位置(或城市)确定为类内变化的来源。我们通过利用多视图学习来克服这种差异,其中每个记录位置都被视为一个视图。基于典型相关分析(CCA)的多视图算法学习一个子空间,其中来自同一类的样本被聚集在一起,来自不同类的样本被分开,而与视图无关。通过将城市视为视图,并使用CCA算法的几种变体,我们表明可以减少类内变化,并且可以学习位置不变表示。与不使用视图信息相比,所提出的方法在DCASE 2018和2019数据集上的改进幅度超过8%。
{"title":"Location-invariant representations for acoustic scene classification","authors":"Akansha Tyagi, Padmanabhan Rajan","doi":"10.23919/eusipco55093.2022.9909672","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909672","url":null,"abstract":"High intra-class variance is one of the significant challenges in solving the problem of acoustic scene classification. This work identifies the recording location (or city) of an audio sample as a source of intra-class variation. We overcome this variation by utilising multi-view learning, where each recording location is considered as a view. Canonical correlation analysis (CCA) based multi-view algorithms learn a subspace where samples from the same class are brought together, and samples from different classes are moved apart, irrespective of the views. By considering cities as views, and by using several variants of CCA algorithms, we show that intra-class variation can be reduced, and location-invariant representations can be learnt. The proposed method demonstrates an improvement of more than 8% on the DCASE 2018 and 2019 datasets, when compared to not using the view information.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"1994 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125546601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech Enhancement Using Augmented SSL CycleGAN 使用增强SSL CycleGAN的语音增强
Pub Date : 2022-08-29 DOI: 10.23919/eusipco55093.2022.9909754
B. Popović, Lidija Krstanović, M. Janev, S. Suzic, Tijana V. Nosek, J. Galic
The purpose of a single-channel speech enhancement is to attenuate the noise component of noisy speech to increase the intelligibility and the perceived quality of the speech component. One such approach uses deep neural networks to transform noisy speech features into clean speech by minimizing the mean squared errors between the degraded and the clean features using paired datasets. Most recently, an unpaired datasets approach, CycleGAN speech enhancement, was proposed, obtaining state-of-the-art results, regardless there was no supervision during the actual training. Also, only a small amount of noisy speech data is usually accessible in comparison to clean speech. Therefore, in this paper, an augmented semi-supervised CycleGAN speech enhancement algorithm is proposed, where only a small percentage of the training database contains the actual paired data. This, as a consequence, prevents overfitting of the discriminator corresponding to the scarce noised speech domain during the initial training stages and also augments the discriminator by periodically adding clean speech samples transformed by the inverse network into the pool of the discriminator of the scarce noisy speech domain. Significantly better results in the means of several standard measures are obtained using the proposed augmented semi-supervised method in comparison to the baseline CycleGAN speech enhancement approach operating on a reduced noisy speech domain.
单通道语音增强的目的是减弱有噪声语音的噪声成分,以提高语音成分的可理解性和感知质量。其中一种方法使用深度神经网络,通过使用成对数据集最小化退化特征和干净特征之间的均方误差,将有噪声的语音特征转换为干净的语音。最近,提出了一种非配对数据集方法CycleGAN语音增强,在实际训练过程中没有监督的情况下获得了最先进的结果。此外,与干净的语音相比,通常只有少量的噪声语音数据是可访问的。因此,本文提出了一种增强型半监督CycleGAN语音增强算法,该算法中只有一小部分训练数据库包含实际的配对数据。因此,这可以防止在初始训练阶段对稀缺噪声语音域对应的鉴别器进行过拟合,并且还可以通过定期将由逆网络变换的干净语音样本添加到稀缺噪声语音域的鉴别器池中来增强鉴别器。与在降低噪声的语音域上操作的基线CycleGAN语音增强方法相比,使用所提出的增强半监督方法在几种标准度量的手段中获得了显着更好的结果。
{"title":"Speech Enhancement Using Augmented SSL CycleGAN","authors":"B. Popović, Lidija Krstanović, M. Janev, S. Suzic, Tijana V. Nosek, J. Galic","doi":"10.23919/eusipco55093.2022.9909754","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909754","url":null,"abstract":"The purpose of a single-channel speech enhancement is to attenuate the noise component of noisy speech to increase the intelligibility and the perceived quality of the speech component. One such approach uses deep neural networks to transform noisy speech features into clean speech by minimizing the mean squared errors between the degraded and the clean features using paired datasets. Most recently, an unpaired datasets approach, CycleGAN speech enhancement, was proposed, obtaining state-of-the-art results, regardless there was no supervision during the actual training. Also, only a small amount of noisy speech data is usually accessible in comparison to clean speech. Therefore, in this paper, an augmented semi-supervised CycleGAN speech enhancement algorithm is proposed, where only a small percentage of the training database contains the actual paired data. This, as a consequence, prevents overfitting of the discriminator corresponding to the scarce noised speech domain during the initial training stages and also augments the discriminator by periodically adding clean speech samples transformed by the inverse network into the pool of the discriminator of the scarce noisy speech domain. Significantly better results in the means of several standard measures are obtained using the proposed augmented semi-supervised method in comparison to the baseline CycleGAN speech enhancement approach operating on a reduced noisy speech domain.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"395 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116392324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 30th European Signal Processing Conference (EUSIPCO)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1