首页 > 最新文献

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder 多声道变分自编码器混响混响的联合分离与去噪
S. Inoue, H. Kameoka, Li Li, Shogo Seki, S. Makino
In this paper, we deal with a multichannel source separation problem under a highly reverberant condition. The multichan- nel variational autoencoder (MVAE) is a recently proposed source separation method that employs the decoder distribu- tion of a conditional VAE (CVAE) as the generative model for the complex spectrograms of the underlying source sig- nals. Although MVAE is notable in that it can significantly improve the source separation performance compared with conventional methods, its capability to separate highly rever- berant mixtures is still limited since MVAE uses an instan- taneous mixture model. To overcome this limitation, in this paper we propose extending MVAE to simultaneously solve source separation and dereverberation problems by formulat- ing the separation system as a frequency-domain convolutive mixture model. A convergence-guaranteed algorithm based on the coordinate descent method is derived for the optimiza- tion. Experimental results revealed that the proposed method outperformed the conventional methods in terms of all the source separation criteria in highly reverberant environments.
本文研究了高混响条件下的多通道源分离问题。多通道变分自编码器(MVAE)是近年来提出的一种信号源分离方法,它利用条件变分自编码器(CVAE)的解码器分布作为源信号复杂谱图的生成模型。尽管与传统方法相比,MVAE可以显著提高源分离性能,但由于MVAE使用的是瞬时混合模型,因此其分离高度不稳定混合物的能力仍然有限。为了克服这一限制,本文提出扩展MVAE,通过将分离系统表述为频域卷积混合模型来同时解决源分离和去噪问题。提出了一种基于坐标下降法的收敛保证优化算法。实验结果表明,在高混响环境下,该方法在所有声源分离指标上都优于传统方法。
{"title":"Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder","authors":"S. Inoue, H. Kameoka, Li Li, Shogo Seki, S. Makino","doi":"10.1109/ICASSP.2019.8683497","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683497","url":null,"abstract":"In this paper, we deal with a multichannel source separation problem under a highly reverberant condition. The multichan- nel variational autoencoder (MVAE) is a recently proposed source separation method that employs the decoder distribu- tion of a conditional VAE (CVAE) as the generative model for the complex spectrograms of the underlying source sig- nals. Although MVAE is notable in that it can significantly improve the source separation performance compared with conventional methods, its capability to separate highly rever- berant mixtures is still limited since MVAE uses an instan- taneous mixture model. To overcome this limitation, in this paper we propose extending MVAE to simultaneously solve source separation and dereverberation problems by formulat- ing the separation system as a frequency-domain convolutive mixture model. A convergence-guaranteed algorithm based on the coordinate descent method is derived for the optimiza- tion. Experimental results revealed that the proposed method outperformed the conventional methods in terms of all the source separation criteria in highly reverberant environments.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"96-100"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84952009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Single-channel Speech Extraction Using Speaker Inventory and Attention Network 基于说话人清单和注意网络的单通道语音提取
Xiong Xiao, Zhuo Chen, Takuya Yoshioka, Hakan Erdogan, Changliang Liu, D. Dimitriadis, J. Droppo, Y. Gong
Neural network-based speech separation has received a surge of interest in recent years. Previously proposed methods either are speaker independent or extract a target speaker’s voice by using his or her voice snippet. In applications such as home devices or office meeting transcriptions, a possible speaker list is available, which can be leveraged for speech separation. This paper proposes a novel speech extraction method that utilizes an inventory of voice snippets of possible interfering speakers, or speaker enrollment data, in addition to that of the target speaker. Furthermore, an attention-based network architecture is proposed to form time-varying masks for both the target and other speakers during the separation process. This architecture does not reduce the enrollment audio of each speaker into a single vector, thereby allowing each short time frame of the input mixture signal to be aligned and accurately compared with the enrollment signals. We evaluate the proposed system on a speaker extraction task derived from the Libri corpus and show the effectiveness of the method.
近年来,基于神经网络的语音分离技术引起了人们极大的兴趣。以前提出的方法要么是独立于说话人的方法,要么是利用目标说话人的语音片段提取目标说话人的语音。在家庭设备或办公室会议转录等应用中,可以使用可能的演讲者列表,可以利用该列表进行语音分离。本文提出了一种新的语音提取方法,该方法除了利用目标说话人的语音片段外,还利用可能干扰说话人的语音片段或说话人登记数据。此外,提出了一种基于注意力的网络结构,在分离过程中为目标和其他说话人形成时变掩码。该体系结构不会将每个说话者的登记音频减少为单个矢量,从而允许输入混合信号的每个短时间帧对齐并与登记信号进行准确比较。我们在一个来自Libri语料库的说话人提取任务上对所提出的系统进行了评估,并证明了该方法的有效性。
{"title":"Single-channel Speech Extraction Using Speaker Inventory and Attention Network","authors":"Xiong Xiao, Zhuo Chen, Takuya Yoshioka, Hakan Erdogan, Changliang Liu, D. Dimitriadis, J. Droppo, Y. Gong","doi":"10.1109/ICASSP.2019.8682245","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682245","url":null,"abstract":"Neural network-based speech separation has received a surge of interest in recent years. Previously proposed methods either are speaker independent or extract a target speaker’s voice by using his or her voice snippet. In applications such as home devices or office meeting transcriptions, a possible speaker list is available, which can be leveraged for speech separation. This paper proposes a novel speech extraction method that utilizes an inventory of voice snippets of possible interfering speakers, or speaker enrollment data, in addition to that of the target speaker. Furthermore, an attention-based network architecture is proposed to form time-varying masks for both the target and other speakers during the separation process. This architecture does not reduce the enrollment audio of each speaker into a single vector, thereby allowing each short time frame of the input mixture signal to be aligned and accurately compared with the enrollment signals. We evaluate the proposed system on a speaker extraction task derived from the Libri corpus and show the effectiveness of the method.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"124 1","pages":"86-90"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83544170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Non-local Self-attention Structure for Function Approximation in Deep Reinforcement Learning 深度强化学习中函数逼近的非局部自注意结构
Z. Wang, Xi Xiao, Guangwu Hu, Yao Yao, Dianyan Zhang, Zhendong Peng, Qing Li, Shutao Xia
Reinforcement learning is a framework to make sequential decisions. The combination with deep neural networks further improves the ability of this framework. Convolutional nerual networks make it possible to make sequential decisions based on raw pixels information directly and make reinforcement learning achieve satisfying performances in series of tasks. However, convolutional neural networks still have own limitations in representing geometric patterns and long-term dependencies that occur consistently in state inputs. To tackle with the limitation, we propose the self-attention architecture to augment the original network. It provides a better balance between ability to model long-range dependencies and computational efficiency. Experiments on Atari games illustrate that self-attention structure is significantly effective for function approximation in deep reinforcement learning.
强化学习是一个做出连续决策的框架。与深度神经网络的结合进一步提高了该框架的能力。卷积神经网络使直接基于原始像素信息进行序列决策成为可能,并使强化学习在一系列任务中取得令人满意的性能。然而,卷积神经网络在表示几何模式和长期依赖关系方面仍然有自己的局限性,这些依赖关系在状态输入中始终存在。为了解决这个问题,我们提出了自关注架构来增强原有的网络。它在远程依赖关系建模能力和计算效率之间提供了更好的平衡。在Atari游戏上的实验表明,自注意结构对于深度强化学习中的函数逼近是非常有效的。
{"title":"Non-local Self-attention Structure for Function Approximation in Deep Reinforcement Learning","authors":"Z. Wang, Xi Xiao, Guangwu Hu, Yao Yao, Dianyan Zhang, Zhendong Peng, Qing Li, Shutao Xia","doi":"10.1109/ICASSP.2019.8682832","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682832","url":null,"abstract":"Reinforcement learning is a framework to make sequential decisions. The combination with deep neural networks further improves the ability of this framework. Convolutional nerual networks make it possible to make sequential decisions based on raw pixels information directly and make reinforcement learning achieve satisfying performances in series of tasks. However, convolutional neural networks still have own limitations in representing geometric patterns and long-term dependencies that occur consistently in state inputs. To tackle with the limitation, we propose the self-attention architecture to augment the original network. It provides a better balance between ability to model long-range dependencies and computational efficiency. Experiments on Atari games illustrate that self-attention structure is significantly effective for function approximation in deep reinforcement learning.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"146 1","pages":"3042-3046"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86091462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Spiking Neural Network Approach to Auditory Source Lateralisation 听觉源侧化的脉冲神经网络方法
R. Luke, D. McAlpine
A novel approach to multi-microphone acoustic source localisation based on spiking neural networks is presented. We demonstrate that a two microphone system connected to a spiking neural network can be used to localise acoustic sources based purely on inter microphone timing differences, with no need for manually configured delay lines. A two sensor example is provided which includes 1) a front end which converts the acoustic signal to a series of spikes, 2) a hidden layer of spiking neurons, 3) an output layer of spiking neurons which represents the location of the acoustic source. We present details on training the network, and evaluation of its performance in quiet and noisy conditions. The system is trained on two locations, and we show that the lateralisation accuracy is 100% when presented with previously unseen data in quiet conditions. We also demonstrate the network generalises to modulation rates and background noise on which it was not trained.
提出了一种基于尖峰神经网络的多传声器声源定位方法。我们证明了连接到尖峰神经网络的两个麦克风系统可以完全基于麦克风间的时间差异来定位声源,而不需要手动配置延迟线。提供了一个双传感器示例,其包括1)将声信号转换为一系列尖峰的前端,2)尖峰神经元的隐藏层,3)表示声源位置的尖峰神经元的输出层。我们详细介绍了网络的训练,以及在安静和噪声条件下对其性能的评估。该系统在两个位置进行了训练,结果表明,在安静的条件下,当提供以前未见过的数据时,侧向化精度达到100%。我们还证明了网络可以泛化到调制速率和背景噪声,而不是它所训练的。
{"title":"A Spiking Neural Network Approach to Auditory Source Lateralisation","authors":"R. Luke, D. McAlpine","doi":"10.1109/ICASSP.2019.8683767","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683767","url":null,"abstract":"A novel approach to multi-microphone acoustic source localisation based on spiking neural networks is presented. We demonstrate that a two microphone system connected to a spiking neural network can be used to localise acoustic sources based purely on inter microphone timing differences, with no need for manually configured delay lines. A two sensor example is provided which includes 1) a front end which converts the acoustic signal to a series of spikes, 2) a hidden layer of spiking neurons, 3) an output layer of spiking neurons which represents the location of the acoustic source. We present details on training the network, and evaluation of its performance in quiet and noisy conditions. The system is trained on two locations, and we show that the lateralisation accuracy is 100% when presented with previously unseen data in quiet conditions. We also demonstrate the network generalises to modulation rates and background noise on which it was not trained.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"1488-1492"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86447868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Neural Variational Identification and Filtering for Stochastic Non-linear Dynamical Systems with Application to Non-intrusive Load Monitoring 随机非线性动力系统的神经变分辨识与滤波及其在非侵入式负荷监测中的应用
Henning Lange, M. Berges, J. Z. Kolter
In this paper, an algorithm for performing System Identification and inference of the filtering recursion for stochastic non-linear dynamical systems is introduced. Additionally, the algorithm allows for enforcing domain-constraints of the state variable. The algorithm makes use of an approximate inference technique called Variational Inference in conjunction with Deep Neural Networks as the optimization engine. Although general in its nature, the algorithm is evaluated in the context of Non-Intrusive Load Monitoring, the problem of inferring the operational state of individual electrical appliances given aggregate measurements of electrical power collected in a home.
本文介绍了一种对随机非线性动力系统进行系统辨识和滤波递推推理的算法。此外,该算法允许执行状态变量的域约束。该算法利用一种称为变分推理的近似推理技术,并结合深度神经网络作为优化引擎。虽然其本质是通用的,但该算法是在非侵入式负载监测的背景下进行评估的,该问题是在给定家庭中收集的总电力测量值的情况下推断单个电器的运行状态。
{"title":"Neural Variational Identification and Filtering for Stochastic Non-linear Dynamical Systems with Application to Non-intrusive Load Monitoring","authors":"Henning Lange, M. Berges, J. Z. Kolter","doi":"10.1109/ICASSP.2019.8683552","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683552","url":null,"abstract":"In this paper, an algorithm for performing System Identification and inference of the filtering recursion for stochastic non-linear dynamical systems is introduced. Additionally, the algorithm allows for enforcing domain-constraints of the state variable. The algorithm makes use of an approximate inference technique called Variational Inference in conjunction with Deep Neural Networks as the optimization engine. Although general in its nature, the algorithm is evaluated in the context of Non-Intrusive Load Monitoring, the problem of inferring the operational state of individual electrical appliances given aggregate measurements of electrical power collected in a home.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"8340-8344"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82402365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The Geometry of Equality-constrained Global Consensus Problems 等式约束的全局共识问题的几何
Qiuwei Li, Zhihui Zhu, Gongguo Tang, M. Wakin
A variety of unconstrained nonconvex optimization problems have been shown to have benign geometric landscapes that satisfy the strict saddle property and have no spurious local minima. We present a general result relating the geometry of an unconstrained centralized problem to its equality-constrained distributed extension. It follows that many global consensus problems inherit the benign geometry of their original centralized counterpart. Taking advantage of this fact, we demonstrate the favorable performance of the Gradient ADMM algorithm on a distributed low-rank matrix approximation problem.
各种无约束非凸优化问题已被证明具有良好的几何景观,满足严格鞍形性质,不存在虚假的局部最小值。给出了一个无约束集中问题几何与其等式约束分布扩展的一般结果。因此,许多全局共识问题继承了其原始中心化对应的良性几何结构。利用这一事实,我们证明了梯度ADMM算法在分布式低秩矩阵逼近问题上的良好性能。
{"title":"The Geometry of Equality-constrained Global Consensus Problems","authors":"Qiuwei Li, Zhihui Zhu, Gongguo Tang, M. Wakin","doi":"10.1109/ICASSP.2019.8682568","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682568","url":null,"abstract":"A variety of unconstrained nonconvex optimization problems have been shown to have benign geometric landscapes that satisfy the strict saddle property and have no spurious local minima. We present a general result relating the geometry of an unconstrained centralized problem to its equality-constrained distributed extension. It follows that many global consensus problems inherit the benign geometry of their original centralized counterpart. Taking advantage of this fact, we demonstrate the favorable performance of the Gradient ADMM algorithm on a distributed low-rank matrix approximation problem.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"7928-7932"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78630688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Information Theoretic Lower Bound of Restricted Isometry Property Constant 受限等距性质常数的信息论下界
Gen Li, Jingkai Yan, Yuantao Gu
Compressed sensing seeks to recover an unknown sparse vector from undersampled rate measurements. Since its introduction, there have been enormous works on compressed sensing that develop efficient algorithms for sparse signal recovery. The restricted isometry property (RIP) has become the dominant tool used for the analysis of exact reconstruction from seemingly undersampled measurements. Although the upper bound of the RIP constant has been studied extensively, as far as we know, the result is missing for the lower bound. In this work, we first present a tight lower bound for the RIP constant, filling the gap there. The lower bound is at the same order as the upper bound for the RIP constant. Moreover, we also show that our lower bound is close to the upper bound by numerical simulations. Our bound on the RIP constant provides an information-theoretic lower bound about the sampling rate for the first time, which is the essential question for practitioners.
压缩感知旨在从欠采样率测量中恢复未知的稀疏向量。自从它被引入以来,在压缩感知方面已经有了大量的工作,开发了高效的稀疏信号恢复算法。限制等距特性(RIP)已成为分析从看似欠采样测量中精确重建的主要工具。虽然人们对RIP常数的上界进行了广泛的研究,但据我们所知,对下界的研究结果还很缺乏。在这项工作中,我们首先提出了RIP常数的严格下界,填补了那里的空白。RIP常数的下界与上界处于同一阶。此外,我们还通过数值模拟证明了我们的下界接近上界。我们对RIP常数的取值范围首次提供了一个关于采样率的信息论的下界,这是实践者需要解决的关键问题。
{"title":"Information Theoretic Lower Bound of Restricted Isometry Property Constant","authors":"Gen Li, Jingkai Yan, Yuantao Gu","doi":"10.1109/ICASSP.2019.8683742","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683742","url":null,"abstract":"Compressed sensing seeks to recover an unknown sparse vector from undersampled rate measurements. Since its introduction, there have been enormous works on compressed sensing that develop efficient algorithms for sparse signal recovery. The restricted isometry property (RIP) has become the dominant tool used for the analysis of exact reconstruction from seemingly undersampled measurements. Although the upper bound of the RIP constant has been studied extensively, as far as we know, the result is missing for the lower bound. In this work, we first present a tight lower bound for the RIP constant, filling the gap there. The lower bound is at the same order as the upper bound for the RIP constant. Moreover, we also show that our lower bound is close to the upper bound by numerical simulations. Our bound on the RIP constant provides an information-theoretic lower bound about the sampling rate for the first time, which is the essential question for practitioners.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"25 1","pages":"5297-5301"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84616757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives 基于层次注意网络的语境建模在口语叙事中的情绪和自我评估情绪检测
Lukas Stappen, N. Cummins, Eva-Maria Messner, H. Baumeister, J. Dineley, Björn Schuller
Automatic detection of sentiment and affect in personal narratives through word usage has the potential to assist in the automated detection of change in psychotherapy. Such a tool could, for instance, provide an efficient, objective measure of the time a person has been in a positive or negative state-of-mind. Towards this goal, we propose and develop a hierarchical attention model for the tasks of sentiment (positive and negative) and self-assessed affect detection in transcripts of personal narratives. We also perform a qualitative analysis of the word attentions learnt by our sentiment analysis model. In a key result, our attention model achieved an un-weighted average recall (UAR) of 91.0 % in a binary sentiment detection task on the test partition of the Ulm State-of-Mind in Speech (USoMS) corpus. We also achieved UARs of 73.7 % and 68.6 % in the 3-class tasks of arousal and valence detection respectively. Finally, our qualitative analysis associates colloquial reinforcements with positive sentiments, and uncertain phrasing with negative sentiments.
通过词汇使用自动检测个人叙述中的情绪和影响有可能帮助自动检测心理治疗中的变化。例如,这样的工具可以提供一个有效的、客观的衡量一个人处于积极或消极心态的时间。为了实现这一目标,我们提出并开发了一个分层注意模型,用于个人叙事文本中的情绪(积极和消极)和自我评估情感检测任务。我们还对我们的情感分析模型学习到的单词关注进行了定性分析。在一个关键的结果中,我们的注意力模型在Ulm语音状态(USoMS)语料库的测试分区上实现了91.0%的非加权平均召回率(UAR)。唤醒和效价检测3类任务的uar分别为73.7%和68.6%。最后,我们的定性分析将口语强化与积极情绪联系起来,而不确定的措辞与消极情绪联系起来。
{"title":"Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives","authors":"Lukas Stappen, N. Cummins, Eva-Maria Messner, H. Baumeister, J. Dineley, Björn Schuller","doi":"10.1109/ICASSP.2019.8683801","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683801","url":null,"abstract":"Automatic detection of sentiment and affect in personal narratives through word usage has the potential to assist in the automated detection of change in psychotherapy. Such a tool could, for instance, provide an efficient, objective measure of the time a person has been in a positive or negative state-of-mind. Towards this goal, we propose and develop a hierarchical attention model for the tasks of sentiment (positive and negative) and self-assessed affect detection in transcripts of personal narratives. We also perform a qualitative analysis of the word attentions learnt by our sentiment analysis model. In a key result, our attention model achieved an un-weighted average recall (UAR) of 91.0 % in a binary sentiment detection task on the test partition of the Ulm State-of-Mind in Speech (USoMS) corpus. We also achieved UARs of 73.7 % and 68.6 % in the 3-class tasks of arousal and valence detection respectively. Finally, our qualitative analysis associates colloquial reinforcements with positive sentiments, and uncertain phrasing with negative sentiments.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"47 14","pages":"6680-6684"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91435895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Speech Augmentation Using Wavenet in Speech Recognition 基于小波网络的语音增强语音识别
Jisung Wang, Sangki Kim, Yeha Lee
Data augmentation is crucial to improving the performance of deep neural networks by helping the model avoid overfitting and improve its generalization. In automatic speech recognition, previous work proposed several approaches to augment data by performing speed perturbation or spectral transformation. Since data augmented in this manner has similar acoustic representations as the original data, it has limited advantage in improving generalization of the acoustic model. In order to avoid generating data with limited diversity, we propose a voice conversion approach using a generative model (WaveNet), which generates a new utterance by transforming an utterance to a given target voice. Our method synthesizes speech with diverse pitch patterns by minimizing the use of acoustic features. With the Wall Street Journal dataset, we verify that our method led to better generalization compared to other data augmentation techniques such as speed perturbation and WORLD-based voice conversion. In addition, when combined with the speed perturbation technique, the two methods complement each other to further improve performance of the acoustic model.
数据增强是提高深度神经网络性能的关键,可以帮助模型避免过拟合,提高其泛化能力。在自动语音识别中,以前的工作提出了几种通过速度摄动或频谱变换来增强数据的方法。由于以这种方式增强的数据与原始数据具有相似的声学表示,因此它在提高声学模型泛化方面的优势有限。为了避免生成有限多样性的数据,我们提出了一种使用生成模型(WaveNet)的语音转换方法,该方法通过将话语转换为给定的目标语音来生成新的话语。我们的方法通过最小化声学特征的使用来合成具有不同音高模式的语音。使用《华尔街日报》数据集,我们验证了与其他数据增强技术(如速度扰动和基于world的语音转换)相比,我们的方法具有更好的泛化效果。此外,当与速度摄动技术相结合时,两种方法可以相互补充,进一步提高声学模型的性能。
{"title":"Speech Augmentation Using Wavenet in Speech Recognition","authors":"Jisung Wang, Sangki Kim, Yeha Lee","doi":"10.1109/ICASSP.2019.8683388","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683388","url":null,"abstract":"Data augmentation is crucial to improving the performance of deep neural networks by helping the model avoid overfitting and improve its generalization. In automatic speech recognition, previous work proposed several approaches to augment data by performing speed perturbation or spectral transformation. Since data augmented in this manner has similar acoustic representations as the original data, it has limited advantage in improving generalization of the acoustic model. In order to avoid generating data with limited diversity, we propose a voice conversion approach using a generative model (WaveNet), which generates a new utterance by transforming an utterance to a given target voice. Our method synthesizes speech with diverse pitch patterns by minimizing the use of acoustic features. With the Wall Street Journal dataset, we verify that our method led to better generalization compared to other data augmentation techniques such as speed perturbation and WORLD-based voice conversion. In addition, when combined with the speed perturbation technique, the two methods complement each other to further improve performance of the acoustic model.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"6770-6774"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80783918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Hybrid Method for Blind Estimation of Frequency Dependent Reverberation Time Using Speech Signals 利用语音信号盲估计频率相关混响时间的混合方法
Song Li, Roman Schlieper, J. Peissig
Reverberation time is an important room acoustical parameter that can be used to identify the acoustic environment, predict speech intelligibility and model the late reverberation for binaural rendering, etc. Several blind estimation algorithms of reverberation time have been proposed by analyzing recorded speech signals. Unfortunately, the estimation accuracy for the frequency dependent reverberation time is lower than for the full-band reverberation time due to the lower signal energy in sub-band filters. This study presents a novel approach for the blind estimation of reverberation time in the full frequency range. The maximum likelihood method is applied for the estimation of the reverberation time from low- to mid-frequencies, and the reverberation time from mid- to high-frequencies is predicted by our proposed model based on the analysis of the reverberation time calculated from room impulse responses in different rooms. The proposed method is validated by two experiments and shows a good performance.
混响时间是一个重要的室内声学参数,可用于识别声环境、预测语音可理解度、建立双耳渲染的后期混响模型等。通过对录音语音信号的分析,提出了几种混响时间的盲估计算法。不幸的是,由于子带滤波器中的信号能量较低,频率相关混响时间的估计精度低于全带混响时间。提出了一种在全频率范围内盲估计混响时间的新方法。利用最大似然法估计了低频到中频混响时间,并在分析不同房间的脉冲响应计算混响时间的基础上,利用该模型预测了中频到高频的混响时间。通过两个实验验证了该方法的有效性。
{"title":"A Hybrid Method for Blind Estimation of Frequency Dependent Reverberation Time Using Speech Signals","authors":"Song Li, Roman Schlieper, J. Peissig","doi":"10.1109/ICASSP.2019.8682661","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682661","url":null,"abstract":"Reverberation time is an important room acoustical parameter that can be used to identify the acoustic environment, predict speech intelligibility and model the late reverberation for binaural rendering, etc. Several blind estimation algorithms of reverberation time have been proposed by analyzing recorded speech signals. Unfortunately, the estimation accuracy for the frequency dependent reverberation time is lower than for the full-band reverberation time due to the lower signal energy in sub-band filters. This study presents a novel approach for the blind estimation of reverberation time in the full frequency range. The maximum likelihood method is applied for the estimation of the reverberation time from low- to mid-frequencies, and the reverberation time from mid- to high-frequencies is predicted by our proposed model based on the analysis of the reverberation time calculated from room impulse responses in different rooms. The proposed method is validated by two experiments and shows a good performance.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"211-215"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78067283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1