首页 > 最新文献

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Intrasystem Entanglement Generator and Unambiguos Bell States Discriminator on Chip 芯片上的系统内纠缠产生器和无歧义贝尔态鉴别器
F. Bovino
Bell measurements, jointly projecting two qubits onto the so-called Bell basis, constitute a crucial step in many quantum computation and communication protocols, including dense coding, quantum repeaters, and teleportation-based quantum computation. A problem is the impossibility of deterministic unambiguous Bell measurements using passive linear optics, even when arbitrarily many auxiliary photons, photon-number-resolving detectors, and dynamical (conditionally changing) networks are available. Current proposals for going over the 50% upper bound without using experimentally challenging nonlinearities rely on using entangled photon ancilla states and a sufficiently large interferometer to combine the signal and ancilla modes. We demonstrate that the novel Multiple Rail architecture, based on the propagation of a single photon in a complex multipath optical circuit (or multiwaveguide optical circuit), provides the possibility to perform deterministic Bell measurements so to unambiguously discrimate all four Bell States.
贝尔测量将两个量子位共同投射到所谓的贝尔基上,这是许多量子计算和通信协议的关键一步,包括密集编码、量子中继器和基于隐形传态的量子计算。一个问题是,即使有任意多的辅助光子、光子数分辨探测器和动态(有条件变化的)网络可用,也不可能使用无源线性光学进行确定性的无歧义贝尔测量。目前,在不使用具有实验挑战性的非线性的情况下超过50%上限的建议依赖于使用纠缠光子辅助态和足够大的干涉仪来结合信号和辅助模式。我们证明了基于单光子在复杂多径光电路(或多波导光电路)中传播的新型多轨道架构,提供了执行确定性贝尔测量的可能性,从而明确区分所有四种贝尔态。
{"title":"Intrasystem Entanglement Generator and Unambiguos Bell States Discriminator on Chip","authors":"F. Bovino","doi":"10.1109/ICASSP.2019.8683820","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683820","url":null,"abstract":"Bell measurements, jointly projecting two qubits onto the so-called Bell basis, constitute a crucial step in many quantum computation and communication protocols, including dense coding, quantum repeaters, and teleportation-based quantum computation. A problem is the impossibility of deterministic unambiguous Bell measurements using passive linear optics, even when arbitrarily many auxiliary photons, photon-number-resolving detectors, and dynamical (conditionally changing) networks are available. Current proposals for going over the 50% upper bound without using experimentally challenging nonlinearities rely on using entangled photon ancilla states and a sufficiently large interferometer to combine the signal and ancilla modes. We demonstrate that the novel Multiple Rail architecture, based on the propagation of a single photon in a complex multipath optical circuit (or multiwaveguide optical circuit), provides the possibility to perform deterministic Bell measurements so to unambiguously discrimate all four Bell States.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"7993-7997"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74191445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Graph Spectral Clustering of Convolution Artefacts in Radio Interferometric Images 无线电干涉图像中卷积伪影的图谱聚类
Matthieu Simeoni, P. Hurley
The starting point for deconvolution methods in radio astronomy is an estimate of the sky intensity called a dirty image. These methods rely on the telescope point-spread function so as to remove artefacts which pollute it. In this work, we show that the intensity field is only a partial summary statistic of the matched filtered interferometric data, which we prove is spatially correlated on the celestial sphere. This allows us to define a sky covariance function. This previously unexplored quantity brings us additional information that can be leveraged in the process of removing dirty image artefacts. We demonstrate this using a novel unsupervised learning method. The problem is formulated on a graph: each pixel interpreted as a node, linked by edges weighted according to their spatial correlation. We then use spectral clustering to separate the artefacts in groups, and identify physical sources within them.
射电天文学中反卷积方法的起点是对天空强度的估计,称为脏图像。这些方法依靠望远镜的点扩散函数来去除污染它的伪影。在这项工作中,我们证明了强度场只是匹配滤波干涉数据的部分汇总统计,我们证明了在天球上是空间相关的。这允许我们定义一个天空协方差函数。这个先前未开发的数量为我们带来了可以在去除脏图像伪影的过程中利用的额外信息。我们使用一种新的无监督学习方法来证明这一点。这个问题是在一个图上表述的:每个像素被解释为一个节点,由根据空间相关性加权的边连接起来。然后,我们使用光谱聚类来分离各组伪影,并识别其中的物理源。
{"title":"Graph Spectral Clustering of Convolution Artefacts in Radio Interferometric Images","authors":"Matthieu Simeoni, P. Hurley","doi":"10.1109/ICASSP.2019.8683841","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683841","url":null,"abstract":"The starting point for deconvolution methods in radio astronomy is an estimate of the sky intensity called a dirty image. These methods rely on the telescope point-spread function so as to remove artefacts which pollute it. In this work, we show that the intensity field is only a partial summary statistic of the matched filtered interferometric data, which we prove is spatially correlated on the celestial sphere. This allows us to define a sky covariance function. This previously unexplored quantity brings us additional information that can be leveraged in the process of removing dirty image artefacts. We demonstrate this using a novel unsupervised learning method. The problem is formulated on a graph: each pixel interpreted as a node, linked by edges weighted according to their spatial correlation. We then use spectral clustering to separate the artefacts in groups, and identify physical sources within them.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"27 1","pages":"4260-4264"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74225932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration 基于分层多头部的深度递归神经网络的标点恢复
Seokhwan Kim
Punctuation restoration is a post-processing task of automatic speech recognition to generate the punctuation marks on un-punctuated transcripts. This paper proposes a deep recurrent neural network architecture with layer-wise multi-head attentions towards better modelling of the contexts from a variety of perspectives in putting punctuations by human writers. The experimental results show that our proposed model significantly outperforms previous state-of-the-art methods in punctuation restoration performances on IWSLT dataset.
标点恢复是语音自动识别的一项后处理任务,目的是在未加标点符号的文本上生成标点符号。本文提出了一种深度递归神经网络架构,该架构具有分层式的多头关注,旨在从人类作者放置标点的各种角度更好地建模上下文。实验结果表明,我们提出的模型在IWSLT数据集上的标点恢复性能明显优于现有的最先进的方法。
{"title":"Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration","authors":"Seokhwan Kim","doi":"10.1109/ICASSP.2019.8682418","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682418","url":null,"abstract":"Punctuation restoration is a post-processing task of automatic speech recognition to generate the punctuation marks on un-punctuated transcripts. This paper proposes a deep recurrent neural network architecture with layer-wise multi-head attentions towards better modelling of the contexts from a variety of perspectives in putting punctuations by human writers. The experimental results show that our proposed model significantly outperforms previous state-of-the-art methods in punctuation restoration performances on IWSLT dataset.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"7280-7284"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74555551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Community Detection in Sparse Realistic Graphs: Improving the Bethe Hessian 稀疏真实感图中的社区检测:改进贝特黑森算法
Lorenzo Dall'Amico, Romain Couillet
This article improves over the recently proposed Bethe Hessian matrix for community detection on sparse graphs, assuming here a more realistic setting where node degrees are inhomogeneous. We notably show that the parametrization proposed in the seminal work on the Bethe Hessian clustering can be ameliorated with positive consequences on correct classification rates. Extensive simulations support our claims.
本文改进了最近提出的用于稀疏图社区检测的Bethe Hessian矩阵,假设这里有一个更现实的节点度是非齐次的设置。值得注意的是,在Bethe Hessian聚类的开创性工作中提出的参数化可以得到改善,并对正确的分类率产生积极的影响。大量的模拟支持了我们的说法。
{"title":"Community Detection in Sparse Realistic Graphs: Improving the Bethe Hessian","authors":"Lorenzo Dall'Amico, Romain Couillet","doi":"10.1109/ICASSP.2019.8683594","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683594","url":null,"abstract":"This article improves over the recently proposed Bethe Hessian matrix for community detection on sparse graphs, assuming here a more realistic setting where node degrees are inhomogeneous. We notably show that the parametrization proposed in the seminal work on the Bethe Hessian clustering can be ameliorated with positive consequences on correct classification rates. Extensive simulations support our claims.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"61 1","pages":"2942-2946"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74633205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Parameter Uncertainty for End-to-end Speech Recognition 端到端语音识别的参数不确定性
Stefan Braun, Shih-Chii Liu
Recent work on neural networks with probabilistic parameters has shown that parameter uncertainty improves network regularization. Parameter-specific signal-to-noise ratio (SNR) levels derived from parameter distributions were further found to have high correlations with task importance. However, most of these studies focus on tasks other than automatic speech recognition (ASR). This work investigates end-to-end models with probabilistic parameters for ASR. We demonstrate that probabilistic networks outperform conventional deterministic networks in pruning and domain adaptation experiments carried out on the Wall Street Journal and CHiME-4 datasets. We use parameter-specific SNR information to select parameters for pruning and to condition the parameter updates during adaptation. Experimental results further show that networks with lower SNR parameters (1) tolerate increased sparsity levels during parameter pruning and (2) reduce catastrophic forgetting during domain adaptation.
最近对带有概率参数的神经网络的研究表明,参数的不确定性改善了网络的正则化。从参数分布得出的参数特定信噪比(SNR)水平进一步发现与任务重要性高度相关。然而,这些研究大多集中在自动语音识别(ASR)以外的任务上。这项工作研究了带有概率参数的端到端ASR模型。在华尔街日报和CHiME-4数据集上进行的修剪和域适应实验中,我们证明了概率网络优于传统的确定性网络。我们使用参数特定的信噪比信息来选择修剪参数,并在适应过程中调整参数更新。实验结果进一步表明,较低信噪比参数的网络(1)在参数修剪过程中可以容忍更高的稀疏度水平,(2)在域适应过程中可以减少灾难性遗忘。
{"title":"Parameter Uncertainty for End-to-end Speech Recognition","authors":"Stefan Braun, Shih-Chii Liu","doi":"10.1109/ICASSP.2019.8683066","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683066","url":null,"abstract":"Recent work on neural networks with probabilistic parameters has shown that parameter uncertainty improves network regularization. Parameter-specific signal-to-noise ratio (SNR) levels derived from parameter distributions were further found to have high correlations with task importance. However, most of these studies focus on tasks other than automatic speech recognition (ASR). This work investigates end-to-end models with probabilistic parameters for ASR. We demonstrate that probabilistic networks outperform conventional deterministic networks in pruning and domain adaptation experiments carried out on the Wall Street Journal and CHiME-4 datasets. We use parameter-specific SNR information to select parameters for pruning and to condition the parameter updates during adaptation. Experimental results further show that networks with lower SNR parameters (1) tolerate increased sparsity levels during parameter pruning and (2) reduce catastrophic forgetting during domain adaptation.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"101 1","pages":"5636-5640"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79373113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Convexity-edge-preserving Signal Recovery with Linearly Involved Generalized Minimax Concave Penalty Function 线性相关广义极大极小凹惩罚函数的保凸边信号恢复
Jiro Abe, M. Yamagishi, I. Yamada
In this paper, we propose a new linearly involved convexity-preserving model for signal recovery by extending the idea in the generalized minimax concave (GMC) penalty [Se-lesnick’ 17]. The proposed model can use nonconvex penalties but maintain the overall convexity and is applicable to much more general scenarios of signal recovery than the original GMC model. We also propose a new iterative algorithm which has theoretical guarantee of convergence to a global minimizer of the proposed model. A numerical experiment for noise suppression shows excellent edge-preserving performance of the proposed smoother in comparison with the standard convex TV smoother.
在本文中,我们通过扩展广义极小极大凹(GMC)惩罚的思想,提出了一种新的用于信号恢复的线性涉及的凸保持模型[Se-lesnick ' 17]。该模型可以使用非凸惩罚,但保持整体凸性,并且比原始GMC模型适用于更一般的信号恢复场景。我们还提出了一种新的迭代算法,该算法在理论上保证了该模型收敛到全局最小值。噪声抑制的数值实验表明,与标准凸电视平滑器相比,该平滑器具有良好的边缘保持性能。
{"title":"Convexity-edge-preserving Signal Recovery with Linearly Involved Generalized Minimax Concave Penalty Function","authors":"Jiro Abe, M. Yamagishi, I. Yamada","doi":"10.1109/ICASSP.2019.8682318","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682318","url":null,"abstract":"In this paper, we propose a new linearly involved convexity-preserving model for signal recovery by extending the idea in the generalized minimax concave (GMC) penalty [Se-lesnick’ 17]. The proposed model can use nonconvex penalties but maintain the overall convexity and is applicable to much more general scenarios of signal recovery than the original GMC model. We also propose a new iterative algorithm which has theoretical guarantee of convergence to a global minimizer of the proposed model. A numerical experiment for noise suppression shows excellent edge-preserving performance of the proposed smoother in comparison with the standard convex TV smoother.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"4918-4922"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85243683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Recursive Least-squares Algorithm Based on the Nearest Kronecker Product Decomposition 基于最近邻Kronecker积分解的递推最小二乘算法
Camelia Elisei-Iliescu, C. Paleologu, J. Benesty, S. Ciochină
The recursive least-squares (RLS) adaptive filter is an appealing choice in system identification problems, mainly due to its fast convergence rate. However, this algorithm is computationally very complex, which may make it useless for the identification of high length impulse responses, like in echo cancellation. In this paper, we focus on a new approach to improve the efficiency of the RLS algorithm. The basic idea is to exploit the impulse response decomposition based on the nearest Kronecker product and low-rank approximation. Thus, a high-dimension system identification problem is reformulated in terms of low-dimension problems, which are tensorized together. Simulations performed in the context of echo cancellation indicate the good performance of the RLS algorithm based on this approach.
递推最小二乘(RLS)自适应滤波器由于其快速的收敛速度而成为系统辨识问题中一个很有吸引力的选择。然而,该算法计算非常复杂,这可能使其无法识别长脉冲响应,如回波抵消。本文重点研究了一种提高RLS算法效率的新方法。其基本思想是利用基于最近邻克罗内克积和低秩近似的脉冲响应分解。因此,一个高维系统识别问题被重新表述为低维问题,它们被张拉在一起。仿真结果表明,基于该方法的RLS算法具有良好的性能。
{"title":"A Recursive Least-squares Algorithm Based on the Nearest Kronecker Product Decomposition","authors":"Camelia Elisei-Iliescu, C. Paleologu, J. Benesty, S. Ciochină","doi":"10.1109/ICASSP.2019.8682498","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682498","url":null,"abstract":"The recursive least-squares (RLS) adaptive filter is an appealing choice in system identification problems, mainly due to its fast convergence rate. However, this algorithm is computationally very complex, which may make it useless for the identification of high length impulse responses, like in echo cancellation. In this paper, we focus on a new approach to improve the efficiency of the RLS algorithm. The basic idea is to exploit the impulse response decomposition based on the nearest Kronecker product and low-rank approximation. Thus, a high-dimension system identification problem is reformulated in terms of low-dimension problems, which are tensorized together. Simulations performed in the context of echo cancellation indicate the good performance of the RLS algorithm based on this approach.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"4843-4847"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85421440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Penalized Autoencoder Approach for Nonlinear Independent Component Analysis 非线性独立分量分析的惩罚自编码器方法
Tianwen Wei, S. Chrétien
We propose Independent Component Autoencoder (ICAE), a deep neural network-based framework for nonlinear Independent Component Analysis (ICA). The proposed method consists of a penalized autoencoder and a training objective that is to minimize a combination of the reconstruction loss and an ICA contrast. Unlike many previous ICA methods that are usually tailored to separate specific mixture, our method can recover sources from various mixtures, without prior knowledge on the nature of that mixture.
提出了一种基于深度神经网络的非线性独立分量分析(ICA)框架——独立分量自编码器(ICAE)。所提出的方法由一个惩罚自编码器和一个训练目标组成,该目标是最小化重建损失和ICA对比的组合。与之前的ICA方法不同,我们的方法通常是针对特定混合物进行定制的,而我们的方法可以从各种混合物中恢复源,而无需事先了解混合物的性质。
{"title":"A Penalized Autoencoder Approach for Nonlinear Independent Component Analysis","authors":"Tianwen Wei, S. Chrétien","doi":"10.1109/ICASSP.2019.8682469","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682469","url":null,"abstract":"We propose Independent Component Autoencoder (ICAE), a deep neural network-based framework for nonlinear Independent Component Analysis (ICA). The proposed method consists of a penalized autoencoder and a training objective that is to minimize a combination of the reconstruction loss and an ICA contrast. Unlike many previous ICA methods that are usually tailored to separate specific mixture, our method can recover sources from various mixtures, without prior knowledge on the nature of that mixture.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"51 1","pages":"2797-2801"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79813638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis 汉语端到端语音合成中韵律短语的实现
Yanfeng Lu, M. Dong, Ying Chen
Text-to-Speech (TTS) systems have been evolving rapidly in recent years. With the great modelling power of deep neural networks, researchers have achieved end-to-end conversion from raw text to speech. It has been shown by various research projects that end-to-end TTS systems are able to generate speech that sounds akin to human voice for English and other languages. However, for languages like Chinese, there are two problems to deal with. Firstly, due to the large character set, a small input set comparable to the English character set is needed for the end-to-end solution. Secondly, there are serious prosodic phrasing mistakes when the end-to-end method is applied to Chinese. In this paper, we will propose a solution for an end-to-end Chinese TTS system on the basis of Tacotron 2 and Wavenet vocoder. We will then add extra contextual information to improve the performance of prosodic phrasing. Our experiments have demonstrated the effectiveness of this proposal.
文本转语音(TTS)系统近年来发展迅速。利用深度神经网络强大的建模能力,研究人员已经实现了从原始文本到语音的端到端转换。各种研究项目表明,端到端的TTS系统能够生成听起来像英语和其他语言的人声的语音。然而,对于像汉语这样的语言,有两个问题需要处理。首先,由于字符集很大,端到端解决方案需要一个与英文字符集相当的小输入集。其次,端到端方法在汉语中存在严重的韵律错误。在本文中,我们将提出一个基于Tacotron 2和Wavenet声码器的端到端中文TTS系统的解决方案。然后,我们将添加额外的上下文信息来提高韵律短语的表现。我们的实验证明了这一建议的有效性。
{"title":"Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis","authors":"Yanfeng Lu, M. Dong, Ying Chen","doi":"10.1109/ICASSP.2019.8682368","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682368","url":null,"abstract":"Text-to-Speech (TTS) systems have been evolving rapidly in recent years. With the great modelling power of deep neural networks, researchers have achieved end-to-end conversion from raw text to speech. It has been shown by various research projects that end-to-end TTS systems are able to generate speech that sounds akin to human voice for English and other languages. However, for languages like Chinese, there are two problems to deal with. Firstly, due to the large character set, a small input set comparable to the English character set is needed for the end-to-end solution. Secondly, there are serious prosodic phrasing mistakes when the end-to-end method is applied to Chinese. In this paper, we will propose a solution for an end-to-end Chinese TTS system on the basis of Tacotron 2 and Wavenet vocoder. We will then add extra contextual information to improve the performance of prosodic phrasing. Our experiments have demonstrated the effectiveness of this proposal.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"7050-7054"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84378297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A Bayesian Framework for Intent Prediction in Object Tracking 目标跟踪中意图预测的贝叶斯框架
B. I. Ahmad, P. Langdon, S. Godsill
Engineering Department, University of Cambridge, Trumpington Street, Cambridge, UK, CB2 1PZ In this paper, we introduce a generic Bayesian framework for inferring the intent of a tracked object, as early as possible, based on the available partial sensory observations. It treats the prediction problem, i.e. not estimating the object state such as position, within an object tracking formulation. This leads to a low-complexity implementation of the inference routine with minimal training requirements. The proposed approach utilises suitable stochastic, namely linear Gaussian, models to capture long term dependencies in the object trajectory as dictated by intent. Numerical examples are shown to demonstrate the efficacy of this framework.
在本文中,我们引入了一个通用的贝叶斯框架,用于根据可用的部分感官观察,尽早推断跟踪对象的意图。它在目标跟踪公式中处理预测问题,即不估计目标状态(如位置)。这使得推理例程实现的复杂性较低,训练需求最少。所提出的方法利用合适的随机,即线性高斯,模型来捕获长期依赖的目标轨迹,如指示的意图。数值算例验证了该框架的有效性。
{"title":"A Bayesian Framework for Intent Prediction in Object Tracking","authors":"B. I. Ahmad, P. Langdon, S. Godsill","doi":"10.1109/ICASSP.2019.8682603","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682603","url":null,"abstract":"Engineering Department, University of Cambridge, Trumpington Street, Cambridge, UK, CB2 1PZ In this paper, we introduce a generic Bayesian framework for inferring the intent of a tracked object, as early as possible, based on the available partial sensory observations. It treats the prediction problem, i.e. not estimating the object state such as position, within an object tracking formulation. This leads to a low-complexity implementation of the inference routine with minimal training requirements. The proposed approach utilises suitable stochastic, namely linear Gaussian, models to capture long term dependencies in the object trajectory as dictated by intent. Numerical examples are shown to demonstrate the efficacy of this framework.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"89 1","pages":"8439-8443"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84394910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1