基于潜在轨迹建模的声学-发音深度反演映射

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI:10.1109/APSIPA.2017.8282219

Patrick Lumban Tobing, H. Kameoka, T. Toda

{"title":"基于潜在轨迹建模的声学-发音深度反演映射","authors":"Patrick Lumban Tobing, H. Kameoka, T. Toda","doi":"10.1109/APSIPA.2017.8282219","DOIUrl":null,"url":null,"abstract":"This paper presents a novel implementation of latent trajectory modeling in a deep acoustic-to-articulatory inversion mapping framework. In the conventional methods, i.e., the Gaussian mixture model (GMM)- and the deep neural network (DNN)- based inversion mappings, the frame interdependency can be considered while generating articulatory parameter trajectories with the use of an explicit constraint between static and dynamic features. However, in training these models, such a constraint is not considered, and therefore, the trained model is not optimum for the mapping procedure. In this paper, we address this problem by introducing a latent trajectory modeling into the DNN-based inversion mapping. In the latent trajectory model, the frame interdependency can be well considered, in both training and mapping, by using a soft-constraint between static and dynamic features. The experimental results demonstrate that the proposed latent trajectory DNN (LTDNN)-based inversion mapping outperforms the conventional and the state-of-the-art inversion mapping systems.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling\",\"authors\":\"Patrick Lumban Tobing, H. Kameoka, T. Toda\",\"doi\":\"10.1109/APSIPA.2017.8282219\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel implementation of latent trajectory modeling in a deep acoustic-to-articulatory inversion mapping framework. In the conventional methods, i.e., the Gaussian mixture model (GMM)- and the deep neural network (DNN)- based inversion mappings, the frame interdependency can be considered while generating articulatory parameter trajectories with the use of an explicit constraint between static and dynamic features. However, in training these models, such a constraint is not considered, and therefore, the trained model is not optimum for the mapping procedure. In this paper, we address this problem by introducing a latent trajectory modeling into the DNN-based inversion mapping. In the latent trajectory model, the frame interdependency can be well considered, in both training and mapping, by using a soft-constraint between static and dynamic features. The experimental results demonstrate that the proposed latent trajectory DNN (LTDNN)-based inversion mapping outperforms the conventional and the state-of-the-art inversion mapping systems.\",\"PeriodicalId\":142091,\"journal\":{\"name\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPA.2017.8282219\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8282219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

本文提出了一种在声学-发音深层反演映射框架中实现潜在轨迹建模的新方法。在传统的方法中，即基于高斯混合模型(GMM)和基于深度神经网络(DNN)的反演映射中，在使用静态和动态特征之间的显式约束生成铰合参数轨迹时，可以考虑帧间的相互依赖性。然而，在训练这些模型时，没有考虑到这样的约束，因此，训练的模型对于映射过程来说不是最优的。在本文中，我们通过在基于dnn的反演映射中引入潜在轨迹建模来解决这个问题。在潜在轨迹模型中，通过使用静态和动态特征之间的软约束，可以在训练和映射中很好地考虑帧之间的相互依赖性。实验结果表明，基于潜在轨迹深度神经网络(LTDNN)的反演映射优于传统的和最先进的反演映射系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling

This paper presents a novel implementation of latent trajectory modeling in a deep acoustic-to-articulatory inversion mapping framework. In the conventional methods, i.e., the Gaussian mixture model (GMM)- and the deep neural network (DNN)- based inversion mappings, the frame interdependency can be considered while generating articulatory parameter trajectories with the use of an explicit constraint between static and dynamic features. However, in training these models, such a constraint is not considered, and therefore, the trained model is not optimum for the mapping procedure. In this paper, we address this problem by introducing a latent trajectory modeling into the DNN-based inversion mapping. In the latent trajectory model, the frame interdependency can be well considered, in both training and mapping, by using a soft-constraint between static and dynamic features. The experimental results demonstrate that the proposed latent trajectory DNN (LTDNN)-based inversion mapping outperforms the conventional and the state-of-the-art inversion mapping systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量