DNN acoustic modeling with modular multi-lingual feature extraction networks

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI:10.1109/ASRU.2013.6707754

Jonas Gehring, Quoc Bao Nguyen, Florian Metze, A. Waibel

{"title":"DNN acoustic modeling with modular multi-lingual feature extraction networks","authors":"Jonas Gehring, Quoc Bao Nguyen, Florian Metze, A. Waibel","doi":"10.1109/ASRU.2013.6707754","DOIUrl":null,"url":null,"abstract":"In this work, we propose several deep neural network architectures that are able to leverage data from multiple languages. Modularity is achieved by training networks for extracting high-level features and for estimating phoneme state posteriors separately, and then combining them for decoding in a hybrid DNN/HMM setup. This approach has been shown to achieve superior performance for single-language systems, and here we demonstrate that feature extractors benefit significantly from being trained as multi-lingual networks with shared hidden representations. We also show that existing mono-lingual networks can be re-used in a modular fashion to achieve a similar level of performance without having to train new networks on multi-lingual data. Furthermore, we investigate in extending these architectures to make use of language-specific acoustic features. Evaluations are performed on a low-resource conversational telephone speech transcription task in Vietnamese, while additional data for acoustic model training is provided in Pashto, Tagalog, Turkish, and Cantonese. Improvements of up to 17.4% and 13.8% over mono-lingual GMMs and DNNs, respectively, are obtained.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2013.6707754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

In this work, we propose several deep neural network architectures that are able to leverage data from multiple languages. Modularity is achieved by training networks for extracting high-level features and for estimating phoneme state posteriors separately, and then combining them for decoding in a hybrid DNN/HMM setup. This approach has been shown to achieve superior performance for single-language systems, and here we demonstrate that feature extractors benefit significantly from being trained as multi-lingual networks with shared hidden representations. We also show that existing mono-lingual networks can be re-used in a modular fashion to achieve a similar level of performance without having to train new networks on multi-lingual data. Furthermore, we investigate in extending these architectures to make use of language-specific acoustic features. Evaluations are performed on a low-resource conversational telephone speech transcription task in Vietnamese, while additional data for acoustic model training is provided in Pashto, Tagalog, Turkish, and Cantonese. Improvements of up to 17.4% and 13.8% over mono-lingual GMMs and DNNs, respectively, are obtained.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于模块化多语言特征提取网络的深度神经网络声学建模

在这项工作中，我们提出了几种能够利用多种语言数据的深度神经网络架构。模块化是通过训练网络分别提取高级特征和估计音素状态后验来实现的，然后在混合DNN/HMM设置中将它们组合在一起进行解码。这种方法已经被证明可以在单语言系统中获得卓越的性能，在这里我们证明了特征提取器从被训练成具有共享隐藏表示的多语言网络中受益匪浅。我们还表明，现有的单语言网络可以以模块化的方式重用，以达到类似的性能水平，而无需在多语言数据上训练新的网络。此外，我们还研究了如何扩展这些架构以利用特定语言的声学特征。对越南语的低资源会话电话语音转录任务进行了评估，同时提供了普什图语、他加禄语、土耳其语和粤语的声学模型训练的附加数据。与单语GMMs和dnn相比，分别提高了17.4%和13.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

自引率

0.00%

发文量