韵律信息辅助的基于dnn的普通话自发语音识别

Yu-Chih Deng, Cheng-Hsin Lin, Y. Liao, Yih-Ru Wang, Sin-Horng Chen
{"title":"韵律信息辅助的基于dnn的普通话自发语音识别","authors":"Yu-Chih Deng, Cheng-Hsin Lin, Y. Liao, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/O-COCOSDA50338.2020.9295010","DOIUrl":null,"url":null,"abstract":"This paper continues the method proposed in [1] and updates its traditional HMM-based ASR to state-of-the-art DNN-based ASR. Use prosodic information to assist state-of-the-art DNN-based Mandarin spontaneous-speech recognition, especially to alleviate the serious interference of annoying disfluencies and paralinguistic phenomena during decoding. This approach adopts a sophisticated hierarchical prosodic model (HPM) made of several break-syntax, break-acoustic, syllable prosodic and prosodic state models to rescore and improve the TDNN-f+RNNLM-based 1st pass decoding output and generate, at the same time, the word, Part of Speech (POS), Punctuation Mark (PM), tone, break type, and prosodic state tags for further use. Experimental results showed the HPM-based system not only dramatically reduced the word error rate from previous best value: 41.8% [1] to 21.2%. It also detected well the underlying POS, PMs, and tones (10.9%, 12.6%, and 2.3% error rates were achieved, respectively). This confirms that the proposed method is very promising on tackling the task of Mandarin spontaneous-speech recognition.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Prosodic Information-Assisted DNN-based Mandarin Spontaneous-Speech Recognition\",\"authors\":\"Yu-Chih Deng, Cheng-Hsin Lin, Y. Liao, Yih-Ru Wang, Sin-Horng Chen\",\"doi\":\"10.1109/O-COCOSDA50338.2020.9295010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper continues the method proposed in [1] and updates its traditional HMM-based ASR to state-of-the-art DNN-based ASR. Use prosodic information to assist state-of-the-art DNN-based Mandarin spontaneous-speech recognition, especially to alleviate the serious interference of annoying disfluencies and paralinguistic phenomena during decoding. This approach adopts a sophisticated hierarchical prosodic model (HPM) made of several break-syntax, break-acoustic, syllable prosodic and prosodic state models to rescore and improve the TDNN-f+RNNLM-based 1st pass decoding output and generate, at the same time, the word, Part of Speech (POS), Punctuation Mark (PM), tone, break type, and prosodic state tags for further use. Experimental results showed the HPM-based system not only dramatically reduced the word error rate from previous best value: 41.8% [1] to 21.2%. It also detected well the underlying POS, PMs, and tones (10.9%, 12.6%, and 2.3% error rates were achieved, respectively). This confirms that the proposed method is very promising on tackling the task of Mandarin spontaneous-speech recognition.\",\"PeriodicalId\":385266,\"journal\":{\"name\":\"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/O-COCOSDA50338.2020.9295010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文延续了[1]中提出的方法,将传统的基于hmm的ASR更新为最先进的基于dnn的ASR。利用韵律信息辅助最先进的基于dnn的普通话自发语音识别,特别是减轻解码过程中恼人的不流利和副语言现象的严重干扰。该方法采用由断续句法、断续声学、音节韵律和韵律状态模型组成的复杂的分层韵律模型(HPM),对基于TDNN-f+ rnnlm的一遍解码输出进行重核和改进,同时生成词性、词性、标点符号、语调、断续类型和韵律状态标签,供进一步使用。实验结果表明,基于hpm的系统不仅将单词错误率从之前的最佳值41.8%[1]大幅降低到21.2%。它还可以很好地检测潜在的词性、词性和音调(分别达到10.9%、12.6%和2.3%的错误率)。这证实了所提出的方法在解决汉语自发语音识别任务方面是非常有前途的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Prosodic Information-Assisted DNN-based Mandarin Spontaneous-Speech Recognition
This paper continues the method proposed in [1] and updates its traditional HMM-based ASR to state-of-the-art DNN-based ASR. Use prosodic information to assist state-of-the-art DNN-based Mandarin spontaneous-speech recognition, especially to alleviate the serious interference of annoying disfluencies and paralinguistic phenomena during decoding. This approach adopts a sophisticated hierarchical prosodic model (HPM) made of several break-syntax, break-acoustic, syllable prosodic and prosodic state models to rescore and improve the TDNN-f+RNNLM-based 1st pass decoding output and generate, at the same time, the word, Part of Speech (POS), Punctuation Mark (PM), tone, break type, and prosodic state tags for further use. Experimental results showed the HPM-based system not only dramatically reduced the word error rate from previous best value: 41.8% [1] to 21.2%. It also detected well the underlying POS, PMs, and tones (10.9%, 12.6%, and 2.3% error rates were achieved, respectively). This confirms that the proposed method is very promising on tackling the task of Mandarin spontaneous-speech recognition.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Front-End Technique for Automatic Noisy Speech Recognition Improving Valence Prediction in Dimensional Speech Emotion Recognition Using Linguistic Information A Comparative Study of Named Entity Recognition on Myanmar Language Intent Classification on Myanmar Social Media Data in Telecommunication Domain Using Convolutional Neural Network and Word2Vec Prosodic Information-Assisted DNN-based Mandarin Spontaneous-Speech Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1