面向藏语多方言多任务识别的开放语音资源

Int. J. Comput. Sci. Eng. Pub Date : 2020-05-08 DOI:10.1504/ijcse.2020.10029389

Yue Zhao, Xiaona Xu, Jianjian Yue, Wei Song, Xiali Li, Licheng Wu, Q. Ji

{"title":"面向藏语多方言多任务识别的开放语音资源","authors":"Yue Zhao, Xiaona Xu, Jianjian Yue, Wei Song, Xiali Li, Licheng Wu, Q. Ji","doi":"10.1504/ijcse.2020.10029389","DOIUrl":null,"url":null,"abstract":"This paper introduces a Tibetan multi-dialect data resource for multitask speech research. It can be used for Tibetan multi-dialect speech recognition, Tibetan speaker recognition, Tibetan dialect identification, and Tibetan speech synthesis. The resource consists of 30 hours Lhasa-U-Tsang dialect; 8.7 hours Kham dialect, including 3.4 hours Yushu dialect, 3.3 hours Dege dialect and 2 hours Changdu dialect; 10 hours Amdo pastoral dialect. Other resources are also provided for Lhasa-U-Tsang dialect including phoneme set, pronunciation dictionary and the codes for constructing the Lhasa-U-Tsang speech recognition baseline system. Meanwhile, for Tibetan multi-dialect and multitask speech recognition, the codes and recognition results based on WaveNet-connectionist temporal classification (WaveNet-CTC) are provided. All the resources are free for researchers and publicly available, which effectively compensates for the shortage of public Tibetan multi-dialect speech resources in order to promote the development of Tibetan multi-dialect speech processing technology.","PeriodicalId":340410,"journal":{"name":"Int. J. Comput. Sci. Eng.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An open speech resource for Tibetan multi-dialect and multitask recognition\",\"authors\":\"Yue Zhao, Xiaona Xu, Jianjian Yue, Wei Song, Xiali Li, Licheng Wu, Q. Ji\",\"doi\":\"10.1504/ijcse.2020.10029389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces a Tibetan multi-dialect data resource for multitask speech research. It can be used for Tibetan multi-dialect speech recognition, Tibetan speaker recognition, Tibetan dialect identification, and Tibetan speech synthesis. The resource consists of 30 hours Lhasa-U-Tsang dialect; 8.7 hours Kham dialect, including 3.4 hours Yushu dialect, 3.3 hours Dege dialect and 2 hours Changdu dialect; 10 hours Amdo pastoral dialect. Other resources are also provided for Lhasa-U-Tsang dialect including phoneme set, pronunciation dictionary and the codes for constructing the Lhasa-U-Tsang speech recognition baseline system. Meanwhile, for Tibetan multi-dialect and multitask speech recognition, the codes and recognition results based on WaveNet-connectionist temporal classification (WaveNet-CTC) are provided. All the resources are free for researchers and publicly available, which effectively compensates for the shortage of public Tibetan multi-dialect speech resources in order to promote the development of Tibetan multi-dialect speech processing technology.\",\"PeriodicalId\":340410,\"journal\":{\"name\":\"Int. J. Comput. Sci. Eng.\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Sci. Eng.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijcse.2020.10029389\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Sci. Eng.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijcse.2020.10029389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文介绍了一种用于多任务语音研究的藏语多方言数据资源。可用于藏语多方言语音识别、藏语说话人识别、藏语方言识别和藏语语音合成。资源包括30小时的拉萨-乌-曾方言;康方言8.7学时，其中玉树方言3.4学时、德格方言3.3学时、昌都方言2学时;10小时安多田园方言。此外，还提供了其他资源，包括音素集、发音字典和构建拉萨-裕- tsang语音识别基线系统的代码。同时，针对藏语多方言多任务语音识别，给出了基于WaveNet-connectionist temporal classification (WaveNet-CTC)的编码和识别结果。所有资源对研究人员都是免费的，并且是公开的，这有效地弥补了公共藏语多方言语音资源的不足，从而促进藏语多方言语音处理技术的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An open speech resource for Tibetan multi-dialect and multitask recognition

This paper introduces a Tibetan multi-dialect data resource for multitask speech research. It can be used for Tibetan multi-dialect speech recognition, Tibetan speaker recognition, Tibetan dialect identification, and Tibetan speech synthesis. The resource consists of 30 hours Lhasa-U-Tsang dialect; 8.7 hours Kham dialect, including 3.4 hours Yushu dialect, 3.3 hours Dege dialect and 2 hours Changdu dialect; 10 hours Amdo pastoral dialect. Other resources are also provided for Lhasa-U-Tsang dialect including phoneme set, pronunciation dictionary and the codes for constructing the Lhasa-U-Tsang speech recognition baseline system. Meanwhile, for Tibetan multi-dialect and multitask speech recognition, the codes and recognition results based on WaveNet-connectionist temporal classification (WaveNet-CTC) are provided. All the resources are free for researchers and publicly available, which effectively compensates for the shortage of public Tibetan multi-dialect speech resources in order to promote the development of Tibetan multi-dialect speech processing technology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Comput. Sci. Eng.

自引率

0.00%

发文量