An open speech resource for Tibetan multi-dialect and multitask recognition

Int. J. Comput. Sci. Eng. Pub Date : 2020-05-08 DOI:10.1504/ijcse.2020.10029389

Yue Zhao, Xiaona Xu, Jianjian Yue, Wei Song, Xiali Li, Licheng Wu, Q. Ji

引用次数: 3

Abstract

This paper introduces a Tibetan multi-dialect data resource for multitask speech research. It can be used for Tibetan multi-dialect speech recognition, Tibetan speaker recognition, Tibetan dialect identification, and Tibetan speech synthesis. The resource consists of 30 hours Lhasa-U-Tsang dialect; 8.7 hours Kham dialect, including 3.4 hours Yushu dialect, 3.3 hours Dege dialect and 2 hours Changdu dialect; 10 hours Amdo pastoral dialect. Other resources are also provided for Lhasa-U-Tsang dialect including phoneme set, pronunciation dictionary and the codes for constructing the Lhasa-U-Tsang speech recognition baseline system. Meanwhile, for Tibetan multi-dialect and multitask speech recognition, the codes and recognition results based on WaveNet-connectionist temporal classification (WaveNet-CTC) are provided. All the resources are free for researchers and publicly available, which effectively compensates for the shortage of public Tibetan multi-dialect speech resources in order to promote the development of Tibetan multi-dialect speech processing technology.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向藏语多方言多任务识别的开放语音资源

本文介绍了一种用于多任务语音研究的藏语多方言数据资源。可用于藏语多方言语音识别、藏语说话人识别、藏语方言识别和藏语语音合成。资源包括30小时的拉萨-乌-曾方言;康方言8.7学时，其中玉树方言3.4学时、德格方言3.3学时、昌都方言2学时;10小时安多田园方言。此外，还提供了其他资源，包括音素集、发音字典和构建拉萨-裕- tsang语音识别基线系统的代码。同时，针对藏语多方言多任务语音识别，给出了基于WaveNet-connectionist temporal classification (WaveNet-CTC)的编码和识别结果。所有资源对研究人员都是免费的，并且是公开的，这有效地弥补了公共藏语多方言语音资源的不足，从而促进藏语多方言语音处理技术的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Int. J. Comput. Sci. Eng.

自引率

0.00%

发文量