{"title":"An open speech resource for Tibetan multi-dialect and multitask recognition","authors":"Yue Zhao, Xiaona Xu, Jianjian Yue, Wei Song, Xiali Li, Licheng Wu, Q. Ji","doi":"10.1504/ijcse.2020.10029389","DOIUrl":null,"url":null,"abstract":"This paper introduces a Tibetan multi-dialect data resource for multitask speech research. It can be used for Tibetan multi-dialect speech recognition, Tibetan speaker recognition, Tibetan dialect identification, and Tibetan speech synthesis. The resource consists of 30 hours Lhasa-U-Tsang dialect; 8.7 hours Kham dialect, including 3.4 hours Yushu dialect, 3.3 hours Dege dialect and 2 hours Changdu dialect; 10 hours Amdo pastoral dialect. Other resources are also provided for Lhasa-U-Tsang dialect including phoneme set, pronunciation dictionary and the codes for constructing the Lhasa-U-Tsang speech recognition baseline system. Meanwhile, for Tibetan multi-dialect and multitask speech recognition, the codes and recognition results based on WaveNet-connectionist temporal classification (WaveNet-CTC) are provided. All the resources are free for researchers and publicly available, which effectively compensates for the shortage of public Tibetan multi-dialect speech resources in order to promote the development of Tibetan multi-dialect speech processing technology.","PeriodicalId":340410,"journal":{"name":"Int. J. Comput. Sci. Eng.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Sci. Eng.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijcse.2020.10029389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This paper introduces a Tibetan multi-dialect data resource for multitask speech research. It can be used for Tibetan multi-dialect speech recognition, Tibetan speaker recognition, Tibetan dialect identification, and Tibetan speech synthesis. The resource consists of 30 hours Lhasa-U-Tsang dialect; 8.7 hours Kham dialect, including 3.4 hours Yushu dialect, 3.3 hours Dege dialect and 2 hours Changdu dialect; 10 hours Amdo pastoral dialect. Other resources are also provided for Lhasa-U-Tsang dialect including phoneme set, pronunciation dictionary and the codes for constructing the Lhasa-U-Tsang speech recognition baseline system. Meanwhile, for Tibetan multi-dialect and multitask speech recognition, the codes and recognition results based on WaveNet-connectionist temporal classification (WaveNet-CTC) are provided. All the resources are free for researchers and publicly available, which effectively compensates for the shortage of public Tibetan multi-dialect speech resources in order to promote the development of Tibetan multi-dialect speech processing technology.