台湾语音识别挑战赛2020与台湾语料库中的台湾语

Y. Liao, Chia-Yu Chang, Hak-Khiam Tiun, Huang-Lan Su, Hui-Lu Khoo, Jane S. Tsay, Le-Kun Tan, Peter Kang, Tsun-guan Thiann, Un-Gian Iunn, Jyh-Her Yang, Chih-Neng Liang
{"title":"台湾语音识别挑战赛2020与台湾语料库中的台湾语","authors":"Y. Liao, Chia-Yu Chang, Hak-Khiam Tiun, Huang-Lan Su, Hui-Lu Khoo, Jane S. Tsay, Le-Kun Tan, Peter Kang, Tsun-guan Thiann, Un-Gian Iunn, Jyh-Her Yang, Chih-Neng Liang","doi":"10.1109/O-COCOSDA50338.2020.9295019","DOIUrl":null,"url":null,"abstract":"Taiwanese (a.k.a. Taiwanese Hokkien, Hoklo, Taigi, Southern Min or Min-Nan) is an endangered language, because the domination of Mandarin, the number of Taiwanese speakers continues to drop, especially among the youth generations. In addressing this problem, a Taiwanese speech-enabled human-computer interface for supporting people's daily life is essential. Therefore, a Formosa Speech in the Wild (FSW) project was established to collect a large-scale Taiwanese speech across Taiwan (TAT) corpus to boost the development of Taiwanese speech recognition (TSR). A Formosa Speech Recognition Challenge 2020 (FSR-2020) was also hosted to promote the corpus as well as to evaluate the performance of state-of-the-art TSR systems. This paper briefly introduces TAT corpus and FSR-2020 challenge, presents the provided data profile, evaluation plan and reports experimental baseline results. A subset of TAT corpus, TAT-Vol1, is given away for free for all participants (non-commercial license), and its corresponding Kaldi baseline recipes have been published online. Experimental results have showed that the combination of TAT corpus and the baseline recipes is a good resource pack for TSR research and development.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Formosa Speech Recognition Challenge 2020 and Taiwanese Across Taiwan Corpus\",\"authors\":\"Y. Liao, Chia-Yu Chang, Hak-Khiam Tiun, Huang-Lan Su, Hui-Lu Khoo, Jane S. Tsay, Le-Kun Tan, Peter Kang, Tsun-guan Thiann, Un-Gian Iunn, Jyh-Her Yang, Chih-Neng Liang\",\"doi\":\"10.1109/O-COCOSDA50338.2020.9295019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Taiwanese (a.k.a. Taiwanese Hokkien, Hoklo, Taigi, Southern Min or Min-Nan) is an endangered language, because the domination of Mandarin, the number of Taiwanese speakers continues to drop, especially among the youth generations. In addressing this problem, a Taiwanese speech-enabled human-computer interface for supporting people's daily life is essential. Therefore, a Formosa Speech in the Wild (FSW) project was established to collect a large-scale Taiwanese speech across Taiwan (TAT) corpus to boost the development of Taiwanese speech recognition (TSR). A Formosa Speech Recognition Challenge 2020 (FSR-2020) was also hosted to promote the corpus as well as to evaluate the performance of state-of-the-art TSR systems. This paper briefly introduces TAT corpus and FSR-2020 challenge, presents the provided data profile, evaluation plan and reports experimental baseline results. A subset of TAT corpus, TAT-Vol1, is given away for free for all participants (non-commercial license), and its corresponding Kaldi baseline recipes have been published online. Experimental results have showed that the combination of TAT corpus and the baseline recipes is a good resource pack for TSR research and development.\",\"PeriodicalId\":385266,\"journal\":{\"name\":\"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/O-COCOSDA50338.2020.9295019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

摘要

台湾语(又名台湾闽南语、闽南语、台语、闽南语或闽南语)是一种濒临灭绝的语言,因为普通话的统治,讲台湾语的人数持续下降,尤其是在年轻一代中。为了解决这个问题,一个支持人们日常生活的台湾语音人机界面是必不可少的。因此,本研究建立台塑野外语音(FSW)计画,以搜集大型台文语音(TAT)语料库,推动台文语音识别(TSR)的发展。此外,还举办了Formosa语音识别挑战赛2020 (FSR-2020),以推广语料库并评估最先进的TSR系统的性能。本文简要介绍了TAT语料库和FSR-2020挑战,介绍了提供的数据概况、评估计划,并报告了实验基线结果。TAT语料库的子集TAT- vol1免费提供给所有参与者(非商业许可),其相应的Kaldi基线配方已在线发布。实验结果表明,TAT语料库与基线配方的结合为TSR研究和开发提供了良好的资源包。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Formosa Speech Recognition Challenge 2020 and Taiwanese Across Taiwan Corpus
Taiwanese (a.k.a. Taiwanese Hokkien, Hoklo, Taigi, Southern Min or Min-Nan) is an endangered language, because the domination of Mandarin, the number of Taiwanese speakers continues to drop, especially among the youth generations. In addressing this problem, a Taiwanese speech-enabled human-computer interface for supporting people's daily life is essential. Therefore, a Formosa Speech in the Wild (FSW) project was established to collect a large-scale Taiwanese speech across Taiwan (TAT) corpus to boost the development of Taiwanese speech recognition (TSR). A Formosa Speech Recognition Challenge 2020 (FSR-2020) was also hosted to promote the corpus as well as to evaluate the performance of state-of-the-art TSR systems. This paper briefly introduces TAT corpus and FSR-2020 challenge, presents the provided data profile, evaluation plan and reports experimental baseline results. A subset of TAT corpus, TAT-Vol1, is given away for free for all participants (non-commercial license), and its corresponding Kaldi baseline recipes have been published online. Experimental results have showed that the combination of TAT corpus and the baseline recipes is a good resource pack for TSR research and development.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Front-End Technique for Automatic Noisy Speech Recognition Improving Valence Prediction in Dimensional Speech Emotion Recognition Using Linguistic Information A Comparative Study of Named Entity Recognition on Myanmar Language Intent Classification on Myanmar Social Media Data in Telecommunication Domain Using Convolutional Neural Network and Word2Vec Prosodic Information-Assisted DNN-based Mandarin Spontaneous-Speech Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1