A progress report of the Taiwan Mandarin radio speech corpus project

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA) Pub Date : 2017-11-01 DOI:10.1109/ICSDA.2017.8384450

Y. Liao, Y. Chang, Sing-Yue Wang, Jhih-wei Chen, Sheng-Ming Wang, Jenq-Haur Wang

引用次数: 6

Abstract

The Taiwan Mandarin Radio Speech Corpus contains 300 (and growing) hours of high-quality recordings selected from Taiwan's National Education Radio (NER) archive. The corpus features speech (of various speaking styles, produced by hundreds of speakers) and their corresponding transcriptions (automatically transcribed and manually corrected) and annotations, which are suitable for speech and language research. In this paper, we report the progress of the corpus development and especially show the experimental results of audio event detection/segmentation and semi-supervised acoustic model training on this corpus.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

台湾普通话广播语料库项目进展报告

台湾普通话广播语音语料库包含300小时(并且还在不断增长)的高质量录音，这些录音是从台湾国家教育广播电台(NER)的档案中挑选出来的。该语料库包含语音(各种说话风格，由数百名演讲者产生)及其相应的转录(自动转录和手动校正)和注释，适合语音和语言研究。在本文中，我们报告了语料库开发的进展，特别是展示了在该语料库上音频事件检测/分割和半监督声学模型训练的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

自引率

0.00%

发文量