Selection of Supplementary Acoustic Data for Meta-Learning in Under-Resourced Speech Recognition

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI:10.23919/APSIPAASC55919.2022.9979997

I-Ting Hsieh, Chung-Hsien Wu, Zhenqiang Zhao

引用次数: 1

Abstract

Automatic speech recognition (ASR) for under-resourced languages has been a challenging task during the past decade. In this paper, regarding Taiwanese as the under resourced language, the speech data of the high-resourced languages which have most phonemes in common with Taiwanese are selected as the supplementary resources for meta-training the acoustic models for Taiwanese ASR. Mandarin, English, Japanese, Cantonese and Thai as the high-resourced languages are selected as the supplementary languages based on the designed selection criteria. Model-agnostic meta-learning (MAML) is then used as the meta-training strategy. For evaluation, when 4000 utterances were selected from each supplementary language, we obtained the WER of 20.89% and the SER of 8.86% for Taiwanese ASR. The results were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and traditional method.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

资源不足语音识别中元学习辅助声学数据的选择

在过去的十年中，资源匮乏语言的自动语音识别(ASR)一直是一项具有挑战性的任务。本文以台语为资源不足语言，选取与台语音素相近的高资源语言语音数据作为辅助资源，对台语ASR声学模型进行元训练。普通话、英语、日语、粤语和泰语作为资源丰富的语言，根据设计的选择标准作为补充语言。然后使用模型不可知元学习(MAML)作为元训练策略。为了评估，我们从每种补充语言中选择4000个话语，我们得到台湾ASR的WER为20.89%，SER为8.86%。结果优于仅使用台湾语料库和传统方法的基线模型(26.18%和13.99%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量