资源不足语音识别中元学习辅助声学数据的选择

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI:10.23919/APSIPAASC55919.2022.9979997

I-Ting Hsieh, Chung-Hsien Wu, Zhenqiang Zhao

{"title":"资源不足语音识别中元学习辅助声学数据的选择","authors":"I-Ting Hsieh, Chung-Hsien Wu, Zhenqiang Zhao","doi":"10.23919/APSIPAASC55919.2022.9979997","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition (ASR) for under-resourced languages has been a challenging task during the past decade. In this paper, regarding Taiwanese as the under resourced language, the speech data of the high-resourced languages which have most phonemes in common with Taiwanese are selected as the supplementary resources for meta-training the acoustic models for Taiwanese ASR. Mandarin, English, Japanese, Cantonese and Thai as the high-resourced languages are selected as the supplementary languages based on the designed selection criteria. Model-agnostic meta-learning (MAML) is then used as the meta-training strategy. For evaluation, when 4000 utterances were selected from each supplementary language, we obtained the WER of 20.89% and the SER of 8.86% for Taiwanese ASR. The results were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and traditional method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Selection of Supplementary Acoustic Data for Meta-Learning in Under-Resourced Speech Recognition\",\"authors\":\"I-Ting Hsieh, Chung-Hsien Wu, Zhenqiang Zhao\",\"doi\":\"10.23919/APSIPAASC55919.2022.9979997\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic speech recognition (ASR) for under-resourced languages has been a challenging task during the past decade. In this paper, regarding Taiwanese as the under resourced language, the speech data of the high-resourced languages which have most phonemes in common with Taiwanese are selected as the supplementary resources for meta-training the acoustic models for Taiwanese ASR. Mandarin, English, Japanese, Cantonese and Thai as the high-resourced languages are selected as the supplementary languages based on the designed selection criteria. Model-agnostic meta-learning (MAML) is then used as the meta-training strategy. For evaluation, when 4000 utterances were selected from each supplementary language, we obtained the WER of 20.89% and the SER of 8.86% for Taiwanese ASR. The results were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and traditional method.\",\"PeriodicalId\":382967,\"journal\":{\"name\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPAASC55919.2022.9979997\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9979997","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在过去的十年中，资源匮乏语言的自动语音识别(ASR)一直是一项具有挑战性的任务。本文以台语为资源不足语言，选取与台语音素相近的高资源语言语音数据作为辅助资源，对台语ASR声学模型进行元训练。普通话、英语、日语、粤语和泰语作为资源丰富的语言，根据设计的选择标准作为补充语言。然后使用模型不可知元学习(MAML)作为元训练策略。为了评估，我们从每种补充语言中选择4000个话语，我们得到台湾ASR的WER为20.89%，SER为8.86%。结果优于仅使用台湾语料库和传统方法的基线模型(26.18%和13.99%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Selection of Supplementary Acoustic Data for Meta-Learning in Under-Resourced Speech Recognition

Automatic speech recognition (ASR) for under-resourced languages has been a challenging task during the past decade. In this paper, regarding Taiwanese as the under resourced language, the speech data of the high-resourced languages which have most phonemes in common with Taiwanese are selected as the supplementary resources for meta-training the acoustic models for Taiwanese ASR. Mandarin, English, Japanese, Cantonese and Thai as the high-resourced languages are selected as the supplementary languages based on the designed selection criteria. Model-agnostic meta-learning (MAML) is then used as the meta-training strategy. For evaluation, when 4000 utterances were selected from each supplementary language, we obtained the WER of 20.89% and the SER of 8.86% for Taiwanese ASR. The results were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and traditional method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助