Ant Multilingual Recognition System for OLR 2021 Challenge

Interspeech Pub Date : 2022-09-18 DOI:10.21437/interspeech.2022-355

Anqi Lyu, Zhiming Wang, Huijia Zhu

{"title":"Ant Multilingual Recognition System for OLR 2021 Challenge","authors":"Anqi Lyu, Zhiming Wang, Huijia Zhu","doi":"10.21437/interspeech.2022-355","DOIUrl":null,"url":null,"abstract":"This paper presents a comprehensive description of the Ant multilingual recognition system for the 6th Oriental Language Recognition(OLR 2021) Challenge. Inspired by the transfer learning scheme, the encoder components of language iden-tification(LID) model is initialized from pretrained automatic speech recognition(ASR) networks for integrating the lexical phonetic information into language identification. The ASR model is encoder-decoder networks based on U2++ architecture [1]; then inheriting the shared conformer encoder [2] from pretrained ASR model which is effective at global information capturing and local invariance modeling, the LID model, with an attentive statistical pooling layer and a following linear projection layer added on the encoder, is further finetuned until its optimum. Furthermore, data augmentation, score normalization and model ensemble are good strategies to improve performance indicators, which are investigated and analysed in detail within our paper. In the OLR 2021 Challenge, our submitted systems ranked the top in both tasks 1 and 2 with primary met-rics of 0.0025 and 0.0039 respectively, less than 1/3 of the second place 1 , which fully illustrates that our methodologies for multilingual identification are effectual and competitive in real-life scenarios.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"3684-3688"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

This paper presents a comprehensive description of the Ant multilingual recognition system for the 6th Oriental Language Recognition(OLR 2021) Challenge. Inspired by the transfer learning scheme, the encoder components of language iden-tification(LID) model is initialized from pretrained automatic speech recognition(ASR) networks for integrating the lexical phonetic information into language identification. The ASR model is encoder-decoder networks based on U2++ architecture [1]; then inheriting the shared conformer encoder [2] from pretrained ASR model which is effective at global information capturing and local invariance modeling, the LID model, with an attentive statistical pooling layer and a following linear projection layer added on the encoder, is further finetuned until its optimum. Furthermore, data augmentation, score normalization and model ensemble are good strategies to improve performance indicators, which are investigated and analysed in detail within our paper. In the OLR 2021 Challenge, our submitted systems ranked the top in both tasks 1 and 2 with primary met-rics of 0.0025 and 0.0039 respectively, less than 1/3 of the second place 1 , which fully illustrates that our methodologies for multilingual identification are effectual and competitive in real-life scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向OLR 2021挑战赛的蚂蚁多语言识别系统

本文全面介绍了第六届东方语言识别（OLR 2021）挑战赛的蚂蚁多语言识别系统。受迁移学习方案的启发，语言识别（LID）模型的编码器组件从预训练的自动语音识别（ASR）网络中初始化，用于将词汇语音信息集成到语言识别中。ASR模型是基于U2++架构的编解码器网络[1]；然后从预训练的ASR模型中继承共享的一致性编码器[2]，该模型在全局信息捕获和局部不变性建模方面是有效的，在编码器上添加了注意的统计池化层和随后的线性投影层的LID模型被进一步微调，直到其最优。此外，数据扩充、分数归一化和模型集成是提高绩效指标的好策略，本文对此进行了详细的研究和分析。在OLR 2021挑战赛中，我们提交的系统在任务1和任务2中都排名第一，主要成绩分别为0.0025和0.0039，不到第二名1的1/3，这充分说明我们的多语言识别方法在现实场景中是有效和有竞争力的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Interspeech

自引率

0.00%

发文量