Ant Multilingual Recognition System for OLR 2021 Challenge

Anqi Lyu, Zhiming Wang, Huijia Zhu
{"title":"Ant Multilingual Recognition System for OLR 2021 Challenge","authors":"Anqi Lyu, Zhiming Wang, Huijia Zhu","doi":"10.21437/interspeech.2022-355","DOIUrl":null,"url":null,"abstract":"This paper presents a comprehensive description of the Ant multilingual recognition system for the 6th Oriental Language Recognition(OLR 2021) Challenge. Inspired by the transfer learning scheme, the encoder components of language iden-tification(LID) model is initialized from pretrained automatic speech recognition(ASR) networks for integrating the lexical phonetic information into language identification. The ASR model is encoder-decoder networks based on U2++ architecture [1]; then inheriting the shared conformer encoder [2] from pretrained ASR model which is effective at global information capturing and local invariance modeling, the LID model, with an attentive statistical pooling layer and a following linear projection layer added on the encoder, is further finetuned until its optimum. Furthermore, data augmentation, score normalization and model ensemble are good strategies to improve performance indicators, which are investigated and analysed in detail within our paper. In the OLR 2021 Challenge, our submitted systems ranked the top in both tasks 1 and 2 with primary met-rics of 0.0025 and 0.0039 respectively, less than 1/3 of the second place 1 , which fully illustrates that our methodologies for multilingual identification are effectual and competitive in real-life scenarios.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This paper presents a comprehensive description of the Ant multilingual recognition system for the 6th Oriental Language Recognition(OLR 2021) Challenge. Inspired by the transfer learning scheme, the encoder components of language iden-tification(LID) model is initialized from pretrained automatic speech recognition(ASR) networks for integrating the lexical phonetic information into language identification. The ASR model is encoder-decoder networks based on U2++ architecture [1]; then inheriting the shared conformer encoder [2] from pretrained ASR model which is effective at global information capturing and local invariance modeling, the LID model, with an attentive statistical pooling layer and a following linear projection layer added on the encoder, is further finetuned until its optimum. Furthermore, data augmentation, score normalization and model ensemble are good strategies to improve performance indicators, which are investigated and analysed in detail within our paper. In the OLR 2021 Challenge, our submitted systems ranked the top in both tasks 1 and 2 with primary met-rics of 0.0025 and 0.0039 respectively, less than 1/3 of the second place 1 , which fully illustrates that our methodologies for multilingual identification are effectual and competitive in real-life scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向OLR 2021挑战赛的蚂蚁多语言识别系统
本文全面介绍了第六届东方语言识别(OLR 2021)挑战赛的蚂蚁多语言识别系统。受迁移学习方案的启发,语言识别(LID)模型的编码器组件从预训练的自动语音识别(ASR)网络中初始化,用于将词汇语音信息集成到语言识别中。ASR模型是基于U2++架构的编解码器网络[1];然后从预训练的ASR模型中继承共享的一致性编码器[2],该模型在全局信息捕获和局部不变性建模方面是有效的,在编码器上添加了注意的统计池化层和随后的线性投影层的LID模型被进一步微调,直到其最优。此外,数据扩充、分数归一化和模型集成是提高绩效指标的好策略,本文对此进行了详细的研究和分析。在OLR 2021挑战赛中,我们提交的系统在任务1和任务2中都排名第一,主要成绩分别为0.0025和0.0039,不到第二名1的1/3,这充分说明我们的多语言识别方法在现实场景中是有效和有竞争力的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance. Pronunciation modeling of foreign words for Mandarin ASR by considering the effect of language transfer VCSE: Time-Domain Visual-Contextual Speaker Extraction Network Induce Spoken Dialog Intents via Deep Unsupervised Context Contrastive Clustering Nasal Coda Loss in the Chengdu Dialect of Mandarin: Evidence from RT-MRI
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1