自动语音识别的性别域自适应

Artem Sokolov, A. Savchenko
{"title":"自动语音识别的性别域自适应","authors":"Artem Sokolov, A. Savchenko","doi":"10.1109/SAMI50585.2021.9378626","DOIUrl":null,"url":null,"abstract":"This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conducted experiments with finetuning on the gender-specific test subsets. The obtained word error rate (WER) relatively to the baseline is up to 5% and 3% lower on male and female subsets, respectively, if the layers in the encoder and decoder are not frozen, and the tuning is started from the last checkpoints. Moreover, we adapted our base model on the complete L2 Arctic dataset of accented speech and finetuned it for particular speakers and male and female genders separately. The models trained on the gender subsets obtained 1–2% lower WER when compared to the model tuned on the whole L2 Arctic dataset. Finally, it was experimentally confirmed that the concatenation of the pretrained voice embeddings (x-vector) and embeddings from a conventional encoder cannot significantly improve the speech recognition accuracy.","PeriodicalId":402414,"journal":{"name":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gender domain adaptation for automatic speech recognition\",\"authors\":\"Artem Sokolov, A. Savchenko\",\"doi\":\"10.1109/SAMI50585.2021.9378626\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conducted experiments with finetuning on the gender-specific test subsets. The obtained word error rate (WER) relatively to the baseline is up to 5% and 3% lower on male and female subsets, respectively, if the layers in the encoder and decoder are not frozen, and the tuning is started from the last checkpoints. Moreover, we adapted our base model on the complete L2 Arctic dataset of accented speech and finetuned it for particular speakers and male and female genders separately. The models trained on the gender subsets obtained 1–2% lower WER when compared to the model tuned on the whole L2 Arctic dataset. Finally, it was experimentally confirmed that the concatenation of the pretrained voice embeddings (x-vector) and embeddings from a conventional encoder cannot significantly improve the speech recognition accuracy.\",\"PeriodicalId\":402414,\"journal\":{\"name\":\"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SAMI50585.2021.9378626\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMI50585.2021.9378626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文主要研究针对特定性别的说话人自适应目标的声学模型的微调。我们在librisspeech -960上对Transformer基线模型进行了预训练,并对特定性别的测试子集进行了微调实验。如果编码器和解码器中的层没有冻结,并且从最后一个检查点开始调优,则相对于基线获得的单词错误率(WER)在男性和女性子集上分别降低5%和3%。此外,我们在完整的L2北极重音语音数据集上调整了我们的基本模型,并分别针对特定的说话者和男性和女性性别进行了微调。与在整个L2北极数据集上调整的模型相比,在性别子集上训练的模型获得的WER降低了1-2%。最后,通过实验证明,将预训练好的语音嵌入(x向量)与传统编码器的嵌入进行拼接并不能显著提高语音识别的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Gender domain adaptation for automatic speech recognition
This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conducted experiments with finetuning on the gender-specific test subsets. The obtained word error rate (WER) relatively to the baseline is up to 5% and 3% lower on male and female subsets, respectively, if the layers in the encoder and decoder are not frozen, and the tuning is started from the last checkpoints. Moreover, we adapted our base model on the complete L2 Arctic dataset of accented speech and finetuned it for particular speakers and male and female genders separately. The models trained on the gender subsets obtained 1–2% lower WER when compared to the model tuned on the whole L2 Arctic dataset. Finally, it was experimentally confirmed that the concatenation of the pretrained voice embeddings (x-vector) and embeddings from a conventional encoder cannot significantly improve the speech recognition accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Usage of RAPTOR for travel time minimizing journey planner Slip Control by Identifying the Magnetic Field of the Elements of an Asynchronous Motor Supervised Operational Change Point Detection using Ensemble Long-Short Term Memory in a Multicomponent Industrial System Improving the activity recognition using GMAF and transfer learning in post-stroke rehabilitation assessment A Baseline Assessment Method of UAV Swarm Resilience Based on Complex Networks*
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1