Automatic Speaker Recognition with Limited Data

Ruirui Li, Jyun-Yu Jiang, Jiahao Liu, Chu-Cheng Hsieh, Wei Wang
{"title":"Automatic Speaker Recognition with Limited Data","authors":"Ruirui Li, Jyun-Yu Jiang, Jiahao Liu, Chu-Cheng Hsieh, Wei Wang","doi":"10.1145/3336191.3371802","DOIUrl":null,"url":null,"abstract":"Automatic speaker recognition (ASR) is a stepping-stone technology towards semantic multimedia understanding and benefits versatile downstream applications. In recent years, neural network-based ASR methods have demonstrated remarkable power to achieve excellent recognition performance with sufficient training data. However, it is impractical to collect sufficient training data for every user, especially for fresh users. Therefore, a large portion of users usually has a very limited number of training instances. As a consequence, the lack of training data prevents ASR systems from accurately learning users acoustic biometrics, jeopardizes the downstream applications, and eventually impairs user experience. In this work, we propose an adversarial few-shot learning-based speaker identification framework (AFEASI) to develop robust speaker identification models with only a limited number of training instances. We first employ metric learning-based few-shot learning to learn speaker acoustic representations, where the limited instances are comprehensively utilized to improve the identification performance. In addition, adversarial learning is applied to further enhance the generalization and robustness for speaker identification with adversarial examples. Experiments conducted on a publicly available large-scale dataset demonstrate that \\model significantly outperforms eleven baseline methods. An in-depth analysis further indicates both effectiveness and robustness of the proposed method.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3336191.3371802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

Abstract

Automatic speaker recognition (ASR) is a stepping-stone technology towards semantic multimedia understanding and benefits versatile downstream applications. In recent years, neural network-based ASR methods have demonstrated remarkable power to achieve excellent recognition performance with sufficient training data. However, it is impractical to collect sufficient training data for every user, especially for fresh users. Therefore, a large portion of users usually has a very limited number of training instances. As a consequence, the lack of training data prevents ASR systems from accurately learning users acoustic biometrics, jeopardizes the downstream applications, and eventually impairs user experience. In this work, we propose an adversarial few-shot learning-based speaker identification framework (AFEASI) to develop robust speaker identification models with only a limited number of training instances. We first employ metric learning-based few-shot learning to learn speaker acoustic representations, where the limited instances are comprehensively utilized to improve the identification performance. In addition, adversarial learning is applied to further enhance the generalization and robustness for speaker identification with adversarial examples. Experiments conducted on a publicly available large-scale dataset demonstrate that \model significantly outperforms eleven baseline methods. An in-depth analysis further indicates both effectiveness and robustness of the proposed method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
有限数据的自动说话人识别
自动说话人识别(ASR)是实现语义多媒体理解的基石技术,对多种下游应用都有好处。近年来,基于神经网络的ASR方法在训练数据充足的情况下取得了优异的识别性能。然而,为每个用户收集足够的训练数据是不切实际的,特别是对于新用户。因此,很大一部分用户通常只有非常有限的训练实例。因此,训练数据的缺乏阻碍了ASR系统准确地学习用户的声学生物特征,危及下游应用,并最终损害用户体验。在这项工作中,我们提出了一个对抗性的基于少量学习的说话人识别框架(AFEASI),以开发仅使用有限数量的训练实例的鲁棒说话人识别模型。我们首先采用基于度量学习的少镜头学习来学习说话人的声学表征,其中综合利用有限的实例来提高识别性能。此外,利用对抗学习进一步增强了对抗性样本说话人识别的泛化性和鲁棒性。在公开可用的大规模数据集上进行的实验表明,\模型显著优于11种基线方法。进一步的分析表明了该方法的有效性和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Recurrent Memory Reasoning Network for Expert Finding in Community Question Answering Joint Recognition of Names and Publications in Academic Homepages LouvainNE Enhancing Re-finding Behavior with External Memories for Personalized Search Temporal Pattern of Retweet(s) Help to Maximize Information Diffusion in Twitter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1