基于课程学习的广义判别变换用于说话人识别

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2018-04-23 DOI:10.1109/ICASSP.2018.8461296

E. Marchi, Stephen Shum, Kvuveon Hwang, S. Kajarekar, Siddharth Sigtia, H. Richards, R. Haynes, Yoon Kim, J. Bridle

{"title":"基于课程学习的广义判别变换用于说话人识别","authors":"E. Marchi, Stephen Shum, Kvuveon Hwang, S. Kajarekar, Siddharth Sigtia, H. Richards, R. Haynes, Yoon Kim, J. Bridle","doi":"10.1109/ICASSP.2018.8461296","DOIUrl":null,"url":null,"abstract":"In this paper we introduce a speaker verification system deployed on mobile devices that can be used to personalise a keyword spotter. We describe a baseline DNN system that maps an utterance to a speaker embedding, which is used to measure speaker differences via cosine similarity. We then introduce an architectural modification which uses an LSTM system where the parameters are optimised via a curriculum learning procedure to reduce the detection error and improve its generalisability across various conditions. Experiments on our internal datasets show that the proposed approach outperforms the DNN baseline system and yields a relative EER reduction of 30-70% on both text-dependent and text-independent tasks under a variety of acoustic conditions.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"5324-5328"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Generalised Discriminative Transform via Curriculum Learning for Speaker Recognition\",\"authors\":\"E. Marchi, Stephen Shum, Kvuveon Hwang, S. Kajarekar, Siddharth Sigtia, H. Richards, R. Haynes, Yoon Kim, J. Bridle\",\"doi\":\"10.1109/ICASSP.2018.8461296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we introduce a speaker verification system deployed on mobile devices that can be used to personalise a keyword spotter. We describe a baseline DNN system that maps an utterance to a speaker embedding, which is used to measure speaker differences via cosine similarity. We then introduce an architectural modification which uses an LSTM system where the parameters are optimised via a curriculum learning procedure to reduce the detection error and improve its generalisability across various conditions. Experiments on our internal datasets show that the proposed approach outperforms the DNN baseline system and yields a relative EER reduction of 30-70% on both text-dependent and text-independent tasks under a variety of acoustic conditions.\",\"PeriodicalId\":6638,\"journal\":{\"name\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"7 1\",\"pages\":\"5324-5328\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2018.8461296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2018.8461296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

在本文中，我们介绍了一个部署在移动设备上的说话人验证系统，该系统可用于个性化关键字定位器。我们描述了一个基线DNN系统，该系统将话语映射到说话人嵌入，该嵌入用于通过余弦相似性测量说话人的差异。然后，我们引入了一个使用LSTM系统的架构修改，其中参数通过课程学习过程进行优化，以减少检测误差并提高其在各种条件下的通用性。在我们内部数据集上的实验表明，所提出的方法优于DNN基线系统，在各种声学条件下，文本依赖和文本独立任务的相对EER降低了30-70%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Generalised Discriminative Transform via Curriculum Learning for Speaker Recognition

In this paper we introduce a speaker verification system deployed on mobile devices that can be used to personalise a keyword spotter. We describe a baseline DNN system that maps an utterance to a speaker embedding, which is used to measure speaker differences via cosine similarity. We then introduce an architectural modification which uses an LSTM system where the parameters are optimised via a curriculum learning procedure to reduce the detection error and improve its generalisability across various conditions. Experiments on our internal datasets show that the proposed approach outperforms the DNN baseline system and yields a relative EER reduction of 30-70% on both text-dependent and text-independent tasks under a variety of acoustic conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量

期刊最新文献

Reduced Dimension Minimum BER PSK Precoding for Constrained Transmit Signals in Massive MIMO Low Complexity Joint RDO of Prediction Units Couples for HEVC Intra Coding Non-Native Children Speech Recognition Through Transfer Learning Synthesis of Images by Two-Stage Generative Adversarial Networks Statistical T+2d Subband Modelling for Crowd Counting