{"title":"基于顺序变分自编码器的判别性特征提取在说话人识别中的应用","authors":"Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda","doi":"10.23919/APSIPA.2018.8659722","DOIUrl":null,"url":null,"abstract":"This paper presents an extended version of the variational autoencoder (VAE) for sequence modeling. In contrast to the original VAE, the proposed model can directly handle variable-length observation sequences. Furthermore, the discriminative model and the generative model are simultaneously learned in a unified framework. The network architecture of the proposed model is inspired by the i-vector/PLDA framework, whose effectiveness has been proven in sequence modeling tasks such as speaker recognition. Experimental results on the TIMIT database show that the proposed model outperforms the traditional i-vector/PLDA system.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Discriminative Feature Extraction Based on Sequential Variational Autoencoder for Speaker Recognition\",\"authors\":\"Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda\",\"doi\":\"10.23919/APSIPA.2018.8659722\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an extended version of the variational autoencoder (VAE) for sequence modeling. In contrast to the original VAE, the proposed model can directly handle variable-length observation sequences. Furthermore, the discriminative model and the generative model are simultaneously learned in a unified framework. The network architecture of the proposed model is inspired by the i-vector/PLDA framework, whose effectiveness has been proven in sequence modeling tasks such as speaker recognition. Experimental results on the TIMIT database show that the proposed model outperforms the traditional i-vector/PLDA system.\",\"PeriodicalId\":287799,\"journal\":{\"name\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPA.2018.8659722\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPA.2018.8659722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Discriminative Feature Extraction Based on Sequential Variational Autoencoder for Speaker Recognition
This paper presents an extended version of the variational autoencoder (VAE) for sequence modeling. In contrast to the original VAE, the proposed model can directly handle variable-length observation sequences. Furthermore, the discriminative model and the generative model are simultaneously learned in a unified framework. The network architecture of the proposed model is inspired by the i-vector/PLDA framework, whose effectiveness has been proven in sequence modeling tasks such as speaker recognition. Experimental results on the TIMIT database show that the proposed model outperforms the traditional i-vector/PLDA system.