Discriminative Feature Extraction Based on Sequential Variational Autoencoder for Speaker Recognition

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI:10.23919/APSIPA.2018.8659722

Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda

引用次数: 1

Abstract

This paper presents an extended version of the variational autoencoder (VAE) for sequence modeling. In contrast to the original VAE, the proposed model can directly handle variable-length observation sequences. Furthermore, the discriminative model and the generative model are simultaneously learned in a unified framework. The network architecture of the proposed model is inspired by the i-vector/PLDA framework, whose effectiveness has been proven in sequence modeling tasks such as speaker recognition. Experimental results on the TIMIT database show that the proposed model outperforms the traditional i-vector/PLDA system.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于顺序变分自编码器的判别性特征提取在说话人识别中的应用

本文提出了用于序列建模的变分自编码器(VAE)的扩展版本。与原始VAE相比，该模型可以直接处理变长观测序列。此外，判别模型和生成模型在一个统一的框架中同时学习。该模型的网络架构受到i-vector/PLDA框架的启发，其有效性已在说话人识别等序列建模任务中得到证明。在TIMIT数据库上的实验结果表明，该模型优于传统的i-vector/PLDA系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量