Low-rank bases for factorized hidden layer adaptation of DNN acoustic models

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846332

Lahiru Samarakoon, K. Sim

引用次数: 5

Abstract

Recently, the factorized hidden layer (FHL) adaptation method is proposed for speaker adaptation of deep neural network (DNN) acoustic models. An FHL contains a speaker-dependent (SD) transformation matrix using a linear combination of rank-1 matrices and an SD bias using a linear combination of vectors, in addition to the standard affine transformation. On the other hand, full-rank bases are used with a similar DNN adaptation method which is based on cluster adaptive training (CAT). Therefore, it is interesting to investigate the effect of the rank of the bases used for adaptation. The increase of the rank of the bases improves the speaker subspace representation, without increasing the number of learnable speaker parameters. In this work, we investigate the effect of using various ranks for the bases of the SD transformation of FHLs on Aurora 4, AMI IHM and AMI SDM tasks. Experimental results have shown that when one FHL layer is used, it is optimal to use low-ranked bases of rank-50, instead of full-rank bases. Furthermore, when multiple FHLs are used, rank-1 bases are sufficient.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DNN声学模型分解隐层自适应的低秩基

近年来，针对深度神经网络(DNN)声学模型的说话人自适应，提出了分解隐藏层(FHL)自适应方法。除了标准仿射变换之外，FHL还包含一个使用秩1矩阵线性组合的扬声器相关(SD)变换矩阵和一个使用向量线性组合的SD偏置。另一方面，将全秩基与基于聚类自适应训练(CAT)的DNN自适应方法结合使用。因此，研究用于适应的碱基等级的影响是一个有趣的问题。在不增加可学习的说话人参数数量的情况下，基秩的增加改善了说话人子空间的表示。在这项工作中，我们研究了在Aurora 4、AMI IHM和AMI SDM任务中使用不同等级的FHLs的SD转换碱基的影响。实验结果表明，当使用一个FHL层时，使用rank-50的低秩碱基比使用全秩碱基更优。此外，当使用多个fhl时，排名1的碱基就足够了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量