{"title":"Low-rank bases for factorized hidden layer adaptation of DNN acoustic models","authors":"Lahiru Samarakoon, K. Sim","doi":"10.1109/SLT.2016.7846332","DOIUrl":null,"url":null,"abstract":"Recently, the factorized hidden layer (FHL) adaptation method is proposed for speaker adaptation of deep neural network (DNN) acoustic models. An FHL contains a speaker-dependent (SD) transformation matrix using a linear combination of rank-1 matrices and an SD bias using a linear combination of vectors, in addition to the standard affine transformation. On the other hand, full-rank bases are used with a similar DNN adaptation method which is based on cluster adaptive training (CAT). Therefore, it is interesting to investigate the effect of the rank of the bases used for adaptation. The increase of the rank of the bases improves the speaker subspace representation, without increasing the number of learnable speaker parameters. In this work, we investigate the effect of using various ranks for the bases of the SD transformation of FHLs on Aurora 4, AMI IHM and AMI SDM tasks. Experimental results have shown that when one FHL layer is used, it is optimal to use low-ranked bases of rank-50, instead of full-rank bases. Furthermore, when multiple FHLs are used, rank-1 bases are sufficient.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Recently, the factorized hidden layer (FHL) adaptation method is proposed for speaker adaptation of deep neural network (DNN) acoustic models. An FHL contains a speaker-dependent (SD) transformation matrix using a linear combination of rank-1 matrices and an SD bias using a linear combination of vectors, in addition to the standard affine transformation. On the other hand, full-rank bases are used with a similar DNN adaptation method which is based on cluster adaptive training (CAT). Therefore, it is interesting to investigate the effect of the rank of the bases used for adaptation. The increase of the rank of the bases improves the speaker subspace representation, without increasing the number of learnable speaker parameters. In this work, we investigate the effect of using various ranks for the bases of the SD transformation of FHLs on Aurora 4, AMI IHM and AMI SDM tasks. Experimental results have shown that when one FHL layer is used, it is optimal to use low-ranked bases of rank-50, instead of full-rank bases. Furthermore, when multiple FHLs are used, rank-1 bases are sufficient.