{"title":"基于深隐高斯模型的语音识别特征","authors":"Andros Tjandra, S. Sakti, Satoshi Nakamura","doi":"10.1109/MLSP.2017.8168174","DOIUrl":null,"url":null,"abstract":"This paper constructs speech features based on a generative model using a deep latent Gaussian model (DLGM), which is trained using stochastic gradient variational Bayes (SGVB) algorithm and performs efficient approximate inference and learning with a directed probabilistic graphical model. The trained DLGM then generate latent variables based on Gaussian distribution, which is used as new features for a deep neural network (DNN) acoustic model. Here we compare our results with and without features transformed by DLGM and also observe the benefits of combining both the proposed and original features into a single DNN. Our experimental results show that the proposed features using DLGM improved the ASR performance. Furthermore, the DNN acoustic model, which combined the proposed and original features, gave the best performances.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"54 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech recognition features based on deep latent Gaussian models\",\"authors\":\"Andros Tjandra, S. Sakti, Satoshi Nakamura\",\"doi\":\"10.1109/MLSP.2017.8168174\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper constructs speech features based on a generative model using a deep latent Gaussian model (DLGM), which is trained using stochastic gradient variational Bayes (SGVB) algorithm and performs efficient approximate inference and learning with a directed probabilistic graphical model. The trained DLGM then generate latent variables based on Gaussian distribution, which is used as new features for a deep neural network (DNN) acoustic model. Here we compare our results with and without features transformed by DLGM and also observe the benefits of combining both the proposed and original features into a single DNN. Our experimental results show that the proposed features using DLGM improved the ASR performance. Furthermore, the DNN acoustic model, which combined the proposed and original features, gave the best performances.\",\"PeriodicalId\":6542,\"journal\":{\"name\":\"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)\",\"volume\":\"54 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MLSP.2017.8168174\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2017.8168174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech recognition features based on deep latent Gaussian models
This paper constructs speech features based on a generative model using a deep latent Gaussian model (DLGM), which is trained using stochastic gradient variational Bayes (SGVB) algorithm and performs efficient approximate inference and learning with a directed probabilistic graphical model. The trained DLGM then generate latent variables based on Gaussian distribution, which is used as new features for a deep neural network (DNN) acoustic model. Here we compare our results with and without features transformed by DLGM and also observe the benefits of combining both the proposed and original features into a single DNN. Our experimental results show that the proposed features using DLGM improved the ASR performance. Furthermore, the DNN acoustic model, which combined the proposed and original features, gave the best performances.