{"title":"基于熵的隐单元剪枝以减少深度神经网络参数","authors":"G. Mantena, K. Sim","doi":"10.1109/SLT.2016.7846335","DOIUrl":null,"url":null,"abstract":"For acoustic modeling, the use of DNN has become popular due to its superior performance improvements observed in many automatic speech recognition (ASR) tasks. Typically, DNNs with deep (many layers) and wide (many hidden units per layer) architectures are chosen in order to achieve good gains. An issue with such approaches is that there is an explosion in the number of learnable parameters. Thus, it is often difficult to build models in cases where there is no sufficient amount of training data (or data for adaptation), and also limits the usage of ASR systems on hand-held devices such as mobile phones. A method to overcome this issue is to reduce the number of parameters. In this work, we provide a framework to effectively reduce the number of parameters by removing the hidden units. Each hidden unit is represented by an activity vector associated with speech attributes such as phones. A normalized entropy-based measure is computed from these activity vectors which reflects the significance of these units in the DNN model. For comparison we also use low-rank matrix factorization to reduce the number of parameters. We show that low-rank matrix factorization can reduce the number of parameters only to a certain extent. Thus, we extend the pruning technique in combination with low-rank matrix factorization to further reduce the model. In this work, we provide detailed experimental results on the Aurora-4 and TEDLIUM databases and show that the models can be reduced to approximately 20 – 30% of its initial size without much loss in the ASR performance.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Entropy-based pruning of hidden units to reduce DNN parameters\",\"authors\":\"G. Mantena, K. Sim\",\"doi\":\"10.1109/SLT.2016.7846335\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For acoustic modeling, the use of DNN has become popular due to its superior performance improvements observed in many automatic speech recognition (ASR) tasks. Typically, DNNs with deep (many layers) and wide (many hidden units per layer) architectures are chosen in order to achieve good gains. An issue with such approaches is that there is an explosion in the number of learnable parameters. Thus, it is often difficult to build models in cases where there is no sufficient amount of training data (or data for adaptation), and also limits the usage of ASR systems on hand-held devices such as mobile phones. A method to overcome this issue is to reduce the number of parameters. In this work, we provide a framework to effectively reduce the number of parameters by removing the hidden units. Each hidden unit is represented by an activity vector associated with speech attributes such as phones. A normalized entropy-based measure is computed from these activity vectors which reflects the significance of these units in the DNN model. For comparison we also use low-rank matrix factorization to reduce the number of parameters. We show that low-rank matrix factorization can reduce the number of parameters only to a certain extent. Thus, we extend the pruning technique in combination with low-rank matrix factorization to further reduce the model. In this work, we provide detailed experimental results on the Aurora-4 and TEDLIUM databases and show that the models can be reduced to approximately 20 – 30% of its initial size without much loss in the ASR performance.\",\"PeriodicalId\":281635,\"journal\":{\"name\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2016.7846335\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Entropy-based pruning of hidden units to reduce DNN parameters
For acoustic modeling, the use of DNN has become popular due to its superior performance improvements observed in many automatic speech recognition (ASR) tasks. Typically, DNNs with deep (many layers) and wide (many hidden units per layer) architectures are chosen in order to achieve good gains. An issue with such approaches is that there is an explosion in the number of learnable parameters. Thus, it is often difficult to build models in cases where there is no sufficient amount of training data (or data for adaptation), and also limits the usage of ASR systems on hand-held devices such as mobile phones. A method to overcome this issue is to reduce the number of parameters. In this work, we provide a framework to effectively reduce the number of parameters by removing the hidden units. Each hidden unit is represented by an activity vector associated with speech attributes such as phones. A normalized entropy-based measure is computed from these activity vectors which reflects the significance of these units in the DNN model. For comparison we also use low-rank matrix factorization to reduce the number of parameters. We show that low-rank matrix factorization can reduce the number of parameters only to a certain extent. Thus, we extend the pruning technique in combination with low-rank matrix factorization to further reduce the model. In this work, we provide detailed experimental results on the Aurora-4 and TEDLIUM databases and show that the models can be reduced to approximately 20 – 30% of its initial size without much loss in the ASR performance.