Context-dependent modelling of deep neural network using logistic regression

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI:10.1109/ASRU.2013.6707753

Guangsen Wang, K. Sim

{"title":"Context-dependent modelling of deep neural network using logistic regression","authors":"Guangsen Wang, K. Sim","doi":"10.1109/ASRU.2013.6707753","DOIUrl":null,"url":null,"abstract":"The data sparsity problem of context-dependent acoustic modelling in automatic speech recognition is addressed by using the decision tree state clusters as the training targets in the standard context-dependent (CD) deep neural network (DNN) systems. As a result, the CD states within a cluster cannot be distinguished during decoding. This problem, referred to as the clustering problem, is not explicitly addressed in the current literature. In this paper, we formulate the CD DNN as an instance of the canonical state modelling technique based on a set of broad phone classes to address both the data sparsity and the clustering problems. The triphone is clustered into multiple sets of shorter biphones using broad phone contexts to address the data sparsity issue. A DNN is trained to discriminate the biphones within each set. The canonical states are represented by the concatenated log posteriors of all the broad phone DNNs. Logistic regression is used to transform the canonical states into the triphone state output probability. Clustering of the regression parameters is used to reduce model complexity while still achieving unique acoustic scores for all possible triphones. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD DNN significantly outperforms the standard CD DNN. The best system provides a 2.7% absolute WER reduction compared to the best standard CD DNN system.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2013.6707753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The data sparsity problem of context-dependent acoustic modelling in automatic speech recognition is addressed by using the decision tree state clusters as the training targets in the standard context-dependent (CD) deep neural network (DNN) systems. As a result, the CD states within a cluster cannot be distinguished during decoding. This problem, referred to as the clustering problem, is not explicitly addressed in the current literature. In this paper, we formulate the CD DNN as an instance of the canonical state modelling technique based on a set of broad phone classes to address both the data sparsity and the clustering problems. The triphone is clustered into multiple sets of shorter biphones using broad phone contexts to address the data sparsity issue. A DNN is trained to discriminate the biphones within each set. The canonical states are represented by the concatenated log posteriors of all the broad phone DNNs. Logistic regression is used to transform the canonical states into the triphone state output probability. Clustering of the regression parameters is used to reduce model complexity while still achieving unique acoustic scores for all possible triphones. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD DNN significantly outperforms the standard CD DNN. The best system provides a 2.7% absolute WER reduction compared to the best standard CD DNN system.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于逻辑回归的深度神经网络上下文相关建模

采用决策树状态聚类作为标准上下文相关深度神经网络(DNN)系统的训练目标，解决了自动语音识别中上下文相关声学建模的数据稀疏性问题。因此，在解码期间无法区分集群内的CD状态。这个问题被称为聚类问题，在目前的文献中没有明确地解决。在本文中，我们将CD DNN作为基于一组广义电话类的规范状态建模技术的一个实例来表述，以解决数据稀疏性和聚类问题。使用广泛的电话上下文将三通电话聚类成多组较短的双通电话，以解决数据稀疏性问题。DNN被训练来区分每组中的双话筒。规范状态由所有广义电话dnn的连接日志后验表示。采用逻辑回归将正则态转化为三音态输出概率。回归参数的聚类用于降低模型复杂性，同时仍然为所有可能的三音器获得独特的声学分数。在广播新闻转录任务上的实验结果表明，基于回归的CD深度神经网络显著优于标准CD深度神经网络。与最佳标准CD DNN系统相比，最佳系统提供2.7%的绝对WER降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

自引率

0.00%

发文量