Michael Hentschel, A. Ogawa, Marc Delcroix, T. Nakatani, Yuji Matsumoto
{"title":"Exploiting imbalanced textual and acoustic data for training prosodically-enhanced RNNLMs","authors":"Michael Hentschel, A. Ogawa, Marc Delcroix, T. Nakatani, Yuji Matsumoto","doi":"10.1109/APSIPA.2017.8282099","DOIUrl":null,"url":null,"abstract":"There have been many attempts in the past to exploit various sources of information in language modelling besides words, for instance prosody or topic information. With neural network based language models, it became easier to make use of this continuous valued information, because the neural network transforms the discrete valued space into a continuous valued space. So far, models incorporating prosodic information were jointly trained on the auxiliary and the textual information from the beginning. However, in practice the auxiliary information is usually only available for a small amount of the training data. In order to fully exploit text and acoustic data, we propose to re-train a recurrent neural network language model, rather than training a language model from scratch. Using this method we achieved perplexity and word error rate reductions for N-best rescoring on the MIT-OCW lecture corpus.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8282099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
There have been many attempts in the past to exploit various sources of information in language modelling besides words, for instance prosody or topic information. With neural network based language models, it became easier to make use of this continuous valued information, because the neural network transforms the discrete valued space into a continuous valued space. So far, models incorporating prosodic information were jointly trained on the auxiliary and the textual information from the beginning. However, in practice the auxiliary information is usually only available for a small amount of the training data. In order to fully exploit text and acoustic data, we propose to re-train a recurrent neural network language model, rather than training a language model from scratch. Using this method we achieved perplexity and word error rate reductions for N-best rescoring on the MIT-OCW lecture corpus.