{"title":"Improved language modelling by unsupervised acquisition of structure","authors":"K. Ries, F. D. Buø, Ye-Yi Wang","doi":"10.1109/ICASSP.1995.479397","DOIUrl":null,"url":null,"abstract":"The perplexity of corpora is typically reduced by more than 30% compared to advanced n-gram models by a new method for the unsupervised acquisition of structural text models. This method is based on new algorithms for the classification of words and phrases from context and on new sequence finding procedures. These procedures are designed to work fast and accurately on small and large corpora. They are iterated to build a structural model of a corpus. The structural model can be applied to recalculate the scores of a speech recogniser and improves the word accuracy. Further applications such as preprocessing for neural networks and (hidden) Markov models in language processing, which exploit the structure finding capabilities of this model, are proposed.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1995 International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1995.479397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
The perplexity of corpora is typically reduced by more than 30% compared to advanced n-gram models by a new method for the unsupervised acquisition of structural text models. This method is based on new algorithms for the classification of words and phrases from context and on new sequence finding procedures. These procedures are designed to work fast and accurately on small and large corpora. They are iterated to build a structural model of a corpus. The structural model can be applied to recalculate the scores of a speech recogniser and improves the word accuracy. Further applications such as preprocessing for neural networks and (hidden) Markov models in language processing, which exploit the structure finding capabilities of this model, are proposed.