{"title":"Multilingual acoustic models for the recognition of non-native speech","authors":"V. Fischer, E. Janke, S. Kunzmann, T. Ross","doi":"10.1109/ASRU.2001.1034654","DOIUrl":null,"url":null,"abstract":"We report on the use of multilingual hidden Markov models for the recognition of non-native speech. Based on the design of a common phoneme set that provides a phone compression rate of almost 80 percent compared to a conglomerate of language dependent phone sets, we create acoustic models that share training data from up to 5 languages. Results obtained on two different data bases of non-native English demonstrate the feasibility of the approach, showing improved recognition accuracy in case of sparse training material, and also for speakers whose native language is not in the training data.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"35 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2001.1034654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
We report on the use of multilingual hidden Markov models for the recognition of non-native speech. Based on the design of a common phoneme set that provides a phone compression rate of almost 80 percent compared to a conglomerate of language dependent phone sets, we create acoustic models that share training data from up to 5 languages. Results obtained on two different data bases of non-native English demonstrate the feasibility of the approach, showing improved recognition accuracy in case of sparse training material, and also for speakers whose native language is not in the training data.