{"title":"Automatic Allophone Deriving for Korean Speech Recognition","authors":"Ji Xu, Yujing Si, Jielin Pan, Yonghong Yan","doi":"10.1109/CIS.2013.169","DOIUrl":null,"url":null,"abstract":"In Korean, the pronunciations of phonemes are severely affected by their contexts. Thus, using phonemes directly translated from their written forms as basic units for acoustic modeling is problematic, as these units lack the ability to capture the complex pronunciation variations occurred in continuous speech. Allophone, a sub-phone unit in phonetics but served as independent phoneme in speech recognition, is considered to have the ability to describe complex pronunciation variations. This paper presents a novel approach called Automatic Allophone Deriving (AAD). In this approach, statistics from Gaussian Mixture Models are used to create measurements for allophone candidates, and decision trees are used to derive allophones. Question set used by the decision tree is also generated automatically, since we assumed no linguistic knowledge would be used in this approach. This paper also adopts long-time features over conventional cepstral features to capture acoustic information over several hundred milliseconds for AAD, as co-articulation effects are unlikely to be limited to a single phoneme. Experiment shows that AAD outperforms previous approaches which derive allophones from linguistic knowledge. Additional experiments use long-time features directly in acoustic modeling. The results show that performance improvement achieved by using the same allophones can be significantly improved by using long-time features, compared with corresponding baselines.","PeriodicalId":294223,"journal":{"name":"2013 Ninth International Conference on Computational Intelligence and Security","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Ninth International Conference on Computational Intelligence and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS.2013.169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In Korean, the pronunciations of phonemes are severely affected by their contexts. Thus, using phonemes directly translated from their written forms as basic units for acoustic modeling is problematic, as these units lack the ability to capture the complex pronunciation variations occurred in continuous speech. Allophone, a sub-phone unit in phonetics but served as independent phoneme in speech recognition, is considered to have the ability to describe complex pronunciation variations. This paper presents a novel approach called Automatic Allophone Deriving (AAD). In this approach, statistics from Gaussian Mixture Models are used to create measurements for allophone candidates, and decision trees are used to derive allophones. Question set used by the decision tree is also generated automatically, since we assumed no linguistic knowledge would be used in this approach. This paper also adopts long-time features over conventional cepstral features to capture acoustic information over several hundred milliseconds for AAD, as co-articulation effects are unlikely to be limited to a single phoneme. Experiment shows that AAD outperforms previous approaches which derive allophones from linguistic knowledge. Additional experiments use long-time features directly in acoustic modeling. The results show that performance improvement achieved by using the same allophones can be significantly improved by using long-time features, compared with corresponding baselines.