M. N. Huda, Manoj Banik, G. Muhammad, Bernd J. Kroger
{"title":"基于独特语音特征的音素识别,并结合基于音节的语言模型","authors":"M. N. Huda, Manoj Banik, G. Muhammad, Bernd J. Kroger","doi":"10.1109/ICCIT.2009.5407123","DOIUrl":null,"url":null,"abstract":"This paper presents a phoneme recognition method based on distinctive phonetic features (DPFs). The method comprises three stages. The first stage extracts 3 DPF vectors of 15 dimensions each from local features (LFs) of an input speech signal using three multilayer neural networks (MLNs). The second stage incorporates an Inhibition/Enhancement (In/En) network to obtain more categorical DPF movement and decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure. Then, the third stage embeds acoustic models (AMs) and language models (LMs) of syllable-based subwords to output more precise phoneme strings. The proposed method provides a higher phoneme correct rate as well as phoneme accuracy with fewer mixture components in hidden Markov models (HMMs).","PeriodicalId":443258,"journal":{"name":"2009 12th International Conference on Computers and Information Technology","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Phoneme recognition based on distinctive phonetic features (DPFs) incorporating a syllable based language model\",\"authors\":\"M. N. Huda, Manoj Banik, G. Muhammad, Bernd J. Kroger\",\"doi\":\"10.1109/ICCIT.2009.5407123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a phoneme recognition method based on distinctive phonetic features (DPFs). The method comprises three stages. The first stage extracts 3 DPF vectors of 15 dimensions each from local features (LFs) of an input speech signal using three multilayer neural networks (MLNs). The second stage incorporates an Inhibition/Enhancement (In/En) network to obtain more categorical DPF movement and decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure. Then, the third stage embeds acoustic models (AMs) and language models (LMs) of syllable-based subwords to output more precise phoneme strings. The proposed method provides a higher phoneme correct rate as well as phoneme accuracy with fewer mixture components in hidden Markov models (HMMs).\",\"PeriodicalId\":443258,\"journal\":{\"name\":\"2009 12th International Conference on Computers and Information Technology\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 12th International Conference on Computers and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIT.2009.5407123\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 12th International Conference on Computers and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT.2009.5407123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Phoneme recognition based on distinctive phonetic features (DPFs) incorporating a syllable based language model
This paper presents a phoneme recognition method based on distinctive phonetic features (DPFs). The method comprises three stages. The first stage extracts 3 DPF vectors of 15 dimensions each from local features (LFs) of an input speech signal using three multilayer neural networks (MLNs). The second stage incorporates an Inhibition/Enhancement (In/En) network to obtain more categorical DPF movement and decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure. Then, the third stage embeds acoustic models (AMs) and language models (LMs) of syllable-based subwords to output more precise phoneme strings. The proposed method provides a higher phoneme correct rate as well as phoneme accuracy with fewer mixture components in hidden Markov models (HMMs).