{"title":"A statistical speech recognition of Ningbo dialect monosyllables","authors":"Qinru Fan, Donghong Wang","doi":"10.1109/ISKE.2010.5680873","DOIUrl":null,"url":null,"abstract":"So far, the focus of most research on speech recognition was on speech recognition of mandarin Chinese or English. Since the feature of the research is that the same word pronounces the same, influence on speech recognition of the research concerns primarily with environmental factors. Ningbo dialect is very different than mandarin Chinese and English, for Ningbo dialect possesses some regional variations in pronunciation and intonation even in the area of Ningbo, thus pronunciation changes, or intonation changes is a more important factor than other factors. Therefore, finding a modeling way to suit pronunciation changes, or intonation changes is a vital prerequisite for speech recognition of Ningbo dialect. This paper is designed to probe into the speech recognition of Ningbo dialect, focusing on Fenghua county, Cixi county, Yinzhou district, and central Ningbo. We study the modeling method of Ningbo dialect from the angle of pronunciation changes and intonation changes and running time of recognition. In the research, 64 speech samples of 10 digits (1–10) used in the above-mentioned four regions were created, by using Mel frequency cepstrum coefficient (MFCC) to achieve feature of each digital speech. Then depending on the variations of the pronunciation and intonation of the digits, we do a lot of experiments, 20 models of training samples of digits (1–10) are constructed. A simplified Bayes decision rule is used for classification of Ningbo dialect digits. Experiment data suggested that the rate of speech recognition surpassed 75%. The recognition rate is superior to that recognition rate (52.5%) of a general modeling method that modeling of training samples do not consider factor of regional variations in pronunciation and intonation. We have a rise in robustness of speech recognition of Ningbo dialect. The modeling and recognition method used in the paper is easy to handle and get promoted.","PeriodicalId":6417,"journal":{"name":"2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering","volume":"10 1","pages":"266-269"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE.2010.5680873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
So far, the focus of most research on speech recognition was on speech recognition of mandarin Chinese or English. Since the feature of the research is that the same word pronounces the same, influence on speech recognition of the research concerns primarily with environmental factors. Ningbo dialect is very different than mandarin Chinese and English, for Ningbo dialect possesses some regional variations in pronunciation and intonation even in the area of Ningbo, thus pronunciation changes, or intonation changes is a more important factor than other factors. Therefore, finding a modeling way to suit pronunciation changes, or intonation changes is a vital prerequisite for speech recognition of Ningbo dialect. This paper is designed to probe into the speech recognition of Ningbo dialect, focusing on Fenghua county, Cixi county, Yinzhou district, and central Ningbo. We study the modeling method of Ningbo dialect from the angle of pronunciation changes and intonation changes and running time of recognition. In the research, 64 speech samples of 10 digits (1–10) used in the above-mentioned four regions were created, by using Mel frequency cepstrum coefficient (MFCC) to achieve feature of each digital speech. Then depending on the variations of the pronunciation and intonation of the digits, we do a lot of experiments, 20 models of training samples of digits (1–10) are constructed. A simplified Bayes decision rule is used for classification of Ningbo dialect digits. Experiment data suggested that the rate of speech recognition surpassed 75%. The recognition rate is superior to that recognition rate (52.5%) of a general modeling method that modeling of training samples do not consider factor of regional variations in pronunciation and intonation. We have a rise in robustness of speech recognition of Ningbo dialect. The modeling and recognition method used in the paper is easy to handle and get promoted.