J. Sirigos, V. Darsinos, N. Fakotakis, G. Kokkinakis
{"title":"Vowel-non vowel decision using neural networks and rules","authors":"J. Sirigos, V. Darsinos, N. Fakotakis, G. Kokkinakis","doi":"10.1109/ICECS.1996.582917","DOIUrl":null,"url":null,"abstract":"This paper describes a speaker independent vowel/non-vowel classifier based on neural networks and several rules. RASTA-PLP analysis of the speech signal resulting to mel-cepstral coefficients and a formant tracking method are used in order to provide the feature vectors for the MLP. To train and test the system we used a part of the TIMIT database. The results indicate that the performance of this classifier for speaker independent vowel classification is approximately 98.5% so it can be favorably used for speaker recognition or speech labeling purposes.","PeriodicalId":402369,"journal":{"name":"Proceedings of Third International Conference on Electronics, Circuits, and Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Third International Conference on Electronics, Circuits, and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECS.1996.582917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
This paper describes a speaker independent vowel/non-vowel classifier based on neural networks and several rules. RASTA-PLP analysis of the speech signal resulting to mel-cepstral coefficients and a formant tracking method are used in order to provide the feature vectors for the MLP. To train and test the system we used a part of the TIMIT database. The results indicate that the performance of this classifier for speaker independent vowel classification is approximately 98.5% so it can be favorably used for speaker recognition or speech labeling purposes.