{"title":"Vowel-category based Short Utterance Speaker Recognition","authors":"N. Fatima, T. Zheng","doi":"10.1109/ICSAI.2012.6223387","DOIUrl":null,"url":null,"abstract":"The impact of Short Utterances in Speaker Recognition is of significant importance. Despite the advancements in short utterance speaker recognition (SUSR), text dependence and the role of phonemes in carrying speaker information needs further investigation. This paper presents a novel method of using vowel categories for SUSR. We define Vowel Categories (VC's) considering Chinese and English languages. After recognition and extraction of phonemes, the obtained vowels are divided into VC's, which are then used to develop Universal Background VC Models (UBVCM) for each VC. Conventional GMM-UBM system is used for training and testing. The proposed categories give minimum EERs of 13.76%, 14.03% and 16.18% for 3, 2 and 1 second respectively. Experimental results show that in text dependent SUSR, significant speaker-specific information is present at phoneme level. The similar properties of phonemes can be used such that accurate speech recognition is not required, rather Phoneme Categories can be used effectively for SUSR. Also, it is shown that vowels contain large amount of speaker information, which remains undisturbed when VC are employed.","PeriodicalId":164945,"journal":{"name":"2012 International Conference on Systems and Informatics (ICSAI2012)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Systems and Informatics (ICSAI2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2012.6223387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
The impact of Short Utterances in Speaker Recognition is of significant importance. Despite the advancements in short utterance speaker recognition (SUSR), text dependence and the role of phonemes in carrying speaker information needs further investigation. This paper presents a novel method of using vowel categories for SUSR. We define Vowel Categories (VC's) considering Chinese and English languages. After recognition and extraction of phonemes, the obtained vowels are divided into VC's, which are then used to develop Universal Background VC Models (UBVCM) for each VC. Conventional GMM-UBM system is used for training and testing. The proposed categories give minimum EERs of 13.76%, 14.03% and 16.18% for 3, 2 and 1 second respectively. Experimental results show that in text dependent SUSR, significant speaker-specific information is present at phoneme level. The similar properties of phonemes can be used such that accurate speech recognition is not required, rather Phoneme Categories can be used effectively for SUSR. Also, it is shown that vowels contain large amount of speaker information, which remains undisturbed when VC are employed.