{"title":"X-vectors based Urdu Speaker Identification for short utterances","authors":"M. Farooq, F. Adeeba, S. Hussain","doi":"10.1109/O-COCOSDA46868.2019.9041237","DOIUrl":null,"url":null,"abstract":"In context of commercial applications, robustness of a Speaker Identification (SI) system is adversely effected by short utterances. Performance of SI systems fairly depends upon extracted feature sets. This paper investigates the effect of various feature extraction techniques on performance of i-vectors and x-vectors based Urdu speakers' identification models. The scope of this paper is restricted to text independent speaker identification for short utterances (up to 4 seconds). SI systems demand for a large data covering sufficient inter-speaker and intra-speaker variability. Available Urdu speech corpus is used to measure performance of various feature sets on SI systems. A minimum percentage Equal Error Rate (%EER) of 0.113 is achieved using x-vectors with Linear Frequency Cepstral Coefficients (LFCCs) feature set.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In context of commercial applications, robustness of a Speaker Identification (SI) system is adversely effected by short utterances. Performance of SI systems fairly depends upon extracted feature sets. This paper investigates the effect of various feature extraction techniques on performance of i-vectors and x-vectors based Urdu speakers' identification models. The scope of this paper is restricted to text independent speaker identification for short utterances (up to 4 seconds). SI systems demand for a large data covering sufficient inter-speaker and intra-speaker variability. Available Urdu speech corpus is used to measure performance of various feature sets on SI systems. A minimum percentage Equal Error Rate (%EER) of 0.113 is achieved using x-vectors with Linear Frequency Cepstral Coefficients (LFCCs) feature set.