B. C. Haris, G. Pradhan, A. Misra, S. Shukla, R. Sinha, S. Prasanna
{"title":"多变异性语音数据库的鲁棒说话人识别","authors":"B. C. Haris, G. Pradhan, A. Misra, S. Shukla, R. Sinha, S. Prasanna","doi":"10.1109/NCC.2011.5734775","DOIUrl":null,"url":null,"abstract":"In this paper, we present our initial study with the recently collected speech database for developing robust speaker recognition systems in Indian context. The database contains the speech data collected across different sensors, languages, speaking styles, and environments, from 200 speakers. The speech data is collected across five different sensors in parallel, in English and multiple Indian languages, in reading and conversational speaking styles, and in office and uncontrolled environments such as laboratories, hostel rooms and corridors etc. The collected database is evaluated using adapted Gaussian mixture model based speaker verification system following the NIST 2003 speaker recognition evaluation protocol and gives comparable performance to those obtained using NIST data sets. Our initial study exploring the impact of mismatch in training and test conditions with collected data finds that the mismatch in sensor, speaking style, and environment result in significant degradation in performance compared to the matched case whereas for language mismatch case the degradation is found to be relatively smaller.","PeriodicalId":158295,"journal":{"name":"2011 National Conference on Communications (NCC)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"Multi-variability speech database for robust speaker recognition\",\"authors\":\"B. C. Haris, G. Pradhan, A. Misra, S. Shukla, R. Sinha, S. Prasanna\",\"doi\":\"10.1109/NCC.2011.5734775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present our initial study with the recently collected speech database for developing robust speaker recognition systems in Indian context. The database contains the speech data collected across different sensors, languages, speaking styles, and environments, from 200 speakers. The speech data is collected across five different sensors in parallel, in English and multiple Indian languages, in reading and conversational speaking styles, and in office and uncontrolled environments such as laboratories, hostel rooms and corridors etc. The collected database is evaluated using adapted Gaussian mixture model based speaker verification system following the NIST 2003 speaker recognition evaluation protocol and gives comparable performance to those obtained using NIST data sets. Our initial study exploring the impact of mismatch in training and test conditions with collected data finds that the mismatch in sensor, speaking style, and environment result in significant degradation in performance compared to the matched case whereas for language mismatch case the degradation is found to be relatively smaller.\",\"PeriodicalId\":158295,\"journal\":{\"name\":\"2011 National Conference on Communications (NCC)\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2011.5734775\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2011.5734775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-variability speech database for robust speaker recognition
In this paper, we present our initial study with the recently collected speech database for developing robust speaker recognition systems in Indian context. The database contains the speech data collected across different sensors, languages, speaking styles, and environments, from 200 speakers. The speech data is collected across five different sensors in parallel, in English and multiple Indian languages, in reading and conversational speaking styles, and in office and uncontrolled environments such as laboratories, hostel rooms and corridors etc. The collected database is evaluated using adapted Gaussian mixture model based speaker verification system following the NIST 2003 speaker recognition evaluation protocol and gives comparable performance to those obtained using NIST data sets. Our initial study exploring the impact of mismatch in training and test conditions with collected data finds that the mismatch in sensor, speaking style, and environment result in significant degradation in performance compared to the matched case whereas for language mismatch case the degradation is found to be relatively smaller.