{"title":"生物识别门禁系统中一种有效的说话人识别方法","authors":"Khushboo Jha, Arun Jain, S. Srivastava","doi":"10.1109/RAIT57693.2023.10127101","DOIUrl":null,"url":null,"abstract":"This work proposes an efficient cepstral-frequency domain based acoustic feature as a speaker identification solution for reliable biometric access control system. The Convolutional Neural Network (CNN) trained for this purpose uses the amalgamation of cepstral-frequency domain based acoustic features such as Power Normalized Cepstral Coefficients (PNCC) and Formant as PNCC-F. The PNCC-F with CNN classifier demonstrates an increase in identification efficacy. The speaker identification accuracy in clean, as well as noisy environment, has been used to evaluate the effectiveness of PNCC alone and in tandem with the formant feature. This work has been executed in a Python 3.8.8 environment using the standard database with 43 speakers called VidTIMIT. The efficiency of the PNCC-F feature was further evaluated in a real-time noisy environment by mixing babble, factory, and machine gun noises from NOISEX-92 database to speech samples with 0 to 20 dB of distortion. The proposed PNCC-F feature surpassed the conventional PNCC feature in a clean environment by 2.34%, and outperformed at all SNR levels for all different noises.","PeriodicalId":281845,"journal":{"name":"2023 5th International Conference on Recent Advances in Information Technology (RAIT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Efficient Speaker Identification Approach for Biometric Access Control System\",\"authors\":\"Khushboo Jha, Arun Jain, S. Srivastava\",\"doi\":\"10.1109/RAIT57693.2023.10127101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work proposes an efficient cepstral-frequency domain based acoustic feature as a speaker identification solution for reliable biometric access control system. The Convolutional Neural Network (CNN) trained for this purpose uses the amalgamation of cepstral-frequency domain based acoustic features such as Power Normalized Cepstral Coefficients (PNCC) and Formant as PNCC-F. The PNCC-F with CNN classifier demonstrates an increase in identification efficacy. The speaker identification accuracy in clean, as well as noisy environment, has been used to evaluate the effectiveness of PNCC alone and in tandem with the formant feature. This work has been executed in a Python 3.8.8 environment using the standard database with 43 speakers called VidTIMIT. The efficiency of the PNCC-F feature was further evaluated in a real-time noisy environment by mixing babble, factory, and machine gun noises from NOISEX-92 database to speech samples with 0 to 20 dB of distortion. The proposed PNCC-F feature surpassed the conventional PNCC feature in a clean environment by 2.34%, and outperformed at all SNR levels for all different noises.\",\"PeriodicalId\":281845,\"journal\":{\"name\":\"2023 5th International Conference on Recent Advances in Information Technology (RAIT)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Conference on Recent Advances in Information Technology (RAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RAIT57693.2023.10127101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Conference on Recent Advances in Information Technology (RAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAIT57693.2023.10127101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Efficient Speaker Identification Approach for Biometric Access Control System
This work proposes an efficient cepstral-frequency domain based acoustic feature as a speaker identification solution for reliable biometric access control system. The Convolutional Neural Network (CNN) trained for this purpose uses the amalgamation of cepstral-frequency domain based acoustic features such as Power Normalized Cepstral Coefficients (PNCC) and Formant as PNCC-F. The PNCC-F with CNN classifier demonstrates an increase in identification efficacy. The speaker identification accuracy in clean, as well as noisy environment, has been used to evaluate the effectiveness of PNCC alone and in tandem with the formant feature. This work has been executed in a Python 3.8.8 environment using the standard database with 43 speakers called VidTIMIT. The efficiency of the PNCC-F feature was further evaluated in a real-time noisy environment by mixing babble, factory, and machine gun noises from NOISEX-92 database to speech samples with 0 to 20 dB of distortion. The proposed PNCC-F feature surpassed the conventional PNCC feature in a clean environment by 2.34%, and outperformed at all SNR levels for all different noises.