{"title":"Text-Independent Speaker Identification using Mel-Frequency Energy Coefficients and Convolutional Neural Networks","authors":"Déhia Abdiche, K. Harrar","doi":"10.1109/IHSH51661.2021.9378726","DOIUrl":null,"url":null,"abstract":"Automatic Speaker Identification (ASI) is a biometric technique, which had achieved reliability in real applications, with standard feature extraction methods such as Linear Predictive Cepstral Coefficients (LPCC), Perceptual Linear Prediction (PLP), and modeling methods such as Gaussian mixture model (GMM), etc. However, the success of these manual approaches was quickly hampered by the emergence of big data, and the inability of scientists to manipulate large amounts of data, which led researchers to move towards automatic methods such as deep neural networks. In this work, a Convolutional Neural Network (CNN) is suggested for speaker identification in text-independent mode. Mel-Frequency Energy Coefficients (MFEC) method was used for extracting the characteristics of audio signals and the obtained coefficients were injected into the convolutional neural network model for classification (identification). In addition, a comparison was made between the proposed method and the existing traditional methods. Experimental results show that the proposed structure resulted in a speaker identification rate of 97.89%, which is much higher than the rates obtained in the old state of the art methods.","PeriodicalId":127735,"journal":{"name":"2020 2nd International Workshop on Human-Centric Smart Environments for Health and Well-being (IHSH)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Workshop on Human-Centric Smart Environments for Health and Well-being (IHSH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHSH51661.2021.9378726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Automatic Speaker Identification (ASI) is a biometric technique, which had achieved reliability in real applications, with standard feature extraction methods such as Linear Predictive Cepstral Coefficients (LPCC), Perceptual Linear Prediction (PLP), and modeling methods such as Gaussian mixture model (GMM), etc. However, the success of these manual approaches was quickly hampered by the emergence of big data, and the inability of scientists to manipulate large amounts of data, which led researchers to move towards automatic methods such as deep neural networks. In this work, a Convolutional Neural Network (CNN) is suggested for speaker identification in text-independent mode. Mel-Frequency Energy Coefficients (MFEC) method was used for extracting the characteristics of audio signals and the obtained coefficients were injected into the convolutional neural network model for classification (identification). In addition, a comparison was made between the proposed method and the existing traditional methods. Experimental results show that the proposed structure resulted in a speaker identification rate of 97.89%, which is much higher than the rates obtained in the old state of the art methods.
自动说话人识别(ASI)是一种生物识别技术,在实际应用中已经取得了一定的可靠性,其标准特征提取方法如线性预测倒谱系数(LPCC)、感知线性预测(PLP)和建模方法如高斯混合模型(GMM)等。然而,这些人工方法的成功很快受到大数据出现的阻碍,科学家无法操纵大量数据,这导致研究人员转向自动方法,如深度神经网络。在这项工作中,提出了一种卷积神经网络(CNN)用于文本独立模式下的说话人识别。采用Mel-Frequency Energy Coefficients (MFEC)方法提取音频信号的特征,并将得到的系数注入卷积神经网络模型中进行分类(识别)。此外,还将该方法与现有的传统方法进行了比较。实验结果表明,该结构的说话人识别率达到97.89%,大大高于现有方法的识别率。