Improvement Model for Speaker Recognition using MFCC-CNN and Online Triplet Mining

Ayu Wirdiani, Steven Ndung'u Machetho, Ketut Gede, Darma Putra, Rukmi Sari Hartati Made Sudarma c, Henrico Aldy Ferdian
{"title":"Improvement Model for Speaker Recognition using MFCC-CNN and Online Triplet Mining","authors":"Ayu Wirdiani, Steven Ndung'u Machetho, Ketut Gede, Darma Putra, Rukmi Sari Hartati Made Sudarma c, Henrico Aldy Ferdian","doi":"10.18517/ijaseit.14.2.19396","DOIUrl":null,"url":null,"abstract":"Various biometric security systems, such as face recognition, fingerprint, voice, hand geometry, and iris, have been developed. Apart from being a communication medium, the human voice is also a form of biometrics that can be used for identification. Voice has unique characteristics that can be used as a differentiator between one person and another. A sound speaker recognition system must be able to pick up the features that characterize a person's voice. This study aims to develop a human speaker recognition system using the Convolutional Neural Network (CNN) method. This research proposes improvements in the fine-tuning layer in CNN architecture to improve the Accuracy. The recognition system combines the CNN method with Mel Frequency Cepstral Coefficients (MFCC) to perform feature extraction on raw audio and K Nearest Neighbor (KNN) to classify the embedding output. In general, this system extracts voice data features using MFCC. The process is continued with feature extraction using CNN with triplet loss to obtain the 128-dimensional embedding output. The classification of the CNN embedding output uses the KNN method. This research was conducted on 50 speakers from the TIMIT dataset, which contained eight utterances for each speaker and 60 speakers from live recording using a smartphone. The accuracy of this speaker recognition system achieves high-performance accuracy. Further research can be developed by combining different biometrics objects, commonly known as multimodal, to improve recognition accuracy further.","PeriodicalId":14471,"journal":{"name":"International Journal on Advanced Science, Engineering and Information Technology","volume":"24 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Advanced Science, Engineering and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18517/ijaseit.14.2.19396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Various biometric security systems, such as face recognition, fingerprint, voice, hand geometry, and iris, have been developed. Apart from being a communication medium, the human voice is also a form of biometrics that can be used for identification. Voice has unique characteristics that can be used as a differentiator between one person and another. A sound speaker recognition system must be able to pick up the features that characterize a person's voice. This study aims to develop a human speaker recognition system using the Convolutional Neural Network (CNN) method. This research proposes improvements in the fine-tuning layer in CNN architecture to improve the Accuracy. The recognition system combines the CNN method with Mel Frequency Cepstral Coefficients (MFCC) to perform feature extraction on raw audio and K Nearest Neighbor (KNN) to classify the embedding output. In general, this system extracts voice data features using MFCC. The process is continued with feature extraction using CNN with triplet loss to obtain the 128-dimensional embedding output. The classification of the CNN embedding output uses the KNN method. This research was conducted on 50 speakers from the TIMIT dataset, which contained eight utterances for each speaker and 60 speakers from live recording using a smartphone. The accuracy of this speaker recognition system achieves high-performance accuracy. Further research can be developed by combining different biometrics objects, commonly known as multimodal, to improve recognition accuracy further.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用 MFCC-CNN 和在线三连音挖掘改进说话人识别模型
目前已开发出各种生物识别安全系统,如人脸识别、指纹、声音、手部几何形状和虹膜。除了作为一种交流媒介,人的声音也是一种可用于识别的生物识别形式。声音具有独特的特征,可以用来区分一个人和另一个人。声音识别系统必须能够捕捉到人的声音特征。本研究旨在利用卷积神经网络(CNN)方法开发一种人声识别系统。本研究建议改进 CNN 架构中的微调层,以提高准确率。该识别系统将 CNN 方法与 Mel Frequency Cepstral Coefficients(MFCC)相结合,对原始音频进行特征提取,并利用 K Nearest Neighbor(KNN)对嵌入输出进行分类。一般来说,该系统使用 MFCC 提取语音数据特征。在此过程中,继续使用带有三重损失的 CNN 进行特征提取,以获得 128 维的嵌入输出。使用 KNN 方法对 CNN 嵌入输出进行分类。这项研究是在 TIMIT 数据集中的 50 个说话人和 60 个使用智能手机现场录音的说话人身上进行的。该扬声器识别系统的准确度达到了高性能的准确度。进一步的研究可以通过结合不同的生物识别对象(通常称为多模态)来进一步提高识别准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal on Advanced Science, Engineering and Information Technology
International Journal on Advanced Science, Engineering and Information Technology Agricultural and Biological Sciences-Agricultural and Biological Sciences (all)
CiteScore
1.40
自引率
0.00%
发文量
272
期刊介绍: International Journal on Advanced Science, Engineering and Information Technology (IJASEIT) is an international peer-reviewed journal dedicated to interchange for the results of high quality research in all aspect of science, engineering and information technology. The journal publishes state-of-art papers in fundamental theory, experiments and simulation, as well as applications, with a systematic proposed method, sufficient review on previous works, expanded discussion and concise conclusion. As our commitment to the advancement of science and technology, the IJASEIT follows the open access policy that allows the published articles freely available online without any subscription. The journal scopes include (but not limited to) the followings: -Science: Bioscience & Biotechnology. Chemistry & Food Technology, Environmental, Health Science, Mathematics & Statistics, Applied Physics -Engineering: Architecture, Chemical & Process, Civil & structural, Electrical, Electronic & Systems, Geological & Mining Engineering, Mechanical & Materials -Information Science & Technology: Artificial Intelligence, Computer Science, E-Learning & Multimedia, Information System, Internet & Mobile Computing
期刊最新文献
Medical Record Document Search with TF-IDF and Vector Space Model (VSM) Aesthetic Plastic Surgery Issues During the COVID-19 Period Using Topic Modeling Revolutionizing Echocardiography: A Comparative Study of Advanced AI Models for Precise Left Ventricular Segmentation The Mixed MEWMA and MCUSUM Control Chart Design of Efficiency Series Data of Production Quality Process Monitoring A Comprehensive Review of Machine Learning Approaches for Detecting Malicious Software
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1