Implementation of Mel-Frequency Cepstral Coefficient as Feature Extraction using K-Nearest Neighbor for Emotion Detection Based on Voice Intonation

Revanto Alif Nawasta, Nurheri Cahyana, H. Heriyanto
{"title":"Implementation of Mel-Frequency Cepstral Coefficient as Feature Extraction using K-Nearest Neighbor for Emotion Detection Based on Voice Intonation","authors":"Revanto Alif Nawasta, Nurheri Cahyana, H. Heriyanto","doi":"10.31315/telematika.v20i1.9518","DOIUrl":null,"url":null,"abstract":"Purpose: To determine emotions based on voice intonation by implementing MFCC as a feature extraction method and KNN as an emotion detection method.Design/methodology/approach: In this study, the data used was downloaded from several video podcasts on YouTube. Some of the methods used in this study are pitch shifting for data augmentation, MFCC for feature extraction on audio data, basic statistics for taking the mean, median, min, max, standard deviation for each coefficient, Min max scaler for the normalization process and KNN for the method classification.Findings/result: Because testing is carried out separately for each gender, there are two classification models. In the male model, the highest accuracy was obtained at 88.8% and is included in the good fit model. In the female model, the highest accuracy was obtained at 92.5%, but the model was unable to correctly classify emotions in the new data. This condition is called overfitting. After testing, the cause of this condition was because the pitch shifting augmentation process of one tone in women was unable to solve the problem of the training data size being too small and not containing enough data samples to accurately represent all possible input data values.Originality/value/state of the art: The research data used in this study has never been used in previous studies because the research data is obtained by downloading from Youtube and then processed until the data is ready to be used for research.","PeriodicalId":31716,"journal":{"name":"Telematika","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31315/telematika.v20i1.9518","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Purpose: To determine emotions based on voice intonation by implementing MFCC as a feature extraction method and KNN as an emotion detection method.Design/methodology/approach: In this study, the data used was downloaded from several video podcasts on YouTube. Some of the methods used in this study are pitch shifting for data augmentation, MFCC for feature extraction on audio data, basic statistics for taking the mean, median, min, max, standard deviation for each coefficient, Min max scaler for the normalization process and KNN for the method classification.Findings/result: Because testing is carried out separately for each gender, there are two classification models. In the male model, the highest accuracy was obtained at 88.8% and is included in the good fit model. In the female model, the highest accuracy was obtained at 92.5%, but the model was unable to correctly classify emotions in the new data. This condition is called overfitting. After testing, the cause of this condition was because the pitch shifting augmentation process of one tone in women was unable to solve the problem of the training data size being too small and not containing enough data samples to accurately represent all possible input data values.Originality/value/state of the art: The research data used in this study has never been used in previous studies because the research data is obtained by downloading from Youtube and then processed until the data is ready to be used for research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于k近邻的mel频率倒谱系数特征提取在语音语调情感检测中的实现
目的:实现MFCC作为特征提取方法,KNN作为情绪检测方法,基于语音语调确定情绪。设计/方法/方法:在这项研究中,使用的数据是从YouTube上的几个视频播客中下载的。本研究中使用的方法有:基音移位法进行数据增强,MFCC法对音频数据进行特征提取,基本统计法对每个系数取均值、中位数、最小值、最大值、标准差,min max尺度法进行归一化处理,KNN法进行方法分类。发现/结果:由于对每个性别分别进行了测试,因此存在两种分类模型。在男性模型中,准确率最高,达到88.8%,属于良好拟合模型。在女性模型中,获得了最高的准确率为92.5%,但该模型无法正确分类新数据中的情绪。这种情况称为过拟合。经过测试,造成这种情况的原因是女性单音的音调移位增强过程无法解决训练数据量过小,没有包含足够的数据样本来准确表示所有可能的输入数据值的问题。原创性/价值/技术水平:本研究中使用的研究数据从未在以前的研究中使用过,因为研究数据是从Youtube上下载获得的,然后经过处理,直到数据准备好用于研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
7
审稿时长
24 weeks
期刊最新文献
Identification of Social Media Posts Containing Self-reported COVID-19 Symptoms using Triple Word Embeddings and Long Short-Term Memory Deep Learning for Histopathological Image Analysis: A Convolutional Neural Network Approach to Colon Cancer Classification Comparative Analysis of Classification Methods in Sentiment Analysis: The Impact of Feature Selection and Ensemble Techniques Optimization Optimizing Clustering of Indonesian Text Data Using Particle Swarm Optimization Algorithm: A Case Study of the Quran Translation Monitoring Development Board based on InfluxDB and Grafana
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1