Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM

Wening Mustikarini, Risanuri Hidayat, Agus Bejo
{"title":"Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM","authors":"Wening Mustikarini, Risanuri Hidayat, Agus Bejo","doi":"10.22146/IJITEE.49426","DOIUrl":null,"url":null,"abstract":"Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words \"atas\" (up), \"bawah\" (down), \"kanan\" (right), and \"kiri\" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word \"atas\", 86% for the word \"bawah\", 81% for the word \"kanan\", and 100% for the word \"kiri\". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the \"atas\" class, 54% for the \"bawah\" class, 44% for the \"kanan\" class, and 100% for the \"kiri\" class.","PeriodicalId":292390,"journal":{"name":"IJITEE (International Journal of Information Technology and Electrical Engineering)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJITEE (International Journal of Information Technology and Electrical Engineering)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/IJITEE.49426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words "atas" (up), "bawah" (down), "kanan" (right), and "kiri" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word "atas", 86% for the word "bawah", 81% for the word "kanan", and 100% for the word "kiri". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the "atas" class, 54% for the "bawah" class, 44% for the "kanan" class, and 100% for the "kiri" class.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于MFCC算法和python支持向量机的实时印尼语语音识别
摘要:自动语音识别(ASR)是一种利用机器处理和识别人类声音的技术。提高识别率的一种方法是使用你想要识别的语言模型。本文介绍了一种语音识别应用程序,用于识别单词“atas”(上)、“bawah”(下)、“kanan”(右)和“kiri”(左)。本研究使用了400个语音数据样本,训练数据为每个单词75个样本,测试数据为每个单词25个样本。该语音识别系统采用多达13个Mel频率倒谱系数(MFCC)作为特征,支持向量机(SVM)作为标识符。该系统使用线性核函数和RBF、不同的成本值和三种样本量(n = 25、75、50)进行了测试。SVM采用线性核,代价值为100,每类75个样本组成数据集,得到最佳平均精度值。在训练阶段,系统对“atas”一词的准确率和召回率之间的权衡值为80%,“bawah”一词为86%,“kanan”一词为81%,“kiri”一词为100%。然而,在系统测试阶段,每个类使用25个新样本,“atas”类的f1得分为76%,“bawah”类为54%,“kanan”类为44%,“kiri”类为100%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Eye Blink Classification for Assisting Disability to Communicate Using Bagging and Boosting Product Recommendation Based on Eye Tracking Data Using Fixation Duration Optimal Capacity and Location Wind Turbine to Minimize Power Losses Using NSGA-II Factors Affecting Collaboration Portal Effectiveness of the Audit Board of Indonesia Piezoelectric Energy Harvester for IoT Sensor Devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1