Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System

Risanuri Hidayat, Agus Bejo, Sujoko Sumaryono, A. Winursito
{"title":"Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System","authors":"Risanuri Hidayat, Agus Bejo, Sujoko Sumaryono, A. Winursito","doi":"10.1109/ICITEED.2018.8534807","DOIUrl":null,"url":null,"abstract":"Mel frequency cepstral coefficient (MFCC) is a popular feature extraction method for a speech recognition system. However, this method is susceptible to noise even though it generates a high accuracy. The conventional MFCC method has a degraded performance when the input signal has noises. This paper presents the implementation of denoising wavelet on speech input of MFCC feature extraction method. The addition of denoising process using wavelet transformation was expected to improve the MFCC performance on noisy signals. The study used 120 speech data, with 30 data were used as the reference, and the other 90 were used as the testing data. The testing data were mixed with white Gaussian noise and then tested to the speech recognition system that already had the reference data. Parameters used in the wavelet denoising process were soft thresholding with the Minimaxi thresholding rule. Eleven wavelet methods on decomposition level 10 were tested on the denoising process. The classification process used K-nearest neighbor (KNN) method. The Fejer-Korovkin 6 wavelet was the best denoising speech signal method that achieved the highest accuracy on input signals with SNR of 5-15dB. Meanwhile, the Daubechies 5 method had a high accuracy on input signal with SNR of 3 dB. All of the tested denoising methods using wavelet transformation were able to improve the accuracy of the speech recognition system on input signals with SNR of 0-10 dB compared to the system without denoising method.","PeriodicalId":142523,"journal":{"name":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEED.2018.8534807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

Mel frequency cepstral coefficient (MFCC) is a popular feature extraction method for a speech recognition system. However, this method is susceptible to noise even though it generates a high accuracy. The conventional MFCC method has a degraded performance when the input signal has noises. This paper presents the implementation of denoising wavelet on speech input of MFCC feature extraction method. The addition of denoising process using wavelet transformation was expected to improve the MFCC performance on noisy signals. The study used 120 speech data, with 30 data were used as the reference, and the other 90 were used as the testing data. The testing data were mixed with white Gaussian noise and then tested to the speech recognition system that already had the reference data. Parameters used in the wavelet denoising process were soft thresholding with the Minimaxi thresholding rule. Eleven wavelet methods on decomposition level 10 were tested on the denoising process. The classification process used K-nearest neighbor (KNN) method. The Fejer-Korovkin 6 wavelet was the best denoising speech signal method that achieved the highest accuracy on input signals with SNR of 5-15dB. Meanwhile, the Daubechies 5 method had a high accuracy on input signal with SNR of 3 dB. All of the tested denoising methods using wavelet transformation were able to improve the accuracy of the speech recognition system on input signals with SNR of 0-10 dB compared to the system without denoising method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语音识别系统中基于小波变换的MFCC特征提取去噪
低频倒谱系数(MFCC)是语音识别系统中常用的特征提取方法。然而,这种方法即使产生较高的精度,也容易受到噪声的影响。当输入信号中存在噪声时,传统的MFCC方法性能下降。本文介绍了MFCC特征提取方法对语音输入进行小波去噪的实现。利用小波变换加入去噪处理可以提高MFCC对噪声信号的处理性能。本研究使用120个语音数据,其中30个数据作为参考数据,另外90个数据作为测试数据。将测试数据与高斯白噪声混合,然后对已有参考数据的语音识别系统进行测试。小波去噪过程中使用的参数是基于minimi阈值规则的软阈值。对10级分解的11种小波方法进行了去噪试验。分类过程采用k -最近邻(KNN)方法。Fejer-Korovkin 6小波是降噪效果最好的语音信号方法,对输入信号的降噪精度最高,信噪比为5 ~ 15db。同时,Daubechies 5方法对输入信号具有较高的精度,信噪比为3 dB。所测试的采用小波变换的去噪方法均能提高语音识别系统对输入信号的准确率,信噪比在0 ~ 10 dB之间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Implementing Rule-based and Naive Bayes Algorithm on Incremental Sentiment Analysis System for Indonesian Online Transportation Services Review Wideband Circularly Polarized Printed Slot Antenna With A Pair of Slant Line Slots A new model for measuring the complexity of SQL commands VoBiRo - Vocational Bipedal Robot Platform, Kinematic and Locomotion Control Maximum Allowable Intermittent Renewable Energy Source Penetration in Java-Bali Power System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1