基于MFCC的深度学习语音数字识别

IF 0.8 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC International Journal of Electrical and Computer Engineering Systems Pub Date : 2023-09-11 DOI:10.32985/ijeces.14.7.6

Hossam Boulal, Mohamed Hamidi, Mustapha Abarkan, Jamal Barkani

{"title":"基于MFCC的深度学习语音数字识别","authors":"Hossam Boulal, Mohamed Hamidi, Mustapha Abarkan, Jamal Barkani","doi":"10.32985/ijeces.14.7.6","DOIUrl":null,"url":null,"abstract":"The field of speech recognition has made human-machine voice interaction more convenient. Recognizing spoken digits is particularly useful for communication that involves numbers, such as providing a registration code, cellphone number, score, or account number. This article discusses our experience with Amazigh's Automatic Speech Recognition (ASR) using a deep learning- based approach. Our method involves using a convolutional neural network (CNN) with Mel-Frequency Cepstral Coefficients (MFCC) to analyze audio samples and generate spectrograms. We gathered a database of numerals from zero to nine spoken by 42 native Amazigh speakers, consisting of men and women between the ages of 20 and 40, to recognize Amazigh numerals. Our experimental results demonstrate that spoken digits in Amazigh can be recognized with an accuracy of 91.75%, 93% precision, and 92% recall. The preliminary outcomes we have achieved show great satisfaction when compared to the size of the training database. This motivates us to further enhance the system's performance in order to attain a higher rate of recognition. Our findings align with those reported in the existing literature.","PeriodicalId":41912,"journal":{"name":"International Journal of Electrical and Computer Engineering Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.8000,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Amazigh Spoken Digit Recognition using a Deep Learning Approach based on MFCC\",\"authors\":\"Hossam Boulal, Mohamed Hamidi, Mustapha Abarkan, Jamal Barkani\",\"doi\":\"10.32985/ijeces.14.7.6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The field of speech recognition has made human-machine voice interaction more convenient. Recognizing spoken digits is particularly useful for communication that involves numbers, such as providing a registration code, cellphone number, score, or account number. This article discusses our experience with Amazigh's Automatic Speech Recognition (ASR) using a deep learning- based approach. Our method involves using a convolutional neural network (CNN) with Mel-Frequency Cepstral Coefficients (MFCC) to analyze audio samples and generate spectrograms. We gathered a database of numerals from zero to nine spoken by 42 native Amazigh speakers, consisting of men and women between the ages of 20 and 40, to recognize Amazigh numerals. Our experimental results demonstrate that spoken digits in Amazigh can be recognized with an accuracy of 91.75%, 93% precision, and 92% recall. The preliminary outcomes we have achieved show great satisfaction when compared to the size of the training database. This motivates us to further enhance the system's performance in order to attain a higher rate of recognition. Our findings align with those reported in the existing literature.\",\"PeriodicalId\":41912,\"journal\":{\"name\":\"International Journal of Electrical and Computer Engineering Systems\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2023-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Electrical and Computer Engineering Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32985/ijeces.14.7.6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electrical and Computer Engineering Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32985/ijeces.14.7.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

语音识别领域使人机语音交互更加便捷。识别语音数字对于涉及数字的交流特别有用，例如提供注册代码、手机号码、分数或账号。本文讨论了我们使用基于深度学习的方法使用Amazigh的自动语音识别(ASR)的经验。我们的方法包括使用具有Mel-Frequency倒谱系数(MFCC)的卷积神经网络(CNN)来分析音频样本并生成频谱图。我们收集了一个数据库，里面有42个说阿马齐格语的人说的从0到9的数字，这些人的年龄在20到40岁之间，有男有女，用来识别阿马齐格语的数字。实验结果表明，Amazigh语音数字识别的准确率为91.75%，准确率为93%，召回率为92%。与训练数据库的规模相比，我们取得的初步结果令人非常满意。这促使我们进一步提高系统的性能，以获得更高的识别率。我们的发现与现有文献报道的结果一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Amazigh Spoken Digit Recognition using a Deep Learning Approach based on MFCC

The field of speech recognition has made human-machine voice interaction more convenient. Recognizing spoken digits is particularly useful for communication that involves numbers, such as providing a registration code, cellphone number, score, or account number. This article discusses our experience with Amazigh's Automatic Speech Recognition (ASR) using a deep learning- based approach. Our method involves using a convolutional neural network (CNN) with Mel-Frequency Cepstral Coefficients (MFCC) to analyze audio samples and generate spectrograms. We gathered a database of numerals from zero to nine spoken by 42 native Amazigh speakers, consisting of men and women between the ages of 20 and 40, to recognize Amazigh numerals. Our experimental results demonstrate that spoken digits in Amazigh can be recognized with an accuracy of 91.75%, 93% precision, and 92% recall. The preliminary outcomes we have achieved show great satisfaction when compared to the size of the training database. This motivates us to further enhance the system's performance in order to attain a higher rate of recognition. Our findings align with those reported in the existing literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Electrical and Computer Engineering Systems ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

1.20

自引率

11.80%

发文量

期刊介绍： The International Journal of Electrical and Computer Engineering Systems publishes original research in the form of full papers, case studies, reviews and surveys. It covers theory and application of electrical and computer engineering, synergy of computer systems and computational methods with electrical and electronic systems, as well as interdisciplinary research. Power systems Renewable electricity production Power electronics Electrical drives Industrial electronics Communication systems Advanced modulation techniques RFID devices and systems Signal and data processing Image processing Multimedia systems Microelectronics Instrumentation and measurement Control systems Robotics Modeling and simulation Modern computer architectures Computer networks Embedded systems High-performance computing Engineering education Parallel and distributed computer systems Human-computer systems Intelligent systems Multi-agent and holonic systems Real-time systems Software engineering Internet and web applications and systems Applications of computer systems in engineering and related disciplines Mathematical models of engineering systems Engineering management.

期刊最新文献

A Four Slot Dual Feed and Dual Band Reconfigurable Antenna for Fixed Satellite Service Applications Improving Scientific Literature Classification: A Parameter-Efficient Transformer-Based Approach The New ADE-TLM Algorithm for Modeling Debye Medium Multi-Head CNN-based Software Development Risk Classification FOE NET: Segmentation of Fetal in Ultrasound Images Using V-NET