基于深度卷积神经网络的印尼语大词汇量语音识别

Q2 Decision Sciences IAES International Journal of Artificial Intelligence Pub Date : 2023-06-01 DOI:10.11591/ijai.v12.i2.pp610-617

H. Pardede, Purwoko Adhi, Vicky Zilvan, A. Ramdan, Dikdik Krisnandi

{"title":"基于深度卷积神经网络的印尼语大词汇量语音识别","authors":"H. Pardede, Purwoko Adhi, Vicky Zilvan, A. Ramdan, Dikdik Krisnandi","doi":"10.11591/ijai.v12.i2.pp610-617","DOIUrl":null,"url":null,"abstract":"There are great interests in developing speech recognition using deep learning technologies due to their capability to model the complexity of pronunciations, syntax, and language rules of speech data better than the traditional hidden Markov model (HMM) do. But, the availability of large amount of data is necessary for deep learning-based speech recognition to be effective. While this is not a problem for mainstream languages such as English or Chinese, this is not the case for non-mainstream languages such as Indonesian. To overcome this limitation, we present deep features based on convolutional neural networks (CNN) for Indonesian large vocabulary continuous speech recognition in this paper. The CNN is trained discriminatively which is different from usual deep learning implementations where the networks are trained generatively. Our evaluations show that the proposed method on Indonesian speech data achieves 7.26% and 9.01% error reduction rates over the state-of-the-art deep belief networks-deep neural networks (DBN-DNN) for large vocabulary continuous speech recognition (LVCSR), with Mel frequency cepstral coefficients (MFCC) and filterbank (FBANK) used as features, respectively. An error reduction rate of 6.13% is achieved compared to CNN-DNN with generative training.","PeriodicalId":52221,"journal":{"name":"IAES International Journal of Artificial Intelligence","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep convolutional neural networks-based features for Indonesian large vocabulary speech recognition\",\"authors\":\"H. Pardede, Purwoko Adhi, Vicky Zilvan, A. Ramdan, Dikdik Krisnandi\",\"doi\":\"10.11591/ijai.v12.i2.pp610-617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are great interests in developing speech recognition using deep learning technologies due to their capability to model the complexity of pronunciations, syntax, and language rules of speech data better than the traditional hidden Markov model (HMM) do. But, the availability of large amount of data is necessary for deep learning-based speech recognition to be effective. While this is not a problem for mainstream languages such as English or Chinese, this is not the case for non-mainstream languages such as Indonesian. To overcome this limitation, we present deep features based on convolutional neural networks (CNN) for Indonesian large vocabulary continuous speech recognition in this paper. The CNN is trained discriminatively which is different from usual deep learning implementations where the networks are trained generatively. Our evaluations show that the proposed method on Indonesian speech data achieves 7.26% and 9.01% error reduction rates over the state-of-the-art deep belief networks-deep neural networks (DBN-DNN) for large vocabulary continuous speech recognition (LVCSR), with Mel frequency cepstral coefficients (MFCC) and filterbank (FBANK) used as features, respectively. An error reduction rate of 6.13% is achieved compared to CNN-DNN with generative training.\",\"PeriodicalId\":52221,\"journal\":{\"name\":\"IAES International Journal of Artificial Intelligence\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IAES International Journal of Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11591/ijai.v12.i2.pp610-617\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v12.i2.pp610-617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 0

摘要

人们对使用深度学习技术开发语音识别非常感兴趣，因为它们能够比传统的隐马尔可夫模型(HMM)更好地模拟语音数据的发音、语法和语言规则的复杂性。但是，要使基于深度学习的语音识别有效，大量数据的可用性是必要的。虽然这对英语或中文等主流语言来说不是问题，但对印尼语等非主流语言来说就不是问题了。为了克服这一限制，本文提出了基于卷积神经网络(CNN)的深度特征用于印尼语大词汇量连续语音识别。CNN是判别式训练，这与通常的深度学习实现不同，后者的网络是生成式训练的。我们的评估表明，所提出的方法在印度尼西亚语音数据上的错误率比最先进的深度信念网络-深度神经网络(DBN-DNN)的大词汇量连续语音识别(LVCSR)分别达到7.26%和9.01%，Mel频率频谱系数(MFCC)和滤波器组(FBANK)分别作为特征。与生成训练的CNN-DNN相比，错误率降低了6.13%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Deep convolutional neural networks-based features for Indonesian large vocabulary speech recognition

There are great interests in developing speech recognition using deep learning technologies due to their capability to model the complexity of pronunciations, syntax, and language rules of speech data better than the traditional hidden Markov model (HMM) do. But, the availability of large amount of data is necessary for deep learning-based speech recognition to be effective. While this is not a problem for mainstream languages such as English or Chinese, this is not the case for non-mainstream languages such as Indonesian. To overcome this limitation, we present deep features based on convolutional neural networks (CNN) for Indonesian large vocabulary continuous speech recognition in this paper. The CNN is trained discriminatively which is different from usual deep learning implementations where the networks are trained generatively. Our evaluations show that the proposed method on Indonesian speech data achieves 7.26% and 9.01% error reduction rates over the state-of-the-art deep belief networks-deep neural networks (DBN-DNN) for large vocabulary continuous speech recognition (LVCSR), with Mel frequency cepstral coefficients (MFCC) and filterbank (FBANK) used as features, respectively. An error reduction rate of 6.13% is achieved compared to CNN-DNN with generative training.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IAES International Journal of Artificial Intelligence Decision Sciences-Information Systems and Management

CiteScore

3.90

自引率

0.00%

发文量

170