Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition

Arash Dehghani, S. Seyyedsalehi
{"title":"Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition","authors":"Arash Dehghani, S. Seyyedsalehi","doi":"10.1109/ICBME.2018.8703593","DOIUrl":null,"url":null,"abstract":"In this paper, various structures and methods of Deep Artificial Neural Networks (DNN) will be evaluated and compared for the purpose of continuous Persian speech recognition. One of the first models of neural networks used in speech recognition applications were fully connected Neural Networks (FCNNs) and, consequently, Deep Neural Networks (DNNs). Although these models have better performance compared to GMM / HMM models, they do not have the proper structure to model local speech information. Convolutional Neural Network (CNN) is a good option for modeling the local structure of biological signals, including speech signals. Another issue that Deep Artificial Neural Networks face, is the convergence of networks on training data. The main inhibitor of convergence is the presence of local minima in the process of training. Deep Neural Network Pre-training methods, despite a large amount of computing, are powerful tools for crossing the local minima. But the use of appropriate neuronal models in the network structure seems to be a better solution to this problem. The Rectified Linear Unit neuronal model and the Maxout model are the most suitable neuronal models presented to this date. Several experiments were carried out to evaluate the performance of the methods and structures mentioned. After verifying the proper functioning of these methods, a combination of all models was implemented on FARSDAT speech database for continuous speech recognition. The results obtained from the experiments show that the combined model (CMDNN) improves the performance of ANNs in speech recognition versus the pre-trained fully connected NNs with sigmoid neurons by about 3%.","PeriodicalId":338286,"journal":{"name":"2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBME.2018.8703593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper, various structures and methods of Deep Artificial Neural Networks (DNN) will be evaluated and compared for the purpose of continuous Persian speech recognition. One of the first models of neural networks used in speech recognition applications were fully connected Neural Networks (FCNNs) and, consequently, Deep Neural Networks (DNNs). Although these models have better performance compared to GMM / HMM models, they do not have the proper structure to model local speech information. Convolutional Neural Network (CNN) is a good option for modeling the local structure of biological signals, including speech signals. Another issue that Deep Artificial Neural Networks face, is the convergence of networks on training data. The main inhibitor of convergence is the presence of local minima in the process of training. Deep Neural Network Pre-training methods, despite a large amount of computing, are powerful tools for crossing the local minima. But the use of appropriate neuronal models in the network structure seems to be a better solution to this problem. The Rectified Linear Unit neuronal model and the Maxout model are the most suitable neuronal models presented to this date. Several experiments were carried out to evaluate the performance of the methods and structures mentioned. After verifying the proper functioning of these methods, a combination of all models was implemented on FARSDAT speech database for continuous speech recognition. The results obtained from the experiments show that the combined model (CMDNN) improves the performance of ANNs in speech recognition versus the pre-trained fully connected NNs with sigmoid neurons by about 3%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
深度卷积Maxout神经网络在语音识别中的性能评价
本文将对深度人工神经网络(DNN)的各种结构和方法进行评估和比较,以实现连续波斯语语音识别。语音识别应用中最早使用的神经网络模型之一是全连接神经网络(fcnn),随后是深度神经网络(dnn)。尽管这些模型与GMM / HMM模型相比具有更好的性能,但它们没有适当的结构来建模局部语音信息。卷积神经网络(CNN)是对生物信号(包括语音信号)的局部结构进行建模的一个很好的选择。深度人工神经网络面临的另一个问题是网络在训练数据上的收敛。收敛的主要障碍是训练过程中存在的局部极小值。深度神经网络的预训练方法虽然计算量很大,但却是克服局部极小值的有力工具。但在网络结构中使用适当的神经元模型似乎是解决这个问题的更好方法。校正线性单元神经元模型和Maxout模型是目前提出的最合适的神经元模型。进行了几项实验来评估所述方法和结构的性能。在验证了这些方法的功能后,在FARSDAT语音数据库上实现了所有模型的组合,用于连续语音识别。实验结果表明,与带s型神经元的预训练全连接神经网络相比,CMDNN在语音识别方面的性能提高了约3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Adaptive beamforming with automatic diagonal loading in medical ultrasound imaging Design of a Low Noise Low Power Amplifier for Biomedical Applications Synthesis, Characterization and Electrospinning of Novel Chitosan Derivative for Tissue Engineering Applications Automatic segmentation of prostate in MR images using deep learning and multi-atlas techniques Effects of temperature distribution in the tissue around the tumor on the quality of hyperthermia
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1