Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition

2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME) Pub Date : 2018-11-01 DOI:10.1109/ICBME.2018.8703593

Arash Dehghani, S. Seyyedsalehi

{"title":"Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition","authors":"Arash Dehghani, S. Seyyedsalehi","doi":"10.1109/ICBME.2018.8703593","DOIUrl":null,"url":null,"abstract":"In this paper, various structures and methods of Deep Artificial Neural Networks (DNN) will be evaluated and compared for the purpose of continuous Persian speech recognition. One of the first models of neural networks used in speech recognition applications were fully connected Neural Networks (FCNNs) and, consequently, Deep Neural Networks (DNNs). Although these models have better performance compared to GMM / HMM models, they do not have the proper structure to model local speech information. Convolutional Neural Network (CNN) is a good option for modeling the local structure of biological signals, including speech signals. Another issue that Deep Artificial Neural Networks face, is the convergence of networks on training data. The main inhibitor of convergence is the presence of local minima in the process of training. Deep Neural Network Pre-training methods, despite a large amount of computing, are powerful tools for crossing the local minima. But the use of appropriate neuronal models in the network structure seems to be a better solution to this problem. The Rectified Linear Unit neuronal model and the Maxout model are the most suitable neuronal models presented to this date. Several experiments were carried out to evaluate the performance of the methods and structures mentioned. After verifying the proper functioning of these methods, a combination of all models was implemented on FARSDAT speech database for continuous speech recognition. The results obtained from the experiments show that the combined model (CMDNN) improves the performance of ANNs in speech recognition versus the pre-trained fully connected NNs with sigmoid neurons by about 3%.","PeriodicalId":338286,"journal":{"name":"2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBME.2018.8703593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In this paper, various structures and methods of Deep Artificial Neural Networks (DNN) will be evaluated and compared for the purpose of continuous Persian speech recognition. One of the first models of neural networks used in speech recognition applications were fully connected Neural Networks (FCNNs) and, consequently, Deep Neural Networks (DNNs). Although these models have better performance compared to GMM / HMM models, they do not have the proper structure to model local speech information. Convolutional Neural Network (CNN) is a good option for modeling the local structure of biological signals, including speech signals. Another issue that Deep Artificial Neural Networks face, is the convergence of networks on training data. The main inhibitor of convergence is the presence of local minima in the process of training. Deep Neural Network Pre-training methods, despite a large amount of computing, are powerful tools for crossing the local minima. But the use of appropriate neuronal models in the network structure seems to be a better solution to this problem. The Rectified Linear Unit neuronal model and the Maxout model are the most suitable neuronal models presented to this date. Several experiments were carried out to evaluate the performance of the methods and structures mentioned. After verifying the proper functioning of these methods, a combination of all models was implemented on FARSDAT speech database for continuous speech recognition. The results obtained from the experiments show that the combined model (CMDNN) improves the performance of ANNs in speech recognition versus the pre-trained fully connected NNs with sigmoid neurons by about 3%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度卷积Maxout神经网络在语音识别中的性能评价

本文将对深度人工神经网络(DNN)的各种结构和方法进行评估和比较，以实现连续波斯语语音识别。语音识别应用中最早使用的神经网络模型之一是全连接神经网络(fcnn)，随后是深度神经网络(dnn)。尽管这些模型与GMM / HMM模型相比具有更好的性能，但它们没有适当的结构来建模局部语音信息。卷积神经网络(CNN)是对生物信号(包括语音信号)的局部结构进行建模的一个很好的选择。深度人工神经网络面临的另一个问题是网络在训练数据上的收敛。收敛的主要障碍是训练过程中存在的局部极小值。深度神经网络的预训练方法虽然计算量很大，但却是克服局部极小值的有力工具。但在网络结构中使用适当的神经元模型似乎是解决这个问题的更好方法。校正线性单元神经元模型和Maxout模型是目前提出的最合适的神经元模型。进行了几项实验来评估所述方法和结构的性能。在验证了这些方法的功能后，在FARSDAT语音数据库上实现了所有模型的组合，用于连续语音识别。实验结果表明，与带s型神经元的预训练全连接神经网络相比，CMDNN在语音识别方面的性能提高了约3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME)

自引率

0.00%

发文量