Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer

Q4 Biochemistry, Genetics and Molecular Biology Journal of Biomolecular Techniques Pub Date : 2023-03-21 DOI:10.51173/jt.v5i1.749
Mohanad Sameer, Ahmed Talib, Alla Hussein
{"title":"Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer","authors":"Mohanad Sameer, Ahmed Talib, Alla Hussein","doi":"10.51173/jt.v5i1.749","DOIUrl":null,"url":null,"abstract":"Recognizing and transcribing human speech has become an increasingly important task. Recently, researchers have been more interested in automatic speech recognition (ASR) using End to End models. Previous choices for the Arabic ASR architecture have been time-delay neural networks, recurrent neural networks (RNN), and long short-term memory (LSTM). Preview end-to-end approaches have suffered from slow training and inference speed because of the limitations of training parallelization, and they require a large amount of data to achieve acceptable results in recognizing Arabic speech This research presents an Arabic speech recognition based on a transformer encoder-decoder architecture with self-attention to transcribe Arabic audio speech segments into text, which can be trained faster with more efficiency. The proposed model exceeds the performance of previous end-to-end approaches when utilizing the Common Voice dataset from Mozilla. In this research, we introduced a speech-transformer model that was trained over 110 epochs using only 112 hours of speech. Although Arabic is considered one of the languages that are difficult to interpret by speech recognition systems, we achieved the best word error rate (WER) of 3.2 compared to other systems whose training requires a very large amount of data. The proposed system was evaluated on the common voice 8.0 dataset without using the language model.","PeriodicalId":39617,"journal":{"name":"Journal of Biomolecular Techniques","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomolecular Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51173/jt.v5i1.749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0

Abstract

Recognizing and transcribing human speech has become an increasingly important task. Recently, researchers have been more interested in automatic speech recognition (ASR) using End to End models. Previous choices for the Arabic ASR architecture have been time-delay neural networks, recurrent neural networks (RNN), and long short-term memory (LSTM). Preview end-to-end approaches have suffered from slow training and inference speed because of the limitations of training parallelization, and they require a large amount of data to achieve acceptable results in recognizing Arabic speech This research presents an Arabic speech recognition based on a transformer encoder-decoder architecture with self-attention to transcribe Arabic audio speech segments into text, which can be trained faster with more efficiency. The proposed model exceeds the performance of previous end-to-end approaches when utilizing the Common Voice dataset from Mozilla. In this research, we introduced a speech-transformer model that was trained over 110 epochs using only 112 hours of speech. Although Arabic is considered one of the languages that are difficult to interpret by speech recognition systems, we achieved the best word error rate (WER) of 3.2 compared to other systems whose training requires a very large amount of data. The proposed system was evaluated on the common voice 8.0 dataset without using the language model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于变压器编解码器结构的阿拉伯语语音识别
识别和转录人类语言已成为一项日益重要的任务。近年来,研究人员对基于端到端模型的自动语音识别(ASR)越来越感兴趣。阿拉伯语ASR架构之前的选择是延时神经网络、循环神经网络(RNN)和长短期记忆(LSTM)。由于训练并行化的限制,预览端到端方法的训练和推理速度较慢,并且需要大量的数据才能达到可接受的阿拉伯语语音识别效果。本研究提出了一种基于自关注的转换器编码器-解码器架构的阿拉伯语语音识别,将阿拉伯语音频语音片段转录成文本,训练速度更快,效率更高。当利用来自Mozilla的Common Voice数据集时,所提出的模型的性能超过了以前的端到端方法。在这项研究中,我们引入了一个语音转换模型,该模型仅使用112小时的语音训练了110个epoch。尽管阿拉伯语被认为是语音识别系统难以解释的语言之一,但与其他需要大量数据进行训练的系统相比,我们实现了最佳的单词错误率(WER)为3.2。在不使用语言模型的情况下,在通用语音8.0数据集上对所提出的系统进行了评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Biomolecular Techniques
Journal of Biomolecular Techniques Biochemistry, Genetics and Molecular Biology-Molecular Biology
CiteScore
2.50
自引率
0.00%
发文量
9
期刊介绍: The Journal of Biomolecular Techniques is a peer-reviewed publication issued five times a year by the Association of Biomolecular Resource Facilities. The Journal was established to promote the central role biotechnology plays in contemporary research activities, to disseminate information among biomolecular resource facilities, and to communicate the biotechnology research conducted by the Association’s Research Groups and members, as well as other investigators.
期刊最新文献
Effect of Different Polishing Systems on Surface Roughness of IPS Empress Ceramic Materials Evaluation of the Effect of Nano and Micro Hydroxyapatite Particles on the Impact Strength of Acrylic Resin: In Vitro Study The Effect of Recycled CAD/CAM PEEK Fibers on the Transverse Strength of Repaired Acrylic Resin Assessment of Vitamin D3 Level Among a Sample of Type 2 Diabetic Patients Attending Diabetes and Endocrinology Center in Al-Hilla City The Impact of Digital Transformation in Enhancing Operational Performance: An Applied Study in the Kirkuk Electricity Distribution Branch
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1