{"title":"Speech Recognition Model of Civil Aviation Radiotelephony Communication Based on Improved Conformer","authors":"Ze-ping Xiao, Guimin Jia, Bo Shi","doi":"10.1109/ICARCE55724.2022.10046493","DOIUrl":null,"url":null,"abstract":"Radiotelephony communication has a special grammatical structure and pronunciation, and it is difficult to apply the model of generic speech recognition directly to the field of radiotelephony communication. We propose a Conv1DSlide-Conformer model for speech recognition of radiotelephony communication. The sliding-window attention mechanism is used instead of the self-attention mechanism to improve the decoding speed of the model and increase the adaptability of the model to radiotelephony communication. The convolutional module is used instead of the feedforward neural network module to make the encoder focus more on local information. The improved Conformer model processes the FBANK features of radiotelephony communication and can extract high-dimensional features that better fit the characteristics of radiotelephony communication. The use of concatenated temporal classification (CTC) combined with a data augmentation strategy assists training to speed up convergence during model training and reduce the complexity of model training. Decoding is assisted by CTC and language models to improve the performance of speech recognition. The experimental results show that the improved Conformer speech recognition model in this paper reduces the word error rate to 8.1% and 7.8% on the actual Chinese radiotelephony communication speech dataset.","PeriodicalId":416305,"journal":{"name":"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Automation, Robotics and Computer Engineering (ICARCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARCE55724.2022.10046493","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Radiotelephony communication has a special grammatical structure and pronunciation, and it is difficult to apply the model of generic speech recognition directly to the field of radiotelephony communication. We propose a Conv1DSlide-Conformer model for speech recognition of radiotelephony communication. The sliding-window attention mechanism is used instead of the self-attention mechanism to improve the decoding speed of the model and increase the adaptability of the model to radiotelephony communication. The convolutional module is used instead of the feedforward neural network module to make the encoder focus more on local information. The improved Conformer model processes the FBANK features of radiotelephony communication and can extract high-dimensional features that better fit the characteristics of radiotelephony communication. The use of concatenated temporal classification (CTC) combined with a data augmentation strategy assists training to speed up convergence during model training and reduce the complexity of model training. Decoding is assisted by CTC and language models to improve the performance of speech recognition. The experimental results show that the improved Conformer speech recognition model in this paper reduces the word error rate to 8.1% and 7.8% on the actual Chinese radiotelephony communication speech dataset.