通过语音特征提取实现英语语音的实时翻译

IF 0.8 Q4 ROBOTICS Artificial Life and Robotics Pub Date : 2024-05-27 DOI:10.1007/s10015-024-00951-w

Xiaoyan Lei

{"title":"通过语音特征提取实现英语语音的实时翻译","authors":"Xiaoyan Lei","doi":"10.1007/s10015-024-00951-w","DOIUrl":null,"url":null,"abstract":"<div><p>Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 × 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.</p></div>","PeriodicalId":46050,"journal":{"name":"Artificial Life and Robotics","volume":"29 3","pages":"410 - 415"},"PeriodicalIF":0.8000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-time translation of English speech through speech feature extraction\",\"authors\":\"Xiaoyan Lei\",\"doi\":\"10.1007/s10015-024-00951-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 × 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.</p></div>\",\"PeriodicalId\":46050,\"journal\":{\"name\":\"Artificial Life and Robotics\",\"volume\":\"29 3\",\"pages\":\"410 - 415\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Life and Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10015-024-00951-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Life and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-024-00951-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

实时英语语音翻译在商务和旅行等多种场合都非常有用。本研究的目标是提高实时英语语音翻译的效率。最初，从英语语音中提取滤波器库（FBank）特征。随后，引入了增强型变换器模型，在编码器前端加入了因果卷积模块，以捕捉带有位置信息的英语语音特征。我们使用 MuST-C 数据集测试了优化模型将英语语音翻译成不同目标语言的性能。结果显示，使用改进后的转换器，不同目标语言的翻译结果存在差异。西班牙语文本的双语评估劣等（BLEU）得分最高，为 20.84 分，而俄语文本的得分最低，为 10.56 分。平均 BLEU 得分为 18.51，平均延迟时间为 1202.33 毫秒。与传统的 Transformer 模型相比，改进后的模型在使用 3 × 3 的卷积核大小时，显示出更高的 BLEU 分数、更低的时延和最佳性能。这些结果证明了改进的 Transformer 模型在实时英语语音翻译中的可靠性，突出了其实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Real-time translation of English speech through speech feature extraction

Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 × 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Life and Robotics ROBOTICS-

CiteScore

2.00

自引率

22.20%

发文量

101

期刊介绍： Artificial Life and Robotics is an international journal publishing original technical papers and authoritative state-of-the-art reviews on the development of new technologies concerning artificial life and robotics, especially computer-based simulation and hardware for the twenty-first century. This journal covers a broad multidisciplinary field, including areas such as artificial brain research, artificial intelligence, artificial life, artificial living, artificial mind research, brain science, chaos, cognitive science, complexity, computer graphics, evolutionary computations, fuzzy control, genetic algorithms, innovative computations, intelligent control and modelling, micromachines, micro-robot world cup soccer tournament, mobile vehicles, neural networks, neurocomputers, neurocomputing technologies and applications, robotics, robus virtual engineering, and virtual reality. Hardware-oriented submissions are particularly welcome. Publishing body: International Symposium on Artificial Life and RoboticsEditor-in-Chiei: Hiroshi Tanaka Hatanaka R Apartment 101, Hatanaka 8-7A, Ooaza-Hatanaka, Oita city, Oita, Japan 870-0856 ©International Symposium on Artificial Life and Robotics