End-to-End Speech-to-Text Translation: A Survey

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Speech and Language Pub Date : 2024-11-14 DOI:10.1016/j.csl.2024.101751
Nivedita Sethiya, Chandresh Kumar Maurya
{"title":"End-to-End Speech-to-Text Translation: A Survey","authors":"Nivedita Sethiya,&nbsp;Chandresh Kumar Maurya","doi":"10.1016/j.csl.2024.101751","DOIUrl":null,"url":null,"abstract":"<div><div>Speech-to-Text (ST) translation pertains to the task of converting speech signals in one language to text in another language. It finds its application in various domains, such as hands-free communication, dictation, video lecture transcription, and translation, to name a few. Automatic Speech Recognition (ASR), as well as Machine Translation(MT) models, play crucial roles in traditional ST translation, enabling the conversion of spoken language in its original form to written text and facilitating seamless cross-lingual communication. ASR recognizes spoken words, while MT translates the transcribed text into the target language. Such integrated models suffer from cascaded error propagation and high resource and training costs. As a result, researchers have been exploring end-to-end (E2E) models for ST translation. However, to our knowledge, there is no comprehensive review of existing works on E2E ST. The present survey, therefore, discusses the works in this direction. We have attempted to provide a comprehensive review of models employed, metrics, and datasets used for ST tasks, providing challenges and future research direction with new insights. We believe this review will be helpful to researchers working on various applications of ST models.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101751"},"PeriodicalIF":3.1000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824001347","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Speech-to-Text (ST) translation pertains to the task of converting speech signals in one language to text in another language. It finds its application in various domains, such as hands-free communication, dictation, video lecture transcription, and translation, to name a few. Automatic Speech Recognition (ASR), as well as Machine Translation(MT) models, play crucial roles in traditional ST translation, enabling the conversion of spoken language in its original form to written text and facilitating seamless cross-lingual communication. ASR recognizes spoken words, while MT translates the transcribed text into the target language. Such integrated models suffer from cascaded error propagation and high resource and training costs. As a result, researchers have been exploring end-to-end (E2E) models for ST translation. However, to our knowledge, there is no comprehensive review of existing works on E2E ST. The present survey, therefore, discusses the works in this direction. We have attempted to provide a comprehensive review of models employed, metrics, and datasets used for ST tasks, providing challenges and future research direction with new insights. We believe this review will be helpful to researchers working on various applications of ST models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
端到端语音到文本翻译:调查
语音到文本(ST)翻译是指将一种语言的语音信号转换成另一种语言的文本。它可应用于各种领域,如免提通信、听写、视频讲座转录和翻译等。自动语音识别(ASR)和机器翻译(MT)模型在传统的 ST 翻译中发挥着至关重要的作用,可将口语的原始形式转换为书面文本,促进无缝跨语言交流。ASR 识别口语单词,而 MT 则将转录文本翻译成目标语言。这种集成模型存在级联错误传播以及资源和培训成本高的问题。因此,研究人员一直在探索 ST 翻译的端到端(E2E)模型。然而,据我们所知,目前还没有关于 E2E ST 的全面综述。因此,本调查报告将讨论这方面的工作。我们试图对 ST 任务所使用的模型、度量标准和数据集进行全面评述,并提供具有新见解的挑战和未来研究方向。我们相信,这篇综述将对研究 ST 模型各种应用的研究人员有所帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
期刊最新文献
Modeling correlated causal-effect structure with a hypergraph for document-level event causality identification You Are What You Write: Author re-identification privacy attacks in the era of pre-trained language models End-to-End Speech-to-Text Translation: A Survey Corpus and unsupervised benchmark: Towards Tagalog grammatical error correction TR-Net: Token Relation Inspired Table Filling Network for Joint Entity and Relation Extraction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1