End-to-End Speech Translation with the Transformer

Laura Cross Vila, Carlos Escolano, José A. R. Fonollosa, M. Costa-jussà
{"title":"End-to-End Speech Translation with the Transformer","authors":"Laura Cross Vila, Carlos Escolano, José A. R. Fonollosa, M. Costa-jussà","doi":"10.21437/IBERSPEECH.2018-13","DOIUrl":null,"url":null,"abstract":"Speech Translation has been traditionally addressed with the concatenation of two tasks: Speech Recognition and Machine Translation. This approach has the main drawback that errors are concatenated. Recently, neural approaches to Speech Recognition and Machine Translation have made possible facing the task by means of an End-to-End Speech Translation architecture. In this paper, we propose to use the architecture of the Transformer which is based solely on attention-based mechanisms to address the End-to-End Speech Translation system. As a contrastive architecture, we use the same Transformer to built the Speech Recognition and Machine Translation systems to perform Speech Translation through concatenation of systems. Results on a Spanish-to-English standard task show that the end-to-end architecture is able to outperform the concatenated systems by half point BLEU.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/IBERSPEECH.2018-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 57

Abstract

Speech Translation has been traditionally addressed with the concatenation of two tasks: Speech Recognition and Machine Translation. This approach has the main drawback that errors are concatenated. Recently, neural approaches to Speech Recognition and Machine Translation have made possible facing the task by means of an End-to-End Speech Translation architecture. In this paper, we propose to use the architecture of the Transformer which is based solely on attention-based mechanisms to address the End-to-End Speech Translation system. As a contrastive architecture, we use the same Transformer to built the Speech Recognition and Machine Translation systems to perform Speech Translation through concatenation of systems. Results on a Spanish-to-English standard task show that the end-to-end architecture is able to outperform the concatenated systems by half point BLEU.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
端到端语音翻译与变压器
传统上,语音翻译由两个任务组成:语音识别和机器翻译。这种方法的主要缺点是错误是串联起来的。最近,语音识别和机器翻译的神经方法通过端到端语音翻译架构使得面对这一任务成为可能。在本文中,我们建议使用Transformer的架构,该架构完全基于基于注意力的机制来解决端到端语音翻译系统。作为一种对比体系结构,我们使用相同的Transformer来构建语音识别和机器翻译系统,通过系统的连接来执行语音翻译。在西班牙语到英语标准任务上的结果表明,端到端架构能够比连接系统的性能高出0.5个BLEU。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Recurrent Neural Network Approach to Audio Segmentation for Broadcast Domain Data The Intelligent Voice System for the IberSPEECH-RTVE 2018 Speaker Diarization Challenge AUDIAS-CEU: A Language-independent approach for the Query-by-Example Spoken Term Detection task of the Search on Speech ALBAYZIN 2018 evaluation The GTM-UVIGO System for Audiovisual Diarization Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1