首页 > 最新文献

Proceedings of the Third Workshop on Automatic Simultaneous Translation最新文献

英文 中文
USST’s System for AutoSimTrans 2022 USST的AutoSimTrans系统2022
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.autosimtrans-1.7
Zhu Hui, Yu Jun
This paper describes our submitted text-to-text Simultaneous translation (ST) system, which won the second place in the Chinese→English streaming translation task of AutoSimTrans 2022. Our baseline system is a BPE-based Transformer model trained with the PaddlePaddle framework. In our experiments, we employ data synthesis and ensemble approaches to enhance the base model. In order to bridge the gap between general domain and spoken domain, we select in-domain data from general corpus and mixed then with spoken corpus for mixed fine tuning. Finally, we adopt fixed wait-k policy to transfer our full-sentence translation model to simultaneous translation model. Experiments on the development data show that our system outperforms than the baseline system.
本文介绍了我们提交的文本到文本同声翻译(ST)系统,该系统在AutoSimTrans 2022中→英语流翻译任务中获得第二名。我们的基线系统是使用PaddlePaddle框架训练的基于bpe的Transformer模型。在我们的实验中,我们采用数据综合和集成方法来增强基础模型。为了弥补一般领域和语音领域之间的差距,我们从一般语料库中选择域内数据,并将其与语音语料库混合进行混合微调。最后,我们采用固定的wait-k策略将整句翻译模型转换为同声翻译模型。开发数据实验表明,系统性能优于基准系统。
{"title":"USST’s System for AutoSimTrans 2022","authors":"Zhu Hui, Yu Jun","doi":"10.18653/v1/2022.autosimtrans-1.7","DOIUrl":"https://doi.org/10.18653/v1/2022.autosimtrans-1.7","url":null,"abstract":"This paper describes our submitted text-to-text Simultaneous translation (ST) system, which won the second place in the Chinese→English streaming translation task of AutoSimTrans 2022. Our baseline system is a BPE-based Transformer model trained with the PaddlePaddle framework. In our experiments, we employ data synthesis and ensemble approaches to enhance the base model. In order to bridge the gap between general domain and spoken domain, we select in-domain data from general corpus and mixed then with spoken corpus for mixed fine tuning. Finally, we adopt fixed wait-k policy to transfer our full-sentence translation model to simultaneous translation model. Experiments on the development data show that our system outperforms than the baseline system.","PeriodicalId":444422,"journal":{"name":"Proceedings of the Third Workshop on Automatic Simultaneous Translation","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133385495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Findings of the Third Workshop on Automatic Simultaneous Translation 第三届自动同声翻译研讨会的研究结果
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.autosimtrans-1.1
Ruiqing Zhang, Chuanqiang Zhang, Zhongjun He, Hua Wu, Haifeng Wang, Liang Huang, Qun Liu, Julia Ive, Wolfgang Macherey
This paper reports the results of the shared task we hosted on the Third Workshop of Automatic Simultaneous Translation (AutoSimTrans). The shared task aims to promote the development of text-to-text and speech-to-text simultaneous translation, and includes Chinese-English and English-Spanish tracks. The number of systems submitted this year has increased fourfold compared with last year. Additionally, the top 1 ranked system in the speech-to-text track is the first end-to-end submission we have received in the past three years, which has shown great potential. This paper reports the results and descriptions of the 14 participating teams, compares different evaluation metrics, and revisits the ranking method.
本文报告了我们在第三届自动同声翻译研讨会(AutoSimTrans)上主持的共享任务的结果。共享任务旨在促进文本到文本和语音到文本的同声翻译的发展,包括中英和英西两种轨道。今年提交的系统数量比去年增加了4倍。此外,语音到文本轨道排名前1的系统是我们近三年来收到的第一个端到端提交,显示出巨大的潜力。本文报告了14个参赛队的结果和描述,比较了不同的评价指标,并重新审视了排名方法。
{"title":"Findings of the Third Workshop on Automatic Simultaneous Translation","authors":"Ruiqing Zhang, Chuanqiang Zhang, Zhongjun He, Hua Wu, Haifeng Wang, Liang Huang, Qun Liu, Julia Ive, Wolfgang Macherey","doi":"10.18653/v1/2022.autosimtrans-1.1","DOIUrl":"https://doi.org/10.18653/v1/2022.autosimtrans-1.1","url":null,"abstract":"This paper reports the results of the shared task we hosted on the Third Workshop of Automatic Simultaneous Translation (AutoSimTrans). The shared task aims to promote the development of text-to-text and speech-to-text simultaneous translation, and includes Chinese-English and English-Spanish tracks. The number of systems submitted this year has increased fourfold compared with last year. Additionally, the top 1 ranked system in the speech-to-text track is the first end-to-end submission we have received in the past three years, which has shown great potential. This paper reports the results and descriptions of the 14 participating teams, compares different evaluation metrics, and revisits the ranking method.","PeriodicalId":444422,"journal":{"name":"Proceedings of the Third Workshop on Automatic Simultaneous Translation","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115154774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-to-End Simultaneous Speech Translation with Pretraining and Distillation: Huawei Noah’s System for AutoSimTranS 2022 基于预训练和蒸馏的端到端同步语音翻译:华为Noah的AutoSimTranS系统2022
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.autosimtrans-1.5
Xingshan Zeng, Pengfei Li, Liangyou Li, Qun Liu
This paper describes the system submitted to AutoSimTrans 2022 from Huawei Noah’s Ark Lab, which won the first place in the audio input track of the Chinese-English translation task. Our system is based on RealTranS, an end-to-end simultaneous speech translation model. We enhance the model with pretraining, by initializing the acoustic encoder with ASR encoder, and the semantic encoder and decoder with NMT encoder and decoder, respectively. To relieve the data scarcity, we further construct pseudo training corpus as a kind of knowledge distillation with ASR data and the pretrained NMT model. Meanwhile, we also apply several techniques to improve the robustness and domain generalizability, including punctuation removal, token-level knowledge distillation and multi-domain finetuning. Experiments show that our system significantly outperforms the baselines at all latency and also verify the effectiveness of our proposed methods.
本文介绍了华为诺亚方舟实验室提交给AutoSimTrans 2022的系统,该系统在汉英翻译任务的音频输入轨道中获得了第一名。我们的系统基于RealTranS,一个端到端同步语音翻译模型。我们通过预训练来增强模型,用ASR编码器初始化声学编码器,用NMT编码器和解码器分别初始化语义编码器和解码器。为了缓解数据的稀缺性,我们进一步利用ASR数据和预训练的NMT模型构建了伪训练语料库,作为一种知识蒸馏。同时,我们还采用了一些技术来提高鲁棒性和领域泛化性,包括标点符号去除、标记级知识蒸馏和多领域微调。实验表明,我们的系统在所有延迟下都明显优于基线,也验证了我们所提出方法的有效性。
{"title":"End-to-End Simultaneous Speech Translation with Pretraining and Distillation: Huawei Noah’s System for AutoSimTranS 2022","authors":"Xingshan Zeng, Pengfei Li, Liangyou Li, Qun Liu","doi":"10.18653/v1/2022.autosimtrans-1.5","DOIUrl":"https://doi.org/10.18653/v1/2022.autosimtrans-1.5","url":null,"abstract":"This paper describes the system submitted to AutoSimTrans 2022 from Huawei Noah’s Ark Lab, which won the first place in the audio input track of the Chinese-English translation task. Our system is based on RealTranS, an end-to-end simultaneous speech translation model. We enhance the model with pretraining, by initializing the acoustic encoder with ASR encoder, and the semantic encoder and decoder with NMT encoder and decoder, respectively. To relieve the data scarcity, we further construct pseudo training corpus as a kind of knowledge distillation with ASR data and the pretrained NMT model. Meanwhile, we also apply several techniques to improve the robustness and domain generalizability, including punctuation removal, token-level knowledge distillation and multi-domain finetuning. Experiments show that our system significantly outperforms the baselines at all latency and also verify the effectiveness of our proposed methods.","PeriodicalId":444422,"journal":{"name":"Proceedings of the Third Workshop on Automatic Simultaneous Translation","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114498198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BIT-Xiaomi’s System for AutoSimTrans 2022 bit -小米的AutoSimTrans系统2022
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.autosimtrans-1.6
Mengge Liu, Xiang Li, Bao Chen, Yanzhi Tian, Tianwei Lan, Silin Li, Yuhang Guo, Jian Luan, Bin Wang
This system paper describes the BIT-Xiaomi simultaneous translation system for Autosimtrans 2022 simultaneous translation challenge. We participated in three tracks: the Zh-En text-to-text track, the Zh-En audio-to-text track and the En-Es test-to-text track. In our system, wait-k is employed to train prefix-to-prefix translation models. We integrate streaming chunking to detect boundaries as the source streaming read in. We further improve our system with data selection, data-augmentation and R-drop training methods. Results show that our wait-k implementation outperforms organizer’s baseline by 8 BLEU score at most, and our proposed streaming chunking method further improves about 2 BLEU in low latency regime.
本系统论文介绍了bit -小米同声翻译系统在Autosimtrans 2022同声翻译挑战赛中的应用。我们参与了三个轨道:zhen文本到文本轨道,zhen音频到文本轨道和En-Es测试到文本轨道。在我们的系统中,使用wait-k来训练前缀到前缀的翻译模型。我们集成了流分块来检测源流读入时的边界。我们通过数据选择、数据增强和R-drop训练方法进一步改进我们的系统。结果表明,我们的wait-k实现最多比组织者的基线高出8个BLEU分数,并且我们提出的流分块方法在低延迟状态下进一步提高了约2个BLEU分数。
{"title":"BIT-Xiaomi’s System for AutoSimTrans 2022","authors":"Mengge Liu, Xiang Li, Bao Chen, Yanzhi Tian, Tianwei Lan, Silin Li, Yuhang Guo, Jian Luan, Bin Wang","doi":"10.18653/v1/2022.autosimtrans-1.6","DOIUrl":"https://doi.org/10.18653/v1/2022.autosimtrans-1.6","url":null,"abstract":"This system paper describes the BIT-Xiaomi simultaneous translation system for Autosimtrans 2022 simultaneous translation challenge. We participated in three tracks: the Zh-En text-to-text track, the Zh-En audio-to-text track and the En-Es test-to-text track. In our system, wait-k is employed to train prefix-to-prefix translation models. We integrate streaming chunking to detect boundaries as the source streaming read in. We further improve our system with data selection, data-augmentation and R-drop training methods. Results show that our wait-k implementation outperforms organizer’s baseline by 8 BLEU score at most, and our proposed streaming chunking method further improves about 2 BLEU in low latency regime.","PeriodicalId":444422,"journal":{"name":"Proceedings of the Third Workshop on Automatic Simultaneous Translation","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116564338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
System Description on Third Automatic Simultaneous Translation Workshop 第三届自动同声传译研讨会系统介绍
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.autosimtrans-1.4
Zhang Yiqiao
This paper shows my submission to the Third Automatic Simultaneous Translation Workshop at NAACL2022.The submission includes Chinese audio to English text task, Chinese text to English text tast, and English text to Spanish text task.For the two text-to-text tasks, I use the STACL model of PaddleNLP. As for the audio-to-text task, I first use DeepSpeech2 to translate the audio into text, then apply the STACL model to handle the text-to-text task.The submission results show that the used method can get low delay with a few training samples.
这篇论文是我提交给NAACL2022第三届自动同声翻译研讨会的论文。提交内容包括中文音频转英文文本任务、中文文本转英文文本任务、英文文本转西班牙文文本任务。对于两个文本到文本任务,我使用PaddleNLP的STACL模型。对于音频到文本任务,我首先使用DeepSpeech2将音频转换为文本,然后应用STACL模型处理文本到文本任务。实验结果表明,该方法可以在训练样本较少的情况下获得较低的延迟。
{"title":"System Description on Third Automatic Simultaneous Translation Workshop","authors":"Zhang Yiqiao","doi":"10.18653/v1/2022.autosimtrans-1.4","DOIUrl":"https://doi.org/10.18653/v1/2022.autosimtrans-1.4","url":null,"abstract":"This paper shows my submission to the Third Automatic Simultaneous Translation Workshop at NAACL2022.The submission includes Chinese audio to English text task, Chinese text to English text tast, and English text to Spanish text task.For the two text-to-text tasks, I use the STACL model of PaddleNLP. As for the audio-to-text task, I first use DeepSpeech2 to translate the audio into text, then apply the STACL model to handle the text-to-text task.The submission results show that the used method can get low delay with a few training samples.","PeriodicalId":444422,"journal":{"name":"Proceedings of the Third Workshop on Automatic Simultaneous Translation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129163982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
System Description on Automatic Simultaneous Translation Workshop 自动同声传译车间系统描述
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.autosimtrans-1.3
Zecheng Li, Yue Sun, Haoze Li
This paper describes our system submitted on the third automatic simultaneous translation workshop at NAACL2022. We participate in the Chinese audio->English text direction of Chinese-to-English translation. Our speech-to-text system is a pipeline system, in which we resort to rhymological features for audio split, ASRT model for speech recoginition, STACL model for streaming text translation. To translate streaming text, we use wait-k policy trained to generate the target sentence concurrently with the source sentence, but always k words behind. We propose a competitive simultaneous translation system and rank 3rd in the audio input track. The code will release soon.
本文介绍了我们在NAACL2022第三届自动同声翻译研讨会上提交的系统。我们参与了中文音频->英文文本的汉英翻译方向。我们的语音转文本系统是一个管道系统,其中我们采用韵律特征进行音频分割,采用ASRT模型进行语音识别,采用STACL模型进行流文本翻译。为了翻译流文本,我们使用经过训练的wait-k策略来同时生成目标句子和源句子,但总是落后k个单词。我们提出了一个具有竞争力的同声翻译系统,在音频输入轨道中排名第三。代码将很快发布。
{"title":"System Description on Automatic Simultaneous Translation Workshop","authors":"Zecheng Li, Yue Sun, Haoze Li","doi":"10.18653/v1/2022.autosimtrans-1.3","DOIUrl":"https://doi.org/10.18653/v1/2022.autosimtrans-1.3","url":null,"abstract":"This paper describes our system submitted on the third automatic simultaneous translation workshop at NAACL2022. We participate in the Chinese audio->English text direction of Chinese-to-English translation. Our speech-to-text system is a pipeline system, in which we resort to rhymological features for audio split, ASRT model for speech recoginition, STACL model for streaming text translation. To translate streaming text, we use wait-k policy trained to generate the target sentence concurrently with the source sentence, but always k words behind. We propose a competitive simultaneous translation system and rank 3rd in the audio input track. The code will release soon.","PeriodicalId":444422,"journal":{"name":"Proceedings of the Third Workshop on Automatic Simultaneous Translation","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122410147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the Third Workshop on Automatic Simultaneous Translation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1