首页 > 最新文献

European Association for Machine Translation Conferences/Workshops最新文献

英文 中文
Incorporating Human Translator Style into English-Turkish Literary Machine Translation 将人工译者风格融入英土文学机器翻译
Pub Date : 2023-07-21 DOI: 10.48550/arXiv.2307.11457
Zeynep Yi̇rmi̇beşoğlu, Olgun Dursun, Harun Dalli, Mehmet Şahin, Ena Hodzik, Sabri Gürses, Tunga Güngör
Although machine translation systems are mostly designed to serve in the general domain, there is a growing tendency to adapt these systems to other domains like literary translation. In this paper, we focus on English-Turkish literary translation and develop machine translation models that take into account the stylistic features of translators. We fine-tune a pre-trained machine translation model by the manually-aligned works of a particular translator. We make a detailed analysis of the effects of manual and automatic alignments, data augmentation methods, and corpus size on the translations. We propose an approach based on stylistic features to evaluate the style of a translator in the output translations. We show that the human translator style can be highly recreated in the target machine translations by adapting the models to the style of the translator.
虽然机器翻译系统主要是为一般领域服务而设计的,但越来越多的人倾向于将这些系统应用于文学翻译等其他领域。本文以英土文学翻译为研究对象,建立了考虑译者文体特征的机器翻译模型。我们通过特定翻译人员的手动对齐作品来微调预训练的机器翻译模型。我们详细分析了手动对齐和自动对齐、数据增强方法和语料库大小对翻译的影响。本文提出了一种基于文体特征的翻译风格评价方法。我们表明,通过使模型适应译者的风格,可以在目标机器翻译中高度再现人类译者的风格。
{"title":"Incorporating Human Translator Style into English-Turkish Literary Machine Translation","authors":"Zeynep Yi̇rmi̇beşoğlu, Olgun Dursun, Harun Dalli, Mehmet Şahin, Ena Hodzik, Sabri Gürses, Tunga Güngör","doi":"10.48550/arXiv.2307.11457","DOIUrl":"https://doi.org/10.48550/arXiv.2307.11457","url":null,"abstract":"Although machine translation systems are mostly designed to serve in the general domain, there is a growing tendency to adapt these systems to other domains like literary translation. In this paper, we focus on English-Turkish literary translation and develop machine translation models that take into account the stylistic features of translators. We fine-tune a pre-trained machine translation model by the manually-aligned works of a particular translator. We make a detailed analysis of the effects of manual and automatic alignments, data augmentation methods, and corpus size on the translations. We propose an approach based on stylistic features to evaluate the style of a translator in the output translations. We show that the human translator style can be highly recreated in the target machine translations by adapting the models to the style of the translator.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115778725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios 多语言情景下人类和神经机器翻译的自动识别
Pub Date : 2023-05-31 DOI: 10.48550/arXiv.2305.19757
Mălina Chichirău, Rik van Noord, Antonio Toral
We tackle the task of automatically discriminating between human and machine translations. As opposed to most previous work, we perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models. We show that a classifier trained on parallel data with a single source language (in our case German–English) can still perform well on English translations that come from different source languages, even when the machine translations were produced by other systems than the one it was trained on. Additionally, we demonstrate that incorporating the source text in the input of a multilingual classifier improves (i) its accuracy and (ii) its robustness on cross-system evaluation, compared to a monolingual classifier. Furthermore, we find that using training data from multiple source languages (German, Russian and Chinese) tends to improve the accuracy of both monolingual and multilingual classifiers. Finally, we show that bilingual classifiers and classifiers trained on multiple source languages benefit from being trained on longer text sequences, rather than on sentences.
我们解决了自动区分人工翻译和机器翻译的任务。与大多数先前的工作相反,我们在多语言环境中进行实验,考虑多种语言和多语言预训练的语言模型。我们表明,在使用单一源语言(在我们的例子中是德语-英语)的并行数据上训练的分类器仍然可以在来自不同源语言的英语翻译上表现良好,即使机器翻译是由其他系统产生的,而不是它所训练的系统。此外,我们证明,与单语言分类器相比,将源文本纳入多语言分类器的输入可以提高(i)其准确性和(ii)跨系统评估的鲁棒性。此外,我们发现使用多源语言(德语、俄语和汉语)的训练数据倾向于提高单语和多语分类器的准确性。最后,我们表明双语分类器和在多源语言上训练的分类器受益于在较长的文本序列上训练,而不是在句子上训练。
{"title":"Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios","authors":"Mălina Chichirău, Rik van Noord, Antonio Toral","doi":"10.48550/arXiv.2305.19757","DOIUrl":"https://doi.org/10.48550/arXiv.2305.19757","url":null,"abstract":"We tackle the task of automatically discriminating between human and machine translations. As opposed to most previous work, we perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models. We show that a classifier trained on parallel data with a single source language (in our case German–English) can still perform well on English translations that come from different source languages, even when the machine translations were produced by other systems than the one it was trained on. Additionally, we demonstrate that incorporating the source text in the input of a multilingual classifier improves (i) its accuracy and (ii) its robustness on cross-system evaluation, compared to a monolingual classifier. Furthermore, we find that using training data from multiple source languages (German, Russian and Chinese) tends to improve the accuracy of both monolingual and multilingual classifiers. Finally, we show that bilingual classifiers and classifiers trained on multiple source languages benefit from being trained on longer text sequences, rather than on sentences.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129901357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages 印度语多语言机器翻译中的词汇共享研究
Pub Date : 2023-05-04 DOI: 10.48550/arXiv.2305.03207
Sonal Sannigrahi, Rachel Bawden
Multilingual language models have shown impressive cross-lingual transfer ability across a diverse set of languages and tasks. To improve the cross-lingual ability of these models, some strategies include transliteration and finer-grained segmentation into characters as opposed to subwords. In this work, we investigate lexical sharing in multilingual machine translation (MT) from Hindi, Gujarati, Nepali into English. We explore the trade-offs that exist in translation performance between data sampling and vocabulary size, and we explore whether transliteration is useful in encouraging cross-script generalisation. We also verify how the different settings generalise to unseen languages (Marathi and Bengali). We find that transliteration does not give pronounced improvements and our analysis suggests that our multilingual MT models trained on original scripts are already robust to cross-script differences even for relatively low-resource languages.
多语言模型显示了跨多种语言和任务的跨语言迁移能力。为了提高这些模型的跨语言能力,一些策略包括音译和更细粒度的字符分割,而不是子词。在这项工作中,我们研究了印地语、古吉拉特语、尼泊尔语到英语的多语言机器翻译(MT)中的词汇共享。我们探讨了数据采样和词汇量之间存在的翻译性能权衡,并探讨了音译是否有助于鼓励跨脚本泛化。我们还验证了不同的设置如何推广到看不见的语言(马拉地语和孟加拉语)。我们发现音译并没有带来明显的改进,我们的分析表明,我们在原始脚本上训练的多语言机器翻译模型即使对于资源相对较少的语言,也已经对跨脚本差异具有鲁棒性。
{"title":"Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages","authors":"Sonal Sannigrahi, Rachel Bawden","doi":"10.48550/arXiv.2305.03207","DOIUrl":"https://doi.org/10.48550/arXiv.2305.03207","url":null,"abstract":"Multilingual language models have shown impressive cross-lingual transfer ability across a diverse set of languages and tasks. To improve the cross-lingual ability of these models, some strategies include transliteration and finer-grained segmentation into characters as opposed to subwords. In this work, we investigate lexical sharing in multilingual machine translation (MT) from Hindi, Gujarati, Nepali into English. We explore the trade-offs that exist in translation performance between data sampling and vocabulary size, and we explore whether transliteration is useful in encouraging cross-script generalisation. We also verify how the different settings generalise to unseen languages (Marathi and Bengali). We find that transliteration does not give pronounced improvements and our analysis suggests that our multilingual MT models trained on original scripts are already robust to cross-script differences even for relatively low-resource languages.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128673885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
State Spaces Aren’t Enough: Machine Translation Needs Attention 状态空间不够:机器翻译需要注意
Pub Date : 2023-04-25 DOI: 10.48550/arXiv.2304.12776
Ali Vardasbi, Telmo Pires, Robin M. Schmidt, Stephan Peitz
Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modelling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT’14 and WMT’16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4’s inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.
序列的结构化状态空间(S4)是最近提出的序列模型,已成功应用于各种任务,如视觉,语言建模和音频。由于其数学公式,它将其输入压缩为单个隐藏状态,并且能够捕获长期依赖关系,同时避免需要注意机制。在这项工作中,我们将S4应用于机器翻译(MT),并评估了WMT ' 14和WMT ' 16上的几个编码器-解码器变体。与语言建模的成功相比,我们发现S4落后于Transformer大约4个BLEU点,并且与直觉相反,它在处理长句子时遇到了困难。最后,我们证明了这一差距是由于S4无法在单一隐藏状态下总结完整的源句子造成的,并且表明我们可以通过引入注意机制来缩小这一差距。
{"title":"State Spaces Aren’t Enough: Machine Translation Needs Attention","authors":"Ali Vardasbi, Telmo Pires, Robin M. Schmidt, Stephan Peitz","doi":"10.48550/arXiv.2304.12776","DOIUrl":"https://doi.org/10.48550/arXiv.2304.12776","url":null,"abstract":"Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modelling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT’14 and WMT’16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4’s inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132168333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models 利用知识蒸馏压缩多语言神经机器翻译模型的实证研究
Pub Date : 2023-04-19 DOI: 10.48550/arXiv.2304.09388
Varun Gumma, Raj Dabre, Pratyush Kumar
Knowledge distillation (KD) is a well-known method for compressing neural models. However, works focusing on distilling knowledge from large multilingual neural machine translation (MNMT) models into smaller ones are practically nonexistent, despite the popularity and superiority of MNMT. This paper bridges this gap by presenting an empirical investigation of knowledge distillation for compressing MNMT models. We take Indic to English translation as a case study and demonstrate that commonly used language-agnostic and language-aware KD approaches yield models that are 4-5x smaller but also suffer from performance drops of up to 3.5 BLEU. To mitigate this, we then experiment with design considerations such as shallower versus deeper models, heavy parameter sharing, multistage training, and adapters. We observe that deeper compact models tend to be as good as shallower non-compact ones and that fine-tuning a distilled model on a high-quality subset slightly boosts translation quality. Overall, we conclude that compressing MNMT models via KD is challenging, indicating immense scope for further research.
知识蒸馏(Knowledge distillation, KD)是一种众所周知的神经模型压缩方法。然而,专注于从大型多语言神经机器翻译(MNMT)模型中提取知识到较小模型的工作实际上是不存在的,尽管MNMT的普及和优势。本文通过对压缩MNMT模型的知识蒸馏进行实证研究,弥补了这一差距。我们以印度语到英语的翻译为例进行了研究,并证明了常用的语言无关和语言感知的KD方法产生的模型规模缩小了4-5倍,但也遭受了高达3.5 BLEU的性能下降。为了减轻这种情况,我们然后尝试设计考虑因素,例如较浅的模型与较深的模型、重参数共享、多阶段训练和适配器。我们观察到,更深的紧凑模型往往与较浅的非紧凑模型一样好,并且在高质量子集上微调蒸馏模型略微提高翻译质量。总的来说,我们得出结论,通过KD压缩MNMT模型是具有挑战性的,表明了进一步研究的巨大空间。
{"title":"An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models","authors":"Varun Gumma, Raj Dabre, Pratyush Kumar","doi":"10.48550/arXiv.2304.09388","DOIUrl":"https://doi.org/10.48550/arXiv.2304.09388","url":null,"abstract":"Knowledge distillation (KD) is a well-known method for compressing neural models. However, works focusing on distilling knowledge from large multilingual neural machine translation (MNMT) models into smaller ones are practically nonexistent, despite the popularity and superiority of MNMT. This paper bridges this gap by presenting an empirical investigation of knowledge distillation for compressing MNMT models. We take Indic to English translation as a case study and demonstrate that commonly used language-agnostic and language-aware KD approaches yield models that are 4-5x smaller but also suffer from performance drops of up to 3.5 BLEU. To mitigate this, we then experiment with design considerations such as shallower versus deeper models, heavy parameter sharing, multistage training, and adapters. We observe that deeper compact models tend to be as good as shallower non-compact ones and that fine-tuning a distilled model on a high-quality subset slightly boosts translation quality. Overall, we conclude that compressing MNMT models via KD is challenging, indicating immense scope for further research.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132500205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tailoring Domain Adaptation for Machine Translation Quality Estimation 基于裁剪域自适应的机器翻译质量估计
Pub Date : 2023-04-18 DOI: 10.48550/arXiv.2304.08891
Javad Pourmostafa Roshan Sharami, D. Shterionov, F. Blain, Eva Vanmassenhove, M. D. Sisto, Chris Emmery, P. Spronck
While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high-cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizabile, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues — data scarcity and domain mismatch — this paper combines domain adaptation and data augmentation within a robust QE system. Our method is to first train a generic QE model and then fine-tune it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines.
虽然质量估计(QE)在翻译过程中发挥着重要作用,但其有效性取决于训练数据的可用性和质量。特别是对于QE,由于与标记此类数据相关的高成本和工作量,通常缺乏高质量的标记数据。除了数据稀缺的挑战之外,QE模型还应该具有通用性,也就是说,它们应该能够处理来自不同领域的数据,无论是通用的还是特定的。为了缓解这两个主要问题-数据稀缺性和领域不匹配-本文结合了领域适应和数据扩充在一个鲁棒QE系统。我们的方法是首先训练一个通用的QE模型,然后在保留通用知识的同时对其在特定领域进行微调。我们的研究结果显示,与最先进的基线相比,所有被调查的语言对都有显著改善,跨语言推理更好,并且在零射击学习场景中表现优异。
{"title":"Tailoring Domain Adaptation for Machine Translation Quality Estimation","authors":"Javad Pourmostafa Roshan Sharami, D. Shterionov, F. Blain, Eva Vanmassenhove, M. D. Sisto, Chris Emmery, P. Spronck","doi":"10.48550/arXiv.2304.08891","DOIUrl":"https://doi.org/10.48550/arXiv.2304.08891","url":null,"abstract":"While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high-cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizabile, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues — data scarcity and domain mismatch — this paper combines domain adaptation and data augmentation within a robust QE system. Our method is to first train a generic QE model and then fine-tune it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125019293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM 大型多语言模型的翻译性能研究:以BLOOM为例
Pub Date : 2023-03-03 DOI: 10.48550/arXiv.2303.01911
Rachel Bawden, Franccois Yvon
The NLP community recently saw the release of a new large open-access multilingual language model, BLOOM (BigScience et al., 2022) covering 46 languages. We focus on BLOOM’s multilingual ability by evaluating its machine translation performance across several datasets (WMT, Flores-101 and DiaBLa) and language pairs (high- and low-resourced). Our results show that 0-shot performance suffers from overgeneration and generating in the wrong language, but this is greatly improved in the few-shot setting, with very good results for a number of language pairs. We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.
NLP社区最近发布了一个新的大型开放获取多语言模型BLOOM (BigScience et al., 2022),涵盖46种语言。我们通过评估BLOOM在多个数据集(WMT, Flores-101和DiaBLa)和语言对(高资源和低资源)上的机器翻译性能来关注BLOOM的多语言能力。我们的结果表明,0次射击的性能会受到过度生成和错误语言生成的影响,但在少数射击设置中,这种情况得到了极大的改善,对于许多语言对都有很好的结果。我们研究了几个方面,包括提示设计,模型大小,跨语言迁移和语篇语境的使用。
{"title":"Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM","authors":"Rachel Bawden, Franccois Yvon","doi":"10.48550/arXiv.2303.01911","DOIUrl":"https://doi.org/10.48550/arXiv.2303.01911","url":null,"abstract":"The NLP community recently saw the release of a new large open-access multilingual language model, BLOOM (BigScience et al., 2022) covering 46 languages. We focus on BLOOM’s multilingual ability by evaluating its machine translation performance across several datasets (WMT, Flores-101 and DiaBLa) and language pairs (high- and low-resourced). Our results show that 0-shot performance suffers from overgeneration and generating in the wrong language, but this is greatly improved in the few-shot setting, with very good results for a number of language pairs. We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114645356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Large Language Models Are State-of-the-Art Evaluators of Translation Quality 大型语言模型是最先进的翻译质量评估工具
Pub Date : 2023-02-28 DOI: 10.48550/arXiv.2302.14520
Tom Kocmi, C. Federmann
We describe GEMBA, a GPT-based metric for assessment of translation quality, which works both with a reference translation and without. In our evaluation, we focus on zero-shot prompting, comparing four prompt variants in two modes, based on the availability of the reference. We investigate seven versions of GPT models, including ChatGPT. We show that our method for translation quality assessment only works with GPT 3.5 and larger models. Comparing to results from WMT22’s Metrics shared task, our method achieves state-of-the-art accuracy in both modes when compared to MQM-based human labels. Our results are valid on the system level for all three WMT22 Metrics shared task language pairs, namely English into German, English into Russian, and Chinese into English. This provides a first glimpse into the usefulness of pre-trained, generative large language models for quality assessment of translations. We publicly release all our code and prompt templates used for the experiments described in this work, as well as all corresponding scoring results, to allow for external validation and reproducibility.
我们将介绍GEMBA,这是一种基于gpt的翻译质量评估指标,它既适用于参考翻译,也适用于没有参考翻译的情况。在我们的评估中,我们将重点放在零射击提示上,基于参考的可用性,比较两种模式下的四种提示变体。我们研究了7个版本的GPT模型,包括ChatGPT。我们证明了我们的翻译质量评估方法只适用于GPT 3.5和更大的模型。与来自WMT22的Metrics共享任务的结果相比,与基于mqm的人工标签相比,我们的方法在两种模式下都达到了最先进的精度。我们的结果在系统级别上对所有三个WMT22 Metrics共享任务语言对都是有效的,即英语转换成德语、英语转换成俄语和中文转换成英语。这让我们第一次看到了预训练的、生成的大型语言模型对翻译质量评估的有用性。我们公开发布了用于本工作中描述的实验的所有代码和提示模板,以及所有相应的评分结果,以允许外部验证和可重复性。
{"title":"Large Language Models Are State-of-the-Art Evaluators of Translation Quality","authors":"Tom Kocmi, C. Federmann","doi":"10.48550/arXiv.2302.14520","DOIUrl":"https://doi.org/10.48550/arXiv.2302.14520","url":null,"abstract":"We describe GEMBA, a GPT-based metric for assessment of translation quality, which works both with a reference translation and without. In our evaluation, we focus on zero-shot prompting, comparing four prompt variants in two modes, based on the availability of the reference. We investigate seven versions of GPT models, including ChatGPT. We show that our method for translation quality assessment only works with GPT 3.5 and larger models. Comparing to results from WMT22’s Metrics shared task, our method achieves state-of-the-art accuracy in both modes when compared to MQM-based human labels. Our results are valid on the system level for all three WMT22 Metrics shared task language pairs, namely English into German, English into Russian, and Chinese into English. This provides a first glimpse into the usefulness of pre-trained, generative large language models for quality assessment of translations. We publicly release all our code and prompt templates used for the experiments described in this work, as well as all corresponding scoring results, to allow for external validation and reproducibility.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"30 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125696407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction 语言模型和层次模型在神经序列到序列预测中的作用
Pub Date : 2020-05-16 DOI: 10.17863/CAM.49422
Felix Stahlberg
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks (Hochreiter and Schmidhuber, 1997) are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as bioinformatics (Min et al., 2016). Recent advances in contextual word embeddings like BERT (Devlin et al., 2019) boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand. At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements that tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This thesis can be understood as an antithesis to the prevailing paradigm. We show how traditional symbolic statistical machine translation (Koehn, 2009) models can still improve neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al., 2014; Bahdanau et al., 2015, NMT) while reducing the risk of common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural models to correct grammatical errors in text.
随着深度学习的出现,机器学习的许多领域的研究正在向同一套方法和模型融合。例如,长短期记忆网络(Hochreiter和Schmidhuber, 1997)不仅在语音识别、机器翻译、手写识别、句法解析等自然语言处理(NLP)中的各种任务中很受欢迎,而且还适用于生物信息学等看似不相关的领域(Min et al., 2016)。上下文词嵌入的最新进展,如BERT (Devlin等人,2019),在使用相同模型的11个NLP任务上取得了最先进的结果。在深度学习之前,语音识别器和语法解析器几乎没有共同点,因为系统更适合手头的任务。这种开发的核心是倾向于将每个任务视为另一个数据映射问题,而忽略了任务在实践中经常具有的特定特征和(软)需求。这通常伴随着深度学习方法与特定领域先前研究的急剧断裂。这篇论文可以被理解为对主流范式的反对。我们展示了传统的符号统计机器翻译(Koehn, 2009)模型如何仍然可以改进神经机器翻译(Kalchbrenner and Blunsom, 2013;Sutskever et al., 2014;Bahdanau等人,2015,NMT),同时降低NMT常见病理(如幻觉和新词)的风险。其他外部符号模型,如拼写检查器和词法数据库,可以帮助神经模型纠正文本中的语法错误。
{"title":"The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction","authors":"Felix Stahlberg","doi":"10.17863/CAM.49422","DOIUrl":"https://doi.org/10.17863/CAM.49422","url":null,"abstract":"With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks (Hochreiter and Schmidhuber, 1997) are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as bioinformatics (Min et al., 2016). Recent advances in contextual word embeddings like BERT (Devlin et al., 2019) boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand. At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements that tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This thesis can be understood as an antithesis to the prevailing paradigm. We show how traditional symbolic statistical machine translation (Koehn, 2009) models can still improve neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al., 2014; Bahdanau et al., 2015, NMT) while reducing the risk of common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural models to correct grammatical errors in text.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131935796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An English-Swahili parallel corpus and its use for neural machine translation in the news domain 英语-斯瓦希里语平行语料库及其在新闻领域神经机器翻译中的应用
Pub Date : 2020-03-31 DOI: 10.5281/ZENODO.3923590
F. Sánchez-Martínez, V. M. Sánchez-Cartagena, J. A. Pérez-Ortiz, M. Forcada, M. Esplà-Gomis, Andrew Secker, Susie Coleman, J. Wall
This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.
本文描述了我们在新闻领域创建一个神经机器翻译系统来翻译英语和斯瓦希里语(两个方向)的方法,以及我们从互联网上抓取必要的平行语料库的过程。我们报告了参与H2020欧盟资助项目GoURMET的新闻媒体组织进行的试点人类评估的结果。
{"title":"An English-Swahili parallel corpus and its use for neural machine translation in the news domain","authors":"F. Sánchez-Martínez, V. M. Sánchez-Cartagena, J. A. Pérez-Ortiz, M. Forcada, M. Esplà-Gomis, Andrew Secker, Susie Coleman, J. Wall","doi":"10.5281/ZENODO.3923590","DOIUrl":"https://doi.org/10.5281/ZENODO.3923590","url":null,"abstract":"This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121415933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
European Association for Machine Translation Conferences/Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1