机器翻译语篇现象的定量分析

IF 0.5 Q3 LINGUISTICS Discours-Revue de Linguistique Psycholinguistique et Informatique Pub Date : 2015-09-09 DOI:10.4000/DISCOURS.9047

Carolina Scarton, Lucia Specia

{"title":"机器翻译语篇现象的定量分析","authors":"Carolina Scarton, Lucia Specia","doi":"10.4000/DISCOURS.9047","DOIUrl":null,"url":null,"abstract":"State-of-the-art Machine Translation (MT) systems translate documents by considering isolated sentences, disregarding information beyond sentence level. As a result, machine-translated documents often contain problems related to discourse coherence and cohesion. Recently, some initiatives in the evaluation and quality estimation of MT outputs have attempted to detect discourse problems in order to assess the quality of these machine translations. However, a quantitative analysis of discourse phenomena in MT outputs is still needed in order to better understand the phenomena and identify possible solutions or ways to improve evaluation. This paper aims to answer the following questions: What is the impact of discourse phenomena on MT quality? Can we capture and measure quantitatively any issues related to discourse in MT outputs? In order to answer these questions, we present a quantitative analysis of several discourse phenomena and correlate the resulting figures with scores from automatic translation quality evaluation metrics. We show that figures related to discourse phenomena present a higher correlation with quality scores than the baseline counts widely used for quality estimation of MT.","PeriodicalId":51977,"journal":{"name":"Discours-Revue de Linguistique Psycholinguistique et Informatique","volume":"36 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2015-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Quantitative Analysis of Discourse Phenomena in Machine Translation\",\"authors\":\"Carolina Scarton, Lucia Specia\",\"doi\":\"10.4000/DISCOURS.9047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"State-of-the-art Machine Translation (MT) systems translate documents by considering isolated sentences, disregarding information beyond sentence level. As a result, machine-translated documents often contain problems related to discourse coherence and cohesion. Recently, some initiatives in the evaluation and quality estimation of MT outputs have attempted to detect discourse problems in order to assess the quality of these machine translations. However, a quantitative analysis of discourse phenomena in MT outputs is still needed in order to better understand the phenomena and identify possible solutions or ways to improve evaluation. This paper aims to answer the following questions: What is the impact of discourse phenomena on MT quality? Can we capture and measure quantitatively any issues related to discourse in MT outputs? In order to answer these questions, we present a quantitative analysis of several discourse phenomena and correlate the resulting figures with scores from automatic translation quality evaluation metrics. We show that figures related to discourse phenomena present a higher correlation with quality scores than the baseline counts widely used for quality estimation of MT.\",\"PeriodicalId\":51977,\"journal\":{\"name\":\"Discours-Revue de Linguistique Psycholinguistique et Informatique\",\"volume\":\"36 1\",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2015-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Discours-Revue de Linguistique Psycholinguistique et Informatique\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4000/DISCOURS.9047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discours-Revue de Linguistique Psycholinguistique et Informatique","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4000/DISCOURS.9047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 8

摘要

最先进的机器翻译(MT)系统通过考虑孤立的句子来翻译文档，而忽略句子级别以外的信息。因此，机器翻译文档中经常存在语篇连贯和衔接的问题。最近，在机器翻译输出的评估和质量估计中，一些举措试图检测话语问题，以评估这些机器翻译的质量。然而，仍然需要对机器翻译输出中的语篇现象进行定量分析，以便更好地理解这些现象，并确定可能的解决方案或改进评估的方法。本文旨在回答以下问题:话语现象对机器翻译质量的影响是什么?我们能否在机器翻译输出中捕获和定量测量与话语相关的任何问题?为了回答这些问题，我们对几种话语现象进行了定量分析，并将结果与自动翻译质量评估指标的得分相关联。我们表明，与话语现象相关的数字与质量分数的相关性高于广泛用于机器翻译质量估计的基线计数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Quantitative Analysis of Discourse Phenomena in Machine Translation

State-of-the-art Machine Translation (MT) systems translate documents by considering isolated sentences, disregarding information beyond sentence level. As a result, machine-translated documents often contain problems related to discourse coherence and cohesion. Recently, some initiatives in the evaluation and quality estimation of MT outputs have attempted to detect discourse problems in order to assess the quality of these machine translations. However, a quantitative analysis of discourse phenomena in MT outputs is still needed in order to better understand the phenomena and identify possible solutions or ways to improve evaluation. This paper aims to answer the following questions: What is the impact of discourse phenomena on MT quality? Can we capture and measure quantitatively any issues related to discourse in MT outputs? In order to answer these questions, we present a quantitative analysis of several discourse phenomena and correlate the resulting figures with scores from automatic translation quality evaluation metrics. We show that figures related to discourse phenomena present a higher correlation with quality scores than the baseline counts widely used for quality estimation of MT.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Discours-Revue de Linguistique Psycholinguistique et Informatique LINGUISTICS-

自引率

0.00%

发文量

审稿时长

12 weeks