机器翻译语篇现象的定量分析

Carolina Scarton, Lucia Specia
{"title":"机器翻译语篇现象的定量分析","authors":"Carolina Scarton, Lucia Specia","doi":"10.4000/DISCOURS.9047","DOIUrl":null,"url":null,"abstract":"State-of-the-art Machine Translation (MT) systems translate documents by considering isolated sentences, disregarding information beyond sentence level. As a result, machine-translated documents often contain problems related to discourse coherence and cohesion. Recently, some initiatives in the evaluation and quality estimation of MT outputs have attempted to detect discourse problems in order to assess the quality of these machine translations. However, a quantitative analysis of discourse phenomena in MT outputs is still needed in order to better understand the phenomena and identify possible solutions or ways to improve evaluation. This paper aims to answer the following questions: What is the impact of discourse phenomena on MT quality? Can we capture and measure quantitatively any issues related to discourse in MT outputs? In order to answer these questions, we present a quantitative analysis of several discourse phenomena and correlate the resulting figures with scores from automatic translation quality evaluation metrics. We show that figures related to discourse phenomena present a higher correlation with quality scores than the baseline counts widely used for quality estimation of MT.","PeriodicalId":51977,"journal":{"name":"Discours-Revue de Linguistique Psycholinguistique et Informatique","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2015-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Quantitative Analysis of Discourse Phenomena in Machine Translation\",\"authors\":\"Carolina Scarton, Lucia Specia\",\"doi\":\"10.4000/DISCOURS.9047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"State-of-the-art Machine Translation (MT) systems translate documents by considering isolated sentences, disregarding information beyond sentence level. As a result, machine-translated documents often contain problems related to discourse coherence and cohesion. Recently, some initiatives in the evaluation and quality estimation of MT outputs have attempted to detect discourse problems in order to assess the quality of these machine translations. However, a quantitative analysis of discourse phenomena in MT outputs is still needed in order to better understand the phenomena and identify possible solutions or ways to improve evaluation. This paper aims to answer the following questions: What is the impact of discourse phenomena on MT quality? Can we capture and measure quantitatively any issues related to discourse in MT outputs? In order to answer these questions, we present a quantitative analysis of several discourse phenomena and correlate the resulting figures with scores from automatic translation quality evaluation metrics. We show that figures related to discourse phenomena present a higher correlation with quality scores than the baseline counts widely used for quality estimation of MT.\",\"PeriodicalId\":51977,\"journal\":{\"name\":\"Discours-Revue de Linguistique Psycholinguistique et Informatique\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2015-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Discours-Revue de Linguistique Psycholinguistique et Informatique\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4000/DISCOURS.9047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discours-Revue de Linguistique Psycholinguistique et Informatique","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4000/DISCOURS.9047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 8

摘要

最先进的机器翻译(MT)系统通过考虑孤立的句子来翻译文档,而忽略句子级别以外的信息。因此,机器翻译文档中经常存在语篇连贯和衔接的问题。最近,在机器翻译输出的评估和质量估计中,一些举措试图检测话语问题,以评估这些机器翻译的质量。然而,仍然需要对机器翻译输出中的语篇现象进行定量分析,以便更好地理解这些现象,并确定可能的解决方案或改进评估的方法。本文旨在回答以下问题:话语现象对机器翻译质量的影响是什么?我们能否在机器翻译输出中捕获和定量测量与话语相关的任何问题?为了回答这些问题,我们对几种话语现象进行了定量分析,并将结果与自动翻译质量评估指标的得分相关联。我们表明,与话语现象相关的数字与质量分数的相关性高于广泛用于机器翻译质量估计的基线计数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Quantitative Analysis of Discourse Phenomena in Machine Translation
State-of-the-art Machine Translation (MT) systems translate documents by considering isolated sentences, disregarding information beyond sentence level. As a result, machine-translated documents often contain problems related to discourse coherence and cohesion. Recently, some initiatives in the evaluation and quality estimation of MT outputs have attempted to detect discourse problems in order to assess the quality of these machine translations. However, a quantitative analysis of discourse phenomena in MT outputs is still needed in order to better understand the phenomena and identify possible solutions or ways to improve evaluation. This paper aims to answer the following questions: What is the impact of discourse phenomena on MT quality? Can we capture and measure quantitatively any issues related to discourse in MT outputs? In order to answer these questions, we present a quantitative analysis of several discourse phenomena and correlate the resulting figures with scores from automatic translation quality evaluation metrics. We show that figures related to discourse phenomena present a higher correlation with quality scores than the baseline counts widely used for quality estimation of MT.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
11
审稿时长
12 weeks
期刊最新文献
Référence multimodale dans les narrations d’enfants : les gestes servent-ils à clarifier les expressions référentielles ambiguës ? Subject Clitics and the Dynamics of Writing: A Perspective Based on Bursts « Be proud, and loud » : marqueurs de fierté dans les discours oraux de drag queens Ancrage spatial d’un nouveau référent dans le récit en français et en chinois : perspective informationnelle et organisation discursive Anaphoric Distance in Oral and Written Language: Experimental Evidence
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1