Automatic Meta-evaluation of Low-Resource Machine Translation Evaluation Metrics

2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI:10.1109/IALP48816.2019.9037658

Junting Yu, Wuying Liu, Hongye He, Lin Wang

{"title":"Automatic Meta-evaluation of Low-Resource Machine Translation Evaluation Metrics","authors":"Junting Yu, Wuying Liu, Hongye He, Lin Wang","doi":"10.1109/IALP48816.2019.9037658","DOIUrl":null,"url":null,"abstract":"Meta-evaluation is a method to assess machine translation (MT) evaluation metrics according to certain theories and standards. This paper addresses an automatic meta-evaluation method of machine translation evaluation based on ORANGE- Limited ORANGE, which is applied in low-resource machine translation evaluation. It is adopted when the resources are limited. And take the three n-gram-based metrics - BLEUS, ROUGE-L and ROUGE-S for experiment, which is called horizontal comparison. Also, vertical comparison is used to compare the different forms of the same evaluation metric. Compared with the traditional human method, this method can evaluate metrics automatically without extra human involvement except for a set of references. It only needs the average rank of the references, and will not be influenced by the subjective factors. And it costs less and expends less time than the traditional one. It is good for the machine translation system parameter optimization and shortens the system development period. In this paper, we use this automatic meta-evaluation method to evaluate BLEUS, ROUGE-L, ROUGE-S and their different forms based on Cilin on the Russian-Chinese dataset. The result shows the same as that of the traditional human meta-evaluation. In this way, the consistency and effectiveness of Limited ORANGE are verified.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP48816.2019.9037658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Meta-evaluation is a method to assess machine translation (MT) evaluation metrics according to certain theories and standards. This paper addresses an automatic meta-evaluation method of machine translation evaluation based on ORANGE- Limited ORANGE, which is applied in low-resource machine translation evaluation. It is adopted when the resources are limited. And take the three n-gram-based metrics - BLEUS, ROUGE-L and ROUGE-S for experiment, which is called horizontal comparison. Also, vertical comparison is used to compare the different forms of the same evaluation metric. Compared with the traditional human method, this method can evaluate metrics automatically without extra human involvement except for a set of references. It only needs the average rank of the references, and will not be influenced by the subjective factors. And it costs less and expends less time than the traditional one. It is good for the machine translation system parameter optimization and shortens the system development period. In this paper, we use this automatic meta-evaluation method to evaluate BLEUS, ROUGE-L, ROUGE-S and their different forms based on Cilin on the Russian-Chinese dataset. The result shows the same as that of the traditional human meta-evaluation. In this way, the consistency and effectiveness of Limited ORANGE are verified.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

低资源机器翻译评价指标的自动元评价

元评价是根据一定的理论和标准对机器翻译评价指标进行评价的一种方法。本文提出了一种基于ORANGE- Limited ORANGE的机器翻译评价自动元评价方法，并将其应用于低资源机器翻译评价中。在资源有限的情况下采用。并以三个基于n-gram的指标——BLEUS、ROUGE-L和ROUGE-S进行实验，称为水平比较。此外，垂直比较用于比较相同评估指标的不同形式。与传统的人工方法相比，该方法可以自动评估指标，除了一组参考外，无需额外的人工参与。它只需要参考文献的平均排名，不会受到主观因素的影响。它比传统的花费更少，花费更少的时间。这有利于机器翻译系统的参数优化，缩短了系统的开发周期。本文采用这种自动元评价方法，在俄中数据集上对基于Cilin的BLEUS、ROUGE-L、ROUGE-S及其不同形式进行了评价。结果与传统的人的元评价结果一致。从而验证了Limited ORANGE的一致性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 International Conference on Asian Language Processing (IALP)

自引率

0.00%

发文量