Semantic Similarity Based Evaluation for Abstractive News Summarization

Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021) Pub Date : 1900-01-01 DOI:10.18653/v1/2021.gem-1.3

Figen Beken Fikri, Kemal Oflazer, B. Yanikoglu

引用次数: 10

Abstract

ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语义相似度的新闻文摘评价

ROUGE是一种广泛应用于文本摘要的评价度量。然而，它不适合评估抽象摘要系统，因为它依赖于金标准和生成的摘要之间的词汇重叠。对于具有非常大的词汇表和高类型/标记比率的粘合语言，这种限制变得更加明显。在本文中，我们提出了土耳其语的语义相似度模型，并将其作为抽象摘要任务的评估指标。为了实现这一目标，我们将英文STSb数据集翻译成土耳其语，并提出了土耳其语的第一个语义文本相似度数据集。我们发现，在Pearson和Spearman相关性中，与ROUGE相比，我们的最佳相似性模型与人类平均判断有更好的一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊