Semantic Similarity Based Evaluation for Abstractive News Summarization

Figen Beken Fikri, Kemal Oflazer, B. Yanikoglu
{"title":"Semantic Similarity Based Evaluation for Abstractive News Summarization","authors":"Figen Beken Fikri, Kemal Oflazer, B. Yanikoglu","doi":"10.18653/v1/2021.gem-1.3","DOIUrl":null,"url":null,"abstract":"ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.","PeriodicalId":431658,"journal":{"name":"Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.gem-1.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于语义相似度的新闻文摘评价
ROUGE是一种广泛应用于文本摘要的评价度量。然而,它不适合评估抽象摘要系统,因为它依赖于金标准和生成的摘要之间的词汇重叠。对于具有非常大的词汇表和高类型/标记比率的粘合语言,这种限制变得更加明显。在本文中,我们提出了土耳其语的语义相似度模型,并将其作为抽象摘要任务的评估指标。为了实现这一目标,我们将英文STSb数据集翻译成土耳其语,并提出了土耳其语的第一个语义文本相似度数据集。我们发现,在Pearson和Spearman相关性中,与ROUGE相比,我们的最佳相似性模型与人类平均判断有更好的一致性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
NUIG-DSI’s submission to The GEM Benchmark 2021 Human Perception in Natural Language Generation SimpleNER Sentence Simplification System for GEM 2021 System Description for the CommonGen task with the POINTER model Semantic Similarity Based Evaluation for Abstractive News Summarization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1