How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure Evaluation

Proceedings of COLING. International Conference on Computational Linguistics Pub Date : 2022-09-14 DOI:10.48550/arXiv.2209.06517

Julius Steen, K. Markert

{"title":"How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure Evaluation","authors":"Julius Steen, K. Markert","doi":"10.48550/arXiv.2209.06517","DOIUrl":null,"url":null,"abstract":"Automatically evaluating the coherence of summaries is of great significance both to enable cost-efficient summarizer evaluation and as a tool for improving coherence by selecting high-scoring candidate summaries. While many different approaches have been suggested to model summary coherence, they are often evaluated using disparate datasets and metrics. This makes it difficult to understand their relative performance and identify ways forward towards better summary coherence modelling. In this work, we conduct a large-scale investigation of various methods for summary coherence modelling on an even playing field. Additionally, we introduce two novel analysis measures, _intra-system correlation_ and _bias matrices_, that help identify biases in coherence measures and provide robustness against system-level confounders. While none of the currently available automatic coherence measures are able to assign reliable coherence scores to system summaries across all evaluation metrics, large-scale language models fine-tuned on self-supervised tasks show promising results, as long as fine-tuning takes into account that they need to generalize across different summary lengths.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"1 1","pages":"6035-6049"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of COLING. International Conference on Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.06517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Automatically evaluating the coherence of summaries is of great significance both to enable cost-efficient summarizer evaluation and as a tool for improving coherence by selecting high-scoring candidate summaries. While many different approaches have been suggested to model summary coherence, they are often evaluated using disparate datasets and metrics. This makes it difficult to understand their relative performance and identify ways forward towards better summary coherence modelling. In this work, we conduct a large-scale investigation of various methods for summary coherence modelling on an even playing field. Additionally, we introduce two novel analysis measures, _intra-system correlation_ and _bias matrices_, that help identify biases in coherence measures and provide robustness against system-level confounders. While none of the currently available automatic coherence measures are able to assign reliable coherence scores to system summaries across all evaluation metrics, large-scale language models fine-tuned on self-supervised tasks show promising results, as long as fine-tuning takes into account that they need to generalize across different summary lengths.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

如何找到较强的总结一致性度量?摘要相干测度评价的工具箱与比较研究

摘要的连贯性自动评估对于提高摘要评估的成本效益和通过选择高分候选摘要来提高摘要的连贯性具有重要意义。虽然已经提出了许多不同的方法来建模摘要一致性，但它们通常使用不同的数据集和指标进行评估。这使得很难理解它们的相对性能，并确定朝着更好的总结相干建模前进的方法。在这项工作中，我们对在一个公平的竞争环境中进行总结相干建模的各种方法进行了大规模的调查。此外，我们还引入了两种新的分析度量，即系统内相关性和偏差矩阵，它们有助于识别相干度量中的偏差，并提供对系统级混杂因素的鲁棒性。虽然目前没有一个可用的自动一致性度量能够在所有评估度量中为系统摘要分配可靠的一致性分数，但是在自我监督任务上进行微调的大规模语言模型显示出有希望的结果，只要微调考虑到它们需要在不同的摘要长度上进行泛化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of COLING. International Conference on Computational Linguistics

自引率

0.00%

发文量