Unified Multi-Scenario Summarization Evaluation and Explanation

IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-05 DOI:10.1109/TKDE.2024.3509715
Shuo Shang;Zhitao Yao;Hao Fu;Chongyang Tao;Xiuying Chen;Feng Wang;Yongbo Wang;Zhaochun Ren;Shen Gao
{"title":"Unified Multi-Scenario Summarization Evaluation and Explanation","authors":"Shuo Shang;Zhitao Yao;Hao Fu;Chongyang Tao;Xiuying Chen;Feng Wang;Yongbo Wang;Zhaochun Ren;Shen Gao","doi":"10.1109/TKDE.2024.3509715","DOIUrl":null,"url":null,"abstract":"Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1) \n<italic>reference-based:</i>\n evaluating with human-labeled reference summary; (2) \n<italic>reference-free:</i>\n evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models to align with human criteria and finally give a numeric score. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Moreover, only providing the numeric quality evaluation score for users cannot help users to improve the summarization model, since they do not know why the score is low. Inspired by this, we propose \n<bold>U</b>\nnified \n<bold>M</b>\nulti-scenario \n<bold>S</b>\nummarization \n<bold>E</b>\nvaluator (UMSE) and \n<bold>M</b>\nulti-\n<bold>A</b>\ngent \n<bold>S</b>\nummarization \n<bold>E</b>\nvaluation \n<bold>E</b>\nxplainer (MASEE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. We propose a multi-agent summary evaluation explanation method MASEE, which employs several LLM-based agents to generate detailed natural language explanations in four different aspects. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods that are specifically designed for each scenario. And intensive quantitative and qualitative experiments also demonstrate the effectiveness of our proposed explanation method, which can generate consistent and accurate explanations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"991-1003"},"PeriodicalIF":8.9000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10778604/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1) reference-based: evaluating with human-labeled reference summary; (2) reference-free: evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models to align with human criteria and finally give a numeric score. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Moreover, only providing the numeric quality evaluation score for users cannot help users to improve the summarization model, since they do not know why the score is low. Inspired by this, we propose U nified M ulti-scenario S ummarization E valuator (UMSE) and M ulti- A gent S ummarization E valuation E xplainer (MASEE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. We propose a multi-agent summary evaluation explanation method MASEE, which employs several LLM-based agents to generate detailed natural language explanations in four different aspects. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods that are specifically designed for each scenario. And intensive quantitative and qualitative experiments also demonstrate the effectiveness of our proposed explanation method, which can generate consistent and accurate explanations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
统一多场景总结评价与解释
摘要质量评价是文本摘要中的一项重要工作。现有的评价方法主要分为两大类:(1)基于参考文献的评价方法:利用人工标注的参考文献摘要进行评价;(2)无参考:评价文件摘要的一致性。最近的研究主要集中在这些场景之一,并探索训练神经模型以符合人类标准并最终给出数字分数。然而,来自不同场景的模型是单独优化的,这可能会导致性能次优,因为它们忽略了不同场景之间的共享知识。此外,为每个场景设计单独的模型给用户带来了不便。此外,仅向用户提供数字质量评价分数,并不能帮助用户改进总结模型,因为用户不知道分数低的原因。受此启发,我们提出了统一多场景摘要评估器(UMSE)和多智能体摘要评估解释器(MASEE)。更具体地说,我们提出了一种扰动前缀调优方法来共享场景之间的跨场景知识,并使用自监督训练范式来优化模型,而无需额外的人工标记。我们的UMSE是第一个统一的总结评估框架,具有在三个评估场景中使用的能力。我们提出了一种多智能体摘要评价解释方法MASEE,该方法使用几个基于llm的智能体从四个不同的方面生成详细的自然语言解释。在基准数据集SummEval上的三个典型场景的实验结果表明,我们的UMSE可以与专门为每个场景设计的几种现有的强方法实现相当的性能。大量的定量和定性实验也证明了我们提出的解释方法的有效性,它可以产生一致和准确的解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering 工程技术-工程:电子与电气
CiteScore
11.70
自引率
3.40%
发文量
515
审稿时长
6 months
期刊介绍: The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.
期刊最新文献
2024 Reviewers List Web-FTP: A Feature Transferring-Based Pre-Trained Model for Web Attack Detection Network-to-Network: Self-Supervised Network Representation Learning via Position Prediction AEGK: Aligned Entropic Graph Kernels Through Continuous-Time Quantum Walks Contextual Inference From Sparse Shopping Transactions Based on Motif Patterns
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1