Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence

ArXiv Pub Date : 2024-02-15 DOI:10.48550/arXiv.2402.10175

Yinhong Liu, Yixuan Su, Ehsan Shareghi, Nigel Collier

引用次数: 0

Abstract

Recent large language models (LLMs) have shown remarkable performance in aligning generated text with user intentions across various tasks. When it comes to long-form text generation, there has been a growing interest in generation from a discourse coherence perspective. However, existing lexical or semantic metrics such as BLEU, ROUGE, BertScore cannot effectively capture the discourse coherence. The development of discourse-specific automatic evaluation methods for assessing the output of LLMs warrants greater focus and exploration. In this paper, we present a novel automatic metric designed to quantify the discourse divergence between two long-form articles. Extensive experiments on three datasets from representative domains demonstrate that our metric aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing evaluation methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

解锁结构测量：介绍 PDD--位置话语一致性的自动度量标准

最近的大型语言模型（LLM）在将生成的文本与各种任务中的用户意图相一致方面表现出色。说到长文本生成，人们对从语篇一致性角度生成文本越来越感兴趣。然而，现有的词汇或语义度量标准，如 BLEU、ROUGE、BertScore 等，无法有效捕捉语篇连贯性。因此，开发针对特定语篇的自动评估方法来评估 LLM 的输出值得我们更多关注和探索。在本文中，我们提出了一种新颖的自动度量方法，旨在量化两篇长篇文章之间的话语分歧。在三个代表性领域的数据集上进行的广泛实验表明，我们的度量方法与人类偏好和 GPT-4 连贯性评估更为一致，优于现有的评估方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ArXiv

自引率

0.00%

发文量