Cross-Refine: Improving Natural Language Explanation Generation by Learning in Tandem

arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI:arxiv-2409.07123

Qianli Wang, Tatiana Anikina, Nils Feldhus, Simon Ostermann, Sebastian Möller, Vera Schmitt

{"title":"Cross-Refine: Improving Natural Language Explanation Generation by Learning in Tandem","authors":"Qianli Wang, Tatiana Anikina, Nils Feldhus, Simon Ostermann, Sebastian Möller, Vera Schmitt","doi":"arxiv-2409.07123","DOIUrl":null,"url":null,"abstract":"Natural language explanations (NLEs) are vital for elucidating the reasoning\nbehind large language model (LLM) decisions. Many techniques have been\ndeveloped to generate NLEs using LLMs. However, like humans, LLMs might not\nalways produce optimal NLEs on first attempt. Inspired by human learning\nprocesses, we introduce Cross-Refine, which employs role modeling by deploying\ntwo LLMs as generator and critic, respectively. The generator outputs a first\nNLE and then refines this initial explanation using feedback and suggestions\nprovided by the critic. Cross-Refine does not require any supervised training\ndata or additional training. We validate Cross-Refine across three NLP tasks\nusing three state-of-the-art open-source LLMs through automatic and human\nevaluation. We select Self-Refine (Madaan et al., 2023) as the baseline, which\nonly utilizes self-feedback to refine the explanations. Our findings from\nautomatic evaluation and a user study indicate that Cross-Refine outperforms\nSelf-Refine. Meanwhile, Cross-Refine can perform effectively with less powerful\nLLMs, whereas Self-Refine only yields strong results with ChatGPT.\nAdditionally, we conduct an ablation study to assess the importance of feedback\nand suggestions. Both of them play an important role in refining explanations.\nWe further evaluate Cross-Refine on a bilingual dataset in English and German.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Natural language explanations (NLEs) are vital for elucidating the reasoning behind large language model (LLM) decisions. Many techniques have been developed to generate NLEs using LLMs. However, like humans, LLMs might not always produce optimal NLEs on first attempt. Inspired by human learning processes, we introduce Cross-Refine, which employs role modeling by deploying two LLMs as generator and critic, respectively. The generator outputs a first NLE and then refines this initial explanation using feedback and suggestions provided by the critic. Cross-Refine does not require any supervised training data or additional training. We validate Cross-Refine across three NLP tasks using three state-of-the-art open-source LLMs through automatic and human evaluation. We select Self-Refine (Madaan et al., 2023) as the baseline, which only utilizes self-feedback to refine the explanations. Our findings from automatic evaluation and a user study indicate that Cross-Refine outperforms Self-Refine. Meanwhile, Cross-Refine can perform effectively with less powerful LLMs, whereas Self-Refine only yields strong results with ChatGPT. Additionally, we conduct an ablation study to assess the importance of feedback and suggestions. Both of them play an important role in refining explanations. We further evaluate Cross-Refine on a bilingual dataset in English and German.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

交叉定义：通过串联学习改进自然语言解释生成

自然语言解释（NLE）对于阐明大型语言模型（LLM）决策背后的推理至关重要。目前已经开发了许多技术来使用 LLM 生成 NLE。然而，与人类一样，LLM 也不一定能在第一次尝试时生成最佳的 NLE。受人类学习过程的启发，我们引入了 Cross-Refine，它通过部署两个 LLM 分别作为生成器和批判器来进行角色建模。生成器输出第一个 NLE，然后利用批评者提供的反馈和建议完善这个初始解释。Cross-Refine 不需要任何监督训练数据或额外的训练。通过自动和人工评估，我们使用三种最先进的开源 LLM 在三个 NLP 任务中验证了 Cross-Refine。我们选择Self-Refine（Madaan等人，2023年）作为基线，它只利用自我反馈来完善解释。我们的自动评估和用户研究结果表明，Cross-Refine 优于 Self-Refine。同时，Cross-Refine 可以有效地使用功能较弱的LLM，而 Self-Refine 只有在使用 ChatGPT 时才能产生强大的效果。我们还在英语和德语的双语数据集上对 Cross-Refine 进行了进一步评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助