Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation

IF 9.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2025-02-21 DOI:10.1109/TAFFC.2025.3544594

Hui Ma;Bo Zhang;Bo Xu;Jian Wang;Hongfei Lin;Xiao Sun

{"title":"Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation","authors":"Hui Ma;Bo Zhang;Bo Xu;Jian Wang;Hongfei Lin;Xiao Sun","doi":"10.1109/TAFFC.2025.3544594","DOIUrl":null,"url":null,"abstract":"Empathetic response generation, aiming to understand the user’s situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Traditional approaches typically employ maximum likelihood estimation as the optimization objective during training, yet fail to align the empathy levels between generated and target responses. To this end, we propose an empathetic response generation framework using reinforcement learning (EmpRL). The framework develops an effective empathy reward function and generates empathetic responses by maximizing the expected reward through reinforcement learning. EmpRL utilizes the pre-trained T5 model as the generator and further fine-tunes it to initialize the policy. To align the empathy levels between generated and target responses within a given context, an empathy reward function containing three empathy communication mechanisms—emotional reaction, interpretation, and exploration—is constructed using pre-designed and pre-trained empathy identifiers. During reinforcement learning training, the proximal policy optimization algorithm is used to fine-tune the policy, enabling the generation of empathetic responses. Both automatic and human evaluations demonstrate that the proposed EmpRL framework significantly improves the quality of generated responses, enhances the similarity in empathy levels between generated and target responses, and produces empathetic responses covering both affective and cognitive aspects.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1873-1884"},"PeriodicalIF":9.8000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10899840/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Empathetic response generation, aiming to understand the user’s situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Traditional approaches typically employ maximum likelihood estimation as the optimization objective during training, yet fail to align the empathy levels between generated and target responses. To this end, we propose an empathetic response generation framework using reinforcement learning (EmpRL). The framework develops an effective empathy reward function and generates empathetic responses by maximizing the expected reward through reinforcement learning. EmpRL utilizes the pre-trained T5 model as the generator and further fine-tunes it to initialize the policy. To align the empathy levels between generated and target responses within a given context, an empathy reward function containing three empathy communication mechanisms—emotional reaction, interpretation, and exploration—is constructed using pre-designed and pre-trained empathy identifiers. During reinforcement learning training, the proximal policy optimization algorithm is used to fine-tune the policy, enabling the generation of empathetic responses. Both automatic and human evaluations demonstrate that the proposed EmpRL framework significantly improves the quality of generated responses, enhances the similarity in empathy levels between generated and target responses, and produces empathetic responses covering both affective and cognitive aspects.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过强化学习对共情反应生成的共情水平校准

移情响应生成，旨在理解用户的情况和感受，并进行移情响应，是构建类人对话系统的关键。传统方法在训练过程中通常采用最大似然估计作为优化目标，但未能使生成的反应和目标反应之间的共情水平保持一致。为此，我们提出了一个使用强化学习（EmpRL）的共情反应生成框架。该框架开发了有效的共情奖励功能，并通过强化学习最大化预期奖励来产生共情反应。EmpRL利用预训练的T5模型作为生成器，并进一步对其进行微调以初始化策略。为了在给定情境中使产生的共情反应和目标反应之间的共情水平保持一致，我们使用预先设计和预先训练的共情标识符构建了包含三种共情沟通机制——情绪反应、解释和探索的共情奖励函数。在强化学习训练过程中，使用近端策略优化算法对策略进行微调，从而产生共情响应。结果表明，EmpRL框架显著提高了生成反应的质量，增强了生成反应与目标反应之间共情水平的相似性，并产生了涵盖情感和认知两个方面的共情反应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

15.00

自引率

6.20%

发文量

174

期刊介绍： The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.