开发多模态情感支持对话系统

Yuqi Chu, Lizi Liao, Zhiyuan Zhou, Chong-Wah Ngo, Richang Hong
{"title":"开发多模态情感支持对话系统","authors":"Yuqi Chu, Lizi Liao, Zhiyuan Zhou, Chong-Wah Ngo, Richang Hong","doi":"arxiv-2408.03650","DOIUrl":null,"url":null,"abstract":"The integration of conversational artificial intelligence (AI) into mental\nhealth care promises a new horizon for therapist-client interactions, aiming to\nclosely emulate the depth and nuance of human conversations. Despite the\npotential, the current landscape of conversational AI is markedly limited by\nits reliance on single-modal data, constraining the systems' ability to\nempathize and provide effective emotional support. This limitation stems from a\npaucity of resources that encapsulate the multimodal nature of human\ncommunication essential for therapeutic counseling. To address this gap, we\nintroduce the Multimodal Emotional Support Conversation (MESC) dataset, a\nfirst-of-its-kind resource enriched with comprehensive annotations across text,\naudio, and video modalities. This dataset captures the intricate interplay of\nuser emotions, system strategies, system emotion, and system responses, setting\na new precedent in the field. Leveraging the MESC dataset, we propose a general\nSequential Multimodal Emotional Support framework (SMES) grounded in\nTherapeutic Skills Theory. Tailored for multimodal dialogue systems, the SMES\nframework incorporates an LLM-based reasoning model that sequentially generates\nuser emotion recognition, system strategy prediction, system emotion\nprediction, and response generation. Our rigorous evaluations demonstrate that\nthis framework significantly enhances the capability of AI systems to mimic\ntherapist behaviors with heightened empathy and strategic responsiveness. By\nintegrating multimodal data in this innovative manner, we bridge the critical\ngap between emotion recognition and emotional support, marking a significant\nadvancement in conversational AI for mental health support.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Multimodal Emotional Support Conversation Systems\",\"authors\":\"Yuqi Chu, Lizi Liao, Zhiyuan Zhou, Chong-Wah Ngo, Richang Hong\",\"doi\":\"arxiv-2408.03650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The integration of conversational artificial intelligence (AI) into mental\\nhealth care promises a new horizon for therapist-client interactions, aiming to\\nclosely emulate the depth and nuance of human conversations. Despite the\\npotential, the current landscape of conversational AI is markedly limited by\\nits reliance on single-modal data, constraining the systems' ability to\\nempathize and provide effective emotional support. This limitation stems from a\\npaucity of resources that encapsulate the multimodal nature of human\\ncommunication essential for therapeutic counseling. To address this gap, we\\nintroduce the Multimodal Emotional Support Conversation (MESC) dataset, a\\nfirst-of-its-kind resource enriched with comprehensive annotations across text,\\naudio, and video modalities. This dataset captures the intricate interplay of\\nuser emotions, system strategies, system emotion, and system responses, setting\\na new precedent in the field. Leveraging the MESC dataset, we propose a general\\nSequential Multimodal Emotional Support framework (SMES) grounded in\\nTherapeutic Skills Theory. Tailored for multimodal dialogue systems, the SMES\\nframework incorporates an LLM-based reasoning model that sequentially generates\\nuser emotion recognition, system strategy prediction, system emotion\\nprediction, and response generation. Our rigorous evaluations demonstrate that\\nthis framework significantly enhances the capability of AI systems to mimic\\ntherapist behaviors with heightened empathy and strategic responsiveness. By\\nintegrating multimodal data in this innovative manner, we bridge the critical\\ngap between emotion recognition and emotional support, marking a significant\\nadvancement in conversational AI for mental health support.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.03650\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

将对话式人工智能(AI)融入心理健康护理有望为治疗师与客户之间的互动开辟一片新天地,其目标是近似模拟人类对话的深度和细微差别。尽管潜力巨大,但目前的对话式人工智能由于依赖单一模式的数据而受到明显限制,制约了系统的移情能力和提供有效情感支持的能力。这种限制源于缺乏对治疗咨询至关重要的人类交流的多模态性质进行概括的资源。为了弥补这一不足,我们引入了多模态情感支持对话(MESC)数据集,这是首个在文本、音频和视频模态中添加了全面注释的同类资源。该数据集捕捉了用户情绪、系统策略、系统情绪和系统响应之间错综复杂的相互作用,在该领域开创了一个新的先例。利用 MESC 数据集,我们提出了一个以治疗技能理论为基础的通用连续多模态情感支持框架(SMES)。为多模态对话系统量身定制的 SMES 框架包含一个基于 LLM 的推理模型,可依次生成用户情感识别、系统策略预测、系统情感预测和响应生成。我们的严格评估结果表明,该框架大大增强了人工智能系统模仿治疗师行为的能力,提高了共鸣和策略响应能力。通过以这种创新方式整合多模态数据,我们弥合了情感识别与情感支持之间的关键差距,标志着心理健康支持对话式人工智能取得了重大进展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards Multimodal Emotional Support Conversation Systems
The integration of conversational artificial intelligence (AI) into mental health care promises a new horizon for therapist-client interactions, aiming to closely emulate the depth and nuance of human conversations. Despite the potential, the current landscape of conversational AI is markedly limited by its reliance on single-modal data, constraining the systems' ability to empathize and provide effective emotional support. This limitation stems from a paucity of resources that encapsulate the multimodal nature of human communication essential for therapeutic counseling. To address this gap, we introduce the Multimodal Emotional Support Conversation (MESC) dataset, a first-of-its-kind resource enriched with comprehensive annotations across text, audio, and video modalities. This dataset captures the intricate interplay of user emotions, system strategies, system emotion, and system responses, setting a new precedent in the field. Leveraging the MESC dataset, we propose a general Sequential Multimodal Emotional Support framework (SMES) grounded in Therapeutic Skills Theory. Tailored for multimodal dialogue systems, the SMES framework incorporates an LLM-based reasoning model that sequentially generates user emotion recognition, system strategy prediction, system emotion prediction, and response generation. Our rigorous evaluations demonstrate that this framework significantly enhances the capability of AI systems to mimic therapist behaviors with heightened empathy and strategic responsiveness. By integrating multimodal data in this innovative manner, we bridge the critical gap between emotion recognition and emotional support, marking a significant advancement in conversational AI for mental health support.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Vista3D: Unravel the 3D Darkside of a Single Image MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Efficient Low-Resolution Face Recognition via Bridge Distillation Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints NVLM: Open Frontier-Class Multimodal LLMs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1