MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu
{"title":"MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents","authors":"Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu","doi":"arxiv-2408.04203","DOIUrl":null,"url":null,"abstract":"Recently, Role-Playing Agents (RPAs) have garnered increasing attention for\ntheir potential to deliver emotional value and facilitate sociological\nresearch. However, existing studies are primarily confined to the textual\nmodality, unable to simulate humans' multimodal perceptual capabilities. To\nbridge this gap, we introduce the concept of Multimodal Role-Playing Agents\n(MRPAs), and propose a comprehensive framework, MMRole, for their development\nand evaluation, which comprises a personalized multimodal dataset and a robust\nevaluation method. Specifically, we construct a large-scale, high-quality\ndataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single\nor multi-turn dialogues. Additionally, we present a robust evaluation method,\nMMRole-Eval, encompassing eight metrics across three dimensions, where a reward\nmodel is trained to score MRPAs with the constructed ground-truth data for\ncomparison. Moreover, we develop the first specialized MRPA, MMRole-Agent.\nExtensive evaluation results demonstrate the improved performance of\nMMRole-Agent and highlight the primary challenges in developing MRPAs,\nemphasizing the need for enhanced multimodal understanding and role-playing\nconsistency. The data, code, and models will be available at\nhttps://github.com/YanqiDai/MMRole.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comprehensive framework, MMRole, for their development and evaluation, which comprises a personalized multimodal dataset and a robust evaluation method. Specifically, we construct a large-scale, high-quality dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single or multi-turn dialogues. Additionally, we present a robust evaluation method, MMRole-Eval, encompassing eight metrics across three dimensions, where a reward model is trained to score MRPAs with the constructed ground-truth data for comparison. Moreover, we develop the first specialized MRPA, MMRole-Agent. Extensive evaluation results demonstrate the improved performance of MMRole-Agent and highlight the primary challenges in developing MRPAs, emphasizing the need for enhanced multimodal understanding and role-playing consistency. The data, code, and models will be available at https://github.com/YanqiDai/MMRole.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MMRole:开发和评估多模式角色扮演代理的综合框架
近来,角色扮演代理(RPA)因其在传递情感价值和促进社会学研究方面的潜力而受到越来越多的关注。然而,现有的研究主要局限于文本模式,无法模拟人类的多模态感知能力。为了填补这一空白,我们提出了多模态角色扮演代理(MRPAs)的概念,并为其开发和评估提出了一个综合框架--MMRole,其中包括一个个性化的多模态数据集和一个稳健的评估方法。具体来说,我们构建了一个大规模、高质量的数据集 MMRole-Data,其中包括 85 个字符、11K 张图像和 14K 个单轮或多轮对话。此外,我们还提出了一种稳健的评估方法--MMRole-Eval,该方法包含三个维度的八个指标,其中训练了一个 rewardmodel,用于对与构建的地面实况数据进行比较的 MRPA 进行评分。广泛的评估结果表明了 MMRole-Agent 性能的提高,并突出了开发 MRPA 的主要挑战,强调了增强多模态理解和角色扮演一致性的必要性。数据、代码和模型可在https://github.com/YanqiDai/MMRole。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Abductive explanations of classifiers under constraints: Complexity and properties Explaining Non-monotonic Normative Reasoning using Argumentation Theory with Deontic Logic Towards Explainable Goal Recognition Using Weight of Evidence (WoE): A Human-Centered Approach A Metric Hybrid Planning Approach to Solving Pandemic Planning Problems with Simple SIR Models Neural Networks for Vehicle Routing Problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1