MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

arXiv - CS - Artificial Intelligence Pub Date : 2024-08-08 DOI:arxiv-2408.04203

Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu

{"title":"MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents","authors":"Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu","doi":"arxiv-2408.04203","DOIUrl":null,"url":null,"abstract":"Recently, Role-Playing Agents (RPAs) have garnered increasing attention for\ntheir potential to deliver emotional value and facilitate sociological\nresearch. However, existing studies are primarily confined to the textual\nmodality, unable to simulate humans' multimodal perceptual capabilities. To\nbridge this gap, we introduce the concept of Multimodal Role-Playing Agents\n(MRPAs), and propose a comprehensive framework, MMRole, for their development\nand evaluation, which comprises a personalized multimodal dataset and a robust\nevaluation method. Specifically, we construct a large-scale, high-quality\ndataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single\nor multi-turn dialogues. Additionally, we present a robust evaluation method,\nMMRole-Eval, encompassing eight metrics across three dimensions, where a reward\nmodel is trained to score MRPAs with the constructed ground-truth data for\ncomparison. Moreover, we develop the first specialized MRPA, MMRole-Agent.\nExtensive evaluation results demonstrate the improved performance of\nMMRole-Agent and highlight the primary challenges in developing MRPAs,\nemphasizing the need for enhanced multimodal understanding and role-playing\nconsistency. The data, code, and models will be available at\nhttps://github.com/YanqiDai/MMRole.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comprehensive framework, MMRole, for their development and evaluation, which comprises a personalized multimodal dataset and a robust evaluation method. Specifically, we construct a large-scale, high-quality dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single or multi-turn dialogues. Additionally, we present a robust evaluation method, MMRole-Eval, encompassing eight metrics across three dimensions, where a reward model is trained to score MRPAs with the constructed ground-truth data for comparison. Moreover, we develop the first specialized MRPA, MMRole-Agent. Extensive evaluation results demonstrate the improved performance of MMRole-Agent and highlight the primary challenges in developing MRPAs, emphasizing the need for enhanced multimodal understanding and role-playing consistency. The data, code, and models will be available at https://github.com/YanqiDai/MMRole.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MMRole：开发和评估多模式角色扮演代理的综合框架

近来，角色扮演代理（RPA）因其在传递情感价值和促进社会学研究方面的潜力而受到越来越多的关注。然而，现有的研究主要局限于文本模式，无法模拟人类的多模态感知能力。为了填补这一空白，我们提出了多模态角色扮演代理（MRPAs）的概念，并为其开发和评估提出了一个综合框架--MMRole，其中包括一个个性化的多模态数据集和一个稳健的评估方法。具体来说，我们构建了一个大规模、高质量的数据集 MMRole-Data，其中包括 85 个字符、11K 张图像和 14K 个单轮或多轮对话。此外，我们还提出了一种稳健的评估方法--MMRole-Eval，该方法包含三个维度的八个指标，其中训练了一个 rewardmodel，用于对与构建的地面实况数据进行比较的 MRPA 进行评分。广泛的评估结果表明了 MMRole-Agent 性能的提高，并突出了开发 MRPA 的主要挑战，强调了增强多模态理解和角色扮演一致性的必要性。数据、代码和模型可在https://github.com/YanqiDai/MMRole。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量