Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu
{"title":"MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents","authors":"Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu","doi":"arxiv-2408.04203","DOIUrl":null,"url":null,"abstract":"Recently, Role-Playing Agents (RPAs) have garnered increasing attention for\ntheir potential to deliver emotional value and facilitate sociological\nresearch. However, existing studies are primarily confined to the textual\nmodality, unable to simulate humans' multimodal perceptual capabilities. To\nbridge this gap, we introduce the concept of Multimodal Role-Playing Agents\n(MRPAs), and propose a comprehensive framework, MMRole, for their development\nand evaluation, which comprises a personalized multimodal dataset and a robust\nevaluation method. Specifically, we construct a large-scale, high-quality\ndataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single\nor multi-turn dialogues. Additionally, we present a robust evaluation method,\nMMRole-Eval, encompassing eight metrics across three dimensions, where a reward\nmodel is trained to score MRPAs with the constructed ground-truth data for\ncomparison. Moreover, we develop the first specialized MRPA, MMRole-Agent.\nExtensive evaluation results demonstrate the improved performance of\nMMRole-Agent and highlight the primary challenges in developing MRPAs,\nemphasizing the need for enhanced multimodal understanding and role-playing\nconsistency. The data, code, and models will be available at\nhttps://github.com/YanqiDai/MMRole.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, Role-Playing Agents (RPAs) have garnered increasing attention for
their potential to deliver emotional value and facilitate sociological
research. However, existing studies are primarily confined to the textual
modality, unable to simulate humans' multimodal perceptual capabilities. To
bridge this gap, we introduce the concept of Multimodal Role-Playing Agents
(MRPAs), and propose a comprehensive framework, MMRole, for their development
and evaluation, which comprises a personalized multimodal dataset and a robust
evaluation method. Specifically, we construct a large-scale, high-quality
dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single
or multi-turn dialogues. Additionally, we present a robust evaluation method,
MMRole-Eval, encompassing eight metrics across three dimensions, where a reward
model is trained to score MRPAs with the constructed ground-truth data for
comparison. Moreover, we develop the first specialized MRPA, MMRole-Agent.
Extensive evaluation results demonstrate the improved performance of
MMRole-Agent and highlight the primary challenges in developing MRPAs,
emphasizing the need for enhanced multimodal understanding and role-playing
consistency. The data, code, and models will be available at
https://github.com/YanqiDai/MMRole.