The Social Cognition Ability Evaluation of LLMs: A Dynamic Gamified Assessment and Hierarchical Social Learning Measurement Approach

IF 7.2 4区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-06-18 DOI:10.1145/3673238
Qin Ni, Yangze Yu, Yiming Ma, Xin Lin, Ciping Deng, Tingjiang Wei, Mo Xuan
{"title":"The Social Cognition Ability Evaluation of LLMs: A Dynamic Gamified Assessment and Hierarchical Social Learning Measurement Approach","authors":"Qin Ni, Yangze Yu, Yiming Ma, Xin Lin, Ciping Deng, Tingjiang Wei, Mo Xuan","doi":"10.1145/3673238","DOIUrl":null,"url":null,"abstract":"<p>Large Language Model(LLM) has shown amazing abilities in reasoning tasks, theory of mind(ToM) has been tested in many studies as part of reasoning tasks, and social learning, which is closely related to theory of mind, are still lack of investigation. However, the test methods and materials make the test results unconvincing. We propose a dynamic gamified assessment(DGA) and hierarchical social learning measurement to test ToM and social learning capacities in LLMs. The test for ToM consists of five parts. First, we extract ToM tasks from ToM experiments and then design game rules to satisfy the ToM task requirement. After that, we design ToM questions to match the game’s rules and use these to generate test materials. Finally, we go through the above steps to test the model. To assess the social learning ability, we introduce a novel set of social rules (three in total). Experiment results demonstrate that, except GPT-4, LLMs performed poorly on the ToM test but showed a certain level of social learning ability in social learning measurement.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3673238","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Large Language Model(LLM) has shown amazing abilities in reasoning tasks, theory of mind(ToM) has been tested in many studies as part of reasoning tasks, and social learning, which is closely related to theory of mind, are still lack of investigation. However, the test methods and materials make the test results unconvincing. We propose a dynamic gamified assessment(DGA) and hierarchical social learning measurement to test ToM and social learning capacities in LLMs. The test for ToM consists of five parts. First, we extract ToM tasks from ToM experiments and then design game rules to satisfy the ToM task requirement. After that, we design ToM questions to match the game’s rules and use these to generate test materials. Finally, we go through the above steps to test the model. To assess the social learning ability, we introduce a novel set of social rules (three in total). Experiment results demonstrate that, except GPT-4, LLMs performed poorly on the ToM test but showed a certain level of social learning ability in social learning measurement.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
法律硕士的社会认知能力评估:动态游戏化评估和分层社会学习测量方法
大语言模型(LLM)在推理任务中表现出了惊人的能力,心智理论(ToM)作为推理任务的一部分已在许多研究中进行了测试,而与心智理论密切相关的社会学习仍缺乏研究。然而,测试方法和材料使得测试结果缺乏说服力。我们提出了一种动态游戏化测评(DGA)和分层社会学习测评的方法来测试低年级学生的心智理论和社会学习能力。ToM 测试包括五个部分。首先,我们从 ToM 实验中提取 ToM 任务,然后设计游戏规则以满足 ToM 任务要求。然后,我们设计与游戏规则相匹配的 ToM 问题,并利用这些问题生成测试材料。最后,我们通过上述步骤对模型进行测试。为了评估社交学习能力,我们引入了一套新的社交规则(共三套)。实验结果表明,除 GPT-4 外,LLM 在 ToM 测试中表现较差,但在社会学习测量中表现出一定的社会学习能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
9.30
自引率
2.00%
发文量
131
期刊介绍: ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.
期刊最新文献
A Survey of Trustworthy Federated Learning: Issues, Solutions, and Challenges DeepSneak: User GPS Trajectory Reconstruction from Federated Route Recommendation Models WC-SBERT: Zero-Shot Topic Classification Using SBERT and Light Self-Training on Wikipedia Categories Self-supervised Text Style Transfer using Cycle-Consistent Adversarial Networks Federated Learning Survey: A Multi-Level Taxonomy of Aggregation Techniques, Experimental Insights, and Future Frontiers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1