大型语言模型的人格测试：时间稳定性有限，但亲社会性突出。

IF 2.9 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Royal Society Open Science Pub Date : 2024-10-09 eCollection Date: 2024-10-01 DOI:10.1098/rsos.240180

Bojana Bodroža, Bojana M Dinić, Ljubiša Bojić

{"title":"大型语言模型的人格测试：时间稳定性有限，但亲社会性突出。","authors":"Bojana Bodroža, Bojana M Dinić, Ljubiša Bojić","doi":"10.1098/rsos.240180","DOIUrl":null,"url":null,"abstract":"As large language models (LLMs) continue to gain popularity due to their human-like traits and the intimacy they offer to users, their societal impact inevitably expands. This leads to the rising necessity for comprehensive studies to fully understand LLMs and reveal their potential opportunities, drawbacks and overall societal impact. With that in mind, this research conducted an extensive investigation into seven LLMs, aiming to assess the temporal stability and inter-rater agreement on their responses on personality instruments in two time points. In addition, LLMs' personality profile was analysed and compared with human normative data. The findings revealed varying levels of inter-rater agreement in the LLMs' responses over a short time, with some LLMs showing higher agreement (e.g. Llama3 and GPT-4o) compared with others (e.g. GPT-4 and Gemini). Furthermore, agreement depended on used instruments as well as on domain or trait. This implies the variable robustness in LLMs' ability to reliably simulate stable personality characteristics. In the case of scales which showed at least fair agreement, LLMs displayed mostly a socially desirable profile in both agentic and communal domains, as well as a prosocial personality profile reflected in higher agreeableness and conscientiousness and lower Machiavellianism. Exhibiting temporal stability and coherent responses on personality traits is crucial for AI systems due to their societal impact and AI safety concerns.","PeriodicalId":21525,"journal":{"name":"Royal Society Open Science","volume":"11 10","pages":"240180"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461045/pdf/","citationCount":"0","resultStr":"{\"title\":\"Personality testing of large language models: limited temporal stability, but highlighted prosociality.\",\"authors\":\"Bojana Bodroža, Bojana M Dinić, Ljubiša Bojić\",\"doi\":\"10.1098/rsos.240180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As large language models (LLMs) continue to gain popularity due to their human-like traits and the intimacy they offer to users, their societal impact inevitably expands. This leads to the rising necessity for comprehensive studies to fully understand LLMs and reveal their potential opportunities, drawbacks and overall societal impact. With that in mind, this research conducted an extensive investigation into seven LLMs, aiming to assess the temporal stability and inter-rater agreement on their responses on personality instruments in two time points. In addition, LLMs' personality profile was analysed and compared with human normative data. The findings revealed varying levels of inter-rater agreement in the LLMs' responses over a short time, with some LLMs showing higher agreement (e.g. Llama3 and GPT-4o) compared with others (e.g. GPT-4 and Gemini). Furthermore, agreement depended on used instruments as well as on domain or trait. This implies the variable robustness in LLMs' ability to reliably simulate stable personality characteristics. In the case of scales which showed at least fair agreement, LLMs displayed mostly a socially desirable profile in both agentic and communal domains, as well as a prosocial personality profile reflected in higher agreeableness and conscientiousness and lower Machiavellianism. Exhibiting temporal stability and coherent responses on personality traits is crucial for AI systems due to their societal impact and AI safety concerns.\",\"PeriodicalId\":21525,\"journal\":{\"name\":\"Royal Society Open Science\",\"volume\":\"11 10\",\"pages\":\"240180\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461045/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Royal Society Open Science\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1098/rsos.240180\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Royal Society Open Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsos.240180","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

随着大型语言模型（LLMs）因其类似人类的特征和对用户的亲和力而不断受到欢迎，其社会影响也不可避免地扩大了。因此，越来越有必要进行全面研究，以充分了解 LLM，揭示其潜在的机遇、缺点和整体社会影响。有鉴于此，本研究对七位法律硕士进行了广泛调查，旨在评估他们在两个时间点上对人格工具的反应的时间稳定性和评分者之间的一致性。此外，还对法律硕士的人格特征进行了分析，并与人类常模数据进行了比较。研究结果表明，在短时间内，利比里亚人的回答在不同程度上存在评分者之间的一致性，一些利比里亚人（如 Llama3 和 GPT-4o）与其他利比里亚人（如 GPT-4 和双子座）相比显示出更高的一致性。此外，一致性取决于所使用的工具以及领域或特征。这意味着 LLMs 在可靠模拟稳定人格特征方面具有不同的稳健性。在至少表现出相当一致的量表中，LLMs 在代理和公共领域大多表现出理想的社会特征，以及亲社会人格特征，这反映在较高的合意性和自觉性以及较低的马基雅维利主义上。由于人工智能系统的社会影响和人工智能的安全问题，在人格特质上表现出时间稳定性和连贯性对于人工智能系统至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Personality testing of large language models: limited temporal stability, but highlighted prosociality.

As large language models (LLMs) continue to gain popularity due to their human-like traits and the intimacy they offer to users, their societal impact inevitably expands. This leads to the rising necessity for comprehensive studies to fully understand LLMs and reveal their potential opportunities, drawbacks and overall societal impact. With that in mind, this research conducted an extensive investigation into seven LLMs, aiming to assess the temporal stability and inter-rater agreement on their responses on personality instruments in two time points. In addition, LLMs' personality profile was analysed and compared with human normative data. The findings revealed varying levels of inter-rater agreement in the LLMs' responses over a short time, with some LLMs showing higher agreement (e.g. Llama3 and GPT-4o) compared with others (e.g. GPT-4 and Gemini). Furthermore, agreement depended on used instruments as well as on domain or trait. This implies the variable robustness in LLMs' ability to reliably simulate stable personality characteristics. In the case of scales which showed at least fair agreement, LLMs displayed mostly a socially desirable profile in both agentic and communal domains, as well as a prosocial personality profile reflected in higher agreeableness and conscientiousness and lower Machiavellianism. Exhibiting temporal stability and coherent responses on personality traits is crucial for AI systems due to their societal impact and AI safety concerns.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Royal Society Open Science Multidisciplinary-Multidisciplinary

CiteScore

6.00

自引率

0.00%

发文量

508

审稿时长

14 weeks

期刊介绍： Royal Society Open Science is a new open journal publishing high-quality original research across the entire range of science on the basis of objective peer-review. The journal covers the entire range of science and mathematics and will allow the Society to publish all the high-quality work it receives without the usual restrictions on scope, length or impact.