{"title":"人工智能能否取代人类实验对象?大规模复制 LLM 心理实验","authors":"Ziyan Cui, Ning Li, Huaikang Zhou","doi":"arxiv-2409.00128","DOIUrl":null,"url":null,"abstract":"Artificial Intelligence (AI) is increasingly being integrated into scientific\nresearch, particularly in the social sciences, where understanding human\nbehavior is critical. Large Language Models (LLMs) like GPT-4 have shown\npromise in replicating human-like responses in various psychological\nexperiments. However, the extent to which LLMs can effectively replace human\nsubjects across diverse experimental contexts remains unclear. Here, we conduct\na large-scale study replicating 154 psychological experiments from top social\nscience journals with 618 main effects and 138 interaction effects using GPT-4\nas a simulated participant. We find that GPT-4 successfully replicates 76.0\npercent of main effects and 47.0 percent of interaction effects observed in the\noriginal studies, closely mirroring human responses in both direction and\nsignificance. However, only 19.44 percent of GPT-4's replicated confidence\nintervals contain the original effect sizes, with the majority of replicated\neffect sizes exceeding the 95 percent confidence interval of the original\nstudies. Additionally, there is a 71.6 percent rate of unexpected significant\nresults where the original studies reported null findings, suggesting potential\noverestimation or false positives. Our results demonstrate the potential of\nLLMs as powerful tools in psychological research but also emphasize the need\nfor caution in interpreting AI-driven findings. While LLMs can complement human\nstudies, they cannot yet fully replace the nuanced insights provided by human\nsubjects.","PeriodicalId":501273,"journal":{"name":"arXiv - ECON - General Economics","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs\",\"authors\":\"Ziyan Cui, Ning Li, Huaikang Zhou\",\"doi\":\"arxiv-2409.00128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial Intelligence (AI) is increasingly being integrated into scientific\\nresearch, particularly in the social sciences, where understanding human\\nbehavior is critical. Large Language Models (LLMs) like GPT-4 have shown\\npromise in replicating human-like responses in various psychological\\nexperiments. However, the extent to which LLMs can effectively replace human\\nsubjects across diverse experimental contexts remains unclear. Here, we conduct\\na large-scale study replicating 154 psychological experiments from top social\\nscience journals with 618 main effects and 138 interaction effects using GPT-4\\nas a simulated participant. We find that GPT-4 successfully replicates 76.0\\npercent of main effects and 47.0 percent of interaction effects observed in the\\noriginal studies, closely mirroring human responses in both direction and\\nsignificance. However, only 19.44 percent of GPT-4's replicated confidence\\nintervals contain the original effect sizes, with the majority of replicated\\neffect sizes exceeding the 95 percent confidence interval of the original\\nstudies. Additionally, there is a 71.6 percent rate of unexpected significant\\nresults where the original studies reported null findings, suggesting potential\\noverestimation or false positives. Our results demonstrate the potential of\\nLLMs as powerful tools in psychological research but also emphasize the need\\nfor caution in interpreting AI-driven findings. While LLMs can complement human\\nstudies, they cannot yet fully replace the nuanced insights provided by human\\nsubjects.\",\"PeriodicalId\":501273,\"journal\":{\"name\":\"arXiv - ECON - General Economics\",\"volume\":\"35 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - ECON - General Economics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.00128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - General Economics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs
Artificial Intelligence (AI) is increasingly being integrated into scientific
research, particularly in the social sciences, where understanding human
behavior is critical. Large Language Models (LLMs) like GPT-4 have shown
promise in replicating human-like responses in various psychological
experiments. However, the extent to which LLMs can effectively replace human
subjects across diverse experimental contexts remains unclear. Here, we conduct
a large-scale study replicating 154 psychological experiments from top social
science journals with 618 main effects and 138 interaction effects using GPT-4
as a simulated participant. We find that GPT-4 successfully replicates 76.0
percent of main effects and 47.0 percent of interaction effects observed in the
original studies, closely mirroring human responses in both direction and
significance. However, only 19.44 percent of GPT-4's replicated confidence
intervals contain the original effect sizes, with the majority of replicated
effect sizes exceeding the 95 percent confidence interval of the original
studies. Additionally, there is a 71.6 percent rate of unexpected significant
results where the original studies reported null findings, suggesting potential
overestimation or false positives. Our results demonstrate the potential of
LLMs as powerful tools in psychological research but also emphasize the need
for caution in interpreting AI-driven findings. While LLMs can complement human
studies, they cannot yet fully replace the nuanced insights provided by human
subjects.