{"title":"使用 GPT-4 无监督方法评估创意的新颖性、可行性和价值。","authors":"Felix B Kern, Chien-Te Wu, Zenas C Chao","doi":"10.1111/bjop.12720","DOIUrl":null,"url":null,"abstract":"<p><p>Creativity is defined by three key factors: novelty, feasibility and value. While many creativity tests focus primarily on novelty, they often neglect feasibility and value, thereby limiting their reflection of real-world creativity. In this study, we employ GPT-4, a large language model, to assess these three dimensions in a Japanese-language Alternative Uses Test (AUT). Using a crowdsourced evaluation method, we acquire ground truth data for 30 question items and test various GPT prompt designs. Our findings show that asking for multiple responses in a single prompt, using an 'explain first, rate later' design, is both cost-effective and accurate (r = .62, .59 and .33 for novelty, feasibility and value, respectively). Moreover, our method offers comparable accuracy to existing methods in assessing novelty, without the need for training data. We also evaluate additional models such as GPT-4 Turbo, GPT-4 Omni and Claude 3.5 Sonnet. Comparable performance across these models demonstrates the universal applicability of our prompt design. Our results contribute a straightforward platform for instant AUT evaluation and provide valuable ground truth data for future methodological research.</p>","PeriodicalId":9300,"journal":{"name":"British journal of psychology","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing novelty, feasibility and value of creative ideas with an unsupervised approach using GPT-4.\",\"authors\":\"Felix B Kern, Chien-Te Wu, Zenas C Chao\",\"doi\":\"10.1111/bjop.12720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Creativity is defined by three key factors: novelty, feasibility and value. While many creativity tests focus primarily on novelty, they often neglect feasibility and value, thereby limiting their reflection of real-world creativity. In this study, we employ GPT-4, a large language model, to assess these three dimensions in a Japanese-language Alternative Uses Test (AUT). Using a crowdsourced evaluation method, we acquire ground truth data for 30 question items and test various GPT prompt designs. Our findings show that asking for multiple responses in a single prompt, using an 'explain first, rate later' design, is both cost-effective and accurate (r = .62, .59 and .33 for novelty, feasibility and value, respectively). Moreover, our method offers comparable accuracy to existing methods in assessing novelty, without the need for training data. We also evaluate additional models such as GPT-4 Turbo, GPT-4 Omni and Claude 3.5 Sonnet. Comparable performance across these models demonstrates the universal applicability of our prompt design. Our results contribute a straightforward platform for instant AUT evaluation and provide valuable ground truth data for future methodological research.</p>\",\"PeriodicalId\":9300,\"journal\":{\"name\":\"British journal of psychology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"British journal of psychology\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1111/bjop.12720\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"British journal of psychology","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1111/bjop.12720","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
创造力由三个关键因素定义:新颖性、可行性和价值。许多创造力测试主要关注新颖性,但往往忽略了可行性和价值,从而限制了对现实世界创造力的反映。在本研究中,我们采用了大型语言模型 GPT-4 来评估日语替代用途测试(AUT)中的这三个维度。通过众包评估方法,我们获得了 30 个问题项目的基本真实数据,并测试了各种 GPT 提示设计。我们的研究结果表明,采用 "先解释,后评价 "的设计,在单个提示中要求多个回答,既经济又准确(新颖性、可行性和价值的 r 分别为 0.62、0.59 和 0.33)。此外,我们的方法在评估新颖性方面的准确性与现有方法相当,而且无需训练数据。我们还评估了其他模型,如 GPT-4 Turbo、GPT-4 Omni 和 Claude 3.5 Sonnet。这些模型的性能相当,这表明我们的提示设计具有普遍适用性。我们的结果为即时 AUT 评估提供了一个直接的平台,并为未来的方法研究提供了宝贵的基础数据。
Assessing novelty, feasibility and value of creative ideas with an unsupervised approach using GPT-4.
Creativity is defined by three key factors: novelty, feasibility and value. While many creativity tests focus primarily on novelty, they often neglect feasibility and value, thereby limiting their reflection of real-world creativity. In this study, we employ GPT-4, a large language model, to assess these three dimensions in a Japanese-language Alternative Uses Test (AUT). Using a crowdsourced evaluation method, we acquire ground truth data for 30 question items and test various GPT prompt designs. Our findings show that asking for multiple responses in a single prompt, using an 'explain first, rate later' design, is both cost-effective and accurate (r = .62, .59 and .33 for novelty, feasibility and value, respectively). Moreover, our method offers comparable accuracy to existing methods in assessing novelty, without the need for training data. We also evaluate additional models such as GPT-4 Turbo, GPT-4 Omni and Claude 3.5 Sonnet. Comparable performance across these models demonstrates the universal applicability of our prompt design. Our results contribute a straightforward platform for instant AUT evaluation and provide valuable ground truth data for future methodological research.
期刊介绍:
The British Journal of Psychology publishes original research on all aspects of general psychology including cognition; health and clinical psychology; developmental, social and occupational psychology. For information on specific requirements, please view Notes for Contributors. We attract a large number of international submissions each year which make major contributions across the range of psychology.