Pub Date : 2025-12-01Epub Date: 2025-09-15DOI: 10.1016/j.chbah.2025.100207
Kibum Moon, Adam E. Green, Kostadin Kushlev
Generative AI systems, especially Large Language Models (LLMs) such as ChatGPT, have recently emerged as significant contributors to creative processes. While LLMs can produce creative content that might be as good as or even better than human-created content, their widespread use risks reducing creative diversity across groups of people. In the present research, we aimed to quantify this homogenizing effect of LLMs on creative diversity, not only at the individual level but also at the collective level. Across three preregistered studies, we analyzed 2,200 college admissions essays. Using a novel measure—the diversity growth rate—we showed that each additional human-written essay contributed more new ideas than did each additional GPT-4 essay. Notably, this difference became more pronounced as more essays were included in the analysis and persisted despite efforts to enhance AI-generated content through both prompt and parameter modifications. Overall, our findings suggest that, despite their potential to enhance individual creativity, the widespread use of LLMs could diminish the collective diversity of creative ideas.
{"title":"Homogenizing effect of large language models (LLMs) on creative diversity: An empirical comparison of human and ChatGPT writing","authors":"Kibum Moon, Adam E. Green, Kostadin Kushlev","doi":"10.1016/j.chbah.2025.100207","DOIUrl":"10.1016/j.chbah.2025.100207","url":null,"abstract":"<div><div>Generative AI systems, especially Large Language Models (LLMs) such as ChatGPT, have recently emerged as significant contributors to creative processes. While LLMs can produce creative content that might be as good as or even better than human-created content, their widespread use risks reducing creative diversity across groups of people. In the present research, we aimed to quantify this homogenizing effect of LLMs on creative diversity, not only at the individual level but also at the collective level. Across three preregistered studies, we analyzed 2,200 college admissions essays. Using a novel measure—the diversity growth rate—we showed that each additional human-written essay contributed more new ideas than did each additional GPT-4 essay. Notably, this difference became more pronounced as more essays were included in the analysis and persisted despite efforts to enhance AI-generated content through both prompt and parameter modifications. Overall, our findings suggest that, despite their potential to enhance individual creativity, the widespread use of LLMs could diminish the collective diversity of creative ideas.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100207"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-10DOI: 10.1016/j.chbah.2025.100218
Lior Gazit , Ofer Arazy , Uri Hertz
As technology companies develop AI agents designed to function as friends, therapists, and personal advisors, a fundamental question arises: can algorithms fulfill these intimate social roles? Relational Models Theory (RMT) suggests that relationships shape normative expectations in social decisions. Our research examines the perceived relationship between human/algorithmic advisors and advisee. Across two experiments (N = 492), participants reported their expectations from advisors that recommended splitting money between the advisee and an unknown other. Participants expected algorithmic advisors to exhibit higher consistency and higher sensitivity to others' payoffs, even when this resulted in smaller gains for the advisee, reflecting expectations of institutional fairness rather than personal favoritism. In contrast, participants anticipated that human advisors would prioritize their own welfare, consistent with personal relational norms. Seeking to validate that relational norms indeed drive expectations, in a follow-up experiment, we framed advisors as either "Institutional" or "Personal". Participants expected both human and algorithmic advisors to show higher sensitivity to others' payoffs and greater consistency when framed as Institutional, in line with RMT. However, regardless of framing, participants expected algorithmic advisors to exhibit higher sensitivity to others’ payoffs and greater consistency than the expectations from human advisors. Our findings extend Human-AI interaction literature by showing that people apply different normative standards to algorithmic versus human advisors. Results suggest that while relational framing can influence perceptions, attempts to position AI as replacements for humans must account for the persistent tendency to view algorithms through an institutional lens.
{"title":"Whose agent are you? Relational norms shape expectation from algorithmic and human advisors in social decisions","authors":"Lior Gazit , Ofer Arazy , Uri Hertz","doi":"10.1016/j.chbah.2025.100218","DOIUrl":"10.1016/j.chbah.2025.100218","url":null,"abstract":"<div><div>As technology companies develop AI agents designed to function as friends, therapists, and personal advisors, a fundamental question arises: can algorithms fulfill these intimate social roles? Relational Models Theory (RMT) suggests that relationships shape normative expectations in social decisions. Our research examines the perceived relationship between human/algorithmic advisors and advisee. Across two experiments (N = 492), participants reported their expectations from advisors that recommended splitting money between the advisee and an unknown other. Participants expected algorithmic advisors to exhibit higher consistency and higher sensitivity to others' payoffs, even when this resulted in smaller gains for the advisee, reflecting expectations of institutional fairness rather than personal favoritism. In contrast, participants anticipated that human advisors would prioritize their own welfare, consistent with personal relational norms. Seeking to validate that relational norms indeed drive expectations, in a follow-up experiment, we framed advisors as either \"Institutional\" or \"Personal\". Participants expected both human and algorithmic advisors to show higher sensitivity to others' payoffs and greater consistency when framed as Institutional, in line with RMT. However, regardless of framing, participants expected algorithmic advisors to exhibit higher sensitivity to others’ payoffs and greater consistency than the expectations from human advisors. Our findings extend Human-AI interaction literature by showing that people apply different normative standards to algorithmic versus human advisors. Results suggest that while relational framing can influence perceptions, attempts to position AI as replacements for humans must account for the persistent tendency to view algorithms through an institutional lens.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100218"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145320647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-19DOI: 10.1016/j.chbah.2025.100238
Ying Qin, Wanhui Zhou, Bu Zhong
Understanding user responses to AI versus human errors is crucial, as they shape trust, acceptance, and interaction outcomes. This study investigates the emotional dynamics of human-AI interactions by examining how agent identity (human vs. AI) and error severity (low vs. high) influence negative emotional reactions. Using a 2 × 2 factorial design (N = 250), the findings reveal that human agents consistently elicit stronger negative emotions than AI agents, regardless of error severity. Moreover, perceived experience moderates this relationship under specific conditions: individuals who view AI less experienced than humans exhibit stronger negative emotions toward human errors, while this effect diminishes when AI is perceived as having higher experience. However, perceived agency does not significantly influence emotional responses. These findings highlight the critical role of agent identity and perceived experience in shaping emotional reactions to errors, adding insights into the dynamics of human-AI interactions. This research shows that developing effective AI systems needs to manage user emotional responses and trust, in which perceived experience and competency play pivotal roles in adoption. The findings can guide the design of AI systems that adjust user expectations and emotional responses in accordance with the AI's perceived level of experience.
{"title":"Why human mistakes hurt more? Emotional responses in human-AI errors","authors":"Ying Qin, Wanhui Zhou, Bu Zhong","doi":"10.1016/j.chbah.2025.100238","DOIUrl":"10.1016/j.chbah.2025.100238","url":null,"abstract":"<div><div>Understanding user responses to AI versus human errors is crucial, as they shape trust, acceptance, and interaction outcomes. This study investigates the emotional dynamics of human-AI interactions by examining how agent identity (human vs. AI) and error severity (low vs. high) influence negative emotional reactions. Using a 2 × 2 factorial design (<em>N</em> = 250), the findings reveal that human agents consistently elicit stronger negative emotions than AI agents, regardless of error severity. Moreover, perceived experience moderates this relationship under specific conditions: individuals who view AI less experienced than humans exhibit stronger negative emotions toward human errors, while this effect diminishes when AI is perceived as having higher experience. However, perceived agency does not significantly influence emotional responses. These findings highlight the critical role of agent identity and perceived experience in shaping emotional reactions to errors, adding insights into the dynamics of human-AI interactions. This research shows that developing effective AI systems needs to manage user emotional responses and trust, in which perceived experience and competency play pivotal roles in adoption. The findings can guide the design of AI systems that adjust user expectations and emotional responses in accordance with the AI's perceived level of experience.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100238"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-04DOI: 10.1016/j.chbah.2025.100213
Peter André Busch , Geir Inge Hausvik , Jeppe Agger Nielsen
Researchers and practitioners are increasingly engaged in discussions about the hopes and fears of artificial intelligence (AI). In this article, we critically examine the early scholarly response to one prominent form of generative and conversational AI: ChatGPT. The launch of ChatGPT has sparked a surge in research, resulting in a fast-growing but fragmented body of literature. Against this backdrop, we undertook a systematic literature review of 192 empirical articles about ChatGPT to examine, synthesize, and evaluate the foci and gaps in this early wave of research to capture the dominating and immediate scholarly reactions to ChatGPT's release. Our analytical focus covered the following main aspects: perspectives on the purpose, usage, attitudes, and impacts of ChatGPT, as well as the theories and methods scholars apply in studying ChatGPT. Most studies in our sample focus on performance tests of ChatGPT, highlighting its strengths in remembering, understanding, and analyzing content, while revealing limitations in its capacity to generate novel ideas and its hallucination habit. Although the initial wave of ChatGPT research has generated valuable first insights, much of this early research remains a-theoretical, descriptive, and narrowly scoped, with limited attention to broader social, ethical, and institutional implications. These patterns reflect both the rapid publication pace and the early stage of scholarly engagement with this emerging technology. In response, we propose a conceptual model that maps key focus areas of ChatGPT research and suggest ways of strengthening ChatGPT research by proposing a research agenda aimed at advancing more theoretically informed, contextually grounded, and socially responsive studies of generative and conversational AI.
{"title":"The early wave of ChatGPT research: A review and future agenda","authors":"Peter André Busch , Geir Inge Hausvik , Jeppe Agger Nielsen","doi":"10.1016/j.chbah.2025.100213","DOIUrl":"10.1016/j.chbah.2025.100213","url":null,"abstract":"<div><div>Researchers and practitioners are increasingly engaged in discussions about the hopes and fears of artificial intelligence (AI). In this article, we critically examine the early scholarly response to one prominent form of generative and conversational AI: ChatGPT. The launch of ChatGPT has sparked a surge in research, resulting in a fast-growing but fragmented body of literature. Against this backdrop, we undertook a systematic literature review of 192 empirical articles about ChatGPT to examine, synthesize, and evaluate the foci and gaps in this early wave of research to capture the dominating and immediate scholarly reactions to ChatGPT's release. Our analytical focus covered the following main aspects: perspectives on the purpose, usage, attitudes, and impacts of ChatGPT, as well as the theories and methods scholars apply in studying ChatGPT. Most studies in our sample focus on performance tests of ChatGPT, highlighting its strengths in remembering, understanding, and analyzing content, while revealing limitations in its capacity to generate novel ideas and its hallucination habit. Although the initial wave of ChatGPT research has generated valuable first insights, much of this early research remains a-theoretical, descriptive, and narrowly scoped, with limited attention to broader social, ethical, and institutional implications. These patterns reflect both the rapid publication pace and the early stage of scholarly engagement with this emerging technology. In response, we propose a conceptual model that maps key focus areas of ChatGPT research and suggest ways of strengthening ChatGPT research by proposing a research agenda aimed at advancing more theoretically informed, contextually grounded, and socially responsive studies of generative and conversational AI.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100213"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-05DOI: 10.1016/j.chbah.2025.100205
Rohan L. Dunham, Gerben A. van Kleef, Eftychia Stamkou
People generally evaluate music less favourably if they believe it is created by artificial intelligence (AI) rather than humans. But the psychological mechanisms underlying this tendency remain unclear. Prior research has relied entirely on self-reports that are vulnerable to bias. This leaves open the question as to whether negative reactions are a reflection of motivated reasoning – a controlled, cognitive process in which people justify their scepticism about AI's creative capacity – or whether they stem from deeper, embodied feelings of threat to human creative uniqueness manifested physiologically. We address this question across two lab-in-field studies, measuring participants' self-reported and physiological responses to the same piece of music framed either as having AI or human origins. Study 1 (N = 50) revealed that individuals in the AI condition appreciated music less, reported less intense emotions, and experienced decreased parasympathetic nervous system activity as compared to those in the human condition. Study 2 (N = 372) showed that these effects were more pronounced among individuals who more strongly endorsed the belief that creativity is uniquely human, and that this could largely be explained by the perceived threat posed by AI. Together, these findings suggest that unfavourable responses to AI-generated music are not driven solely by controlled cognitive justifications but also by automatic, embodied threat reactions in response to creative AI. They suggest that strategies addressing perceived threats posed by AI may be key to fostering more harmonious human-AI collaboration and acceptance.
{"title":"The threat of synthetic harmony: The effects of AI vs. human origin beliefs on listeners' cognitive, emotional, and physiological responses to music","authors":"Rohan L. Dunham, Gerben A. van Kleef, Eftychia Stamkou","doi":"10.1016/j.chbah.2025.100205","DOIUrl":"10.1016/j.chbah.2025.100205","url":null,"abstract":"<div><div>People generally evaluate music less favourably if they believe it is created by artificial intelligence (AI) rather than humans. But the psychological mechanisms underlying this tendency remain unclear. Prior research has relied entirely on self-reports that are vulnerable to bias. This leaves open the question as to whether negative reactions are a reflection of motivated reasoning – a controlled, cognitive process in which people justify their scepticism about AI's creative capacity – or whether they stem from deeper, embodied feelings of threat to human creative uniqueness manifested physiologically. We address this question across two lab-in-field studies, measuring participants' self-reported and physiological responses to the same piece of music framed either as having AI or human origins. Study 1 (<em>N</em> = 50) revealed that individuals in the AI condition appreciated music less, reported less intense emotions, and experienced decreased parasympathetic nervous system activity as compared to those in the human condition. Study 2 (<em>N</em> = 372) showed that these effects were more pronounced among individuals who more strongly endorsed the belief that creativity is uniquely human, and that this could largely be explained by the perceived threat posed by AI. Together, these findings suggest that unfavourable responses to AI-generated music are not driven solely by controlled cognitive justifications but also by automatic, embodied threat reactions in response to creative AI. They suggest that strategies addressing perceived threats posed by AI may be key to fostering more harmonious human-AI collaboration and acceptance.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100205"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145020444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-30DOI: 10.1016/j.chbah.2025.100235
Simon Schreibelmayr, Martina Mara
The widespread adoption of artificially intelligent advisory systems in everyday decision-making situations draws attention to the topic of user trust. Based on psychological theories of trust formation, several key determinants of Trust in Automation (TiA) have been proposed, though systematic empirical validation remains limited. To test them under highly controlled conditions, we implemented an immersive Virtual Reality trust game in which 165 participants solved riddles together with a voice-based AI assistant, evaluated it along multiple theoretically derived dimensions, and indicated how much they would rely on its advice. Largely consistent with the TiA model by Körber (2019), we found perceived system competence, understandability, assumed intentions of developers, and participants’ individual trust propensity to significantly predict user trust in the AI advisor, with the first having the largest influence. Additionally, familiarity moderated the relation between perceived system competence and trust. This model, derived from subjective trust measures (self-report scales), was then re-evaluated using behavioral reliance (i.e., the number of accepted in-game AI recommendations) as the outcome variable. Theoretical, empirical, and practical implications of the results are discussed.
{"title":"Determinants of self-reported and behavioral trust in an AI advisor within a cooperative problem-solving game","authors":"Simon Schreibelmayr, Martina Mara","doi":"10.1016/j.chbah.2025.100235","DOIUrl":"10.1016/j.chbah.2025.100235","url":null,"abstract":"<div><div>The widespread adoption of artificially intelligent advisory systems in everyday decision-making situations draws attention to the topic of user trust. Based on psychological theories of trust formation, several key determinants of Trust in Automation (TiA) have been proposed, though systematic empirical validation remains limited. To test them under highly controlled conditions, we implemented an immersive Virtual Reality trust game in which 165 participants solved riddles together with a voice-based AI assistant, evaluated it along multiple theoretically derived dimensions, and indicated how much they would rely on its advice. Largely consistent with the TiA model by Körber (2019), we found perceived system competence, understandability, assumed intentions of developers, and participants’ individual trust propensity to significantly predict user trust in the AI advisor, with the first having the largest influence. Additionally, familiarity moderated the relation between perceived system competence and trust. This model, derived from subjective trust measures (self-report scales), was then re-evaluated using behavioral reliance (i.e., the number of accepted in-game AI recommendations) as the outcome variable. Theoretical, empirical, and practical implications of the results are discussed.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100235"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-04DOI: 10.1016/j.chbah.2025.100214
Young Ji Kim , Ceciley Xinyi Zhang , Chengyu Fang
Generative artificial intelligence (GenAI), such as ChatGPT, has recently attracted vast public attention for its remarkable ability to produce sophisticated, human-like content. As these technologies increasingly blur the boundaries between artificial and human intelligence, understanding how users perceive and manage this boundary becomes essential. Drawing on the concept of boundary work, this paper examines how GenAI users discursively and practically navigate the ontological boundaries between human intelligence and GenAI. Through a qualitative analysis of nine focus groups involving 45 college students from diverse academic backgrounds, this study identifies three types of human-GenAI boundaries: complementary, competitive, and co-evolving. Complementary boundaries highlight GenAI's supportive and instrumental role and competitive boundaries emphasize human superiority and concerns over GenAI's threats, while co-evolving boundaries acknowledge dynamic interplay and reflective collaboration between humans and GenAI. The paper contributes theoretically by demonstrating that human-machine boundaries are dynamic, multifaceted, and actively negotiated. Practically, it offers insights into user strategies and implications for responsible adoption of GenAI technologies in educational and organizational contexts.
{"title":"Navigating the human-AI divide: Boundary work in the age of generative AI","authors":"Young Ji Kim , Ceciley Xinyi Zhang , Chengyu Fang","doi":"10.1016/j.chbah.2025.100214","DOIUrl":"10.1016/j.chbah.2025.100214","url":null,"abstract":"<div><div>Generative artificial intelligence (GenAI), such as ChatGPT, has recently attracted vast public attention for its remarkable ability to produce sophisticated, human-like content. As these technologies increasingly blur the boundaries between artificial and human intelligence, understanding how users perceive and manage this boundary becomes essential. Drawing on the concept of boundary work, this paper examines how GenAI users discursively and practically navigate the ontological boundaries between human intelligence and GenAI. Through a qualitative analysis of nine focus groups involving 45 college students from diverse academic backgrounds, this study identifies three types of human-GenAI boundaries: <em>complementary, competitive, and co-evolving</em>. Complementary boundaries highlight GenAI's supportive and instrumental role and competitive boundaries emphasize human superiority and concerns over GenAI's threats, while co-evolving boundaries acknowledge dynamic interplay and reflective collaboration between humans and GenAI. The paper contributes theoretically by demonstrating that human-machine boundaries are dynamic, multifaceted, and actively negotiated. Practically, it offers insights into user strategies and implications for responsible adoption of GenAI technologies in educational and organizational contexts.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100214"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-17DOI: 10.1016/j.chbah.2025.100241
Yiwen Jin , Lies Sercu , Feng Guo
As large language models (LLMs) such as ChatGPT are increasingly used across cultures and languages, concerns have arisen about their ability to respond in culturally sensitive ways. This study evaluated the intercultural sensitivity of GPT-3.5 and GPT-4 using the Intercultural Sensitivity Scale (ISS) translated into eight languages. Each model completed ten randomized iterations of the 24-item ISS per language, and the results were analyzed using descriptive statistics and three-way ANOVA. GPT-4 achieved significantly higher intercultural sensitivity scores than GPT-3.5 across all dimensions, with “respect for cultural differences” scoring highest and “interaction confidence” lowest. Significant interactions were found between model version and language, and between model version and ISS dimensions, indicating that GPT-4's improvements vary by linguistic context. Nonetheless, the interaction between language and dimensions did not yield significant results. Future research should focus on increasing the amount of training data for the less spoken languages, as well as adding rich emotional and cultural background data to improve the model's understanding of cultural norms and nuances.
{"title":"Assessing intercultural sensitivity in large language models: A comparative study of GPT-3.5 and GPT-4 across eight languages","authors":"Yiwen Jin , Lies Sercu , Feng Guo","doi":"10.1016/j.chbah.2025.100241","DOIUrl":"10.1016/j.chbah.2025.100241","url":null,"abstract":"<div><div>As large language models (LLMs) such as ChatGPT are increasingly used across cultures and languages, concerns have arisen about their ability to respond in culturally sensitive ways. This study evaluated the intercultural sensitivity of GPT-3.5 and GPT-4 using the Intercultural Sensitivity Scale (ISS) translated into eight languages. Each model completed ten randomized iterations of the 24-item ISS per language, and the results were analyzed using descriptive statistics and three-way ANOVA. GPT-4 achieved significantly higher intercultural sensitivity scores than GPT-3.5 across all dimensions, with “respect for cultural differences” scoring highest and “interaction confidence” lowest. Significant interactions were found between model version and language, and between model version and ISS dimensions, indicating that GPT-4's improvements vary by linguistic context. Nonetheless, the interaction between language and dimensions did not yield significant results. Future research should focus on increasing the amount of training data for the less spoken languages, as well as adding rich emotional and cultural background data to improve the model's understanding of cultural norms and nuances.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100241"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-27DOI: 10.1016/j.chbah.2025.100231
Jiyeon Yeo, Jan-Philipp Stein
Exposure to idealized body imagery on social media has been linked to lower body satisfaction/appreciation, negative mood effects, and mental health risks. Serving as a potential counterforce to these severe issues, body-positive content creators advocate for broader conceptualizations of beauty, more inclusivity, and self-acceptance among social media users. Amidst this on-going discourse, hyper-realistic virtual influencers (VIs) have emerged as novel social agents—some reinforcing traditional beauty ideals and others promoting more diversity. Experiment 1 (N = 337) examined how VIs with different body types (larger-sized versus thin-ideal) influence women’s state body appreciation and perceptions of ideal body shapes. Experiment 2 (N = 462) further investigated whether VIs elicit user responses in a way comparable to human influencers, considering ontological distinctions and perceived self-similarity. Across both experiments, neither body type nor influencer type significantly influenced women’s body appreciation or body-related ideals. Whereas several proposed moderating variables did not result in significant findings, perceptions of self-similarity were ultimately found to play a meaningful role: Human influencers were perceived as more self-similar, and this perception was positively linked to body appreciation. Taken together, our mixed findings indicate that VIs may exert a weaker impact on young women’s body perceptions than expected—at least in the short term. As such, future research might benefit from focusing more on potential long-term effects.
{"title":"Digitally created body positivity: The effects of virtual influencers with different body types on viewer perceptions","authors":"Jiyeon Yeo, Jan-Philipp Stein","doi":"10.1016/j.chbah.2025.100231","DOIUrl":"10.1016/j.chbah.2025.100231","url":null,"abstract":"<div><div>Exposure to idealized body imagery on social media has been linked to lower body satisfaction/appreciation, negative mood effects, and mental health risks. Serving as a potential counterforce to these severe issues, body-positive content creators advocate for broader conceptualizations of beauty, more inclusivity, and self-acceptance among social media users. Amidst this on-going discourse, hyper-realistic virtual influencers (VIs) have emerged as novel social agents—some reinforcing traditional beauty ideals and others promoting more diversity. Experiment 1 (<em>N</em> = 337) examined how VIs with different body types (larger-sized versus thin-ideal) influence women’s state body appreciation and perceptions of ideal body shapes. Experiment 2 (<em>N</em> = 462) further investigated whether VIs elicit user responses in a way comparable to human influencers, considering ontological distinctions and perceived self-similarity. Across both experiments, neither body type nor influencer type significantly influenced women’s body appreciation or body-related ideals. Whereas several proposed moderating variables did not result in significant findings, perceptions of self-similarity were ultimately found to play a meaningful role: Human influencers were perceived as more self-similar, and this perception was positively linked to body appreciation. Taken together, our mixed findings indicate that VIs may exert a weaker impact on young women’s body perceptions than expected—at least in the short term. As such, future research might benefit from focusing more on potential long-term effects.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100231"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study aims to introduce a methodology for assessing the agreement between AI and human ratings, specifically focusing on visual large language models (LLMs). This paper presents empirical findings on the alignment between ratings generated by GPT-4 Vision (GPT-4V) and Gemini Pro Vision with human subjective evaluations of environmental visuals. Using photographs of restaurant interior design and food, the study estimates the degree of agreement with human preferences. The intraclass correlation reveals that GPT-4V, unlike Gemini Pro Vision, achieves moderate agreement with participants’ general restaurant preferences. Similar results are observed for rating food photos. Additionally, there is good agreement in categorizing restaurants into low-cost, mid-range and exclusive categories based on interior quality. Finally, differences in ratings were observed at the subsample level based on age, gender, and socioeconomic status across the human sample and LLMs. The results of repeated-measures ANOVAs indicate varying degrees of alignment between humans and LLMs across different sociodemographic characteristics. Overall, GPT-4V currently demonstrates limited ability to provide meaningful ratings of visual stimuli compared to human ratings and performs better in this task compared to Gemini Pro Vision.
本研究旨在介绍一种评估人工智能和人类评级之间一致性的方法,特别关注视觉大型语言模型(llm)。本文介绍了由GPT-4 Vision (GPT-4V)和Gemini Pro Vision生成的评分与人类对环境视觉的主观评价之间的一致性的实证研究结果。该研究利用餐馆室内设计和食物的照片,估计了与人类偏好的一致程度。类内相关性表明,与Gemini Pro Vision不同,GPT-4V与参与者的一般餐厅偏好达成了适度的一致。在评价食物照片时也观察到类似的结果。此外,在根据内部质量将餐厅分为低成本、中档和独家三类方面,也存在很好的共识。最后,在基于年龄、性别和社会经济地位的子样本水平上,在人类样本和法学硕士中观察到评分的差异。重复测量方差分析的结果表明,不同社会人口特征的人类和法学硕士之间存在不同程度的一致性。总的来说,与人类相比,GPT-4V目前提供有意义的视觉刺激评级的能力有限,与Gemini Pro Vision相比,它在这项任务中的表现更好。
{"title":"Evaluating the agreement between human preferences, GPT-4V and Gemini Pro Vision assessments: Can AI recognize what people might like?","authors":"Dino Krupić , Domagoj Matijević , Nenad Šuvak , Jurica Maltar , Domagoj Ševerdija","doi":"10.1016/j.chbah.2025.100234","DOIUrl":"10.1016/j.chbah.2025.100234","url":null,"abstract":"<div><div>This study aims to introduce a methodology for assessing the agreement between AI and human ratings, specifically focusing on visual large language models (LLMs). This paper presents empirical findings on the alignment between ratings generated by GPT-4 Vision (GPT-4V) and Gemini Pro Vision with human subjective evaluations of environmental visuals. Using photographs of restaurant interior design and food, the study estimates the degree of agreement with human preferences. The intraclass correlation reveals that GPT-4V, unlike Gemini Pro Vision, achieves moderate agreement with participants’ general restaurant preferences. Similar results are observed for rating food photos. Additionally, there is good agreement in categorizing restaurants into low-cost, mid-range and exclusive categories based on interior quality. Finally, differences in ratings were observed at the subsample level based on age, gender, and socioeconomic status across the human sample and LLMs. The results of repeated-measures ANOVAs indicate varying degrees of alignment between humans and LLMs across different sociodemographic characteristics. Overall, GPT-4V currently demonstrates limited ability to provide meaningful ratings of visual stimuli compared to human ratings and performs better in this task compared to Gemini Pro Vision.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"6 ","pages":"Article 100234"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}