首页 > 最新文献

Computers in human behavior reports最新文献

英文 中文
Advanced robot interfaces are unnecessary for effective psychological health interventions 先进的机器人界面对于有效的心理健康干预是不必要的
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2026-01-29 DOI: 10.1016/j.chbr.2026.100955
Ivy S. Huang , Johan F. Hoorn
Depression is prevalent among young adults, many of whom encounter obstacles to accessing traditional interventions. This study investigated whether the modality of robotic delivery influences outcomes when administering identical psychological health intervention content. We compared three modalities, a text-based chatbot, an audio bot, and a video telepresence robot, each delivering the same imagery-enhanced interpretation bias modification (eiIBM) intervention. Forty-nine young adults with depressive symptoms (Mage = 22.71, SD = 3.30) were randomly assigned to one of the three robot conditions and completed six eiIBM sessions over two weeks. An additional control group (n = 18) received no intervention. User experience was assessed using the I-PEFiC framework, and measures of depression severity and interpretation biases were collected. All three robot modalities yielded comparable outcomes, with substantial reductions in depression symptoms (Hedges' g = 1.11–1.33) and approximately 40 % decreases in negative interpretation biases. Bayesian analyses focusing on the modality provided substantial evidence for the absence of differences between modalities regarding intervention outcomes (BFincl < 0.026). Notably, user experience emerged as a significant predictor of intervention efficacy: participants who reported positive user experiences exhibited markedly greater reductions in interpretation bias (Cohen's d > 3.0) regardless of the robot modality. These findings suggest that, when intervention content is standardized, increasing the sensory richness of the delivery modality does not enhance intervention outcomes. For structured cognitive interventions such as eiIBM, the fidelity of content delivery and the quality of user experience are more critical determinants of effectiveness than the sensory richness.
抑郁症在年轻人中很普遍,他们中的许多人在获得传统干预措施方面遇到障碍。本研究调查了当给予相同的心理健康干预内容时,机器人分娩方式是否会影响结果。我们比较了三种模式,基于文本的聊天机器人、音频机器人和视频远程呈现机器人,每种模式都提供相同的图像增强解释偏差修正(eiIBM)干预。49名有抑郁症状的年轻人(Mage = 22.71, SD = 3.30)被随机分配到三种机器人环境中的一种,并在两周内完成了六次eiIBM会话。另一对照组(n = 18)不进行干预。使用I-PEFiC框架评估用户体验,并收集抑郁严重程度和解释偏差的测量。所有三种机器人模式都产生了类似的结果,抑郁症状显著减少(赫奇斯的g = 1.11-1.33),负面解释偏差减少约40%。专注于模式的贝叶斯分析提供了大量证据,证明在干预结果方面,模式之间没有差异(BFincl < 0.026)。值得注意的是,用户体验成为干预效果的重要预测因素:报告积极用户体验的参与者在解释偏差方面表现出明显更大的减少(Cohen's d > 3.0),而不管机器人的形式如何。这些发现表明,当干预内容标准化时,增加传递方式的感官丰富性并不能提高干预结果。对于结构化的认知干预,如eiIBM,内容传递的保真度和用户体验的质量是比感官丰富性更重要的有效性决定因素。
{"title":"Advanced robot interfaces are unnecessary for effective psychological health interventions","authors":"Ivy S. Huang ,&nbsp;Johan F. Hoorn","doi":"10.1016/j.chbr.2026.100955","DOIUrl":"10.1016/j.chbr.2026.100955","url":null,"abstract":"<div><div>Depression is prevalent among young adults, many of whom encounter obstacles to accessing traditional interventions. This study investigated whether the modality of robotic delivery influences outcomes when administering identical psychological health intervention content. We compared three modalities, a text-based chatbot, an audio bot, and a video telepresence robot, each delivering the same imagery-enhanced interpretation bias modification (eiIBM) intervention. Forty-nine young adults with depressive symptoms (<em>M</em><sub>age</sub> = 22.71, <em>SD</em> = 3.30) were randomly assigned to one of the three robot conditions and completed six eiIBM sessions over two weeks. An additional control group (<em>n</em> = 18) received no intervention. User experience was assessed using the I-PEFiC framework, and measures of depression severity and interpretation biases were collected. All three robot modalities yielded comparable outcomes, with substantial reductions in depression symptoms (Hedges' <em>g</em> = 1.11–1.33) and approximately 40 % decreases in negative interpretation biases. Bayesian analyses focusing on the modality provided substantial evidence for the absence of differences between modalities regarding intervention outcomes (BF<sub>incl</sub> &lt; 0.026). Notably, user experience emerged as a significant predictor of intervention efficacy: participants who reported positive user experiences exhibited markedly greater reductions in interpretation bias (Cohen's <em>d</em> &gt; 3.0) regardless of the robot modality. These findings suggest that, when intervention content is standardized, increasing the sensory richness of the delivery modality does not enhance intervention outcomes. For structured cognitive interventions such as eiIBM, the fidelity of content delivery and the quality of user experience are more critical determinants of effectiveness than the sensory richness.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100955"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146076875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Walking experience in real and virtual environments: A comparative study 真实与虚拟环境下的行走体验比较研究
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2026-01-23 DOI: 10.1016/j.chbr.2026.100950
Marzieh Ghanbari , Martin Dijst , Reza Aghanejad , Sébastien Claramunt , Camille Perchoux
Virtual reality (VR) offers new opportunities to promote active behaviors by enhancing engagement and allowing controlled modifications of urban environments. This study investigates whether virtual environments (VEs) can evoke affective responses comparable with real environments (REs), both psychologically and physiologically, by using an immersive VE combined with a walking simulator that replicates walking motion. Forty-nine healthy adults, Luxembourg residents or cross-border commuters, aged 18–65, including students, university staff, and the general public, walked two contrasting street segments, walking-friendly and car-friendly, in both RE and VE in a crossover design. Affective responses were assessed through questions on aesthetics, safety, enjoyment, comfort, relaxation, momentary stress, and real-time physiological data collected using E4 wristband.
Significant differences emerged between the RE and VE across all affective measurements, except for nonspecific skin conductance responses, with the RE consistently eliciting more positive affective responses. Nevertheless, similar affective trends were observed in both the RE and VE across the two segments. Moreover, environmental characteristics significantly influenced affective responses in both the RE and VE, with the walking-friendly segment yielding more positive affective ratings than the car-friendly one. The interactions between environment type (RE vs. VE) and segment type (car-friendly vs. walking-friendly) were not significant for most measurements, indicating that the effect of environment type on affective responses remained consistent across segments. These findings emphasize that VEs can mimic the overall patterns of affective responses observed in REs. This research highlights VR's potential in planning healthier cities, offering insights into its benefits and limitations for future research.
虚拟现实(VR)通过增强参与度和允许对城市环境进行可控修改,为促进积极行为提供了新的机会。本研究通过使用身临其境的虚拟环境与复制步行运动的步行模拟器相结合,研究虚拟环境(VEs)是否能在心理和生理上唤起与真实环境(REs)相当的情感反应。49名健康成年人,卢森堡居民或跨境通勤者,年龄在18-65岁之间,包括学生、大学工作人员和一般公众,在交叉设计中行走在两个不同的街道上,步行友好和汽车友好,在RE和VE。通过美观、安全、享受、舒适、放松、瞬间压力和使用E4腕带收集的实时生理数据等问题来评估情感反应。除了非特异性皮肤电导反应外,RE和VE在所有情感测量中都存在显著差异,RE始终引起更积极的情感反应。然而,在两个部分中,在RE和VE中观察到类似的情感趋势。此外,环境特征显著影响了RE和VE的情感反应,步行友好的情感反应比汽车友好的情感反应产生更积极的情感评价。环境类型(RE vs. VE)和区段类型(car-friendly vs. walking-friendly)之间的交互作用在大多数测量中不显著,表明环境类型对情感反应的影响在区段之间保持一致。这些发现强调,虚拟现实可以模拟在res中观察到的情感反应的整体模式。这项研究强调了虚拟现实在规划更健康城市方面的潜力,为未来的研究提供了对其优点和局限性的见解。
{"title":"Walking experience in real and virtual environments: A comparative study","authors":"Marzieh Ghanbari ,&nbsp;Martin Dijst ,&nbsp;Reza Aghanejad ,&nbsp;Sébastien Claramunt ,&nbsp;Camille Perchoux","doi":"10.1016/j.chbr.2026.100950","DOIUrl":"10.1016/j.chbr.2026.100950","url":null,"abstract":"<div><div>Virtual reality (VR) offers new opportunities to promote active behaviors by enhancing engagement and allowing controlled modifications of urban environments. This study investigates whether virtual environments (VEs) can evoke affective responses comparable with real environments (REs), both psychologically and physiologically, by using an immersive VE combined with a walking simulator that replicates walking motion. Forty-nine healthy adults, Luxembourg residents or cross-border commuters, aged 18–65, including students, university staff, and the general public, walked two contrasting street segments, walking-friendly and car-friendly, in both RE and VE in a crossover design. Affective responses were assessed through questions on aesthetics, safety, enjoyment, comfort, relaxation, momentary stress, and real-time physiological data collected using E4 wristband.</div><div>Significant differences emerged between the RE and VE across all affective measurements, except for nonspecific skin conductance responses, with the RE consistently eliciting more positive affective responses. Nevertheless, similar affective trends were observed in both the RE and VE across the two segments. Moreover, environmental characteristics significantly influenced affective responses in both the RE and VE, with the walking-friendly segment yielding more positive affective ratings than the car-friendly one. The interactions between environment type (RE vs. VE) and segment type (car-friendly vs. walking-friendly) were not significant for most measurements, indicating that the effect of environment type on affective responses remained consistent across segments. These findings emphasize that VEs can mimic the overall patterns of affective responses observed in REs. This research highlights VR's potential in planning healthier cities, offering insights into its benefits and limitations for future research.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100950"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146076874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-autonomy-teaming in e-sports: An exploratory study 电子竞技中的人类自主团队:探索性研究
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2025-12-23 DOI: 10.1016/j.chbr.2025.100919
Tilman Nols, Anna-Sophie Ulfert, Josette Gevers
Advancements in Artificial Intelligence (AI) have enabled the formation of Human-Autonomy Teams (HATs), where humans and AI collaborate interdependently toward shared goals. Despite this progress, research on HAT effectiveness remains limited and often relies on constrained laboratory settings and quantitative methods, raising concerns about ecological validity. To address these limitations, this study proposes e-sports, specifically professional competitive gaming, as a viable and underutilized context for HAT research. E-sports involve complex human-AI interactions, characterized by emotional intensity and high-paced decision-making, offering a naturalistic and alternative research environment to study HAT dynamics. To establish the psychological fidelity of e-sports as a HAT setting, we conducted interviews with professional EA FC (formerly FIFA) players and coaches from various European leagues, exploring critical aspects of human-AI collaboration. Reflexive thematic analysis revealed that EA FC meets key criteria of HATs and mirrors teamwork constructs identified in prior research. Additionally, the findings highlight underexplored dynamics such as adaptive learning, training, and team identity formation. We conclude by discussing the implications for HAT theory and call for greater use of e-sports as a testbed for advancing HAT research.
人工智能(AI)的进步使人类自治团队(hat)得以形成,人类和人工智能相互协作,实现共同的目标。尽管取得了这些进展,但对HAT有效性的研究仍然有限,而且往往依赖于受限的实验室环境和定量方法,这引起了人们对生态有效性的担忧。为了解决这些限制,本研究提出电子竞技,特别是专业竞技游戏,作为HAT研究的可行和未充分利用的背景。电子竞技涉及复杂的人机交互,以情感强度和高节奏的决策为特征,为研究HAT动态提供了一个自然的、可替代的研究环境。为了建立电子竞技作为HAT设置的心理保真度,我们采访了来自欧洲各个联赛的专业EA FC(以前的FIFA)球员和教练,探讨了人类与人工智能合作的关键方面。反身性主题分析表明,EA FC符合HATs的关键标准,并反映了先前研究中确定的团队合作结构。此外,研究结果还强调了适应性学习、培训和团队认同形成等未被充分探索的动态。最后,我们讨论了HAT理论的含义,并呼吁更多地使用电子竞技作为推进HAT研究的测试平台。
{"title":"Human-autonomy-teaming in e-sports: An exploratory study","authors":"Tilman Nols,&nbsp;Anna-Sophie Ulfert,&nbsp;Josette Gevers","doi":"10.1016/j.chbr.2025.100919","DOIUrl":"10.1016/j.chbr.2025.100919","url":null,"abstract":"<div><div>Advancements in Artificial Intelligence (AI) have enabled the formation of Human-Autonomy Teams (HATs), where humans and AI collaborate interdependently toward shared goals. Despite this progress, research on HAT effectiveness remains limited and often relies on constrained laboratory settings and quantitative methods, raising concerns about ecological validity. To address these limitations, this study proposes e-sports, specifically professional competitive gaming, as a viable and underutilized context for HAT research. E-sports involve complex human-AI interactions, characterized by emotional intensity and high-paced decision-making, offering a naturalistic and alternative research environment to study HAT dynamics. To establish the psychological fidelity of e-sports as a HAT setting, we conducted interviews with professional EA FC (formerly FIFA) players and coaches from various European leagues, exploring critical aspects of human-AI collaboration. Reflexive thematic analysis revealed that EA FC meets key criteria of HATs and mirrors teamwork constructs identified in prior research. Additionally, the findings highlight underexplored dynamics such as adaptive learning, training, and team identity formation. We conclude by discussing the implications for HAT theory and call for greater use of e-sports as a testbed for advancing HAT research.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100919"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validation of the German version of the problematic media use measure (PMUM-short form) and its relation to child and parental mental health 德文版问题媒体使用量表的验证及其与儿童和父母心理健康的关系
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2025-12-22 DOI: 10.1016/j.chbr.2025.100911
Carolin Konrad , Sarah E. Domoff , Silvia Schneider
Problematic media use in children and adolescents has been linked to various mental health concerns. Given the increasing prevalence of media use among German children, it is crucial to have a valid tool to assess problematic media use. The present study validated the 9-item short form of the problematic media use measure PMUM-SF (Domoff et al., 2019) in a German sample of N = 240 parents of 4-16-year-old children and explored its relationship with child and parental mental health. Parents completed questionnaires on their child's problematic media use (PMUM-SF, Domoff et al., 2019; BSMAS, Andreassen et al., 2016), psychosocial functioning (SDQ; Goodman, 1997), their own mental health (DASS bubbles, Brailovskaia et al., 2024) and media use online (CAFE MAQ, Barr et al., 2020; DISRUPT, McDaniel, 2021). The German version of the PMUM-SF demonstrated high internal consistency, and good convergent and incremental validity. Problematic media use was significantly associated with age and with child functioning such as externalizing and internalizing problems. Problematic media use, not duration, predicted child psychosocial functioning. Furthermore, parental media use and parental stress and depression were positively correlated with child problematic media use. The more problematic the media use, the more media rules were set by the parents, when controlling for the child's age. The study thus supports the use of the German version of the PMUM-SF as a measure of problematic media use in children ages 4 through 16 years old.
儿童和青少年有问题的媒体使用与各种心理健康问题有关。鉴于媒体使用在德国儿童中越来越普遍,有一个有效的工具来评估有问题的媒体使用是至关重要的。本研究在德国的N = 240名4-16岁儿童的父母样本中验证了问题媒体使用量表PMUM-SF (Domoff et al., 2019)的9项简短形式,并探讨了其与儿童和父母心理健康的关系。父母完成了关于孩子问题媒体使用的问卷调查(pum - sf, Domoff等人,2019;BSMAS, Andreassen等人,2016)、心理社会功能(SDQ; Goodman, 1997)、他们自己的心理健康(DASS泡泡,Brailovskaia等人,2024)和在线媒体使用(CAFE MAQ, Barr等人,2020;DISRUPT, McDaniel, 2021)。德文版本的PMUM-SF具有较高的内部一致性,具有较好的收敛效度和增量效度。有问题的媒体使用与年龄和儿童功能(如外化和内化问题)显著相关。有问题的媒体使用,而不是持续时间,预测儿童的心理社会功能。此外,父母媒体使用、父母压力和抑郁与儿童问题媒体使用呈正相关。媒体使用的问题越多,父母在控制孩子年龄的情况下制定的媒体规则就越多。因此,该研究支持使用德文版本的PMUM-SF作为4至16岁儿童使用问题媒体的衡量标准。
{"title":"Validation of the German version of the problematic media use measure (PMUM-short form) and its relation to child and parental mental health","authors":"Carolin Konrad ,&nbsp;Sarah E. Domoff ,&nbsp;Silvia Schneider","doi":"10.1016/j.chbr.2025.100911","DOIUrl":"10.1016/j.chbr.2025.100911","url":null,"abstract":"<div><div>Problematic media use in children and adolescents has been linked to various mental health concerns. Given the increasing prevalence of media use among German children, it is crucial to have a valid tool to assess problematic media use. The present study validated the 9-item short form of the problematic media use measure PMUM-SF (Domoff et al., 2019) in a German sample of <em>N</em> = 240 parents of 4-16-year-old children and explored its relationship with child and parental mental health. Parents completed questionnaires on their child's problematic media use (PMUM-SF, Domoff et al., 2019; BSMAS, Andreassen et al., 2016), psychosocial functioning (SDQ; Goodman, 1997), their own mental health (DASS bubbles, Brailovskaia et al., 2024) and media use online (CAFE MAQ, Barr et al., 2020; DISRUPT, McDaniel, 2021). The German version of the PMUM-SF demonstrated high internal consistency, and good convergent and incremental validity. Problematic media use was significantly associated with age and with child functioning such as externalizing and internalizing problems. Problematic media use, not duration, predicted child psychosocial functioning. Furthermore, parental media use and parental stress and depression were positively correlated with child problematic media use. The more problematic the media use, the more media rules were set by the parents, when controlling for the child's age. The study thus supports the use of the German version of the PMUM-SF as a measure of problematic media use in children ages 4 through 16 years old.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100911"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When the game turns toxic: Exploring gendered effects on well-being and self-esteem 当游戏变得有毒:探索性别对幸福感和自尊的影响
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2025-12-20 DOI: 10.1016/j.chbr.2025.100914
Beate Wold Hygen , Stian Lydersen , Daria J. Kuss , Tobias Scholz , Christian Wendelborg

Background

Toxic behavior remains a significant problem in online games, but there is scarce knowledge about the psychological effects of being subjected to such behavior, especially across genders. The present study addresses this gap by examining how toxic behavior affects players’ well-being and self-esteem.

Methods

The present study combined elements of an Intensive Longitudinal Study (ILS) and an Ecological Momentary Assessment (EMA) with daily assessments of 88 gamers over a 15-day period, investigating self-esteem, well-being and verbal toxic behavior experience. Data were analysed using a logistic random effects model.

Results

Women received significantly more derogatory comments related to gender, sexuality, religion, or ethnicity, and sexual comments/sounds directed at them compared to men. Women also experienced a significantly stronger effect of toxic behavior on their well-being and self-esteem relative to men.

Conclusions

The present study demonstrates that experiencing toxicity can indeed have significant effects on those who are targeted, especially women. Results should be considered by the gaming industry and gaming community, respectively, to create and maintain safe and welcoming gaming environments.
在网络游戏中,有毒行为仍然是一个严重的问题,但是关于这种行为对心理的影响的知识却很少,尤其是跨性别的。目前的研究通过检查有害行为如何影响球员的幸福感和自尊来解决这一差距。方法本研究结合了密集纵向研究(ILS)和生态瞬间评估(EMA)的元素,对88名游戏玩家进行了为期15天的日常评估,调查了自尊、幸福感和言语毒性行为体验。数据分析采用logistic随机效应模型。结果与男性相比,女性收到了更多与性别、性取向、宗教或种族有关的贬损评论,以及针对她们的性评论/声音。与男性相比,女性的不良行为对她们的幸福感和自尊心的影响也明显更大。目前的研究表明,经历毒性确实会对目标人群产生重大影响,尤其是女性。游戏产业和游戏社区应该分别考虑结果,以创造和维持安全和受欢迎的游戏环境。
{"title":"When the game turns toxic: Exploring gendered effects on well-being and self-esteem","authors":"Beate Wold Hygen ,&nbsp;Stian Lydersen ,&nbsp;Daria J. Kuss ,&nbsp;Tobias Scholz ,&nbsp;Christian Wendelborg","doi":"10.1016/j.chbr.2025.100914","DOIUrl":"10.1016/j.chbr.2025.100914","url":null,"abstract":"<div><h3>Background</h3><div>Toxic behavior remains a significant problem in online games, but there is scarce knowledge about the psychological effects of being subjected to such behavior, especially across genders. The present study addresses this gap by examining how toxic behavior affects players’ well-being and self-esteem.</div></div><div><h3>Methods</h3><div>The present study combined elements of an Intensive Longitudinal Study (ILS) and an Ecological Momentary Assessment (EMA) with daily assessments of 88 gamers over a 15-day period, investigating self-esteem, well-being and verbal toxic behavior experience. Data were analysed using a logistic random effects model.</div></div><div><h3>Results</h3><div>Women received significantly more derogatory comments related to gender, sexuality, religion, or ethnicity, and sexual comments/sounds directed at them compared to men. Women also experienced a significantly stronger effect of toxic behavior on their well-being and self-esteem relative to men.</div></div><div><h3>Conclusions</h3><div>The present study demonstrates that experiencing toxicity can indeed have significant effects on those who are targeted, especially women. Results should be considered by the gaming industry and gaming community, respectively, to create and maintain safe and welcoming gaming environments.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100914"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From interaction to impact: Examining the role of chatbots in enhancing social sustainability using SEM-ANN approach 从互动到影响:使用SEM-ANN方法研究聊天机器人在增强社会可持续性方面的作用
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2026-01-21 DOI: 10.1016/j.chbr.2026.100942
Abdulla Alsharhan , Mostafa Al-Emran , Khaled Shaalan
Despite the growing integration of Artificial Intelligence (AI) in educational settings, there is a notable gap in the literature regarding the role of chatbots in promoting social sustainability in higher education. This study aims to fill this void by developing a model that combines constructs from Task-Technology Fit (TTF), Source Credibility Theory (SCT), Fogg's Model of Web Credibility, and Social Presence Theory (SPT). This research utilizes a hybrid approach of Structural Equation Modeling (SEM) and Artificial Neural Networks (ANN) to evaluate the proposed model based on data collected from 341 students. The results confirmed 13 out of 16 hypotheses, underscoring the pivotal roles of credibility, social presence, and TTF in enhancing chatbot utilization, which, in turn, supports social sustainability. The ANN findings showed that TTF is the most important factor influencing chatbot use, with a normalized importance of 99.1 %. The significance of this research lies in its potential to guide the development of chatbot applications that effectively support universities' educational and social objectives, making a vital contribution to the discourse on technology's role in sustainable educational practices.
尽管人工智能(AI)越来越多地融入教育环境,但关于聊天机器人在促进高等教育中社会可持续性方面的作用,文献中存在显著差距。本研究旨在通过开发一个模型来填补这一空白,该模型结合了任务-技术匹配(TTF)、来源可信度理论(SCT)、福格网络可信度模型和社会存在理论(SPT)的结构。本研究利用结构方程模型(SEM)和人工神经网络(ANN)的混合方法来评估基于341名学生收集的数据提出的模型。结果证实了16个假设中的13个,强调了可信度、社会存在和TTF在提高聊天机器人利用率方面的关键作用,而这反过来又支持了社会的可持续性。人工神经网络的研究结果表明,TTF是影响聊天机器人使用的最重要因素,标准化重要性为99.1%。这项研究的意义在于,它有可能指导聊天机器人应用的发展,有效地支持大学的教育和社会目标,为技术在可持续教育实践中的作用做出重要贡献。
{"title":"From interaction to impact: Examining the role of chatbots in enhancing social sustainability using SEM-ANN approach","authors":"Abdulla Alsharhan ,&nbsp;Mostafa Al-Emran ,&nbsp;Khaled Shaalan","doi":"10.1016/j.chbr.2026.100942","DOIUrl":"10.1016/j.chbr.2026.100942","url":null,"abstract":"<div><div>Despite the growing integration of Artificial Intelligence (AI) in educational settings, there is a notable gap in the literature regarding the role of chatbots in promoting social sustainability in higher education. This study aims to fill this void by developing a model that combines constructs from Task-Technology Fit (TTF), Source Credibility Theory (SCT), Fogg's Model of Web Credibility, and Social Presence Theory (SPT). This research utilizes a hybrid approach of Structural Equation Modeling (SEM) and Artificial Neural Networks (ANN) to evaluate the proposed model based on data collected from 341 students. The results confirmed 13 out of 16 hypotheses, underscoring the pivotal roles of credibility, social presence, and TTF in enhancing chatbot utilization, which, in turn, supports social sustainability. The ANN findings showed that TTF is the most important factor influencing chatbot use, with a normalized importance of 99.1 %. The significance of this research lies in its potential to guide the development of chatbot applications that effectively support universities' educational and social objectives, making a vital contribution to the discourse on technology's role in sustainable educational practices.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100942"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146022608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Country-level differences in socio-economic development and cultural dimensions are associated with workers’ economic expectations of AI: Evidence from 31 countries 社会经济发展和文化层面的国家层面差异与工人对人工智能的经济期望有关:来自31个国家的证据
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2025-12-12 DOI: 10.1016/j.chbr.2025.100905
Leonhard Reiter , Robert Böhm , Christoph Fuchs
Although artificial intelligence (AI) affects many jobs and fundamentally changes labor, surprisingly little is known about workers' economic expectations of AI. Understanding workers' expectations is important, as it can inform the design of effective AI adoption strategies by firms and governments. Such expectations are likely co-shaped by the workers' economic and cultural realities. In the present research, we therefore examined how workers' economic expectations of AI differ between countries and explored the role of socio-economic and cultural dimensions in shaping these differences. Using data from 14,651 workers across 31 countries (Mage = 41.4, SDage = 12.5, 46 % female), including a wide range of different economies and cultures, we find that (i) overall workers hold positive economic expectations of AI but (ii) there is substantial cross-country variance, and (iii) this variance is associated with a country's level of human development and cultural tightness–looseness. Specifically, we find that higher levels of human development are negatively associated with workers' expectations of AI, while cultural tightness is positively associated with their expectations of AI. Additionally, we find that workers' demographics, knowledge of AI, and perceived replacement likelihood are associated with their economic expectations. The findings remain robust when different model specifications and control variables are considered. Our research highlights that workers' economic expectations of AI are associated with both socio-economic development and cultural tightness–looseness, underscoring the importance of the country context when studying how workers anticipate technological change.
尽管人工智能(AI)影响了许多工作,并从根本上改变了劳动力,但令人惊讶的是,人们对人工智能的经济预期知之甚少。了解员工的期望很重要,因为它可以为企业和政府设计有效的人工智能采用策略提供信息。工人们的经济和文化现实很可能共同塑造了这种期望。因此,在本研究中,我们研究了各国工人对人工智能的经济期望如何不同,并探讨了社会经济和文化因素在形成这些差异中的作用。使用来自31个国家(Mage = 41.4, SDage = 12.5, 46%为女性)的14,651名工人的数据,包括各种不同的经济和文化,我们发现(i)总体上工人对人工智能持积极的经济期望,但(ii)存在很大的跨国差异,(iii)这种差异与一个国家的人类发展水平和文化的松紧度有关。具体来说,我们发现较高的人类发展水平与工人对人工智能的期望呈负相关,而文化紧密度与他们对人工智能的期望呈正相关。此外,我们发现工人的人口统计数据、人工智能知识和感知的替代可能性与他们的经济预期有关。当考虑不同的模型规格和控制变量时,结果仍然是稳健的。我们的研究强调,工人对人工智能的经济期望与社会经济发展和文化的松紧度有关,在研究工人如何预测技术变革时,强调了国家背景的重要性。
{"title":"Country-level differences in socio-economic development and cultural dimensions are associated with workers’ economic expectations of AI: Evidence from 31 countries","authors":"Leonhard Reiter ,&nbsp;Robert Böhm ,&nbsp;Christoph Fuchs","doi":"10.1016/j.chbr.2025.100905","DOIUrl":"10.1016/j.chbr.2025.100905","url":null,"abstract":"<div><div>Although artificial intelligence (AI) affects many jobs and fundamentally changes labor, surprisingly little is known about workers' economic expectations of AI. Understanding workers' expectations is important, as it can inform the design of effective AI adoption strategies by firms and governments. Such expectations are likely co-shaped by the workers' economic and cultural realities. In the present research, we therefore examined how workers' economic expectations of AI differ between countries and explored the role of socio-economic and cultural dimensions in shaping these differences. Using data from 14,651 workers across 31 countries (M<sub>age</sub> = 41.4, SD<sub>age</sub> = 12.5, 46 % female), including a wide range of different economies and cultures, we find that (i) overall workers hold positive economic expectations of AI but (ii) there is substantial cross-country variance, and (iii) this variance is associated with a country's level of human development and cultural tightness–looseness. Specifically, we find that higher levels of human development are negatively associated with workers' expectations of AI, while cultural tightness is positively associated with their expectations of AI. Additionally, we find that workers' demographics, knowledge of AI, and perceived replacement likelihood are associated with their economic expectations. The findings remain robust when different model specifications and control variables are considered. Our research highlights that workers' economic expectations of AI are associated with both socio-economic development and cultural tightness–looseness, underscoring the importance of the country context when studying how workers anticipate technological change.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100905"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Being transparent about personalization: The case of personalized digital “just-in-time” nudges for healthier food choice 在个性化方面保持透明:个性化的数字“及时”推动了更健康的食品选择
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2025-12-02 DOI: 10.1016/j.chbr.2025.100894
Rachelle de Vries, Nadine Bol, Nynke van der Laan
Digital “just-in-time” (JIT) nudging is a potentially promising strategy to promote healthier online food choices: nudges can intervene when one adds an unhealthy product to their virtual shopping basket and can be personalized to match users’ preferences or motivations. However, both nudging and personalization raise (ethical) concerns, as disclosing the presence and/or purpose of (non)personalized nudges is necessary for users to make an informed decision but might lead to counterproductive effects. We shed light on this tension by (1) examining the role of transparency in moderating the impact of personalized (versus non-personalized) digital JIT nudges on healthier food choices, and (2) exploring its underlying mechanisms (i.e., increased perceptions of nudge acceptability, experienced autonomy, decreased psychological reactance). In a 2 (Nudge Personalization: Non-personalized vs. Personalized) by 2 (Nudge Transparency: Non-transparent vs. Transparent) between-subjects lab experiment, 200 healthy participants completed an online grocery shopping task on a mock supermarket app. Results showed that nudge personalization and transparency did not lead to healthier food choices. Furthermore, transparency did not influence perceptions of nudge acceptability, experienced autonomy, and psychological reactance regarding personalized JIT nudges, nor did it compromise these outcomes in non-personalized counterparts. We conclude that, while we did not find an added benefit of nudge personalization for healthier online food choices, there was no indication that transparency would be detrimental to the effectiveness nor perceptions of digital JIT nudges in general, which has implications for design and policy concerning nudging in practice.
数字“即时”(JIT)推送是一种潜在的有前途的策略,可以促进更健康的在线食品选择:当用户将不健康的产品添加到虚拟购物篮中时,推送可以进行干预,并且可以根据用户的偏好或动机进行个性化设置。然而,轻推和个性化都引起了(道德)关注,因为披露(非)个性化轻推的存在和/或目的对于用户做出明智的决定是必要的,但可能会导致适得其反的效果。我们通过(1)研究透明度在调节个性化(与非个性化)数字JIT轻推对健康食品选择的影响中的作用,以及(2)探索其潜在机制(即,增加对轻推可接受性的感知,体验自主权,减少心理抗拒)来阐明这种紧张关系。在2(轻推个性化:非个性化vs.个性化)2(轻推透明度:非透明vs.透明)受试者之间的实验室实验中,200名健康参与者在模拟超市应用程序上完成了在线杂货购物任务。结果表明,轻推个性化和透明度并没有导致更健康的食物选择。此外,对于个性化JIT轻推,透明度并不影响轻推可接受性、经验自主权和心理抗拒的感知,也不会影响非个性化对手的这些结果。我们得出的结论是,虽然我们没有发现助推个性化对更健康的在线食品选择有额外的好处,但没有迹象表明,总体而言,透明度会损害数字JIT助推的有效性和认知,这对实践中助推的设计和政策有影响。
{"title":"Being transparent about personalization: The case of personalized digital “just-in-time” nudges for healthier food choice","authors":"Rachelle de Vries,&nbsp;Nadine Bol,&nbsp;Nynke van der Laan","doi":"10.1016/j.chbr.2025.100894","DOIUrl":"10.1016/j.chbr.2025.100894","url":null,"abstract":"<div><div>Digital “just-in-time” (JIT) nudging is a potentially promising strategy to promote healthier online food choices: nudges can intervene when one adds an unhealthy product to their virtual shopping basket and can be personalized to match users’ preferences or motivations. However, both nudging and personalization raise (ethical) concerns, as disclosing the presence and/or purpose of (non)personalized nudges is necessary for users to make an informed decision but might lead to counterproductive effects. We shed light on this tension by (1) examining the role of transparency in moderating the impact of personalized (versus non-personalized) digital JIT nudges on healthier food choices, and (2) exploring its underlying mechanisms (i.e., increased perceptions of nudge acceptability, experienced autonomy, decreased psychological reactance). In a 2 (<em>Nudge Personalization</em>: Non-personalized vs. Personalized) by 2 (<em>Nudge Transparency</em>: Non-transparent vs. Transparent) between-subjects lab experiment, 200 healthy participants completed an online grocery shopping task on a mock supermarket app. Results showed that nudge personalization and transparency did not lead to healthier food choices. Furthermore, transparency did not influence perceptions of nudge acceptability, experienced autonomy, and psychological reactance regarding personalized JIT nudges, nor did it compromise these outcomes in non-personalized counterparts. We conclude that, while we did not find an added benefit of nudge personalization for healthier online food choices, there was no indication that transparency would be detrimental to the effectiveness nor perceptions of digital JIT nudges in general, which has implications for design and policy concerning nudging in practice.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100894"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using machine learning approaches to predict Taiwanese eighth graders' computational thinking performance in ICILS 2023 study 运用机器学习方法预测台湾八年级学生的计算思维表现:ICILS 2023研究
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2025-12-03 DOI: 10.1016/j.chbr.2025.100896
Nitesh Kumar Jha , Meng-Jung Tsai
This study employs machine learning approaches to examine how socio-demographic, student-related, and school-related variables predict the computational thinking (CT) performance of 5211 Taiwanese eighth graders in the ICILS 2023 study (Fraillon, 2024). It further aims to identify the key predictors of Taiwanese students' CT scores in this international evaluation project. The study used seven trained models: Multinomial Logistic Regression, Random Forest, AdaBoost, XGBoost, LightGBM, Gradient Boosting classifier, and Stacking Ensemble to identify and rank the variables that affect CT scores. The CT performance score was used as a binary variable with two classes: below and above average score. Findings showed that XGBoost and Stacking Ensemble performed best when classifying below and average CT scores respectively in terms of precision, recall and F1 score. In addition, among the variables, student-related variables had the highest impact on students' CT skills followed by school-related and socio-demographic. Among student-related variables, CT disposition was the most significant variable followed by ICT self-efficacy and academic multitasking. Further, among school-related factor, learning special applications in class had significant impact followed by a low impact of socio-demographic variables such as home literacy and parents' education. This study offers practical implications for educators, policymakers, and curriculum designers by underscoring the role of CT disposition and recommending targeted support for enhancing students’ digital self-efficacy. Additionally, the study shows the potential of ML for creating adaptive learning environments and guiding data-informed decisions in educational policy and practice.
本研究采用机器学习的方法来检验社会人口统计学、学生相关和学校相关变量如何预测ICILS 2023研究中5211名台湾八年级学生的计算思维(CT)表现(Fraillon, 2024)。在此国际评量计画中,本研究旨在进一步找出台湾学生CT成绩的关键预测因子。该研究使用了7个训练模型:多项逻辑回归、随机森林、AdaBoost、XGBoost、LightGBM、梯度增强分类器和堆叠集成来识别和排序影响CT评分的变量。CT表现评分作为二元变量,分为平均分以下和平均分以上两类。结果表明,XGBoost和Stacking Ensemble分别在准确率、召回率和F1分数上对低于和平均CT分数进行分类时表现最好。此外,在变量中,学生相关变量对学生CT技能的影响最大,其次是学校相关变量和社会人口变量。在学生相关变量中,CT倾向是最显著的变量,其次是ICT自我效能感和学业多任务处理。此外,在学校相关因素中,课堂学习特殊应用的影响显著,其次是家庭文化和父母教育等社会人口变量的影响较低。本研究通过强调CT倾向的作用,并建议有针对性的支持来提高学生的数字自我效能感,为教育工作者、政策制定者和课程设计者提供了实际意义。此外,该研究还显示了机器学习在创建自适应学习环境和指导教育政策和实践中数据知情决策方面的潜力。
{"title":"Using machine learning approaches to predict Taiwanese eighth graders' computational thinking performance in ICILS 2023 study","authors":"Nitesh Kumar Jha ,&nbsp;Meng-Jung Tsai","doi":"10.1016/j.chbr.2025.100896","DOIUrl":"10.1016/j.chbr.2025.100896","url":null,"abstract":"<div><div>This study employs machine learning approaches to examine how socio-demographic, student-related, and school-related variables predict the computational thinking (CT) performance of 5211 Taiwanese eighth graders in the ICILS 2023 study (Fraillon, 2024). It further aims to identify the key predictors of Taiwanese students' CT scores in this international evaluation project. The study used seven trained models: Multinomial Logistic Regression, Random Forest, AdaBoost, XGBoost, LightGBM, Gradient Boosting classifier, and Stacking Ensemble to identify and rank the variables that affect CT scores. The CT performance score was used as a binary variable with two classes: below and above average score. Findings showed that XGBoost and Stacking Ensemble performed best when classifying below and average CT scores respectively in terms of precision, recall and F1 score. In addition, among the variables, student-related variables had the highest impact on students' CT skills followed by school-related and socio-demographic. Among student-related variables, CT disposition was the most significant variable followed by ICT self-efficacy and academic multitasking. Further, among school-related factor, learning special applications in class had significant impact followed by a low impact of socio-demographic variables such as home literacy and parents' education. This study offers practical implications for educators, policymakers, and curriculum designers by underscoring the role of CT disposition and recommending targeted support for enhancing students’ digital self-efficacy. Additionally, the study shows the potential of ML for creating adaptive learning environments and guiding data-informed decisions in educational policy and practice.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100896"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When AI meets medical assessment: A comparative study of GPT-4O and Claude 3 Opus in China's standardized resident physician examinations 当AI满足医学评估:gpt - 40与Claude 3 Opus在中国标准化住院医师考试中的比较研究
IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL Pub Date : 2026-03-01 Epub Date: 2026-02-19 DOI: 10.1016/j.chbr.2026.100974
Wenjie Zhong , Yidan Hu , Ruiqiang Su , Lingcong Xu , Yating Chen , Niezhenghao He , Caiyuan Liu , Ke Xu , Mao Zhao , Wenao Liao , Wei Zhang , Jiang Hu , Fei Wang , Haowen Cui
Previous research has highlighted the importance of standardized residency training (SRT) in cultivating competent medical specialists, with qualification examinations serving as a decisive step. Recent advances in large language models (LLMs) have drawn growing interest in their potential role in medical assessment. The present research investigates the performance of GPT-4O (gpt-4o-2024-05-13) and Claude 3 Opus (claude-3-opus-20240,229) in the context of China's SRT Assessments, focusing on two distinct roles: as AI examinees and as automated exam item generators. We conducted a comparative evaluation using real-world orthopedic and general surgery SRT exam questions in both Chinese and English. In addition, both models were tasked with generating exam questions, which were reviewed by independent medical experts for content validity, curriculum alignment, and psychometric properties. Statistical analyses included answer accuracy, item qualification rates, content coverage, internal consistency, and criterion validity. Findings showed that GPT-4O achieved over 79% answer accuracy across languages and specialties, consistently outperforming Claude 3 Opus. Items generated by GPT-4O exhibited higher qualification rates (89.3% vs. 62.9%), superior curriculum alignment (91.7% vs. 62.2%), and stronger psychometric quality. Moreover, a strong positive correlation (r = 0.707) between GPT-4O-generated exam scores and historical student performance confirmed their practical relevance. The present study demonstrates that LLMs can effectively serve dual roles in medical education, functioning both as reliable test-takers and as effective question generators. However, their application requires expert oversight and adherence to ethical standards to ensure validity in high-stakes assessments.
先前的研究强调了标准化住院医师培训(SRT)在培养合格医学专家方面的重要性,资格考试是决定性的一步。大型语言模型(llm)的最新进展引起了人们对其在医学评估中的潜在作用的兴趣。本研究调查了gpt- 40 (gpt- 40 -2024-05-13)和Claude 3 Opus (Claude -3- Opus -20240,229)在中国SRT评估背景下的表现,重点关注两个不同的角色:作为人工智能考生和自动考试项目生成器。我们使用真实世界中、英文的骨科和普外科SRT试题进行了对比评估。此外,这两个模型的任务都是生成考试问题,这些问题由独立的医学专家对内容效度、课程一致性和心理测量特性进行审查。统计分析包括答案准确性、项目合格率、内容覆盖率、内部一致性和标准效度。调查结果显示,gpt - 40在语言和专业方面的回答准确率超过79%,始终优于Claude 3 Opus。gpt - 40生成的项目合格率更高(89.3%对62.9%),课程一致性更好(91.7%对62.2%),心理测量质量更强。此外,gpt - 40生成的考试成绩与学生历史成绩之间存在强正相关(r = 0.707),证实了它们的实际相关性。本研究表明,法学硕士可以有效地在医学教育中扮演双重角色,既可以作为可靠的考生,也可以作为有效的问题制造者。然而,它们的应用需要专家监督和遵守道德标准,以确保高风险评估的有效性。
{"title":"When AI meets medical assessment: A comparative study of GPT-4O and Claude 3 Opus in China's standardized resident physician examinations","authors":"Wenjie Zhong ,&nbsp;Yidan Hu ,&nbsp;Ruiqiang Su ,&nbsp;Lingcong Xu ,&nbsp;Yating Chen ,&nbsp;Niezhenghao He ,&nbsp;Caiyuan Liu ,&nbsp;Ke Xu ,&nbsp;Mao Zhao ,&nbsp;Wenao Liao ,&nbsp;Wei Zhang ,&nbsp;Jiang Hu ,&nbsp;Fei Wang ,&nbsp;Haowen Cui","doi":"10.1016/j.chbr.2026.100974","DOIUrl":"10.1016/j.chbr.2026.100974","url":null,"abstract":"<div><div>Previous research has highlighted the importance of standardized residency training (SRT) in cultivating competent medical specialists, with qualification examinations serving as a decisive step. Recent advances in large language models (LLMs) have drawn growing interest in their potential role in medical assessment. The present research investigates the performance of GPT-4O (gpt-4o-2024-05-13) and Claude 3 Opus (claude-3-opus-20240,229) in the context of China's SRT Assessments, focusing on two distinct roles: as AI examinees and as automated exam item generators. We conducted a comparative evaluation using real-world orthopedic and general surgery SRT exam questions in both Chinese and English. In addition, both models were tasked with generating exam questions, which were reviewed by independent medical experts for content validity, curriculum alignment, and psychometric properties. Statistical analyses included answer accuracy, item qualification rates, content coverage, internal consistency, and criterion validity. Findings showed that GPT-4O achieved over 79% answer accuracy across languages and specialties, consistently outperforming Claude 3 Opus. Items generated by GPT-4O exhibited higher qualification rates (89.3% vs. 62.9%), superior curriculum alignment (91.7% vs. 62.2%), and stronger psychometric quality. Moreover, a strong positive correlation (r = 0.707) between GPT-4O-generated exam scores and historical student performance confirmed their practical relevance. The present study demonstrates that LLMs can effectively serve dual roles in medical education, functioning both as reliable test-takers and as effective question generators. However, their application requires expert oversight and adherence to ethical standards to ensure validity in high-stakes assessments.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"21 ","pages":"Article 100974"},"PeriodicalIF":5.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147394974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computers in human behavior reports
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1