Machine learning and deep learning systems for automated measurement of "advanced" theory of mind: Reliability and validity in children and adolescents.

IF 3.3 2区 心理学 Q1 PSYCHOLOGY, CLINICAL Psychological Assessment Pub Date : 2023-02-01 DOI:10.1037/pas0001186
Rory T Devine, Venelin Kovatchev, Imogen Grumley Traynor, Phillip Smith, Mark Lee
{"title":"Machine learning and deep learning systems for automated measurement of \"advanced\" theory of mind: Reliability and validity in children and adolescents.","authors":"Rory T Devine,&nbsp;Venelin Kovatchev,&nbsp;Imogen Grumley Traynor,&nbsp;Phillip Smith,&nbsp;Mark Lee","doi":"10.1037/pas0001186","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding individual differences in theory of mind (ToM; the ability to attribute mental states to others) in middle childhood and adolescence hinges on the availability of robust and scalable measures. Open-ended response tasks yield valid indicators of ToM but are labor intensive and difficult to compare across studies. We examined the reliability and validity of new machine learning and deep learning neural network automated scoring systems for measuring ToM in children and adolescents. Two large samples of British children and adolescents aged between 7 and 13 years (Sample 1: N = 1,135, Mage = 10.22 years, SD = 1.45; Sample 2: N = 1,020, Mage = 10.36 years, SD = 1.27) completed the silent film and strange stories tasks. Teachers rated Sample 2 children's social competence with peers. A single latent-factor explained variation in performance on both the silent film and strange stories task (in Sample 1 and 2) and test performance was sensitive to age-related differences and individual differences within each age-group. A deep learning neural network automated scoring system trained on Sample 1 exhibited interrater reliability and measurement invariance with manual ratings in Sample 2. Validity of ratings from the automated scoring system was supported by unique positive associations between ToM and teacher-rated social competence. The results demonstrate that reliable and valid measures of ToM can be obtained using the new freely available deep learning neural network automated scoring system to rate open-ended text responses. (PsycInfo Database Record (c) 2023 APA, all rights reserved).</p>","PeriodicalId":20770,"journal":{"name":"Psychological Assessment","volume":"35 2","pages":"165-177"},"PeriodicalIF":3.3000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological Assessment","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/pas0001186","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}
引用次数: 2

Abstract

Understanding individual differences in theory of mind (ToM; the ability to attribute mental states to others) in middle childhood and adolescence hinges on the availability of robust and scalable measures. Open-ended response tasks yield valid indicators of ToM but are labor intensive and difficult to compare across studies. We examined the reliability and validity of new machine learning and deep learning neural network automated scoring systems for measuring ToM in children and adolescents. Two large samples of British children and adolescents aged between 7 and 13 years (Sample 1: N = 1,135, Mage = 10.22 years, SD = 1.45; Sample 2: N = 1,020, Mage = 10.36 years, SD = 1.27) completed the silent film and strange stories tasks. Teachers rated Sample 2 children's social competence with peers. A single latent-factor explained variation in performance on both the silent film and strange stories task (in Sample 1 and 2) and test performance was sensitive to age-related differences and individual differences within each age-group. A deep learning neural network automated scoring system trained on Sample 1 exhibited interrater reliability and measurement invariance with manual ratings in Sample 2. Validity of ratings from the automated scoring system was supported by unique positive associations between ToM and teacher-rated social competence. The results demonstrate that reliable and valid measures of ToM can be obtained using the new freely available deep learning neural network automated scoring system to rate open-ended text responses. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于“高级”心智理论自动测量的机器学习和深度学习系统:儿童和青少年的可靠性和有效性。
理解心理理论中的个体差异(ToM;在儿童中期和青少年时期,将心理状态归因于他人的能力取决于是否有可靠且可扩展的测量方法。开放式回答任务产生有效的ToM指标,但劳动密集型且难以在研究之间进行比较。我们研究了新的机器学习和深度学习神经网络自动评分系统用于测量儿童和青少年的ToM的可靠性和有效性。两组年龄在7 - 13岁之间的英国儿童和青少年(样本1:N = 1135, Mage = 10.22 years, SD = 1.45;样本2:N = 1020,年龄= 10.36,SD = 1.27)完成了默片和奇谈任务。教师评价样本2儿童与同伴的社交能力。一个单一的潜在因素解释了默片和奇怪故事任务(样本1和样本2)的表现差异,测试表现对年龄相关差异和每个年龄组的个体差异敏感。在样本1上训练的深度学习神经网络自动评分系统在样本2中表现出与人工评分的互估可靠性和测量不变性。自动评分系统的有效性得到了ToM和教师评定的社会能力之间独特的正相关的支持。结果表明,使用新的免费的深度学习神经网络自动评分系统对开放式文本响应进行评分,可以获得可靠有效的ToM度量。(PsycInfo数据库记录(c) 2023 APA,版权所有)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Psychological Assessment
Psychological Assessment PSYCHOLOGY, CLINICAL-
CiteScore
5.70
自引率
5.60%
发文量
167
期刊介绍: Psychological Assessment is concerned mainly with empirical research on measurement and evaluation relevant to the broad field of clinical psychology. Submissions are welcome in the areas of assessment processes and methods. Included are - clinical judgment and the application of decision-making models - paradigms derived from basic psychological research in cognition, personality–social psychology, and biological psychology - development, validation, and application of assessment instruments, observational methods, and interviews
期刊最新文献
Development and validation of a method for deriving MMPI-3 scores from MMPI-2/MMPI-2-RF item responses. Evaluation of the Multidimensional Personality Questionnaire (MPQ) Unlikely Virtues Scale in the detection of underreporting. Prospectively predicting violent and aggressive incidents in prison practice with the Risk Screener Violence (RS-V): Results from a multisite prison study. Development of the Food Addiction Symptom Inventory: The first clinical interview to assess ultra-processed food addiction. Does the Bayley-4 measure the same constructs across girls and boys and infants, toddlers, and preschoolers?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1