Denis Dumas, Selcuk Acar, Kelly Berthiaume, Peter Organisciak, David Eby, Katalin Grajzel, Theadora Vlaamster, Michele Newman, Melanie Carrera
{"title":"是什么让孩子对创造力评估的反应难以可靠判断?","authors":"Denis Dumas, Selcuk Acar, Kelly Berthiaume, Peter Organisciak, David Eby, Katalin Grajzel, Theadora Vlaamster, Michele Newman, Melanie Carrera","doi":"10.1002/jocb.588","DOIUrl":null,"url":null,"abstract":"<p>Open-ended verbal creativity assessments are commonly administered in psychological research and in educational practice to elementary-aged children. Children's responses are then typically rated by teams of judges who are trained to identify original ideas, hopefully with a degree of inter-rater agreement. Even in cases where the judges are reliable, some residual disagreement on the originality of the responses is inevitable. Here, we modeled the predictors of inter-rater disagreement in a large (i.e., 387 elementary school students and 10,449 individual item responses) dataset of children's creativity assessment responses. Our five trained judges rated the responses with a high degree of consistency reliability (<i>α</i> = 0.844), but we undertook this study to predict the residual disagreement. We used an adaptive LASSO model to predict 72% of the variance in our judges' residual disagreement and found that there were certain types of responses on which our judges tended to disagree more. The main effects in our model showed that responses that were less original, more elaborate, prompted by a Uses task, from younger children, or from male students, were all more difficult for the judges to rate reliably. Among the interaction effects, we found that our judges were also more likely to disagree on highly original responses from Gifted/Talented students, responses from Latinx students who were identified as English Language Learners, or responses from Asian students who took a lot of time on the task. Given that human judgments such as these are currently being used to train artificial intelligence systems to rate responses to creativity assessments, we believe understanding their nuances is important.</p>","PeriodicalId":39915,"journal":{"name":"Journal of Creative Behavior","volume":"57 3","pages":"419-438"},"PeriodicalIF":2.8000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jocb.588","citationCount":"1","resultStr":"{\"title\":\"What Makes Children's Responses to Creativity Assessments Difficult to Judge Reliably?\",\"authors\":\"Denis Dumas, Selcuk Acar, Kelly Berthiaume, Peter Organisciak, David Eby, Katalin Grajzel, Theadora Vlaamster, Michele Newman, Melanie Carrera\",\"doi\":\"10.1002/jocb.588\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Open-ended verbal creativity assessments are commonly administered in psychological research and in educational practice to elementary-aged children. Children's responses are then typically rated by teams of judges who are trained to identify original ideas, hopefully with a degree of inter-rater agreement. Even in cases where the judges are reliable, some residual disagreement on the originality of the responses is inevitable. Here, we modeled the predictors of inter-rater disagreement in a large (i.e., 387 elementary school students and 10,449 individual item responses) dataset of children's creativity assessment responses. Our five trained judges rated the responses with a high degree of consistency reliability (<i>α</i> = 0.844), but we undertook this study to predict the residual disagreement. We used an adaptive LASSO model to predict 72% of the variance in our judges' residual disagreement and found that there were certain types of responses on which our judges tended to disagree more. The main effects in our model showed that responses that were less original, more elaborate, prompted by a Uses task, from younger children, or from male students, were all more difficult for the judges to rate reliably. Among the interaction effects, we found that our judges were also more likely to disagree on highly original responses from Gifted/Talented students, responses from Latinx students who were identified as English Language Learners, or responses from Asian students who took a lot of time on the task. Given that human judgments such as these are currently being used to train artificial intelligence systems to rate responses to creativity assessments, we believe understanding their nuances is important.</p>\",\"PeriodicalId\":39915,\"journal\":{\"name\":\"Journal of Creative Behavior\",\"volume\":\"57 3\",\"pages\":\"419-438\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jocb.588\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Creative Behavior\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jocb.588\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHOLOGY, EDUCATIONAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Creative Behavior","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jocb.588","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, EDUCATIONAL","Score":null,"Total":0}
What Makes Children's Responses to Creativity Assessments Difficult to Judge Reliably?
Open-ended verbal creativity assessments are commonly administered in psychological research and in educational practice to elementary-aged children. Children's responses are then typically rated by teams of judges who are trained to identify original ideas, hopefully with a degree of inter-rater agreement. Even in cases where the judges are reliable, some residual disagreement on the originality of the responses is inevitable. Here, we modeled the predictors of inter-rater disagreement in a large (i.e., 387 elementary school students and 10,449 individual item responses) dataset of children's creativity assessment responses. Our five trained judges rated the responses with a high degree of consistency reliability (α = 0.844), but we undertook this study to predict the residual disagreement. We used an adaptive LASSO model to predict 72% of the variance in our judges' residual disagreement and found that there were certain types of responses on which our judges tended to disagree more. The main effects in our model showed that responses that were less original, more elaborate, prompted by a Uses task, from younger children, or from male students, were all more difficult for the judges to rate reliably. Among the interaction effects, we found that our judges were also more likely to disagree on highly original responses from Gifted/Talented students, responses from Latinx students who were identified as English Language Learners, or responses from Asian students who took a lot of time on the task. Given that human judgments such as these are currently being used to train artificial intelligence systems to rate responses to creativity assessments, we believe understanding their nuances is important.
期刊介绍:
The Journal of Creative Behavior is our quarterly academic journal citing the most current research in creative thinking. For nearly four decades JCB has been the benchmark scientific periodical in the field. It provides up to date cutting-edge ideas about creativity in education, psychology, business, arts and more.