Using automated analysis to assess middle school students' competence with scientific argumentation

IF 3.6 1区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Journal of Research in Science Teaching Pub Date : 2023-05-04 DOI:10.1002/tea.21864

Christopher D. Wilson, Kevin C. Haudek, Jonathan F. Osborne, Zoë E. Buck Bracey, Tina Cheuk, Brian M. Donovan, Molly A. M. Stuhlsatz, Marisol M. Santiago, Xiaoming Zhai

{"title":"Using automated analysis to assess middle school students' competence with scientific argumentation","authors":"Christopher D. Wilson, Kevin C. Haudek, Jonathan F. Osborne, Zoë E. Buck Bracey, Tina Cheuk, Brian M. Donovan, Molly A. M. Stuhlsatz, Marisol M. Santiago, Xiaoming Zhai","doi":"10.1002/tea.21864","DOIUrl":null,"url":null,"abstract":"<p>Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning—a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the measurement of knowledge and understanding to the measurement of performance and knowledge in use. Performance tasks requiring argumentation must capture the many ways students can construct and evaluate arguments in science, yet such tasks are both expensive and resource-intensive to score. In this study we explore how machine learning text classification techniques can be applied to develop efficient, valid, and accurate constructed-response measures of students' competency with written scientific argumentation that are aligned with a validated argumentation learning progression. Data come from 933 middle school students in the San Francisco Bay Area and are based on three sets of argumentation items in three different science contexts. The findings demonstrate that we have been able to develop computer scoring models that can achieve substantial to almost perfect agreement between human-assigned and computer-predicted scores. Model performance was slightly weaker for harder items targeting higher levels of the learning progression, largely due to the linguistic complexity of these responses and the sparsity of higher-level responses in the training data set. Comparing the efficacy of different scoring approaches revealed that breaking down students' arguments into multiple components (e.g., the presence of an accurate claim or providing sufficient evidence), developing computer models for each component, and combining scores from these analytic components into a holistic score produced better results than holistic scoring approaches. However, this analytical approach was found to be differentially biased when scoring responses from English learners (EL) students as compared to responses from non-EL students on some items. Differences in the severity between human and computer scores for EL between these approaches are explored, and potential sources of bias in automated scoring are discussed.</p>","PeriodicalId":48369,"journal":{"name":"Journal of Research in Science Teaching","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Research in Science Teaching","FirstCategoryId":"95","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/tea.21864","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning—a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the measurement of knowledge and understanding to the measurement of performance and knowledge in use. Performance tasks requiring argumentation must capture the many ways students can construct and evaluate arguments in science, yet such tasks are both expensive and resource-intensive to score. In this study we explore how machine learning text classification techniques can be applied to develop efficient, valid, and accurate constructed-response measures of students' competency with written scientific argumentation that are aligned with a validated argumentation learning progression. Data come from 933 middle school students in the San Francisco Bay Area and are based on three sets of argumentation items in three different science contexts. The findings demonstrate that we have been able to develop computer scoring models that can achieve substantial to almost perfect agreement between human-assigned and computer-predicted scores. Model performance was slightly weaker for harder items targeting higher levels of the learning progression, largely due to the linguistic complexity of these responses and the sparsity of higher-level responses in the training data set. Comparing the efficacy of different scoring approaches revealed that breaking down students' arguments into multiple components (e.g., the presence of an accurate claim or providing sufficient evidence), developing computer models for each component, and combining scores from these analytic components into a holistic score produced better results than holistic scoring approaches. However, this analytical approach was found to be differentially biased when scoring responses from English learners (EL) students as compared to responses from non-EL students on some items. Differences in the severity between human and computer scores for EL between these approaches are explored, and potential sources of bias in automated scoring are discussed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用自动分析评估中学生的科学论证能力

论证是科学教育的基础，既是科学推理的突出特点，也是一种有效的学习模式--这一点在当代的框架和标准中都有所体现。然而，要在学校科学中成功实施论证，就需要科学评估范式的转变，从对知识和理解的测量转向对使用中的表现和知识的测量。需要论证的成绩任务必须捕捉到学生构建和评价科学论证的多种方式，然而这类任务的评分既昂贵又耗费资源。在本研究中，我们探讨了如何应用机器学习文本分类技术来开发高效、有效和准确的构建式回答测量方法，以衡量学生的书面科学论证能力，并与经过验证的论证学习进度保持一致。数据来自旧金山湾区的 933 名中学生，基于三种不同科学背景下的三组论证项目。研究结果表明，我们已经能够开发出计算机评分模型，该模型能够在人工指定分数和计算机预测分数之间实现几乎完美的一致性。主要由于这些回答在语言上的复杂性和训练数据集中较高层次回答的稀缺性，模型在针对较高层次学习进展的较难项目上的表现稍弱。比较不同评分方法的效果后发现，将学生的论点分解为多个组成部分（例如，是否有准确的主张或是否提供了充分的证据），为每个组成部分开发计算机模型，并将这些分析组成部分的分数合并为一个整体分数，会比整体评分方法产生更好的结果。然而，在对英语学习者（EL）与非英语学习者（EL）学生在某些项目上的回答进行评分时，发现这种分析方法存在不同程度的偏差。本文探讨了这些方法对英语学习者的人工评分和计算机评分之间的严重程度差异，并讨论了自动评分中可能存在的偏差来源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Research in Science Teaching EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

8.80

自引率

19.60%

发文量

期刊介绍： Journal of Research in Science Teaching, the official journal of NARST: A Worldwide Organization for Improving Science Teaching and Learning Through Research, publishes reports for science education researchers and practitioners on issues of science teaching and learning and science education policy. Scholarly manuscripts within the domain of the Journal of Research in Science Teaching include, but are not limited to, investigations employing qualitative, ethnographic, historical, survey, philosophical, case study research, quantitative, experimental, quasi-experimental, data mining, and data analytics approaches; position papers; policy perspectives; critical reviews of the literature; and comments and criticism.

期刊最新文献

Issue Information “Powered by emotions”: Exploring emotion induction in out‐of‐school authentic science learning Issue Information Developing and evaluating the extended epistemic vigilance framework The IPM cycle: An instructional tool for promoting students' engagement in modeling practices and construction of models