Christopher D. Wilson, Kevin C. Haudek, Jonathan F. Osborne, Zoë E. Buck Bracey, Tina Cheuk, Brian M. Donovan, Molly A. M. Stuhlsatz, Marisol M. Santiago, Xiaoming Zhai
{"title":"Using automated analysis to assess middle school students' competence with scientific argumentation","authors":"Christopher D. Wilson, Kevin C. Haudek, Jonathan F. Osborne, Zoë E. Buck Bracey, Tina Cheuk, Brian M. Donovan, Molly A. M. Stuhlsatz, Marisol M. Santiago, Xiaoming Zhai","doi":"10.1002/tea.21864","DOIUrl":null,"url":null,"abstract":"<p>Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning—a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the measurement of knowledge and understanding to the measurement of performance and knowledge in use. Performance tasks requiring argumentation must capture the many ways students can construct and evaluate arguments in science, yet such tasks are both expensive and resource-intensive to score. In this study we explore how machine learning text classification techniques can be applied to develop efficient, valid, and accurate constructed-response measures of students' competency with written scientific argumentation that are aligned with a validated argumentation learning progression. Data come from 933 middle school students in the San Francisco Bay Area and are based on three sets of argumentation items in three different science contexts. The findings demonstrate that we have been able to develop computer scoring models that can achieve substantial to almost perfect agreement between human-assigned and computer-predicted scores. Model performance was slightly weaker for harder items targeting higher levels of the learning progression, largely due to the linguistic complexity of these responses and the sparsity of higher-level responses in the training data set. Comparing the efficacy of different scoring approaches revealed that breaking down students' arguments into multiple components (e.g., the presence of an accurate claim or providing sufficient evidence), developing computer models for each component, and combining scores from these analytic components into a holistic score produced better results than holistic scoring approaches. However, this analytical approach was found to be differentially biased when scoring responses from English learners (EL) students as compared to responses from non-EL students on some items. Differences in the severity between human and computer scores for EL between these approaches are explored, and potential sources of bias in automated scoring are discussed.</p>","PeriodicalId":48369,"journal":{"name":"Journal of Research in Science Teaching","volume":"61 1","pages":"38-69"},"PeriodicalIF":3.6000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Research in Science Teaching","FirstCategoryId":"95","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/tea.21864","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0
Abstract
Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning—a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the measurement of knowledge and understanding to the measurement of performance and knowledge in use. Performance tasks requiring argumentation must capture the many ways students can construct and evaluate arguments in science, yet such tasks are both expensive and resource-intensive to score. In this study we explore how machine learning text classification techniques can be applied to develop efficient, valid, and accurate constructed-response measures of students' competency with written scientific argumentation that are aligned with a validated argumentation learning progression. Data come from 933 middle school students in the San Francisco Bay Area and are based on three sets of argumentation items in three different science contexts. The findings demonstrate that we have been able to develop computer scoring models that can achieve substantial to almost perfect agreement between human-assigned and computer-predicted scores. Model performance was slightly weaker for harder items targeting higher levels of the learning progression, largely due to the linguistic complexity of these responses and the sparsity of higher-level responses in the training data set. Comparing the efficacy of different scoring approaches revealed that breaking down students' arguments into multiple components (e.g., the presence of an accurate claim or providing sufficient evidence), developing computer models for each component, and combining scores from these analytic components into a holistic score produced better results than holistic scoring approaches. However, this analytical approach was found to be differentially biased when scoring responses from English learners (EL) students as compared to responses from non-EL students on some items. Differences in the severity between human and computer scores for EL between these approaches are explored, and potential sources of bias in automated scoring are discussed.
期刊介绍:
Journal of Research in Science Teaching, the official journal of NARST: A Worldwide Organization for Improving Science Teaching and Learning Through Research, publishes reports for science education researchers and practitioners on issues of science teaching and learning and science education policy. Scholarly manuscripts within the domain of the Journal of Research in Science Teaching include, but are not limited to, investigations employing qualitative, ethnographic, historical, survey, philosophical, case study research, quantitative, experimental, quasi-experimental, data mining, and data analytics approaches; position papers; policy perspectives; critical reviews of the literature; and comments and criticism.