{"title":"Accuracy of progress monitoring decision rules to evaluate response to instruction with two computer adaptive tests","authors":"Ethan R. Van Norman, Emily R. Forcht","doi":"10.1016/j.jsp.2024.101319","DOIUrl":null,"url":null,"abstract":"<div><p>Computer adaptive tests have become popular assessments to screen students for academic risk. Research is emerging regarding their use as progress monitoring tools to measure response to instruction. We evaluated the accuracy of the trend-line decision rule when applied to outcomes from a frequently used reading computer adaptive test (i.e., Star Reading [SR]) and frequently used math computer adaptive test (i.e., Star Math [SM]). Analyses of extant SR and SM data were conducted to inform conditions for simulations to determine the number of assessments required to yield sufficient sensitivity (i.e., probability of recommending an instructional change when a change was warranted) and specificity (i.e., probability of recommending maintaining an intervention when a change was not warranted) when comparing performance to goal lines based upon a future target score (i.e., benchmark) as well as normative comparisons (50th and 75th percentiles). The extant dataset of SR outcomes consisted of monthly progress monitoring data from 993 Grade 3, 804 Grade 4, and 709 Grade 5 students from multiple states in the United States northwest. Data for SM were also drawn from the northwest and contained outcomes from 518 Grade 3, 474 Grade 4, and 391 Grade 5 students. Grade level samples were predominately White (range = 59.89%–67.72%) followed by Latinx (range = 9.65%–15.94%). Results of simulations suggest that when data were collected once a month, seven, eight, and nine observations were required to support low-stakes decisions with SR for Grades 3, 4, and 5, respectively. For SM, nine, ten, and eight observations were required for Grades, 3, 4, and 5, respectively. Given the length of time required to support reasonably accurate decisions, recommendations to consider other types of assessments and decision-making frameworks for academic progress monitoring are provided.</p></div>","PeriodicalId":48232,"journal":{"name":"Journal of School Psychology","volume":"105 ","pages":"Article 101319"},"PeriodicalIF":3.8000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of School Psychology","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022440524000396","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, SOCIAL","Score":null,"Total":0}
引用次数: 0
Abstract
Computer adaptive tests have become popular assessments to screen students for academic risk. Research is emerging regarding their use as progress monitoring tools to measure response to instruction. We evaluated the accuracy of the trend-line decision rule when applied to outcomes from a frequently used reading computer adaptive test (i.e., Star Reading [SR]) and frequently used math computer adaptive test (i.e., Star Math [SM]). Analyses of extant SR and SM data were conducted to inform conditions for simulations to determine the number of assessments required to yield sufficient sensitivity (i.e., probability of recommending an instructional change when a change was warranted) and specificity (i.e., probability of recommending maintaining an intervention when a change was not warranted) when comparing performance to goal lines based upon a future target score (i.e., benchmark) as well as normative comparisons (50th and 75th percentiles). The extant dataset of SR outcomes consisted of monthly progress monitoring data from 993 Grade 3, 804 Grade 4, and 709 Grade 5 students from multiple states in the United States northwest. Data for SM were also drawn from the northwest and contained outcomes from 518 Grade 3, 474 Grade 4, and 391 Grade 5 students. Grade level samples were predominately White (range = 59.89%–67.72%) followed by Latinx (range = 9.65%–15.94%). Results of simulations suggest that when data were collected once a month, seven, eight, and nine observations were required to support low-stakes decisions with SR for Grades 3, 4, and 5, respectively. For SM, nine, ten, and eight observations were required for Grades, 3, 4, and 5, respectively. Given the length of time required to support reasonably accurate decisions, recommendations to consider other types of assessments and decision-making frameworks for academic progress monitoring are provided.
期刊介绍:
The Journal of School Psychology publishes original empirical articles and critical reviews of the literature on research and practices relevant to psychological and behavioral processes in school settings. JSP presents research on intervention mechanisms and approaches; schooling effects on the development of social, cognitive, mental-health, and achievement-related outcomes; assessment; and consultation. Submissions from a variety of disciplines are encouraged. All manuscripts are read by the Editor and one or more editorial consultants with the intent of providing appropriate and constructive written reviews.