Pub Date : 2022-11-23DOI: 10.1080/19345747.2022.2128486
Daniel Litwok, Laura R. Peck, Doug Walton
Abstract This article estimates earnings impacts for those who completed long-term college credentials (degrees or college certificates requiring a year or more of study) and those who did not in an experimental evaluation of a federally-funded sectoral job training program. The experimental evaluation found no overall impact of the program on earnings, but we explore whether impacts vary by the long-term credential receipt. In theory, we expect that program impacts should be larger for those who earn long-term credentials. We test this theory using Analysis of Symmetrically-Predicted Endogenous Subgroups (ASPES)—an approach that leverages the experimental design to create experimentally valid treatment and control subgroups associated with some endogenous activity and estimates impacts for these subgroups (subject to assumptions required for identification). We find weak evidence that those who earned long-term credentials experienced meaningfully larger program impacts than those who did not. We posit that these differences are largely due to engagement with support services.
{"title":"How Do the Impacts of Healthcare Training Vary with Credential Length? Evidence from the Health Profession Opportunity Grants Program","authors":"Daniel Litwok, Laura R. Peck, Doug Walton","doi":"10.1080/19345747.2022.2128486","DOIUrl":"https://doi.org/10.1080/19345747.2022.2128486","url":null,"abstract":"Abstract This article estimates earnings impacts for those who completed long-term college credentials (degrees or college certificates requiring a year or more of study) and those who did not in an experimental evaluation of a federally-funded sectoral job training program. The experimental evaluation found no overall impact of the program on earnings, but we explore whether impacts vary by the long-term credential receipt. In theory, we expect that program impacts should be larger for those who earn long-term credentials. We test this theory using Analysis of Symmetrically-Predicted Endogenous Subgroups (ASPES)—an approach that leverages the experimental design to create experimentally valid treatment and control subgroups associated with some endogenous activity and estimates impacts for these subgroups (subject to assumptions required for identification). We find weak evidence that those who earned long-term credentials experienced meaningfully larger program impacts than those who did not. We posit that these differences are largely due to engagement with support services.","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":"16 1","pages":"246 - 270"},"PeriodicalIF":1.8,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44948158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-21DOI: 10.1080/19345747.2022.2144563
L. Keele, M. Lenard, Lindsay C. Page
{"title":"Overlap Violations in Clustered Observational Studies of Educational Interventions","authors":"L. Keele, M. Lenard, Lindsay C. Page","doi":"10.1080/19345747.2022.2144563","DOIUrl":"https://doi.org/10.1080/19345747.2022.2144563","url":null,"abstract":"","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44961083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-08DOI: 10.1080/19345747.2022.2128952
Daniel Litwok, A. Nichols, Azim Shivji, Robert B. Olsen
Abstract Experimental studies of educational interventions are rarely based on representative samples of the target population. This simulation study tests two formal sampling strategies for selecting districts and schools from within strata when they may not agree to participate if selected: (1) balanced selection of the most typical district or school within each stratum; and (2) random selection. We compared the generalizability of the resulting impact estimates, both to each other and to a stylized approach to purposive selection (the typical approach for experimental studies in education). We found that balanced and random selection of schools within randomly selected districts were the most consistent strategies in terms of generalizability, with minimal difference between the two. Separately, for random selection, we tested two strategies for replacing districts that refused to participate—random and nearest neighbor replacement. Random replacement outperformed nearest neighbor replacement in many, but not all, scenarios. Overall, the findings suggest that formal sampling strategies for selecting districts and schools for experimental studies of educational interventions can substantially improve the generalizability of their impact findings.
{"title":"Selecting Districts and Schools for Impact Studies in Education: A Simulation Study of Different Strategies","authors":"Daniel Litwok, A. Nichols, Azim Shivji, Robert B. Olsen","doi":"10.1080/19345747.2022.2128952","DOIUrl":"https://doi.org/10.1080/19345747.2022.2128952","url":null,"abstract":"Abstract Experimental studies of educational interventions are rarely based on representative samples of the target population. This simulation study tests two formal sampling strategies for selecting districts and schools from within strata when they may not agree to participate if selected: (1) balanced selection of the most typical district or school within each stratum; and (2) random selection. We compared the generalizability of the resulting impact estimates, both to each other and to a stylized approach to purposive selection (the typical approach for experimental studies in education). We found that balanced and random selection of schools within randomly selected districts were the most consistent strategies in terms of generalizability, with minimal difference between the two. Separately, for random selection, we tested two strategies for replacing districts that refused to participate—random and nearest neighbor replacement. Random replacement outperformed nearest neighbor replacement in many, but not all, scenarios. Overall, the findings suggest that formal sampling strategies for selecting districts and schools for experimental studies of educational interventions can substantially improve the generalizability of their impact findings.","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":"12 1","pages":"501 - 531"},"PeriodicalIF":1.8,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60072041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1080/19345747.2022.2128485
Peter Boedeker
Abstract Quasi-experimental designs (QEDs) are used to estimate a treatment effect without randomization. Confounders have a causal relationship with the outcome and probability of treatment adoption and if unaccounted for can bias treatment effect estimates. A variable considered a confounder prior to treatment can change after treatment has occurred (i.e., a time-varying confounder) not as a result of treatment (what we call an exogenous time-varying confounder). If the post-treatment value causally affects the outcome to change and this post-treatment value of the exogenous time-varying confounder is unaccounted for, then the treatment effect may be biased. We review the Rubin Causal Model and QED assumptions and the effect an exogenous time-varying confounder has on the ability of QEDs to produce an appropriate counterfactual. We conduct a simulation study evaluating propensity score and difference-in-differences based methods for estimating a treatment effect with an exogenous time-varying confounder. Propensity score weighted two-way fixed effects, inverse probability weighted, or doubly robust difference-in-differences methods, each with propensity scores estimated using post-implementation values of the exogenous time-varying confounder, proved least biased when the exogenous time-varying confounder changed differentially for members of the treatment and control groups.
{"title":"Propensity Score Methods and Difference-in-Differences with an Exogenous Time-Varying Confounder: Evaluation of Methods","authors":"Peter Boedeker","doi":"10.1080/19345747.2022.2128485","DOIUrl":"https://doi.org/10.1080/19345747.2022.2128485","url":null,"abstract":"Abstract Quasi-experimental designs (QEDs) are used to estimate a treatment effect without randomization. Confounders have a causal relationship with the outcome and probability of treatment adoption and if unaccounted for can bias treatment effect estimates. A variable considered a confounder prior to treatment can change after treatment has occurred (i.e., a time-varying confounder) not as a result of treatment (what we call an exogenous time-varying confounder). If the post-treatment value causally affects the outcome to change and this post-treatment value of the exogenous time-varying confounder is unaccounted for, then the treatment effect may be biased. We review the Rubin Causal Model and QED assumptions and the effect an exogenous time-varying confounder has on the ability of QEDs to produce an appropriate counterfactual. We conduct a simulation study evaluating propensity score and difference-in-differences based methods for estimating a treatment effect with an exogenous time-varying confounder. Propensity score weighted two-way fixed effects, inverse probability weighted, or doubly robust difference-in-differences methods, each with propensity scores estimated using post-implementation values of the exogenous time-varying confounder, proved least biased when the exogenous time-varying confounder changed differentially for members of the treatment and control groups.","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":"16 1","pages":"377 - 397"},"PeriodicalIF":1.8,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46664319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1080/19345747.2022.2131661
Beth E. Schueler, Daniel Rodriguez-Segura
Abstract Covid-19 school closures generated interest in tutoring to make up for lost learning time. Tutoring is backed by rigorous research, but it is unclear whether it can be delivered effectively remotely. We study the effect of teacher-student phone calls in Kenya when schools were closed. Schools (n = 105) were randomly assigned for 3rd, 5th and 6th graders (n = 8,319) to receive one of two versions of a 7-week weekly math intervention—5-minute accountability checks or 15-min mini-tutoring sessions—or to the control group. Although calls increased perceptions that teachers cared, accountability checks had no effect on math performance four months later and tutoring decreased achievement among students who returned to their schools after reopening. This was, in part, because the relatively low-achieving students most likely to benefit from calls were least likely to return and take assessments. Tutoring substituted away from more productive uses of time, at least among returners. Neither intervention affected enrollment. Tutoring remains a valuable tool but to avoid unintended consequences, careful attention should be paid to aligning interventions with best practices and targeting to those who benefit most.
{"title":"A Cautionary Tale of Tutoring Hard-to-Reach Students in Kenya","authors":"Beth E. Schueler, Daniel Rodriguez-Segura","doi":"10.1080/19345747.2022.2131661","DOIUrl":"https://doi.org/10.1080/19345747.2022.2131661","url":null,"abstract":"Abstract Covid-19 school closures generated interest in tutoring to make up for lost learning time. Tutoring is backed by rigorous research, but it is unclear whether it can be delivered effectively remotely. We study the effect of teacher-student phone calls in Kenya when schools were closed. Schools (n = 105) were randomly assigned for 3rd, 5th and 6th graders (n = 8,319) to receive one of two versions of a 7-week weekly math intervention—5-minute accountability checks or 15-min mini-tutoring sessions—or to the control group. Although calls increased perceptions that teachers cared, accountability checks had no effect on math performance four months later and tutoring decreased achievement among students who returned to their schools after reopening. This was, in part, because the relatively low-achieving students most likely to benefit from calls were least likely to return and take assessments. Tutoring substituted away from more productive uses of time, at least among returners. Neither intervention affected enrollment. Tutoring remains a valuable tool but to avoid unintended consequences, careful attention should be paid to aligning interventions with best practices and targeting to those who benefit most.","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":"16 1","pages":"442 - 472"},"PeriodicalIF":1.8,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48319071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1080/19345747.2022.2130119
E. Jenner, Katherine Lass, Sarah L. Walsh, H. Demby, Rebekah Leger, Gretchen Falk
Abstract This paper summarizes results from an impact study that employed a randomized controlled trial to estimate the efficacy of a cross-age peer mentor program designed to prevent school dropout during the transition from middle to high school. We present findings from the intent-to-treat (ITT) analyses, which included 1,351 ninth-grade students, alongside those of two different methods that estimate the complier average causal effect (CACE) of participating in the program. Although the confirmatory study, which investigated impact on attendance and credit accrual in ninth grade, was null, ITT analyses on exploratory outcomes indicate modest, yet potentially meaningful program impact on ninth-grade outcomes of discipline, school attachment, and expectations of degree attainment across varying dosage levels. CACE estimates also suggest that a threshold level of program participation broadens the program’s impact on additional exploratory academic achievement and social and emotional learning outcomes. Given the adverse effects of the transition to high school, this promising evidence indicates that the cross-age peer mentoring intervention could be an effective strategy for high schools to implement that leverages existing staff and students.
{"title":"Effects of Cross-Age Peer Mentoring Program within a Randomized Controlled Trial","authors":"E. Jenner, Katherine Lass, Sarah L. Walsh, H. Demby, Rebekah Leger, Gretchen Falk","doi":"10.1080/19345747.2022.2130119","DOIUrl":"https://doi.org/10.1080/19345747.2022.2130119","url":null,"abstract":"Abstract This paper summarizes results from an impact study that employed a randomized controlled trial to estimate the efficacy of a cross-age peer mentor program designed to prevent school dropout during the transition from middle to high school. We present findings from the intent-to-treat (ITT) analyses, which included 1,351 ninth-grade students, alongside those of two different methods that estimate the complier average causal effect (CACE) of participating in the program. Although the confirmatory study, which investigated impact on attendance and credit accrual in ninth grade, was null, ITT analyses on exploratory outcomes indicate modest, yet potentially meaningful program impact on ninth-grade outcomes of discipline, school attachment, and expectations of degree attainment across varying dosage levels. CACE estimates also suggest that a threshold level of program participation broadens the program’s impact on additional exploratory academic achievement and social and emotional learning outcomes. Given the adverse effects of the transition to high school, this promising evidence indicates that the cross-age peer mentoring intervention could be an effective strategy for high schools to implement that leverages existing staff and students.","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":"16 1","pages":"473 - 500"},"PeriodicalIF":1.8,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45733877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/19345747.2022.2051651
Ann-Katrin van den Ham, Aiso Heinze
Abstract Supporting students with difficulties in learning mathematics is a challenge for teachers and educational administrators. Formative assessment is considered to play a successful role in supporting at-risk students as well as students without difficulties in mathematics. There is a need for intervention programs, including formative assessment techniques, that (a) are easy to implement in the regular classroom without requiring radical changes in teachers’ individual teaching style, and (b) are effective in supporting at-risk students at the earliest stage possible in their school careers. This article analyzes an effectiveness trial of a formative assessment program developed to meet these goals and conducted in the first two years of elementary school. The examination of the longitudinal dataset from Grades 1–3 (N = 2,330) revealed an effect after the implementation, which was maintained at nearly the same effect size one year after completion of the program. The findings imply that formative assessment can foster the arithmetic achievement of students at risk as well as that of the entire class without changing the curriculum or teachers’ individual teaching style.
{"title":"Evaluation of a State-Wide Mathematics Support Program for at-Risk Students in Grade 1 and 2 in Germany","authors":"Ann-Katrin van den Ham, Aiso Heinze","doi":"10.1080/19345747.2022.2051651","DOIUrl":"https://doi.org/10.1080/19345747.2022.2051651","url":null,"abstract":"Abstract Supporting students with difficulties in learning mathematics is a challenge for teachers and educational administrators. Formative assessment is considered to play a successful role in supporting at-risk students as well as students without difficulties in mathematics. There is a need for intervention programs, including formative assessment techniques, that (a) are easy to implement in the regular classroom without requiring radical changes in teachers’ individual teaching style, and (b) are effective in supporting at-risk students at the earliest stage possible in their school careers. This article analyzes an effectiveness trial of a formative assessment program developed to meet these goals and conducted in the first two years of elementary school. The examination of the longitudinal dataset from Grades 1–3 (N = 2,330) revealed an effect after the implementation, which was maintained at nearly the same effect size one year after completion of the program. The findings imply that formative assessment can foster the arithmetic achievement of students at risk as well as that of the entire class without changing the curriculum or teachers’ individual teaching style.","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":"15 1","pages":"687 - 716"},"PeriodicalIF":1.8,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42378193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-07DOI: 10.1080/19345747.2022.2110545
Joseph M. Kush, Elise T. Pas, R. Musci, Catherine P. Bradshaw
Abstract Propensity score matching and weighting methods are often used in observational effectiveness studies to reduce imbalance between treated and untreated groups on a set of potential confounders. However, much of the prior methodological literature on matching and weighting has yet to examine performance for scenarios with a majority of treated units, as is often encountered with programs and interventions that have been widely disseminated or “scaled-up.” Using a series of Monte Carlo simulations, we compare the performance of k:1 matching with replacement and weighting methods with respect to covariate balance, bias, and mean squared error. Results indicate that the accuracy of all methods declined as treatment prevalence increased. While weighting produced the largest reduction in covariate imbalance, 1:1 matching with replacement provided the most unbiased treatment effect estimates. An applied example using empirical school-level data is provided to further illustrate the application and interpretation of these methods to a real-world scale-up effort. We conclude by considering the implications of propensity score methods for observational effectiveness studies with a particular focus on educational research.
{"title":"Covariate Balance for Observational Effectiveness Studies: A Comparison of Matching and Weighting","authors":"Joseph M. Kush, Elise T. Pas, R. Musci, Catherine P. Bradshaw","doi":"10.1080/19345747.2022.2110545","DOIUrl":"https://doi.org/10.1080/19345747.2022.2110545","url":null,"abstract":"Abstract Propensity score matching and weighting methods are often used in observational effectiveness studies to reduce imbalance between treated and untreated groups on a set of potential confounders. However, much of the prior methodological literature on matching and weighting has yet to examine performance for scenarios with a majority of treated units, as is often encountered with programs and interventions that have been widely disseminated or “scaled-up.” Using a series of Monte Carlo simulations, we compare the performance of k:1 matching with replacement and weighting methods with respect to covariate balance, bias, and mean squared error. Results indicate that the accuracy of all methods declined as treatment prevalence increased. While weighting produced the largest reduction in covariate imbalance, 1:1 matching with replacement provided the most unbiased treatment effect estimates. An applied example using empirical school-level data is provided to further illustrate the application and interpretation of these methods to a real-world scale-up effort. We conclude by considering the implications of propensity score methods for observational effectiveness studies with a particular focus on educational research.","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":"16 1","pages":"189 - 212"},"PeriodicalIF":1.8,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43304081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-08DOI: 10.1080/19345747.2022.2100301
Francis L. Huang, Bixiu Zhang, Xintong Li
Abstract Binary outcomes are often analyzed in cluster randomized trials (CRTs) using logistic regression and cluster robust standard errors (CRSEs) are routinely used to account for the dependent nature of nested data in such models. However, CRSEs can be problematic when the number of clusters is low (e.g., < 50) and, with CRTs, a low number of clusters is quite common. We investigate the use of the CR2 CRSE and an empirical degrees of freedom adjustment (dofBM) proposed by Bell and McCaffrey with a simulation using binary outcomes and illustrate its use with an applied example. Findings show that the CR2 (w/dofBM) standard errors are relatively unbiased with coverage and power rates for group-level predictors that are comparable to that of a multilevel logistic regression model and can be used even with as few as 10 clusters. To promote its use, a free graphical SPSS extension is provided that can fit logistic (and linear) regression models with a variety of CRSEs and dof adjustments.
{"title":"Using Robust Standard Errors for the Analysis of Binary Outcomes with a Small Number of Clusters","authors":"Francis L. Huang, Bixiu Zhang, Xintong Li","doi":"10.1080/19345747.2022.2100301","DOIUrl":"https://doi.org/10.1080/19345747.2022.2100301","url":null,"abstract":"Abstract Binary outcomes are often analyzed in cluster randomized trials (CRTs) using logistic regression and cluster robust standard errors (CRSEs) are routinely used to account for the dependent nature of nested data in such models. However, CRSEs can be problematic when the number of clusters is low (e.g., < 50) and, with CRTs, a low number of clusters is quite common. We investigate the use of the CR2 CRSE and an empirical degrees of freedom adjustment (dofBM) proposed by Bell and McCaffrey with a simulation using binary outcomes and illustrate its use with an applied example. Findings show that the CR2 (w/dofBM) standard errors are relatively unbiased with coverage and power rates for group-level predictors that are comparable to that of a multilevel logistic regression model and can be used even with as few as 10 clusters. To promote its use, a free graphical SPSS extension is provided that can fit logistic (and linear) regression models with a variety of CRSEs and dof adjustments.","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":"16 1","pages":"213 - 245"},"PeriodicalIF":1.8,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41874328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1080/19345747.2022.2071364
Betsy Wolf, Erica Harbatkin
Abstract One challenge in understanding “what works” in education is that effect sizes may not be comparable across studies, raising questions for practitioners and policymakers using research to select interventions. One factor that consistently relates to the magnitude of effect sizes is the type of outcome measure. This article uses study data from the What Works Clearinghouse to determine average effect sizes by outcome measure type. Outcome measures were categorized by whether the group who developed the measure potentially had a stake in the intervention (non-independent) or not (independent). Using meta-analysis and controlling for study quality and intervention characteristics, we find larger average effect sizes for non-independent measures than for independent measures. Results suggest that larger effect sizes for non-independent measures are not due to differences in implementation fidelity, study quality, or intervention or sample characteristics. Instead, non-independent and independent measures appear to represent partially but minimally overlapping latent constructs. Findings call into question whether policymakers and practitioners should make decisions based on non-independent measures when they are ultimately responsible for improving outcomes on independent measures.
摘要理解教育中“什么有效”的一个挑战是,不同研究的效果大小可能不可比较,这对使用研究选择干预措施的从业者和政策制定者提出了问题。与效应大小大小大小一致相关的一个因素是结果测量的类型。本文使用What Works Clearinghouse的研究数据,根据结果测量类型确定平均效应大小。结果指标根据制定指标的群体是否可能参与干预(非独立)进行分类。通过荟萃分析和对研究质量和干预特征的控制,我们发现非独立测量的平均效应大小大于独立测量。结果表明,非独立测量的较大效应大小不是由于实施保真度、研究质量、干预或样本特征的差异。相反,非独立和独立的措施似乎代表了部分但重叠程度最低的潜在结构。调查结果让人怀疑,当决策者和从业者最终负责改善独立措施的结果时,他们是否应该根据非独立措施做出决定。
{"title":"Making Sense of Effect Sizes: Systematic Differences in Intervention Effect Sizes by Outcome Measure Type","authors":"Betsy Wolf, Erica Harbatkin","doi":"10.1080/19345747.2022.2071364","DOIUrl":"https://doi.org/10.1080/19345747.2022.2071364","url":null,"abstract":"Abstract One challenge in understanding “what works” in education is that effect sizes may not be comparable across studies, raising questions for practitioners and policymakers using research to select interventions. One factor that consistently relates to the magnitude of effect sizes is the type of outcome measure. This article uses study data from the What Works Clearinghouse to determine average effect sizes by outcome measure type. Outcome measures were categorized by whether the group who developed the measure potentially had a stake in the intervention (non-independent) or not (independent). Using meta-analysis and controlling for study quality and intervention characteristics, we find larger average effect sizes for non-independent measures than for independent measures. Results suggest that larger effect sizes for non-independent measures are not due to differences in implementation fidelity, study quality, or intervention or sample characteristics. Instead, non-independent and independent measures appear to represent partially but minimally overlapping latent constructs. Findings call into question whether policymakers and practitioners should make decisions based on non-independent measures when they are ultimately responsible for improving outcomes on independent measures.","PeriodicalId":47260,"journal":{"name":"Journal of Research on Educational Effectiveness","volume":"16 1","pages":"134 - 161"},"PeriodicalIF":1.8,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49353661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}