首页 > 最新文献

Journal of Educational and Behavioral Statistics最新文献

英文 中文
A Diagnostic Tree Model for Adaptive Assessment of Complex Cognitive Processes Using Multidimensional Response Options 使用多维反应选项的复杂认知过程适应性评估诊断树模型
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-04-05 DOI: 10.3102/10769986231158301
M. Davison, David J. Weiss, Joseph N. DeWeese, Ozge Ersan, Gina Biancarosa, Patrick C. Kennedy
A tree model for diagnostic educational testing is described along with Monte Carlo simulations designed to evaluate measurement accuracy based on the model. The model is implemented in an assessment of inferential reading comprehension, the Multiple-Choice Online Causal Comprehension Assessment (MOCCA), through a sequential, multidimensional, computerized adaptive testing (CAT) strategy. Assessment of the first dimension, reading comprehension (RC), is based on the three-parameter logistic model. For diagnostic and intervention purposes, the second dimension, called process propensity (PP), is used to classify struggling students based on their pattern of incorrect responses. In the simulation studies, CAT item selection rules and stopping rules were varied to evaluate their effect on measurement accuracy along dimension RC and classification accuracy along dimension PP. For dimension RC, methods that improved accuracy tended to increase test length. For dimension PP, however, item selection and stopping rules increased classification accuracy without materially increasing test length. A small live-testing pilot study confirmed some of the findings of the simulation studies. Development of the assessment has been guided by psychometric theory, Monte Carlo simulation results, and a theory of instruction and diagnosis.
诊断教育测试的树模型描述与蒙特卡罗模拟设计,以评估基于该模型的测量精度。该模型通过顺序的、多维的、计算机化的自适应测试(CAT)策略应用于推理阅读理解的评估——多项选择在线因果理解评估(MOCCA)中。第一个维度,阅读理解(RC)的评估是基于三参数逻辑模型。为了诊断和干预的目的,第二个维度,称为过程倾向(PP),被用来根据错误反应的模式对挣扎的学生进行分类。在模拟研究中,采用不同的CAT项目选择规则和停止规则来评估它们对沿RC维度的测量精度和沿PP维度的分类精度的影响。对于RC维度,提高精度的方法倾向于增加测试长度。而对于维度PP,项目选择和停止规则在不显著增加测试长度的情况下提高了分类精度。一项小型的现场试验试点研究证实了模拟研究的一些发现。评估的发展以心理测量理论、蒙特卡罗模拟结果以及教学和诊断理论为指导。
{"title":"A Diagnostic Tree Model for Adaptive Assessment of Complex Cognitive Processes Using Multidimensional Response Options","authors":"M. Davison, David J. Weiss, Joseph N. DeWeese, Ozge Ersan, Gina Biancarosa, Patrick C. Kennedy","doi":"10.3102/10769986231158301","DOIUrl":"https://doi.org/10.3102/10769986231158301","url":null,"abstract":"A tree model for diagnostic educational testing is described along with Monte Carlo simulations designed to evaluate measurement accuracy based on the model. The model is implemented in an assessment of inferential reading comprehension, the Multiple-Choice Online Causal Comprehension Assessment (MOCCA), through a sequential, multidimensional, computerized adaptive testing (CAT) strategy. Assessment of the first dimension, reading comprehension (RC), is based on the three-parameter logistic model. For diagnostic and intervention purposes, the second dimension, called process propensity (PP), is used to classify struggling students based on their pattern of incorrect responses. In the simulation studies, CAT item selection rules and stopping rules were varied to evaluate their effect on measurement accuracy along dimension RC and classification accuracy along dimension PP. For dimension RC, methods that improved accuracy tended to increase test length. For dimension PP, however, item selection and stopping rules increased classification accuracy without materially increasing test length. A small live-testing pilot study confirmed some of the findings of the simulation studies. Development of the assessment has been guided by psychometric theory, Monte Carlo simulation results, and a theory of instruction and diagnosis.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44970278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Restricted DINA Model: A Comprehensive Cognitive Diagnostic Model for Classroom-Level Assessments 受限DINA模型:一种用于课堂水平评估的综合认知诊断模型
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-03-30 DOI: 10.3102/10769986231158829
P. Nájera, F. J. Abad, Chia-Yi Chiu, M. Sorrel
The nonparametric classification (NPC) method has been proven to be a suitable procedure for cognitive diagnostic assessments at a classroom level. However, its nonparametric nature impedes the obtention of a model likelihood, hindering the exploration of crucial psychometric aspects, such as model fit or reliability. Reporting the reliability and validity of scores is imperative in any applied context. The present study proposes the restricted deterministic input, noisy “and” gate (R-DINA) model, a parametric cognitive diagnosis model based on the NPC method that provides the same attribute profile classifications as the nonparametric method while allowing to derive a model likelihood and, subsequently, to compute fit and reliability indices. The suitability of the new proposal is examined by means of an exhaustive simulation study and a real data illustration. The results show that the R-DINA model properly recovers the posterior probabilities of attribute mastery, thus becoming a suitable alternative for comprehensive small-scale diagnostic assessments.
非参数分类(NPC)方法已被证明是在课堂水平上进行认知诊断评估的合适程序。然而,它的非参数性质阻碍了对模型可能性的关注,阻碍了对关键心理测量方面的探索,如模型拟合或可靠性。在任何应用环境中,报告分数的可靠性和有效性都是必不可少的。本研究提出了限制确定性输入、噪声“和”门(R-DINA)模型,这是一种基于NPC方法的参数认知诊断模型,它提供了与非参数方法相同的属性轮廓分类,同时允许导出模型似然性,并随后计算拟合和可靠性指数。通过详尽的模拟研究和实际数据说明,对新提案的适用性进行了审查。结果表明,R-DINA模型正确地恢复了属性掌握的后验概率,从而成为综合小规模诊断评估的合适替代方案。
{"title":"The Restricted DINA Model: A Comprehensive Cognitive Diagnostic Model for Classroom-Level Assessments","authors":"P. Nájera, F. J. Abad, Chia-Yi Chiu, M. Sorrel","doi":"10.3102/10769986231158829","DOIUrl":"https://doi.org/10.3102/10769986231158829","url":null,"abstract":"The nonparametric classification (NPC) method has been proven to be a suitable procedure for cognitive diagnostic assessments at a classroom level. However, its nonparametric nature impedes the obtention of a model likelihood, hindering the exploration of crucial psychometric aspects, such as model fit or reliability. Reporting the reliability and validity of scores is imperative in any applied context. The present study proposes the restricted deterministic input, noisy “and” gate (R-DINA) model, a parametric cognitive diagnosis model based on the NPC method that provides the same attribute profile classifications as the nonparametric method while allowing to derive a model likelihood and, subsequently, to compute fit and reliability indices. The suitability of the new proposal is examined by means of an exhaustive simulation study and a real data illustration. The results show that the R-DINA model properly recovers the posterior probabilities of attribute mastery, thus becoming a suitable alternative for comprehensive small-scale diagnostic assessments.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43078375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Finding the Right Grain-Size for Measurement in the Classroom 在教室里找到合适的测量粒度
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-03-30 DOI: 10.3102/10769986231159006
M. Wilson
This article introduces a new framework for articulating how educational assessments can be related to teacher uses in the classroom. It articulates three levels of assessment: macro (use of standardized tests), meso (externally developed items), and micro (on-the-fly in the classroom). The first level is the usual context for educational measurement, but one of the contributions of this article is that it mainly focuses on the latter two levels. Co-ordination of the content across these two levels can be achieved using the concept of a construct map, which articulates the substantive target property at levels of detail that are appropriate for both teacher planning and within-classroom use. This article then describes a statistical model designed to span these two levels and discusses how best to relate this to the macrolevel. Results from a curriculum and instruction development project on the topic of measurement in the elementary school are demonstrated, showing how they are empirically related.
本文介绍了一个新的框架来阐明教育评估如何与教师在课堂上的使用相关。它阐述了三个评估级别:宏观(使用标准化测试)、中观(外部开发的项目)和微观(在课堂上动态)。第一个层次是教育测量的常见背景,但本文的贡献之一是主要关注后两个层次。可以使用构造图的概念来实现这两个层次的内容协调,构造图在适合教师规划和课堂使用的细节层次上阐明了实质性目标属性。然后,本文描述了一个旨在跨越这两个层面的统计模型,并讨论了如何最好地将其与宏观层面联系起来。展示了一个关于小学测量主题的课程和教学开发项目的结果,展示了它们之间的经验联系。
{"title":"Finding the Right Grain-Size for Measurement in the Classroom","authors":"M. Wilson","doi":"10.3102/10769986231159006","DOIUrl":"https://doi.org/10.3102/10769986231159006","url":null,"abstract":"This article introduces a new framework for articulating how educational assessments can be related to teacher uses in the classroom. It articulates three levels of assessment: macro (use of standardized tests), meso (externally developed items), and micro (on-the-fly in the classroom). The first level is the usual context for educational measurement, but one of the contributions of this article is that it mainly focuses on the latter two levels. Co-ordination of the content across these two levels can be achieved using the concept of a construct map, which articulates the substantive target property at levels of detail that are appropriate for both teacher planning and within-classroom use. This article then describes a statistical model designed to span these two levels and discusses how best to relate this to the macrolevel. Results from a curriculum and instruction development project on the topic of measurement in the elementary school are demonstrated, showing how they are empirically related.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45422470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Diagnostic Classification Models Application Considering Real-Life Constraints 考虑现实约束的诊断分类模型优化应用
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-03-30 DOI: 10.3102/10769986231159137
Kun Su, R. Henson
This article provides a process to carefully evaluate the suitability of a content domain for which diagnostic classification models (DCMs) could be applicable and then optimized steps for constructing a test blueprint for applying DCMs and a real-life example illustrating this process. The content domains were carefully evaluated using a set of defined criteria, which are purposely defined to improve the success rate of DCM implementation. Given the domain, the Q-matrix is determined by a simulation-based approach using correct classification rates as criteria. Finally, a physics test on the final Q-matrix was developed, administered, and analyzed by the author and the subject-matter experts (SMEs).
本文提供了一个仔细评估可应用诊断分类模型(dcm)的内容域的适用性的过程,然后优化了构建用于应用dcm的测试蓝图的步骤,并提供了一个说明此过程的实际示例。使用一组定义的标准仔细地评估了内容域,这些标准的定义是为了提高DCM实现的成功率。给定域,q矩阵由基于模拟的方法确定,使用正确的分类率作为标准。最后,作者和主题专家(sme)开发、管理和分析了最终q矩阵的物理测试。
{"title":"Optimizing Diagnostic Classification Models Application Considering Real-Life Constraints","authors":"Kun Su, R. Henson","doi":"10.3102/10769986231159137","DOIUrl":"https://doi.org/10.3102/10769986231159137","url":null,"abstract":"This article provides a process to carefully evaluate the suitability of a content domain for which diagnostic classification models (DCMs) could be applicable and then optimized steps for constructing a test blueprint for applying DCMs and a real-life example illustrating this process. The content domains were carefully evaluated using a set of defined criteria, which are purposely defined to improve the success rate of DCM implementation. Given the domain, the Q-matrix is determined by a simulation-based approach using correct classification rates as criteria. Finally, a physics test on the final Q-matrix was developed, administered, and analyzed by the author and the subject-matter experts (SMEs).","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44298525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Expertise on Offer: Why Isn’t Anyone Buying? 专业知识:为什么没人买?
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-03-29 DOI: 10.3102/10769986231160671
H. Braun
It is a much-lamented fact that research with the potential to inform or influence education policy instead remains policy inert. There are many reasons for this frustrating state of affairs, including a lack of strategic thinking on the part of researchers on how to successfully accomplish outreach—as opposed to communication with peers (in-reach). Another, and a principal focus of this article, is the failure of researchers to appreciate the power of employing compelling narratives to bring their findings to the attention of policymakers and other stakeholders. Accordingly, this article presents some examples of narratives specifically designed for outreach and discusses some of their features. It also considers the challenges in gaining traction with counternarratives once a particular narrative has achieved currency. Researchers should also be mindful of the tenor of the times, with experts now often viewed with skepticism, if not downright hostility. In some quarters, excessive reliance on technocrats is even seen as a threat to democratic governance. The article concludes with some recommendations on how to appropriately enhance the role of research in education policymaking.
一个令人遗憾的事实是,有可能为教育政策提供信息或影响教育政策的研究却在政策上处于惰性。造成这种令人沮丧的状态的原因有很多,包括研究人员在如何成功地完成外联方面缺乏战略思维,而不是与同行沟通(in-reach)。另一个问题,也是本文的主要焦点,是研究人员未能认识到采用令人信服的叙述将他们的发现引起政策制定者和其他利益相关者注意的力量。因此,本文提出了一些专门为外联设计的叙事例子,并讨论了它们的一些特点。它还考虑到,一旦一种特定的叙事获得认可,如何通过反叙事获得吸引力所面临的挑战。研究人员还应该注意时代的趋势,现在的专家即使不是彻头彻尾的敌意,也经常被怀疑。在某些地区,过度依赖技术官僚甚至被视为对民主治理的威胁。文章最后就如何适当提高研究在教育决策中的作用提出了一些建议。
{"title":"Expertise on Offer: Why Isn’t Anyone Buying?","authors":"H. Braun","doi":"10.3102/10769986231160671","DOIUrl":"https://doi.org/10.3102/10769986231160671","url":null,"abstract":"It is a much-lamented fact that research with the potential to inform or influence education policy instead remains policy inert. There are many reasons for this frustrating state of affairs, including a lack of strategic thinking on the part of researchers on how to successfully accomplish outreach—as opposed to communication with peers (in-reach). Another, and a principal focus of this article, is the failure of researchers to appreciate the power of employing compelling narratives to bring their findings to the attention of policymakers and other stakeholders. Accordingly, this article presents some examples of narratives specifically designed for outreach and discusses some of their features. It also considers the challenges in gaining traction with counternarratives once a particular narrative has achieved currency. Researchers should also be mindful of the tenor of the times, with experts now often viewed with skepticism, if not downright hostility. In some quarters, excessive reliance on technocrats is even seen as a threat to democratic governance. The article concludes with some recommendations on how to appropriately enhance the role of research in education policymaking.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"547 - 572"},"PeriodicalIF":2.4,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41996110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Item Preknowledge Using Revisits With Speed and Accuracy 使用Revisits快速准确地检测项目先验知识
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-28 DOI: 10.3102/10769986231153403
Onur Demirkaya, Ummugul Bezirhan, Jinming Zhang
Examinees with item preknowledge tend to obtain inflated test scores that undermine test score validity. With the availability of process data collected in computer-based assessments, the research on detecting item preknowledge has progressed on using both item scores and response times. Item revisit patterns of examinees can also be utilized as an additional source of information. This study proposes a new statistic for detecting item preknowledge when compromised items are known by utilizing the hierarchical speed–accuracy revisits model. By simultaneously evaluating abnormal changes in the latent abilities, speeds, and revisit propensities of examinees, the procedure was found to provide greater statistical power and stronger substantive evidence that an examinee had indeed benefited from item preknowledge.
具有项目先验知识的考生往往会获得夸大的考试成绩,从而破坏考试成绩的有效性。随着计算机评估中收集的过程数据的可用性,检测项目先验知识的研究在使用项目得分和响应时间方面取得了进展。考生的项目重访模式也可以作为额外的信息来源。本研究提出了一种新的统计方法,用于在已知受损项目时,通过利用分层的速度-准确性重新访问模型来检测项目先验知识。通过同时评估考生潜在能力、速度和重访倾向的异常变化,发现该程序提供了更大的统计能力和更有力的实质性证据,证明考生确实受益于项目预先知识。
{"title":"Detecting Item Preknowledge Using Revisits With Speed and Accuracy","authors":"Onur Demirkaya, Ummugul Bezirhan, Jinming Zhang","doi":"10.3102/10769986231153403","DOIUrl":"https://doi.org/10.3102/10769986231153403","url":null,"abstract":"Examinees with item preknowledge tend to obtain inflated test scores that undermine test score validity. With the availability of process data collected in computer-based assessments, the research on detecting item preknowledge has progressed on using both item scores and response times. Item revisit patterns of examinees can also be utilized as an additional source of information. This study proposes a new statistic for detecting item preknowledge when compromised items are known by utilizing the hierarchical speed–accuracy revisits model. By simultaneously evaluating abnormal changes in the latent abilities, speeds, and revisit propensities of examinees, the procedure was found to provide greater statistical power and stronger substantive evidence that an examinee had indeed benefited from item preknowledge.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"521 - 542"},"PeriodicalIF":2.4,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48567321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Causal Latent Transition Model With Multivariate Outcomes and Unobserved Heterogeneity: Application to Human Capital Development 具有多元结果和未观察异质性的因果潜在转移模型:在人力资本开发中的应用
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-09 DOI: 10.3102/10769986221150033
F. Bartolucci, F. Pennoni, G. Vittadini
In order to evaluate the effect of a policy or treatment with pre- and post-treatment outcomes, we propose an approach based on a transition model, which may be applied with multivariate outcomes and accounts for unobserved heterogeneity. This model is based on potential versions of discrete latent variables representing the individual characteristic of interest and may be cast in the hidden (latent) Markov literature for panel data. Therefore, it can be estimated by maximum likelihood in a relatively simple way. The approach extends the difference-in-difference method as it is possible to deal with multivariate outcomes. Moreover, causal effects may be expressed with respect to transition probabilities. The proposal is validated through a simulation study, and it is applied to evaluate educational programs administered to pupils in the sixth and seventh grades during their middle school period. These programs are carried out in an Italian region to improve non-cognitive skills (CSs). We study if they impact also on students’ CSs in Italian and Mathematics in the eighth grade, exploiting the pretreatment test scores available in the fifth grade. The main conclusion is that the educational programs aimed to develop noncognitive abilities help the best students to maintain their higher cognitive abilities over time.
为了通过治疗前后的结果来评估政策或治疗的效果,我们提出了一种基于过渡模型的方法,该方法可以应用于多变量结果并解释未观察到的异质性。该模型基于代表感兴趣的个体特征的离散潜在变量的潜在版本,并且可以在面板数据的隐藏(潜在)马尔可夫文献中进行投射。因此,可以用比较简单的方法用最大似然法进行估计。该方法扩展了差分法,因为它可以处理多变量结果。此外,因果效应可以用转移概率来表示。通过模拟研究验证了该建议,并将其应用于评估中学六年级和七年级学生的教育计划。这些项目在意大利的一个地区开展,旨在提高非认知技能(CSs)。我们利用五年级的预处理测试成绩,研究它们是否也会影响八年级学生的意大利语和数学CSs。主要结论是,旨在培养非认知能力的教育项目有助于最优秀的学生随着时间的推移保持其较高的认知能力。
{"title":"A Causal Latent Transition Model With Multivariate Outcomes and Unobserved Heterogeneity: Application to Human Capital Development","authors":"F. Bartolucci, F. Pennoni, G. Vittadini","doi":"10.3102/10769986221150033","DOIUrl":"https://doi.org/10.3102/10769986221150033","url":null,"abstract":"In order to evaluate the effect of a policy or treatment with pre- and post-treatment outcomes, we propose an approach based on a transition model, which may be applied with multivariate outcomes and accounts for unobserved heterogeneity. This model is based on potential versions of discrete latent variables representing the individual characteristic of interest and may be cast in the hidden (latent) Markov literature for panel data. Therefore, it can be estimated by maximum likelihood in a relatively simple way. The approach extends the difference-in-difference method as it is possible to deal with multivariate outcomes. Moreover, causal effects may be expressed with respect to transition probabilities. The proposal is validated through a simulation study, and it is applied to evaluate educational programs administered to pupils in the sixth and seventh grades during their middle school period. These programs are carried out in an Italian region to improve non-cognitive skills (CSs). We study if they impact also on students’ CSs in Italian and Mathematics in the eighth grade, exploiting the pretreatment test scores available in the fifth grade. The main conclusion is that the educational programs aimed to develop noncognitive abilities help the best students to maintain their higher cognitive abilities over time.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"387 - 419"},"PeriodicalIF":2.4,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48079573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Handling Missing Data in Growth Mixture Models 处理增长混合模型中的缺失数据
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-08 DOI: 10.3102/10769986221149140
D. Y. Lee, Jeffrey R. Harring
A Monte Carlo simulation was performed to compare methods for handling missing data in growth mixture models. The methods considered in the current study were (a) a fully Bayesian approach using a Gibbs sampler, (b) full information maximum likelihood using the expectation–maximization algorithm, (c) multiple imputation, (d) a two-stage multiple imputation method, and (e) listwise deletion. Of the five methods, it was found that the Bayesian approach and two-stage multiple imputation methods generally produce less biased parameter estimates compared to maximum likelihood or single imputation methods, although key differences were observed. Similarities and disparities among methods are highlighted and general recommendations articulated.
进行了蒙特卡罗模拟来比较处理生长混合模型中缺失数据的方法。本研究中考虑的方法是(a)使用Gibbs采样器的全贝叶斯方法,(b)使用期望最大化算法的全信息最大似然方法,(c)多次输入,(d)两阶段多次输入方法,以及(e)列表删除。在这五种方法中,我们发现,与最大似然或单次插值方法相比,贝叶斯方法和两阶段多重插值方法通常产生更少的偏差参数估计,尽管观察到关键差异。强调了各种方法之间的异同,并提出了一般性建议。
{"title":"Handling Missing Data in Growth Mixture Models","authors":"D. Y. Lee, Jeffrey R. Harring","doi":"10.3102/10769986221149140","DOIUrl":"https://doi.org/10.3102/10769986221149140","url":null,"abstract":"A Monte Carlo simulation was performed to compare methods for handling missing data in growth mixture models. The methods considered in the current study were (a) a fully Bayesian approach using a Gibbs sampler, (b) full information maximum likelihood using the expectation–maximization algorithm, (c) multiple imputation, (d) a two-stage multiple imputation method, and (e) listwise deletion. Of the five methods, it was found that the Bayesian approach and two-stage multiple imputation methods generally produce less biased parameter estimates compared to maximum likelihood or single imputation methods, although key differences were observed. Similarities and disparities among methods are highlighted and general recommendations articulated.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"320 - 348"},"PeriodicalIF":2.4,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46904680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Clinical (In)Efficiency in the Prediction of Dangerous Behavior 危险行为预测的临床效率
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-01-11 DOI: 10.3102/10769986221144727
Ehsan Bokhari
The prediction of dangerous and/or violent behavior is particularly important to the conduct of the U.S. criminal justice system when it makes decisions about restrictions of personal freedom, such as preventive detention, forensic commitment, parole, and in some states such as Texas, when to permit an execution to proceed of an individual found guilty of a capital crime. This article discusses the prediction of dangerous behavior both through clinical judgment and actuarial assessment. The general conclusion drawn is that for both clinical and actuarial prediction of dangerous behavior, we are far from a level of accuracy that could justify routine use. To support this later negative assessment, two topic areas are emphasized: (1) the MacArthur Study of Mental Disorder and Violence, including the actuarial instrument developed as part of this project (the Classification of Violence Risk), along with all the data collected that helped develop the instrument; and (2) the U.S. Supreme Court case of Barefoot v. Estelle (1983) and the American Psychiatric Association “friend of the court” brief on the (in)accuracy of clinical prediction for the commission of future violence. Although now three decades old, Barefoot v. Estelle is still the controlling Supreme Court opinion regarding the prediction of future dangerous behavior and the imposition of the death penalty in states, such as Texas; for example, see Coble v. Texas (2011) and the Supreme Court denial of certiorari in that case.
当美国刑事司法系统决定限制人身自由时,如预防性拘留、司法承诺、假释,以及在得克萨斯州等一些州,何时允许对被判死刑的个人执行死刑时,对危险和/或暴力行为的预测对其行为尤其重要。本文从临床判断和精算评估两个方面讨论了危险行为的预测。得出的一般结论是,对于危险行为的临床和精算预测,我们还远远没有达到可以证明常规使用的准确度。为了支持后来的负面评估,强调了两个主题领域:(1)麦克阿瑟精神障碍和暴力研究,包括作为该项目一部分开发的精算工具(暴力风险分类),以及帮助开发该工具所收集的所有数据;以及(2)美国最高法院Barefoot v.Estelle案(1983年)和美国精神病协会“法庭之友”关于未来暴力行为临床预测准确性的简报。尽管Barefoot v.Estelle案已有三十年的历史,但它仍然是最高法院关于预测未来危险行为和在德克萨斯州等州判处死刑的主要意见;例如,参见Coble诉德克萨斯州案(2011年)和最高法院在该案中驳回移审令。
{"title":"Clinical (In)Efficiency in the Prediction of Dangerous Behavior","authors":"Ehsan Bokhari","doi":"10.3102/10769986221144727","DOIUrl":"https://doi.org/10.3102/10769986221144727","url":null,"abstract":"The prediction of dangerous and/or violent behavior is particularly important to the conduct of the U.S. criminal justice system when it makes decisions about restrictions of personal freedom, such as preventive detention, forensic commitment, parole, and in some states such as Texas, when to permit an execution to proceed of an individual found guilty of a capital crime. This article discusses the prediction of dangerous behavior both through clinical judgment and actuarial assessment. The general conclusion drawn is that for both clinical and actuarial prediction of dangerous behavior, we are far from a level of accuracy that could justify routine use. To support this later negative assessment, two topic areas are emphasized: (1) the MacArthur Study of Mental Disorder and Violence, including the actuarial instrument developed as part of this project (the Classification of Violence Risk), along with all the data collected that helped develop the instrument; and (2) the U.S. Supreme Court case of Barefoot v. Estelle (1983) and the American Psychiatric Association “friend of the court” brief on the (in)accuracy of clinical prediction for the commission of future violence. Although now three decades old, Barefoot v. Estelle is still the controlling Supreme Court opinion regarding the prediction of future dangerous behavior and the imposition of the death penalty in states, such as Texas; for example, see Coble v. Texas (2011) and the Supreme Court denial of certiorari in that case.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"661 - 682"},"PeriodicalIF":2.4,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47231762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Randomization P-Value Test for Detecting Copying on Multiple-Choice Exams 用于检测多项选择考试中抄袭的随机P值检验
IF 2.4 3区 心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-01-09 DOI: 10.3102/10769986221143515
J. Lang
This article is concerned with the statistical detection of copying on multiple-choice exams. As an alternative to existing permutation- and model-based copy-detection approaches, a simple randomization p-value (RP) test is proposed. The RP test, which is based on an intuitive match-score statistic, makes no assumptions about the distribution of examinees’ answer vectors and hence is broadly applicable. Especially important in this copy-detection setting, the RP test is shown to be exact in that its size is guaranteed to be no larger than a nominal α value. Additionally, simulation results suggest that the RP test is typically more powerful for copy detection than the existing approximate tests. The development of the RP test is based on the idea that the copy-detection problem can be recast as a causal inference and missing data problem. In particular, the observed data are viewed as a subset of a larger collection of potential values, or counterfactuals, and the null hypothesis of “no copying” is viewed as a “no causal effect” hypothesis and formally expressed in terms of constraints on potential variables.
本文研究多项选择题考试中临摹现象的统计检测。作为现有的排列和基于模型的拷贝检测方法的替代方案,提出了一种简单的随机化p值(RP)测试。RP测试基于直观的匹配分数统计,对考生的答案向量的分布没有任何假设,因此具有广泛的适用性。在这种拷贝检测设置中特别重要的是,RP测试被证明是准确的,因为它的大小保证不大于标称α值。此外,模拟结果表明,RP测试在拷贝检测方面通常比现有的近似测试更强大。RP测试的开发基于这样一种想法,即复制检测问题可以被重新定义为因果推断和数据缺失问题。特别是,观察到的数据被视为更大的潜在值集合或反事实的子集,而“无复制”的零假设被视为“无因果效应”假设,并以对潜在变量的约束形式表示。
{"title":"A Randomization P-Value Test for Detecting Copying on Multiple-Choice Exams","authors":"J. Lang","doi":"10.3102/10769986221143515","DOIUrl":"https://doi.org/10.3102/10769986221143515","url":null,"abstract":"This article is concerned with the statistical detection of copying on multiple-choice exams. As an alternative to existing permutation- and model-based copy-detection approaches, a simple randomization p-value (RP) test is proposed. The RP test, which is based on an intuitive match-score statistic, makes no assumptions about the distribution of examinees’ answer vectors and hence is broadly applicable. Especially important in this copy-detection setting, the RP test is shown to be exact in that its size is guaranteed to be no larger than a nominal α value. Additionally, simulation results suggest that the RP test is typically more powerful for copy detection than the existing approximate tests. The development of the RP test is based on the idea that the copy-detection problem can be recast as a causal inference and missing data problem. In particular, the observed data are viewed as a subset of a larger collection of potential values, or counterfactuals, and the null hypothesis of “no copying” is viewed as a “no causal effect” hypothesis and formally expressed in terms of constraints on potential variables.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"296 - 319"},"PeriodicalIF":2.4,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49603850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational and Behavioral Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1