首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
Differential and Functional Response Time Item Analysis: An Application to Understanding Paper versus Digital Reading Processes 差异和功能响应时间项目分析:应用于理解纸质和数字阅读过程
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-04-08 DOI: 10.1111/jedm.12389
Sun-Joo Cho, Amanda Goodwin, Matthew Naveiras, Jorge Salas

Despite the growing interest in incorporating response time data into item response models, there has been a lack of research investigating how the effect of speed on the probability of a correct response varies across different groups (e.g., experimental conditions) for various items (i.e., differential response time item analysis). Furthermore, previous research has shown a complex relationship between response time and accuracy, necessitating a functional analysis to understand the patterns that manifest from this relationship. In this study, response time data are incorporated into an item response model for two purposes: (a) to examine how individuals' speed within an experimental condition affects their response accuracy on an item, and (b) to detect the differences in individuals' speed between conditions in the presence of within-condition effects. For these two purposes, by-variable smooth functions are employed to model differential and functional response time effects by experimental condition for each item. This model is illustrated using an empirical data set to describe the effect of individuals' speed on their reading comprehension ability in two experimental conditions of reading medium (paper vs. digital) by item. A simulation study showed that the recovery of parameters and by-variable smooth functions of response time was satisfactory, and that the type I error rate and power of the test for the by-variable smooth function of response time were acceptable in conditions similar to the empirical data set. In addition, the proposed method correctly identified the range of response time where between-condition differences in the effect of response time on the probability of a correct response were accurate.

尽管将反应时间数据纳入项目反应模型的兴趣日益浓厚,但一直缺乏研究调查不同组别(如实验条件)对不同项目的速度对正确反应概率的影响是如何变化的(即差异反应时间项目分析)。此外,以往的研究表明,反应时间与正确率之间存在复杂的关系,因此有必要进行功能分析,以了解这种关系所体现的模式。在本研究中,将反应时间数据纳入项目反应模型有两个目的:(a) 检验个体在实验条件下的速度如何影响其对项目的反应准确性;(b) 在存在条件内效应的情况下,检测个体在不同条件下的速度差异。为了实现这两个目的,我们采用了副变量平滑函数来模拟每个项目在实验条件下的差异和功能反应时间效应。该模型通过一组经验数据来说明在两种阅读媒介(纸质阅读媒介和数字阅读媒介)的实验条件下,个人速度对其阅读理解能力的影响。模拟研究表明,参数和响应时间的副变平滑函数的恢复令人满意,在与经验数据集类似的条件下,响应时间的副变平滑函数的 I 类错误率和检验功率是可以接受的。此外,所提出的方法还正确确定了响应时间的范围,在该范围内,响应时间对正确响应概率的影响的条件间差异是准确的。
{"title":"Differential and Functional Response Time Item Analysis: An Application to Understanding Paper versus Digital Reading Processes","authors":"Sun-Joo Cho,&nbsp;Amanda Goodwin,&nbsp;Matthew Naveiras,&nbsp;Jorge Salas","doi":"10.1111/jedm.12389","DOIUrl":"10.1111/jedm.12389","url":null,"abstract":"<p>Despite the growing interest in incorporating response time data into item response models, there has been a lack of research investigating how the effect of speed on the probability of a correct response varies across different groups (e.g., experimental conditions) for various items (i.e., differential response time item analysis). Furthermore, previous research has shown a complex relationship between response time and accuracy, necessitating a functional analysis to understand the patterns that manifest from this relationship. In this study, response time data are incorporated into an item response model for two purposes: (a) to examine how individuals' speed within an experimental condition affects their response accuracy on an item, and (b) to detect the differences in individuals' speed between conditions in the presence of within-condition effects. For these two purposes, by-variable smooth functions are employed to model differential and functional response time effects by experimental condition for each item. This model is illustrated using an empirical data set to describe the effect of individuals' speed on their reading comprehension ability in two experimental conditions of reading medium (paper vs. digital) by item. A simulation study showed that the recovery of parameters and by-variable smooth functions of response time was satisfactory, and that the type I error rate and power of the test for the by-variable smooth function of response time were acceptable in conditions similar to the empirical data set. In addition, the proposed method correctly identified the range of response time where between-condition differences in the effect of response time on the probability of a correct response were accurate.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12389","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140563044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Hierarchical Attribute Structures in Diagnostic Classification Models with Multiple Attempts 在多次尝试的诊断分类模型中建立分层属性结构模型
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-03-30 DOI: 10.1111/jedm.12387
Tae Yeon Kwon, A. Corinne Huggins-Manley, Jonathan Templin, Mingying Zheng

In classroom assessments, examinees can often answer test items multiple times, resulting in sequential multiple-attempt data. Sequential diagnostic classification models (DCMs) have been developed for such data. As student learning processes may be aligned with a hierarchy of measured traits, this study aimed to develop a sequential hierarchical DCM (sequential HDCM), which combines a sequential DCM with the HDCM, and investigate classification accuracy of the model in the presence of hierarchies when multiple attempts are allowed in dynamic assessment. We investigated the model's impact on classification accuracy when hierarchical structures are correctly specified, misspecified, or overspecified. The results indicate that (1) a sequential HDCM accurately classified students as masters and nonmasters when the data had a hierarchical structure; (2) a sequential HDCM produced similar or slightly higher classification accuracy than nonhierarchical sequential LCDM when the data had hierarchical structures; and (3) the misspecification of the hierarchical structure of the data resulted in lower classification accuracy when the misspecified model had fewer attribute profiles than the true model. We discuss limitations and make recommendations on using the proposed model in practice. This study provides practitioners with information about the possibilities for psychometric modeling of dynamic classroom assessment data.

在课堂评估中,考生往往会多次回答测试题目,从而产生连续的多次尝试数据。针对此类数据开发了序列诊断分类模型(DCM)。由于学生的学习过程可能与所测特质的层次相一致,本研究旨在开发一种顺序层次诊断分类模型(顺序 HDCM),它将顺序 DCM 与 HDCM 相结合,并研究了在动态评估中允许多次尝试时,模型在存在层次的情况下的分类准确性。我们研究了当分层结构被正确指定、错误指定或过度指定时,模型对分类准确性的影响。结果表明:(1) 当数据具有层次结构时,顺序式 HDCM 能准确地将学生分类为硕士和非硕士;(2) 当数据具有层次结构时,顺序式 HDCM 的分类准确率与非层次式顺序 LCDM 相似或略高于后者;(3) 当错误指定的模型比真实模型具有更少的属性剖面时,对数据层次结构的错误指定会导致分类准确率降低。我们讨论了局限性,并就如何在实践中使用所提出的模型提出了建议。本研究为从业人员提供了有关动态课堂评估数据心理计量建模可能性的信息。
{"title":"Modeling Hierarchical Attribute Structures in Diagnostic Classification Models with Multiple Attempts","authors":"Tae Yeon Kwon,&nbsp;A. Corinne Huggins-Manley,&nbsp;Jonathan Templin,&nbsp;Mingying Zheng","doi":"10.1111/jedm.12387","DOIUrl":"10.1111/jedm.12387","url":null,"abstract":"<p>In classroom assessments, examinees can often answer test items multiple times, resulting in sequential multiple-attempt data. Sequential diagnostic classification models (DCMs) have been developed for such data. As student learning processes may be aligned with a hierarchy of measured traits, this study aimed to develop a sequential hierarchical DCM (sequential HDCM), which combines a sequential DCM with the HDCM, and investigate classification accuracy of the model in the presence of hierarchies when multiple attempts are allowed in dynamic assessment. We investigated the model's impact on classification accuracy when hierarchical structures are correctly specified, misspecified, or overspecified. The results indicate that (1) a sequential HDCM accurately classified students as masters and nonmasters when the data had a hierarchical structure; (2) a sequential HDCM produced similar or slightly higher classification accuracy than nonhierarchical sequential LCDM when the data had hierarchical structures; and (3) the misspecification of the hierarchical structure of the data resulted in lower classification accuracy when the misspecified model had fewer attribute profiles than the true model. We discuss limitations and make recommendations on using the proposed model in practice. This study provides practitioners with information about the possibilities for psychometric modeling of dynamic classroom assessment data.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140562989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption 在违反等方差假定的情况下进行 DIF 检测的贝叶斯调节非线性因子分析方法
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-03-15 DOI: 10.1111/jedm.12388
Sooyong Lee, Suhwa Han, Seung W. Choi

Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and how it can be addressed through moderated nonlinear factor analysis (MNLFA) model via Bayesian estimation approach to overcome limitations from the restrictive assumption. The Bayesian MNLFA approach suggested in this study better control Type I errors by freely estimating latent factor variances across different groups. Our experimentation with simulated data demonstrates that the BMNFA models outperform the existing MIMIC models, in terms of Type I error control as well as parameter recovery. The results suggest that the MNLFA models have the potential to be a superior choice to the existing MIMIC models, especially in situations where the assumption of equal latent variance assumption is not likely to hold.

研究表明,当违反潜在方差相等的假设时,多指标多原因(MIMIC)模型在检测差异项目功能(DIF)时可能会导致 I 类错误率上升。本研究解释了违反等方差假设如何对非均匀 DIF 的检测产生不利影响,以及如何通过贝叶斯估计方法的调节非线性因素分析(MNLFA)模型来克服限制性假设的局限性。本研究提出的贝叶斯 MNLFA 方法通过自由估计不同组的潜在因子方差,更好地控制了 I 类误差。我们用模拟数据进行的实验表明,BMNFA 模型在 I 类误差控制和参数恢复方面优于现有的 MIMIC 模型。结果表明,MNLFA 模型有可能成为优于现有 MIMIC 模型的选择,尤其是在等潜方差假设不可能成立的情况下。
{"title":"A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption","authors":"Sooyong Lee,&nbsp;Suhwa Han,&nbsp;Seung W. Choi","doi":"10.1111/jedm.12388","DOIUrl":"10.1111/jedm.12388","url":null,"abstract":"<p>Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and how it can be addressed through moderated nonlinear factor analysis (MNLFA) model via Bayesian estimation approach to overcome limitations from the restrictive assumption. The Bayesian MNLFA approach suggested in this study better control Type I errors by freely estimating latent factor variances across different groups. Our experimentation with simulated data demonstrates that the BMNFA models outperform the existing MIMIC models, in terms of Type I error control as well as parameter recovery. The results suggest that the MNLFA models have the potential to be a superior choice to the existing MIMIC models, especially in situations where the assumption of equal latent variance assumption is not likely to hold.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Calibration of Items for Multidimensional Achievement Tests 多维成就测验项目的优化校准
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-03-14 DOI: 10.1111/jedm.12386
Mahmood Ul Hassan, Frank Miller

Multidimensional achievement tests are recently gaining more importance in educational and psychological measurements. For example, multidimensional diagnostic tests can help students to determine which particular domain of knowledge they need to improve for better performance. To estimate the characteristics of candidate items (calibration) for future multidimensional achievement tests, we use optimal design theory. We generalize a previously developed exchange algorithm for optimal design computation to the multidimensional setting. We also develop an asymptotic theorem saying which item should be calibrated by examinees with extreme abilities. For several examples, we compute the optimal design numerically with the exchange algorithm. We see clear structures in these results and explain them using the asymptotic theorem. Moreover, we investigate the performance of the optimal design in a simulation study.

最近,多维成就测验在教育和心理测量中越来越受到重视。例如,多维诊断性测验可以帮助学生确定他们需要改进哪一领域的知识,以取得更好的成绩。为了估计未来多维成就测验候选项目(校准)的特征,我们采用了最优设计理论。我们将以前开发的最优设计计算交换算法推广到多维环境中。我们还提出了一个渐近定理,说明哪一个项目应由能力极端的考生来校准。对于几个例子,我们用交换算法数值计算了最优设计。我们在这些结果中看到了清晰的结构,并用渐近定理对其进行了解释。此外,我们还在模拟研究中考察了最优设计的性能。
{"title":"Optimal Calibration of Items for Multidimensional Achievement Tests","authors":"Mahmood Ul Hassan,&nbsp;Frank Miller","doi":"10.1111/jedm.12386","DOIUrl":"10.1111/jedm.12386","url":null,"abstract":"<p>Multidimensional achievement tests are recently gaining more importance in educational and psychological measurements. For example, multidimensional diagnostic tests can help students to determine which particular domain of knowledge they need to improve for better performance. To estimate the characteristics of candidate items (calibration) for future multidimensional achievement tests, we use optimal design theory. We generalize a previously developed exchange algorithm for optimal design computation to the multidimensional setting. We also develop an asymptotic theorem saying which item should be calibrated by examinees with extreme abilities. For several examples, we compute the optimal design numerically with the exchange algorithm. We see clear structures in these results and explain them using the asymptotic theorem. Moreover, we investigate the performance of the optimal design in a simulation study.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models 多组 DIF 检测:三层 GLMM 与多组 IRT 模型的比较
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-02-14 DOI: 10.1111/jedm.12384
Carmen Köhler, Lale Khorramdel, Artur Pokropek, Johannes Hartig

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The current study compares two approaches for DIF detection: a multiple-group item response theory (MG-IRT) model and a generalized linear mixed model (GLMM). In the MG-IRT model approach, item parameters are constrained to be equal across groups and DIF is evaluated for each item in each group. In the GLMM, groups are treated as random, and item difficulties are modeled as correlated random effects with a joint multivariate normal distribution. Its nested structure allows the estimation of item difficulty variances and covariances at the group level. We use an excerpt from the PISA 2015 reading domain as an exemplary empirical investigation, and conduct a simulation study to compare the performance of the two approaches. Results from the empirical investigation show that the detection of countries with DIF is similar in both approaches. Results from the simulation study confirm this finding and indicate slight advantages of the MG-IRT model approach.

对于适用于不同群体(如来自不同州的学生;不同国家的病人)的评估量表,需要对多群体差异项目功能(MG-DIF)进行评估,以确保具有相同特质水平但来自不同群体的受访者对特定项目的反应概率相同。本研究比较了两种 DIF 检测方法:多组项目反应理论(MG-IRT)模型和广义线性混合模型(GLMM)。在 MG-IRT 模型方法中,各组的项目参数被限制为相等,DIF 针对每组中的每个项目进行评估。在 GLMM 中,各组被视为随机组,而项目难度则被建模为具有联合多元正态分布的相关随机效应。其嵌套结构允许在组水平上估计项目难度方差和协方差。我们以 2015 年国际学生评估项目(PISA)阅读领域的一个节选作为实证调查的范例,并进行了模拟研究,以比较两种方法的性能。实证调查的结果表明,两种方法对存在 DIF 的国家的检测结果相似。模拟研究的结果证实了这一结论,并表明 MG-IRT 模型方法略胜一筹。
{"title":"DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models","authors":"Carmen Köhler,&nbsp;Lale Khorramdel,&nbsp;Artur Pokropek,&nbsp;Johannes Hartig","doi":"10.1111/jedm.12384","DOIUrl":"10.1111/jedm.12384","url":null,"abstract":"<p>For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The current study compares two approaches for DIF detection: a multiple-group item response theory (MG-IRT) model and a generalized linear mixed model (GLMM). In the MG-IRT model approach, item parameters are constrained to be equal across groups and DIF is evaluated for each item in each group. In the GLMM, groups are treated as random, and item difficulties are modeled as correlated random effects with a joint multivariate normal distribution. Its nested structure allows the estimation of item difficulty variances and covariances at the group level. We use an excerpt from the PISA 2015 reading domain as an exemplary empirical investigation, and conduct a simulation study to compare the performance of the two approaches. Results from the empirical investigation show that the detection of countries with DIF is similar in both approaches. Results from the simulation study confirm this finding and indicate slight advantages of the MG-IRT model approach.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12384","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139927955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Argument-Based Approach to Validity: Developing a Living Document and Incorporating Preregistration 基于论证的有效性方法:编制活文件并纳入预注册内容
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-02-14 DOI: 10.1111/jedm.12385
Daria Gerasimova

I propose two practical advances to the argument-based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up-to-date summary of the validity argument. As the validation process may span across multiple studies, the living document allows future users of the instrument to access the entire validity argument in one place. Second, I describe how preregistration can be incorporated in the argument-based approach. Specifically, I distinguish between two types of preregistration: preregistration of the argument and preregistration of validation studies. Preregistration of the argument is a single preregistration that is specified for the entire validation process. Here, the developer specifies interpretations, uses, and claims before collecting validity evidence. Preregistration of a validation study refers to preregistering a single validation study that aims to evaluate a set of claims. Here, the developer describes study components (e.g., research design, data collection, data analysis, etc.), before collecting data. Both preregistration types have the potential to reduce the risk of bias (e.g., hindsight and confirmation biases), as well as to allow others to evaluate the risk of bias and, hence, calibrate confidence, in the developer's evaluation of the validity argument.

对于基于论证的有效性方法,我提出了两个切实可行的进展:制定一份活文件和纳入预先登记。首先,我提出了活文件的潜在结构,其中包括有效性论证的最新摘要。由于验证过程可能会跨越多项研究,因此活文档可以让仪器的未来用户在一个地方获取整个有效性论证。其次,我介绍了如何将预注册纳入基于论证的方法。具体来说,我将预注册分为两种类型:论点预注册和验证研究预注册。论点预注册是为整个验证过程指定的单一预注册。在这里,开发者在收集有效性证据之前,要明确解释、用途和主张。验证研究的预注册是指对旨在评估一系列主张的单一验证研究进行预注册。在这里,开发者在收集数据之前,先描述研究的组成部分(如研究设计、数据收集、数据分析等)。这两种预注册类型都有可能降低偏差风险(如事后认识偏差和确认偏差),并允许其他人评估偏差风险,从而校准对开发者有效性论证评估的信心。
{"title":"Argument-Based Approach to Validity: Developing a Living Document and Incorporating Preregistration","authors":"Daria Gerasimova","doi":"10.1111/jedm.12385","DOIUrl":"10.1111/jedm.12385","url":null,"abstract":"<p>I propose two practical advances to the argument-based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up-to-date summary of the validity argument. As the validation process may span across multiple studies, the living document allows future users of the instrument to access the entire validity argument in one place. Second, I describe how preregistration can be incorporated in the argument-based approach. Specifically, I distinguish between two types of preregistration: preregistration of the argument and preregistration of validation studies. Preregistration of the argument is a single preregistration that is specified for the entire validation process. Here, the developer specifies interpretations, uses, and claims before collecting validity evidence. Preregistration of a validation study refers to preregistering a single validation study that aims to evaluate a set of claims. Here, the developer describes study components (e.g., research design, data collection, data analysis, etc.), before collecting data. Both preregistration types have the potential to reduce the risk of bias (e.g., hindsight and confirmation biases), as well as to allow others to evaluate the risk of bias and, hence, calibrate confidence, in the developer's evaluation of the validity argument.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139837198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Argument‐Based Approach to Validity: Developing a Living Document and Incorporating Preregistration 基于论证的有效性方法:编制活文件并纳入预注册内容
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-02-14 DOI: 10.1111/jedm.12385
Daria Gerasimova
I propose two practical advances to the argument‐based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up‐to‐date summary of the validity argument. As the validation process may span across multiple studies, the living document allows future users of the instrument to access the entire validity argument in one place. Second, I describe how preregistration can be incorporated in the argument‐based approach. Specifically, I distinguish between two types of preregistration: preregistration of the argument and preregistration of validation studies. Preregistration of the argument is a single preregistration that is specified for the entire validation process. Here, the developer specifies interpretations, uses, and claims before collecting validity evidence. Preregistration of a validation study refers to preregistering a single validation study that aims to evaluate a set of claims. Here, the developer describes study components (e.g., research design, data collection, data analysis, etc.), before collecting data. Both preregistration types have the potential to reduce the risk of bias (e.g., hindsight and confirmation biases), as well as to allow others to evaluate the risk of bias and, hence, calibrate confidence, in the developer's evaluation of the validity argument.
对于基于论证的有效性方法,我提出了两个切实可行的进展:制定一份活文件和纳入预先登记。首先,我提出了活文件的潜在结构,其中包括有效性论证的最新摘要。由于验证过程可能会跨越多项研究,因此活文档可以让仪器的未来用户在一个地方获取整个有效性论证。其次,我介绍了如何将预注册纳入基于论证的方法。具体来说,我将预注册分为两种类型:论点预注册和验证研究预注册。论点预注册是为整个验证过程指定的单一预注册。在这里,开发者在收集有效性证据之前,要明确解释、用途和主张。验证研究的预注册是指对旨在评估一系列主张的单一验证研究进行预注册。在这里,开发者在收集数据之前,先描述研究的组成部分(如研究设计、数据收集、数据分析等)。这两种预注册类型都有可能降低偏差风险(如事后认识偏差和确认偏差),并允许其他人评估偏差风险,从而校准对开发者有效性论证评估的信心。
{"title":"Argument‐Based Approach to Validity: Developing a Living Document and Incorporating Preregistration","authors":"Daria Gerasimova","doi":"10.1111/jedm.12385","DOIUrl":"https://doi.org/10.1111/jedm.12385","url":null,"abstract":"I propose two practical advances to the argument‐based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up‐to‐date summary of the validity argument. As the validation process may span across multiple studies, the living document allows future users of the instrument to access the entire validity argument in one place. Second, I describe how preregistration can be incorporated in the argument‐based approach. Specifically, I distinguish between two types of preregistration: preregistration of the argument and preregistration of validation studies. Preregistration of the argument is a single preregistration that is specified for the entire validation process. Here, the developer specifies interpretations, uses, and claims before collecting validity evidence. Preregistration of a validation study refers to preregistering a single validation study that aims to evaluate a set of claims. Here, the developer describes study components (e.g., research design, data collection, data analysis, etc.), before collecting data. Both preregistration types have the potential to reduce the risk of bias (e.g., hindsight and confirmation biases), as well as to allow others to evaluate the risk of bias and, hence, calibrate confidence, in the developer's evaluation of the validity argument.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139777592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Dual-Purpose Model for Binary Data: Estimating Ability and Misconceptions 二元数据的两用模型:估计能力和误解
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-01-04 DOI: 10.1111/jedm.12383
Wenchao Ma, Miguel A. Sorrel, Xiaoming Zhai, Yuan Ge

Most existing diagnostic models are developed to detect whether students have mastered a set of skills of interest, but few have focused on identifying what scientific misconceptions students possess. This article developed a general dual-purpose model for simultaneously estimating students' overall ability and the presence and absence of misconceptions. The expectation-maximization algorithm was developed to estimate the model parameters. A simulation study was conducted to evaluate to what extent the parameters can be accurately recovered under varied conditions. A set of real data in science education was also analyzed to examine the viability of the proposed model in practice.

大多数现有的诊断模型都是为了检测学生是否掌握了一套感兴趣的技能而开发的,但很少有诊断模型侧重于识别学生存在哪些科学误解。本文开发了一种通用的两用模型,可同时估计学生的整体能力以及是否存在误解。本文开发了期望最大化算法来估计模型参数。本文进行了一项模拟研究,以评估在不同条件下参数的准确恢复程度。此外,还分析了科学教育中的一组真实数据,以考察所提模型在实践中的可行性。
{"title":"A Dual-Purpose Model for Binary Data: Estimating Ability and Misconceptions","authors":"Wenchao Ma,&nbsp;Miguel A. Sorrel,&nbsp;Xiaoming Zhai,&nbsp;Yuan Ge","doi":"10.1111/jedm.12383","DOIUrl":"10.1111/jedm.12383","url":null,"abstract":"<p>Most existing diagnostic models are developed to detect whether students have mastered a set of skills of interest, but few have focused on identifying what scientific misconceptions students possess. This article developed a general dual-purpose model for simultaneously estimating students' overall ability and the presence and absence of misconceptions. The expectation-maximization algorithm was developed to estimate the model parameters. A simulation study was conducted to evaluate to what extent the parameters can be accurately recovered under varied conditions. A set of real data in science education was also analyzed to examine the viability of the proposed model in practice.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Highly Adaptive Testing Design for PISA 一种高度适应性的PISA测试设计
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2023-12-03 DOI: 10.1111/jedm.12382
Andreas Frey, Christoph König, Aron Fink
The highly adaptive testing (HAT) design is introduced as an alternative test design for the Programme for International Student Assessment (PISA). The principle of HAT is to be as adaptive as possible when selecting items while accounting for PISA's nonstatistical constraints and addressing issues concerning PISA such as item position effects. HAT combines established methods from the field of computerized adaptive testing. It is implemented in R and code is provided. HAT was compared to the PISA 2018 multistage design (MST) in a simulation study based on a factorial design with the independent variables response probability (RP; .50, .62), item pool optimality (PISA 2018, optimal), and ability level (low, medium, high). PISA-specific conditions regarding sample size, missing responses, and nonstatistical constraints were implemented. HAT clearly outperformed MST regarding test information, RMSE, and constraint management across ability groups but it showed slightly weaker item exposure. Raising RP to .62 did not decrease test information much and is therefore a viable option to foster students’ test-taking experience with HAT. Test information for HAT was up to three times higher than for MST when using a hypothetical optimal item pool. Summarizing, HAT proved to be a promising and applicable test design for PISA.
高适应性测试(HAT)设计是作为国际学生评估项目(PISA)的替代测试设计引入的。HAT的原则是在选择项目时尽可能适应,同时考虑到PISA的非统计约束和解决与PISA有关的问题,如项目位置效应。HAT结合了计算机自适应测试领域的既定方法。它是用R实现的,并提供了代码。在一项基于因子设计的模拟研究中,将HAT与PISA 2018多阶段设计(MST)进行了比较,其中自变量为反应概率(RP;.50, .62)、项目池最优性(PISA 2018,最优)和能力水平(低、中、高)。实施了关于样本量、缺失回复和非统计约束的pisa特定条件。HAT在测试信息、RMSE和跨能力组的约束管理方面明显优于MST,但是它显示出稍弱的项目暴露。将RP提高到0.62并没有减少太多的考试信息,因此是一个可行的选择,以促进学生的应试经验与HAT。当使用假设的最优项目池时,HAT的测试信息比MST高三倍。综上所述,HAT被证明是一个很有前途和适用于PISA的测试设计。
{"title":"A Highly Adaptive Testing Design for PISA","authors":"Andreas Frey, Christoph König, Aron Fink","doi":"10.1111/jedm.12382","DOIUrl":"https://doi.org/10.1111/jedm.12382","url":null,"abstract":"The highly adaptive testing (HAT) design is introduced as an alternative test design for the Programme for International Student Assessment (PISA). The principle of HAT is to be as adaptive as possible when selecting items while accounting for PISA's nonstatistical constraints and addressing issues concerning PISA such as item position effects. HAT combines established methods from the field of computerized adaptive testing. It is implemented in R and code is provided. HAT was compared to the PISA 2018 multistage design (MST) in a simulation study based on a factorial design with the independent variables response probability (RP; .50, .62), item pool optimality (PISA 2018, optimal), and ability level (low, medium, high). PISA-specific conditions regarding sample size, missing responses, and nonstatistical constraints were implemented. HAT clearly outperformed MST regarding test information, RMSE, and constraint management across ability groups but it showed slightly weaker item exposure. Raising RP to .62 did not decrease test information much and is therefore a viable option to foster students’ test-taking experience with HAT. Test information for HAT was up to three times higher than for MST when using a hypothetical optimal item pool. Summarizing, HAT proved to be a promising and applicable test design for PISA.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138539704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Highly Adaptive Testing Design for PISA 一种高度适应性的PISA测试设计
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2023-12-03 DOI: 10.1111/jedm.12382
Andreas Frey, Christoph König, Aron Fink
The highly adaptive testing (HAT) design is introduced as an alternative test design for the Programme for International Student Assessment (PISA). The principle of HAT is to be as adaptive as possible when selecting items while accounting for PISA's nonstatistical constraints and addressing issues concerning PISA such as item position effects. HAT combines established methods from the field of computerized adaptive testing. It is implemented in R and code is provided. HAT was compared to the PISA 2018 multistage design (MST) in a simulation study based on a factorial design with the independent variables response probability (RP; .50, .62), item pool optimality (PISA 2018, optimal), and ability level (low, medium, high). PISA-specific conditions regarding sample size, missing responses, and nonstatistical constraints were implemented. HAT clearly outperformed MST regarding test information, RMSE, and constraint management across ability groups but it showed slightly weaker item exposure. Raising RP to .62 did not decrease test information much and is therefore a viable option to foster students’ test-taking experience with HAT. Test information for HAT was up to three times higher than for MST when using a hypothetical optimal item pool. Summarizing, HAT proved to be a promising and applicable test design for PISA.
高适应性测试(HAT)设计是作为国际学生评估项目(PISA)的替代测试设计引入的。HAT的原则是在选择项目时尽可能适应,同时考虑到PISA的非统计约束和解决与PISA有关的问题,如项目位置效应。HAT结合了计算机自适应测试领域的既定方法。它是用R实现的,并提供了代码。在一项基于因子设计的模拟研究中,将HAT与PISA 2018多阶段设计(MST)进行了比较,其中自变量为反应概率(RP;.50, .62)、项目池最优性(PISA 2018,最优)和能力水平(低、中、高)。实施了关于样本量、缺失回复和非统计约束的pisa特定条件。HAT在测试信息、RMSE和跨能力组的约束管理方面明显优于MST,但是它显示出稍弱的项目暴露。将RP提高到0.62并没有减少太多的考试信息,因此是一个可行的选择,以促进学生的应试经验与HAT。当使用假设的最优项目池时,HAT的测试信息比MST高三倍。综上所述,HAT被证明是一个很有前途和适用于PISA的测试设计。
{"title":"A Highly Adaptive Testing Design for PISA","authors":"Andreas Frey, Christoph König, Aron Fink","doi":"10.1111/jedm.12382","DOIUrl":"https://doi.org/10.1111/jedm.12382","url":null,"abstract":"The highly adaptive testing (HAT) design is introduced as an alternative test design for the Programme for International Student Assessment (PISA). The principle of HAT is to be as adaptive as possible when selecting items while accounting for PISA's nonstatistical constraints and addressing issues concerning PISA such as item position effects. HAT combines established methods from the field of computerized adaptive testing. It is implemented in R and code is provided. HAT was compared to the PISA 2018 multistage design (MST) in a simulation study based on a factorial design with the independent variables response probability (RP; .50, .62), item pool optimality (PISA 2018, optimal), and ability level (low, medium, high). PISA-specific conditions regarding sample size, missing responses, and nonstatistical constraints were implemented. HAT clearly outperformed MST regarding test information, RMSE, and constraint management across ability groups but it showed slightly weaker item exposure. Raising RP to .62 did not decrease test information much and is therefore a viable option to foster students’ test-taking experience with HAT. Test information for HAT was up to three times higher than for MST when using a hypothetical optimal item pool. Summarizing, HAT proved to be a promising and applicable test design for PISA.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138539665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1