首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
Using Simulated Retests to Estimate the Reliability of Diagnostic Assessment Systems 用模拟复验估计诊断评估系统的可靠性
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2023-02-19 DOI: 10.1111/jedm.12359
W. Jake Thompson, Brooke Nash, Amy K. Clark, Jeffrey C. Hoover

As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment systems. In this article, we describe and evaluate a method for simulating retests to summarize reliability evidence at multiple reporting levels. We evaluate how the performance of reliability estimates from simulated retests compares to other measures of classification consistency and accuracy for diagnostic assessments that have previously been described in the literature, but which limit the level at which reliability can be reported. Overall, the findings show that reliability estimates from simulated retests are an accurate measure of reliability and are consistent with other measures of reliability for diagnostic assessments. We then apply this method to real data from the Examination for the Certificate of Proficiency in English to demonstrate the method in practice and compare reliability estimates from observed data. Finally, we discuss implications for the field and possible next directions.

随着诊断分类模型在大规模作战评估中的应用越来越广泛,我们必须考虑可靠性的估计和报告方法。研究人员必须探索与诊断评估系统的设计、评分和报告水平一致的传统可靠性方法的替代方案。在本文中,我们描述和评估了一种模拟复测的方法,以总结多个报告水平的可靠性证据。我们评估从模拟复测的可靠性估计的性能如何与以前在文献中描述的诊断评估的分类一致性和准确性的其他措施进行比较,但这限制了可靠性可以报告的水平。总的来说,研究结果表明,模拟复测的可靠性估计是可靠度的准确度量,并且与诊断评估的其他可靠性度量一致。然后,我们将该方法应用于英语水平证书考试的真实数据,以在实践中证明该方法,并比较观察数据的可靠性估计。最后,我们讨论了该领域的意义和可能的下一步方向。
{"title":"Using Simulated Retests to Estimate the Reliability of Diagnostic Assessment Systems","authors":"W. Jake Thompson,&nbsp;Brooke Nash,&nbsp;Amy K. Clark,&nbsp;Jeffrey C. Hoover","doi":"10.1111/jedm.12359","DOIUrl":"10.1111/jedm.12359","url":null,"abstract":"<p>As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment systems. In this article, we describe and evaluate a method for simulating retests to summarize reliability evidence at multiple reporting levels. We evaluate how the performance of reliability estimates from simulated retests compares to other measures of classification consistency and accuracy for diagnostic assessments that have previously been described in the literature, but which limit the level at which reliability can be reported. Overall, the findings show that reliability estimates from simulated retests are an accurate measure of reliability and are consistent with other measures of reliability for diagnostic assessments. We then apply this method to real data from the Examination for the Certificate of Proficiency in English to demonstrate the method in practice and compare reliability estimates from observed data. Finally, we discuss implications for the field and possible next directions.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47801652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Exploration of an Improved Aggregate Student Growth Measure Using Data from Two States 利用两个州的数据探索一种改进的学生综合成长测量方法
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2023-01-31 DOI: 10.1111/jedm.12354
Katherine E. Castellano, Daniel F. McCaffrey, J. R. Lockwood

The simple average of student growth scores is often used in accountability systems, but it can be problematic for decision making. When computed using a small/moderate number of students, it can be sensitive to the sample, resulting in inaccurate representations of growth of the students, low year-to-year stability, and inequities for low-incidence groups. An alternative designed to address these issues is to use an Empirical Best Linear Prediction (EBLP), which is a weighted average of growth score data from other years and/or subjects. We apply both approaches to two statewide datasets to answer empirical questions about their performance. The EBLP outperforms the simple average in accuracy and cross-year stability with the exception that accuracy was not necessarily improved for very large districts in one of the states. In such exceptions, we show a beneficial alternative may be to use a hybrid approach in which very large districts receive the simple average and all others receive the EBLP. We find that adding more growth score data to the computation of the EBLP can improve accuracy, but not necessarily for larger schools/districts. We review key decision points in aggregate growth reporting and in specifying an EBLP weighted average in practice.

学生成长分数的简单平均值通常用于问责制,但它可能会对决策产生问题。当使用少量/中等数量的学生进行计算时,它可能对样本很敏感,导致学生增长的不准确表示,较低的年度稳定性和低发病率组的不公平。解决这些问题的另一种方法是使用经验最佳线性预测(EBLP),它是其他年份和/或主题的增长得分数据的加权平均值。我们将这两种方法应用于两个全州范围的数据集,以回答有关其性能的实证问题。EBLP在准确性和跨年稳定性方面优于简单平均值,但在一个州的非常大的地区,准确性不一定得到改善。在这种例外情况下,我们提出了一种有益的替代方案,可能是使用混合方法,其中非常大的地区接受简单平均,而所有其他地区接受EBLP。我们发现,在EBLP的计算中加入更多的成长分数数据可以提高准确性,但对于较大的学校/学区来说不一定。我们回顾了实践中总增长报告和指定EBLP加权平均值的关键决策点。
{"title":"An Exploration of an Improved Aggregate Student Growth Measure Using Data from Two States","authors":"Katherine E. Castellano,&nbsp;Daniel F. McCaffrey,&nbsp;J. R. Lockwood","doi":"10.1111/jedm.12354","DOIUrl":"10.1111/jedm.12354","url":null,"abstract":"<p>The simple average of student growth scores is often used in accountability systems, but it can be problematic for decision making. When computed using a small/moderate number of students, it can be sensitive to the sample, resulting in inaccurate representations of growth of the students, low year-to-year stability, and inequities for low-incidence groups. An alternative designed to address these issues is to use an Empirical Best Linear Prediction (EBLP), which is a weighted average of growth score data from other years and/or subjects. We apply both approaches to two statewide datasets to answer empirical questions about their performance. The EBLP outperforms the simple average in accuracy and cross-year stability with the exception that accuracy was not necessarily improved for very large districts in one of the states. In such exceptions, we show a beneficial alternative may be to use a hybrid approach in which very large districts receive the simple average and all others receive the EBLP. We find that adding more growth score data to the computation of the EBLP can improve accuracy, but not necessarily for larger schools/districts. We review key decision points in aggregate growth reporting and in specifying an EBLP weighted average in practice.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41556588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification Accuracy and Consistency of Compensatory Composite Test Scores 补偿性综合测试成绩的分类准确性和一致性
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2023-01-28 DOI: 10.1111/jedm.12357
J. Carl Setzer, Ying Cheng, Cheng Liu

Test scores are often used to make decisions about examinees, such as in licensure and certification testing, as well as in many educational contexts. In some cases, these decisions are based upon compensatory scores, such as those from multiple sections or components of an exam. Classification accuracy and classification consistency are two psychometric characteristics of test scores that are often reported when decisions are based on those scores, and several techniques currently exist for estimating both accuracy and consistency. However, research on classification accuracy and consistency on compensatory test scores is scarce. This study demonstrates two techniques that can be used to estimate classification accuracy and consistency when test scores are used in a compensatory manner. First, a simulation study demonstrates that both methods provide very similar results under the studied conditions. Second, we demonstrate how the two methods could be used with a high-stakes licensure exam.

考试成绩经常被用来决定考生,比如在执照和认证考试中,以及在许多教育环境中。在某些情况下,这些决定是基于补偿性分数,例如来自考试的多个部分或组成部分的分数。分类准确性和分类一致性是测试分数的两个心理测量特征,当决策基于这些分数时,通常会报告这些特征,目前存在几种评估准确性和一致性的技术。然而,对代偿测试分数的分类准确性和一致性的研究却很少。本研究展示了两种技术,可以用来估计分类的准确性和一致性时,测试成绩使用补偿的方式。首先,仿真研究表明,在研究条件下,两种方法的结果非常相似。其次,我们将演示如何在高风险的执照考试中使用这两种方法。
{"title":"Classification Accuracy and Consistency of Compensatory Composite Test Scores","authors":"J. Carl Setzer,&nbsp;Ying Cheng,&nbsp;Cheng Liu","doi":"10.1111/jedm.12357","DOIUrl":"10.1111/jedm.12357","url":null,"abstract":"<p>Test scores are often used to make decisions about examinees, such as in licensure and certification testing, as well as in many educational contexts. In some cases, these decisions are based upon compensatory scores, such as those from multiple sections or components of an exam. Classification accuracy and classification consistency are two psychometric characteristics of test scores that are often reported when decisions are based on those scores, and several techniques currently exist for estimating both accuracy and consistency. However, research on classification accuracy and consistency on compensatory test scores is scarce. This study demonstrates two techniques that can be used to estimate classification accuracy and consistency when test scores are used in a compensatory manner. First, a simulation study demonstrates that both methods provide very similar results under the studied conditions. Second, we demonstrate how the two methods could be used with a high-stakes licensure exam.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2023-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46918652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial for JEM issue 59-4 JEM第59-4期社论
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2023-01-06 DOI: 10.1111/jedm.12356
Sandip Sinharay
{"title":"Editorial for JEM issue 59-4","authors":"Sandip Sinharay","doi":"10.1111/jedm.12356","DOIUrl":"https://doi.org/10.1111/jedm.12356","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137647443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Specifying the Three Ws in Educational Measurement: Who Uses Which Scores for What Purpose? 指定教育测量中的三个w:谁使用哪个分数用于什么目的?
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-12-25 DOI: 10.1111/jedm.12355
Andrew Ho

I argue that understanding and improving educational measurement requires specificity about actors, scores, and purpose: Who uses which scores for what purpose? I show how this specificity complements Briggs’ frameworks for educational measurement that he presented in his 2022 address as president of the National Council on Measurement in Education.

我认为理解和改进教育测量需要明确参与者、分数和目的:谁将哪个分数用于什么目的?我展示了这种特殊性是如何补充布里格斯在2022年作为国家教育测量委员会主席的演讲中提出的教育测量框架的。
{"title":"Specifying the Three Ws in Educational Measurement: Who Uses Which Scores for What Purpose?","authors":"Andrew Ho","doi":"10.1111/jedm.12355","DOIUrl":"10.1111/jedm.12355","url":null,"abstract":"<p>I argue that understanding and improving educational measurement requires specificity about actors, scores, and purpose: Who uses which scores for what purpose? I show how this specificity complements Briggs’ frameworks for educational measurement that he presented in his 2022 address as president of the National Council on Measurement in Education.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48187875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Online Calibration in Multidimensional Computerized Adaptive Testing with Polytomously Scored Items 多维计算机自适应测试中的在线标定
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-12-15 DOI: 10.1111/jedm.12353
Lu Yuan, Yingshi Huang, Shuhang Li, Ping Chen

Online calibration is a key technology for item calibration in computerized adaptive testing (CAT) and has been widely used in various forms of CAT, including unidimensional CAT, multidimensional CAT (MCAT), CAT with polytomously scored items, and cognitive diagnostic CAT. However, as multidimensional and polytomous assessment data become more common, only a few published reports focus on online calibration in MCAT with polytomously scored items (P-MCAT). Therefore, standing on the shoulders of the existing online calibration methods/designs, this study proposes four new P-MCAT online calibration methods and two new P-MCAT online calibration designs and conducts two simulation studies to evaluate their performance under varying conditions (i.e., different calibration sample sizes and correlations between dimensions). Results show that all of the newly proposed methods can accurately recover item parameters, and the adaptive designs outperform the random design in most cases. In the end, this paper provides practical guidance based on simulation results.

在线定标是计算机自适应测试(computer adaptive testing, CAT)项目定标的关键技术,已广泛应用于各种形式的CAT,包括一维CAT、多维CAT、多元计分CAT和认知诊断CAT。然而,随着多维和多分式评估数据变得越来越普遍,只有少数已发表的报告关注多分式评分项目(P-MCAT)的MCAT在线校准。因此,本研究在现有在线校准方法/设计的基础上,提出了四种新的P-MCAT在线校准方法和两种新的P-MCAT在线校准设计,并进行了两次仿真研究,以评估其在不同条件下(即不同校准样本量和维度之间的相关性)的性能。结果表明,所有新提出的方法都能准确地恢复项目参数,并且自适应设计在大多数情况下优于随机设计。最后,根据仿真结果给出了实际指导。
{"title":"Online Calibration in Multidimensional Computerized Adaptive Testing with Polytomously Scored Items","authors":"Lu Yuan,&nbsp;Yingshi Huang,&nbsp;Shuhang Li,&nbsp;Ping Chen","doi":"10.1111/jedm.12353","DOIUrl":"10.1111/jedm.12353","url":null,"abstract":"<p>Online calibration is a key technology for item calibration in computerized adaptive testing (CAT) and has been widely used in various forms of CAT, including unidimensional CAT, multidimensional CAT (MCAT), CAT with polytomously scored items, and cognitive diagnostic CAT. However, as multidimensional and polytomous assessment data become more common, only a few published reports focus on online calibration in MCAT with polytomously scored items (P-MCAT). Therefore, standing on the shoulders of the existing online calibration methods/designs, this study proposes four new P-MCAT online calibration methods and two new P-MCAT online calibration designs and conducts two simulation studies to evaluate their performance under varying conditions (i.e., different calibration sample sizes and correlations between dimensions). Results show that all of the newly proposed methods can accurately recover item parameters, and the adaptive designs outperform the random design in most cases. In the end, this paper provides practical guidance based on simulation results.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47208290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring the Uncertainty of Imputed Scores 估算分数的不确定度测量
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-12-14 DOI: 10.1111/jedm.12352
Sandip Sinharay

Technical difficulties and other unforeseen events occasionally lead to incomplete data on educational tests, which necessitates the reporting of imputed scores to some examinees. While there exist several approaches for reporting imputed scores, there is a lack of any guidance on the reporting of the uncertainty of imputed scores. In this paper, several approaches are suggested for quantifying the uncertainty of imputed scores using measures that are similar in spirit to estimates of reliability and standard error of measurement. A simulation study is performed to examine the properties of the approaches. The approaches are then applied to data from a state test on which some examinees' scores had to be imputed following computer problems. Several recommendations are made for practice.

技术困难和其他不可预见的事件有时会导致教育考试数据不完整,这就需要向一些考生报告估算分数。虽然存在几种报告估算分数的方法,但缺乏关于估算分数不确定性报告的任何指导。在本文中,提出了几种方法来量化不确定的估算分数使用的措施,在精神上类似于估计的可靠性和标准误差的测量。通过仿真研究来检验这些方法的特性。然后,这些方法被应用于一项州考试的数据,一些考生的分数必须在计算机出现问题后计算出来。对实践提出了几点建议。
{"title":"Measuring the Uncertainty of Imputed Scores","authors":"Sandip Sinharay","doi":"10.1111/jedm.12352","DOIUrl":"10.1111/jedm.12352","url":null,"abstract":"<p>Technical difficulties and other unforeseen events occasionally lead to incomplete data on educational tests, which necessitates the reporting of imputed scores to some examinees. While there exist several approaches for reporting imputed scores, there is a lack of any guidance on the reporting of the uncertainty of imputed scores. In this paper, several approaches are suggested for quantifying the uncertainty of imputed scores using measures that are similar in spirit to estimates of reliability and standard error of measurement. A simulation study is performed to examine the properties of the approaches. The approaches are then applied to data from a state test on which some examinees' scores had to be imputed following computer problems. Several recommendations are made for practice.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45116305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior 一种指数加权移动平均法检测反向随机响应行为
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-12-09 DOI: 10.1111/jedm.12351
Yinhong He

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the exponentially weighted moving average (EWMA) obtains more detailed information. This study equipped the weighted residual statistic with EWMA, and proposed the EWMA-WR method to detect BRR. To make the critical values adaptive to the ability levels, this study proposed the Monte Carlo simulation with ability stratification (MC-stratification) method for calculating critical values. Compared to the original Monte Carlo simulation (MC) method, the newly proposed MC-stratification method generated a larger number of satisfactory results. The performances of CPA-WR and EWMA-WR were evaluated under different conditions that varied in the test lengths, abnormal proportions, critical values and smoothing constants used in the EWMA-WR method. The results showed that EWMA-WR was more powerful than CPA-WR in detecting BRR. Moreover, an empirical study was conducted to illustrate the utility of EWMA-WR for detecting BRR.

反向随机响应(BRR)行为是一种常见的不小心响应行为。准确地检测BRR行为可以提高测试的有效性。Yu and Cheng(2019)表明,基于加权残差(CPA- wr)的变化点分析(CPA)程序在检测BRR方面表现良好。与CPA方法相比,指数加权移动平均(EWMA)方法可以获得更详细的信息。本研究将加权残差统计量与EWMA结合,提出了EWMA- wr方法来检测BRR。为了使临界值与能力水平相适应,本文提出了基于能力分层的蒙特卡罗模拟(MC-stratification)方法来计算临界值。与原来的蒙特卡罗模拟(MC)方法相比,新提出的MC分层方法产生了更多令人满意的结果。在不同的测试长度、异常比例、临界值和EWMA-WR方法使用的平滑常数等条件下,对CPA-WR和EWMA-WR方法的性能进行了评价。结果表明,EWMA-WR比CPA-WR对BRR的检测更有效。此外,本文还通过实证研究说明了EWMA-WR在BRR检测中的实用性。
{"title":"An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior","authors":"Yinhong He","doi":"10.1111/jedm.12351","DOIUrl":"10.1111/jedm.12351","url":null,"abstract":"<p>Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the exponentially weighted moving average (EWMA) obtains more detailed information. This study equipped the weighted residual statistic with EWMA, and proposed the EWMA-WR method to detect BRR. To make the critical values adaptive to the ability levels, this study proposed the Monte Carlo simulation with ability stratification (MC-stratification) method for calculating critical values. Compared to the original Monte Carlo simulation (MC) method, the newly proposed MC-stratification method generated a larger number of satisfactory results. The performances of CPA-WR and EWMA-WR were evaluated under different conditions that varied in the test lengths, abnormal proportions, critical values and smoothing constants used in the EWMA-WR method. The results showed that EWMA-WR was more powerful than CPA-WR in detecting BRR. Moreover, an empirical study was conducted to illustrate the utility of EWMA-WR for detecting BRR.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47390314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multiple-Group Joint Modeling of Item Responses, Response Times, and Action Counts with the Conway-Maxwell-Poisson Distribution Conway‐Maxwell‐Poisson分布的项目响应、响应时间和行动次数的多组联合建模
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-12-07 DOI: 10.1111/jedm.12349
Xin Qiao, Hong Jiao, Qiwei He

Multiple group modeling is one of the methods to address the measurement noninvariance issue. Traditional studies on multiple group modeling have mainly focused on item responses. In computer-based assessments, joint modeling of response times and action counts with item responses helps estimate the latent speed and action levels in addition to latent ability. These two new data sources can also be used to further address the measurement noninvariance issue. One challenge, however, is to correctly model action counts which can be underdispersed, overdispersed, or equidispersed in real data sets. To address this, we adopted the Conway-Maxwell-Poisson distribution that accounts for different types of dispersion in action counts and incorporated it in the multiple group joint modeling of item responses, response times, and action counts. Bayesian Markov Chain Monte Carlo method was used for model parameter estimation. To illustrate an application of the proposed model, an empirical data analysis was conducted using the Programme for International Student Assessment (PISA) 2015 collaborative problem-solving items where potential measurement noninvariance issue existed between gender groups. Results indicated that Conway-Maxwell-Poisson model yielded better model fit than alternative count data models such as negative binomial and Poisson models. In addition, response times and action counts provided further information on performance differences between groups.

多组建模是解决测量不变性问题的方法之一。传统的多群体建模研究主要集中在项目反应上。在基于计算机的评估中,反应时间和行动次数与项目反应的联合建模有助于估计潜在速度和行动水平以及潜在能力。这两个新数据源还可以用于进一步解决度量不变性问题。然而,一个挑战是正确地建模行动计数,这些计数在实际数据集中可能是不充分分散、过度分散或等分散的。为了解决这个问题,我们采用了康威-麦克斯韦-泊松分布,该分布解释了不同类型的行动计数分散,并将其纳入项目反应、反应时间和行动计数的多组联合建模中。模型参数估计采用贝叶斯马尔可夫链蒙特卡罗方法。为了说明所提出模型的应用,使用2015年国际学生评估项目(PISA)协作解决问题的项目进行了实证数据分析,其中性别群体之间存在潜在的测量不变性问题。结果表明,康威-麦克斯韦-泊松模型比其他计数数据模型如负二项和泊松模型具有更好的模型拟合效果。此外,响应时间和操作计数提供了组间性能差异的进一步信息。
{"title":"Multiple-Group Joint Modeling of Item Responses, Response Times, and Action Counts with the Conway-Maxwell-Poisson Distribution","authors":"Xin Qiao,&nbsp;Hong Jiao,&nbsp;Qiwei He","doi":"10.1111/jedm.12349","DOIUrl":"10.1111/jedm.12349","url":null,"abstract":"<p>Multiple group modeling is one of the methods to address the measurement noninvariance issue. Traditional studies on multiple group modeling have mainly focused on item responses. In computer-based assessments, joint modeling of response times and action counts with item responses helps estimate the latent speed and action levels in addition to latent ability. These two new data sources can also be used to further address the measurement noninvariance issue. One challenge, however, is to correctly model action counts which can be underdispersed, overdispersed, or equidispersed in real data sets. To address this, we adopted the Conway-Maxwell-Poisson distribution that accounts for different types of dispersion in action counts and incorporated it in the multiple group joint modeling of item responses, response times, and action counts. Bayesian Markov Chain Monte Carlo method was used for model parameter estimation. To illustrate an application of the proposed model, an empirical data analysis was conducted using the Programme for International Student Assessment (PISA) 2015 collaborative problem-solving items where potential measurement noninvariance issue existed between gender groups. Results indicated that Conway-Maxwell-Poisson model yielded better model fit than alternative count data models such as negative binomial and Poisson models. In addition, response times and action counts provided further information on performance differences between groups.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45484845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
NCME Presidential Address 2022: Turning the Page to the Next Chapter of Educational Measurement 全国教育计量学会2022年会长致辞:掀开教育计量的新篇章
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-11-09 DOI: 10.1111/jedm.12350
Derek C. Briggs
{"title":"NCME Presidential Address 2022: Turning the Page to the Next Chapter of Educational Measurement","authors":"Derek C. Briggs","doi":"10.1111/jedm.12350","DOIUrl":"https://doi.org/10.1111/jedm.12350","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137813868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1