Cornelis Potgieter, Xin Qiao, Akihito Kamata, Yusuf Kara
As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores, including maximum likelihood estimator (MLE), maximum a posteriori (MAP), and expected a posteriori (EAP), as well as their standard errors. The proposed estimators were demonstrated with a real ORF assessment dataset. Also, the estimation of model-derived ORF scores and their standard errors by the proposed estimators were evaluated through a simulation study. The fully Bayesian approach was included as a comparison in the real data analysis and the simulation study. Results demonstrated that the three likelihood-based approaches for the model-derived ORF scores and their standard error estimation performed satisfactorily.
{"title":"Likelihood-Based Estimation of Model-Derived Oral Reading Fluency","authors":"Cornelis Potgieter, Xin Qiao, Akihito Kamata, Yusuf Kara","doi":"10.1111/jedm.12404","DOIUrl":"10.1111/jedm.12404","url":null,"abstract":"<p>As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores, including maximum likelihood estimator (MLE), maximum a posteriori (MAP), and expected a posteriori (EAP), as well as their standard errors. The proposed estimators were demonstrated with a real ORF assessment dataset. Also, the estimation of model-derived ORF scores and their standard errors by the proposed estimators were evaluated through a simulation study. The fully Bayesian approach was included as a comparison in the real data analysis and the simulation study. Results demonstrated that the three likelihood-based approaches for the model-derived ORF scores and their standard error estimation performed satisfactorily.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"542-559"},"PeriodicalIF":1.4,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiuxiu Tang, Yi Zheng, Tong Wu, Kit-Tai Hau, Hua-Hua Chang
Multistage adaptive testing (MST) has been recently adopted for international large-scale assessments such as Programme for International Student Assessment (PISA). MST offers improved measurement efficiency over traditional nonadaptive tests and improved practical convenience over single-item-adaptive computerized adaptive testing (CAT). As a third alternative adaptive test design to MST and CAT, Zheng and Chang proposed the “on-the-fly multistage adaptive testing” (OMST), which combines the benefits of MST and CAT and offsets their limitations. In this study, we adopted the OMST design while also incorporating response time (RT) in item selection. Via simulations emulating the PISA 2018 reading test, including using the real item attributes and replicating PISA 2018 reading test's MST design, we compared the performance of our OMST designs against the simulated MST design in (1) measurement accuracy of test takers’ ability, (2) test time efficiency and consistency, and (3) expected gains in precision by design. We also investigated the performance of OMST in item bank usage and constraints management. Results show great potential for the proposed RT-incorporated OMST designs to be used for PISA and potentially other international large-scale assessments.
{"title":"Utilizing Response Time for Item Selection in On-the-Fly Multistage Adaptive Testing for PISA Assessment","authors":"Xiuxiu Tang, Yi Zheng, Tong Wu, Kit-Tai Hau, Hua-Hua Chang","doi":"10.1111/jedm.12403","DOIUrl":"10.1111/jedm.12403","url":null,"abstract":"<p>Multistage adaptive testing (MST) has been recently adopted for international large-scale assessments such as Programme for International Student Assessment (PISA). MST offers improved measurement efficiency over traditional nonadaptive tests and improved practical convenience over single-item-adaptive computerized adaptive testing (CAT). As a third alternative adaptive test design to MST and CAT, Zheng and Chang proposed the “on-the-fly multistage adaptive testing” (OMST), which combines the benefits of MST and CAT and offsets their limitations. In this study, we adopted the OMST design while also incorporating response time (RT) in item selection. Via simulations emulating the PISA 2018 reading test, including using the real item attributes and replicating PISA 2018 reading test's MST design, we compared the performance of our OMST designs against the simulated MST design in (1) measurement accuracy of test takers’ ability, (2) test time efficiency and consistency, and (3) expected gains in precision by design. We also investigated the performance of OMST in item bank usage and constraints management. Results show great potential for the proposed RT-incorporated OMST designs to be used for PISA and potentially other international large-scale assessments.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 3","pages":"468-495"},"PeriodicalIF":1.6,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141380339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Process information collected from educational games can illuminate how students approach interactive tasks, complementing assessment outcomes routinely examined in evaluation studies. However, the two sources of information are historically analyzed and interpreted separately, and diagnostic process information is often underused. To tackle these issues, we present a new application of cross-classified item response theory modeling, using indicators of knowledge misconceptions and item-level assessment data collected from a multisite game-based randomized controlled trial. This application addresses (a) the joint modeling of students' pretest and posttest item responses and game-based processes described by indicators of misconceptions; (b) integration of gameplay information when gauging the intervention effect of an educational game; (c) relationships among game-based misconception, pretest initial status, and pre-to-post change; and (d) nesting of students within schools, a common aspect in multisite research. We also demonstrate how to structure the data and set up the model to enable our proposed application, and how our application compares to three other approaches to analyzing gameplay and assessment data. Lastly, we note the implications for future evaluation studies and for using analytic results to inform learning and instruction.
{"title":"Sensemaking of Process Data from Evaluation Studies of Educational Games: An Application of Cross-Classified Item Response Theory Modeling","authors":"Tianying Feng, Li Cai","doi":"10.1111/jedm.12396","DOIUrl":"10.1111/jedm.12396","url":null,"abstract":"<p>Process information collected from educational games can illuminate how students approach interactive tasks, complementing assessment outcomes routinely examined in evaluation studies. However, the two sources of information are historically analyzed and interpreted separately, and diagnostic process information is often underused. To tackle these issues, we present a new application of cross-classified item response theory modeling, using indicators of knowledge misconceptions and item-level assessment data collected from a multisite game-based randomized controlled trial. This application addresses (a) the joint modeling of students' pretest and posttest item responses and game-based processes described by indicators of misconceptions; (b) integration of gameplay information when gauging the intervention effect of an educational game; (c) relationships among game-based misconception, pretest initial status, and pre-to-post change; and (d) nesting of students within schools, a common aspect in multisite research. We also demonstrate how to structure the data and set up the model to enable our proposed application, and how our application compares to three other approaches to analyzing gameplay and assessment data. Lastly, we note the implications for future evaluation studies and for using analytic results to inform learning and instruction.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"63 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12396","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141386053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Item difficulty and dimensionality often correlate, implying that unidimensional IRT approximations to multidimensional data (i.e., reference composites) can take a curvilinear form in the multidimensional space. Although this issue has been previously discussed in the context of vertical scaling applications, we illustrate how such a phenomenon can also easily occur within individual tests. Measures of reading proficiency, for example, often use different task types within a single assessment, a feature that may not only lead to multidimensionality, but also an association between item difficulty and dimensionality. Using a latent regression strategy, we demonstrate through simulations and empirical analysis how associations between dimensionality and difficulty yield a nonlinear reference composite where the weights of the underlying dimensions change across the scale continuum according to the difficulties of the items associated with the dimensions. We further show how this form of curvilinearity produces systematic forms of misspecification in traditional unidimensional IRT models (e.g., 2PL) and can be better accommodated by models such as monotone-polynomial or asymmetric IRT models. Simulations and a real-data example from the Early Childhood Longitudinal Study—Kindergarten are provided for demonstration. Some implications for measurement modeling and for understanding the effects of 2PL misspecification on measurement metrics are discussed.
{"title":"Curvilinearity in the Reference Composite and Practical Implications for Measurement","authors":"Xiangyi Liao, Daniel M. Bolt, Jee-Seon Kim","doi":"10.1111/jedm.12402","DOIUrl":"10.1111/jedm.12402","url":null,"abstract":"<p>Item difficulty and dimensionality often correlate, implying that unidimensional IRT approximations to multidimensional data (i.e., reference composites) can take a curvilinear form in the multidimensional space. Although this issue has been previously discussed in the context of vertical scaling applications, we illustrate how such a phenomenon can also easily occur within individual tests. Measures of reading proficiency, for example, often use different task types within a single assessment, a feature that may not only lead to multidimensionality, but also an association between item difficulty and dimensionality. Using a latent regression strategy, we demonstrate through simulations and empirical analysis how associations between dimensionality and difficulty yield a nonlinear reference composite where the weights of the underlying dimensions <i>change</i> across the scale continuum according to the difficulties of the items associated with the dimensions. We further show how this form of curvilinearity produces systematic forms of misspecification in traditional unidimensional IRT models (e.g., 2PL) and can be better accommodated by models such as monotone-polynomial or asymmetric IRT models. Simulations and a real-data example from the Early Childhood Longitudinal Study—Kindergarten are provided for demonstration. Some implications for measurement modeling and for understanding the effects of 2PL misspecification on measurement metrics are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"511-541"},"PeriodicalIF":1.4,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12402","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141386190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm to address the computational challenge of estimating the proposed model. To demonstrate our new approach, we analyzed empirical student evaluation of teaching (SET) data collected from a large public university with three models: a CC-MNRM with RS, a CC-MNRM with no RS, and a multilevel MNRM with RS. Results indicated that the three models led to different inferences regarding the observed covariates. Additionally, in the example, ignoring/incorporating RS led to changes in student substantive scores, while the instructor substantive scores were less impacted. Misspecifying the cross-classified data structure resulted in apparent changes on instructor scores. To further evaluate the proposed modeling approach, we conducted a preliminary simulation study and observed good parameter and score recovery. We concluded this study with discussions of limitations and future research directions.
{"title":"Modeling Response Styles in Cross-Classified Data Using a Cross-Classified Multidimensional Nominal Response Model","authors":"Sijia Huang, Seungwon Chung, Carl F. Falk","doi":"10.1111/jedm.12401","DOIUrl":"10.1111/jedm.12401","url":null,"abstract":"<p>In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm to address the computational challenge of estimating the proposed model. To demonstrate our new approach, we analyzed empirical student evaluation of teaching (SET) data collected from a large public university with three models: a CC-MNRM with RS, a CC-MNRM with no RS, and a multilevel MNRM with RS. Results indicated that the three models led to different inferences regarding the observed covariates. Additionally, in the example, ignoring/incorporating RS led to changes in student substantive scores, while the instructor substantive scores were less impacted. Misspecifying the cross-classified data structure resulted in apparent changes on instructor scores. To further evaluate the proposed modeling approach, we conducted a preliminary simulation study and observed good parameter and score recovery. We concluded this study with discussions of limitations and future research directions.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"486-510"},"PeriodicalIF":1.4,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we demonstrate the connection between the existing LNRT model parameters and the “level” component of profile similarity, and we define two new parameters for the LNRT model representing profile “dispersion” and “shape.” We show that while the LNRT model measures level (speed), profile dispersion and shape are conflated in model residuals, and that distinguishing them provides meaningful and useful parameters for identifying anomalous testing behavior. Results from data in a situation where many test-takers gained preknowledge of test items revealed that profile shape, not currently measured in the LNRT model, was the most sensitive response time index to the abnormal test-taking behavior patterns. Results strongly support expanding the LNRT model to measure not only each test-taker's level of speed, but also the dispersion and shape of their response time profiles.
{"title":"Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior","authors":"Gregory M. Hurtz, Regi Mucino","doi":"10.1111/jedm.12395","DOIUrl":"10.1111/jedm.12395","url":null,"abstract":"<p>The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we demonstrate the connection between the existing LNRT model parameters and the “level” component of profile similarity, and we define two new parameters for the LNRT model representing profile “dispersion” and “shape.” We show that while the LNRT model measures level (speed), profile dispersion and shape are conflated in model residuals, and that distinguishing them provides meaningful and useful parameters for identifying anomalous testing behavior. Results from data in a situation where many test-takers gained preknowledge of test items revealed that profile shape, not currently measured in the LNRT model, was the most sensitive response time index to the abnormal test-taking behavior patterns. Results strongly support expanding the LNRT model to measure not only each test-taker's level of speed, but also the dispersion and shape of their response time profiles.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"458-485"},"PeriodicalIF":1.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140939780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Corinne Huggins-Manley, Anthony W. Raborn, Peggy K. Jones, Ted Myers
The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the overall testing population. We propose the nonparametric root expected proportion squared difference (REPSD) index that evaluates the statistical significance of composite group DIF for relatively small focal groups stemming from multicategorical focal variables, with decisions of statistical significance based on quasi-exact p values obtained from Monte Carlo permutations of the DIF statistic under the null distribution. We conduct a simulation to evaluate conditions under which the index produces acceptable Type I error and power rates, as well as an application to a school district assessment. Practitioners can calculate the REPSD index in a freely available package we created in the R environment.
本研究的目的是开发一种非参数 DIF 方法,该方法(a)可将焦点组直接与将用于开发报告测试得分量表的综合组进行比较,(b)允许从业人员探索与源自多类别变量的焦点组相关的 DIF,这些焦点组在整个测试人群中只占很小的比例。我们提出了非参数根期望比例平方差(REPSD)指数,该指数可评估源自多类别焦点变量的相对较小焦点组的复合组 DIF 的统计显著性,统计显著性的判定依据的是在零分布下对 DIF 统计量进行蒙特卡罗排列所获得的准精确 p 值。我们进行了一次模拟,以评估该指数在哪些条件下可产生可接受的 I 类错误和幂率,并将其应用于学区评估。实践者可以通过我们在 R 环境中创建的免费软件包计算 REPSD 指数。
{"title":"A Nonparametric Composite Group DIF Index for Focal Groups Stemming from Multicategorical Variables","authors":"Corinne Huggins-Manley, Anthony W. Raborn, Peggy K. Jones, Ted Myers","doi":"10.1111/jedm.12394","DOIUrl":"10.1111/jedm.12394","url":null,"abstract":"<p>The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the overall testing population. We propose the nonparametric root expected proportion squared difference (<i>REPSD</i>) index that evaluates the statistical significance of composite group DIF for relatively small focal groups stemming from multicategorical focal variables, with decisions of statistical significance based on quasi-exact <i>p</i> values obtained from Monte Carlo permutations of the DIF statistic under the null distribution. We conduct a simulation to evaluate conditions under which the index produces acceptable Type I error and power rates, as well as an application to a school district assessment. Practitioners can calculate the <i>REPSD</i> index in a freely available package we created in the R environment.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"432-457"},"PeriodicalIF":1.4,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frank Goldhammer, Ulf Kroehne, Carolin Hahnel, Johannes Naumann, Paul De Boeck
The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability conditional on speed by controlling speed experimentally. Item-level time limits control the stimulus presentation time and the time window for responding (timed condition). The overall goal was to examine the construct validity of effective ability scores obtained from untimed and timed condition by comparing the effects of theory-based item properties on item difficulty. If such effects exist, the scores reflect how well the test-takers were able to cope with the theory-based requirements. A German subsample from PISA 2012 completed two reading component skills tasks (i.e., word recognition and semantic integration) with and without item-level time limits. Overall, the included linguistic item properties showed stronger effects on item difficulty in the timed than the untimed condition. In the semantic integration task, item properties explained the time required in the untimed condition. The results suggest that effective ability scores in the timed condition better reflect how well test-takers were able to cope with the theoretically relevant task demands.
{"title":"Does Timed Testing Affect the Interpretation of Efficiency Scores?—A GLMM Analysis of Reading Components","authors":"Frank Goldhammer, Ulf Kroehne, Carolin Hahnel, Johannes Naumann, Paul De Boeck","doi":"10.1111/jedm.12393","DOIUrl":"10.1111/jedm.12393","url":null,"abstract":"<p>The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability conditional on speed by controlling speed experimentally. Item-level time limits control the stimulus presentation time and the time window for responding (timed condition). The overall goal was to examine the construct validity of effective ability scores obtained from untimed and timed condition by comparing the effects of theory-based item properties on item difficulty. If such effects exist, the scores reflect how well the test-takers were able to cope with the theory-based requirements. A German subsample from PISA 2012 completed two reading component skills tasks (i.e., word recognition and semantic integration) with and without item-level time limits. Overall, the included linguistic item properties showed stronger effects on item difficulty in the timed than the untimed condition. In the semantic integration task, item properties explained the time required in the untimed condition. The results suggest that effective ability scores in the timed condition better reflect how well test-takers were able to cope with the theoretically relevant task demands.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"349-377"},"PeriodicalIF":1.4,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"von Davier, Alina , Mislevy, Robert J. , and Hao, Jiangang (Eds.) (2021). Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-74394-9_1","authors":"Hong Jiao","doi":"10.1111/jedm.12392","DOIUrl":"10.1111/jedm.12392","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"560-566"},"PeriodicalIF":1.4,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140661378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew J. Madison, Stefanie A. Wind, Lientje Maas, Kazuhiro Yamaguchi, Sergio Haab
Diagnostic classification models (DCMs) are psychometric models designed to classify examinees according to their proficiency or nonproficiency of specified latent characteristics. These models are well suited for providing diagnostic and actionable feedback to support intermediate and formative assessment efforts. Several DCMs have been developed and applied in different settings. This study examines a DCM with functional form similar to the 1-parameter logistic item response theory model. Using data from a large-scale mathematics education research study, we demonstrate and prove that the proposed DCM has measurement properties akin to the Rasch and one-parameter logistic item response theory models, including sum score sufficiency, item-free and person-free measurement, and invariant item and person ordering. We introduce some potential applications for this model, and discuss the implications and limitations of these developments, as well as directions for future research.
{"title":"A One-Parameter Diagnostic Classification Model with Familiar Measurement Properties","authors":"Matthew J. Madison, Stefanie A. Wind, Lientje Maas, Kazuhiro Yamaguchi, Sergio Haab","doi":"10.1111/jedm.12390","DOIUrl":"10.1111/jedm.12390","url":null,"abstract":"<p>Diagnostic classification models (DCMs) are psychometric models designed to classify examinees according to their proficiency or nonproficiency of specified latent characteristics. These models are well suited for providing diagnostic and actionable feedback to support intermediate and formative assessment efforts. Several DCMs have been developed and applied in different settings. This study examines a DCM with functional form similar to the 1-parameter logistic item response theory model. Using data from a large-scale mathematics education research study, we demonstrate and prove that the proposed DCM has measurement properties akin to the Rasch and one-parameter logistic item response theory models, including sum score sufficiency, item-free and person-free measurement, and invariant item and person ordering. We introduce some potential applications for this model, and discuss the implications and limitations of these developments, as well as directions for future research.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"408-431"},"PeriodicalIF":1.4,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}