Pub Date : 2023-01-02DOI: 10.1080/15366367.2022.2091358
Masoomeh Estaji, Zahra Banitalebi
{"title":"Quantitative Data Analysis for Language Assessment Volume II: Advanced Methods","authors":"Masoomeh Estaji, Zahra Banitalebi","doi":"10.1080/15366367.2022.2091358","DOIUrl":"https://doi.org/10.1080/15366367.2022.2091358","url":null,"abstract":"","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"2020 1","pages":"51 - 54"},"PeriodicalIF":1.0,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73582606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15366367.2021.1972654
K. Jin, T. Eckes
ABSTRACT Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale’s middle categories. In the present paper, we adopted Jin and Wang’s (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters exhibiting strong central tendencies to raters exhibiting strong tendencies in the opposite direction (extremity). In two simulation studies, we examined three model-based centrality detection indices (rater infit statistics, residual–expected correlations, and rater threshold SDs) as well as the raw-score SD in terms of their efficiency of reconstructing the true rater centrality rank ordering. Findings confirmed the superiority of the residual–expected correlation, rater threshold SD, and raw-score SD statistics, particularly when the examinee sample size was large and the number of scoring criteria was high. By contrast, the infit statistic results were much less consistent and, under conditions of large differences between criterion difficulties, suggested erroneous conclusions about raters’ central tendencies. Analyzing real rating data from a large-scale speaking performance assessment confirmed that infit statistics are unsuitable for identifying raters’ central tendencies. The discussion focuses on detecting centrality effects under different facets models and the indices’ implications for rater monitoring and fair performance assessment.
{"title":"Detecting Rater Centrality Effects in Performance Assessments: A Model-Based Comparison of Centrality Indices","authors":"K. Jin, T. Eckes","doi":"10.1080/15366367.2021.1972654","DOIUrl":"https://doi.org/10.1080/15366367.2021.1972654","url":null,"abstract":"ABSTRACT Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale’s middle categories. In the present paper, we adopted Jin and Wang’s (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters exhibiting strong central tendencies to raters exhibiting strong tendencies in the opposite direction (extremity). In two simulation studies, we examined three model-based centrality detection indices (rater infit statistics, residual–expected correlations, and rater threshold SDs) as well as the raw-score SD in terms of their efficiency of reconstructing the true rater centrality rank ordering. Findings confirmed the superiority of the residual–expected correlation, rater threshold SD, and raw-score SD statistics, particularly when the examinee sample size was large and the number of scoring criteria was high. By contrast, the infit statistic results were much less consistent and, under conditions of large differences between criterion difficulties, suggested erroneous conclusions about raters’ central tendencies. Analyzing real rating data from a large-scale speaking performance assessment confirmed that infit statistics are unsuitable for identifying raters’ central tendencies. The discussion focuses on detecting centrality effects under different facets models and the indices’ implications for rater monitoring and fair performance assessment.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"55 1","pages":"228 - 247"},"PeriodicalIF":1.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72658036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15366367.2021.2024487
D. J. Harris
{"title":"The History of Educational Measurement Key Advancements in Theory, Policy, and Practice","authors":"D. J. Harris","doi":"10.1080/15366367.2021.2024487","DOIUrl":"https://doi.org/10.1080/15366367.2021.2024487","url":null,"abstract":"","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"91 2","pages":"248 - 256"},"PeriodicalIF":1.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72417128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15366367.2021.2005959
T. Raykov, Philipp Doebler, G. Marcoulides
ABSTRACT This article is concerned with the large-sample parameter estimatorbehavior in applications of Bayesian confirmatory factor analysis in behavioral measurement. The property of strong convergence of the popular Bayesian posterior median estimator is discussed, which states numerical convergence with probability 1 of the resulting estimates to the population parameter value as sample size increases without bound. This property is stronger than the consistency and convergence in distribution of that estimator, which have been commonly referred to in the literature. A numerical example is utilized to illustrate this almost sure convergence of a Bayesian latent correlation estimator. The paper contributes to the body of research on optimal statistical features of Bayesian estimates and concludes with a discussion of the implications of this large-sample property of the Bayesian median estimator for empirical measurement studies.
{"title":"Applications of Bayesian Confirmatory Factor Analysis in Behavioral Measurement: Strong Convergence of a Bayesian Parameter Estimator","authors":"T. Raykov, Philipp Doebler, G. Marcoulides","doi":"10.1080/15366367.2021.2005959","DOIUrl":"https://doi.org/10.1080/15366367.2021.2005959","url":null,"abstract":"ABSTRACT This article is concerned with the large-sample parameter estimatorbehavior in applications of Bayesian confirmatory factor analysis in behavioral measurement. The property of strong convergence of the popular Bayesian posterior median estimator is discussed, which states numerical convergence with probability 1 of the resulting estimates to the population parameter value as sample size increases without bound. This property is stronger than the consistency and convergence in distribution of that estimator, which have been commonly referred to in the literature. A numerical example is utilized to illustrate this almost sure convergence of a Bayesian latent correlation estimator. The paper contributes to the body of research on optimal statistical features of Bayesian estimates and concludes with a discussion of the implications of this large-sample property of the Bayesian median estimator for empirical measurement studies.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"29 1","pages":"215 - 227"},"PeriodicalIF":1.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88500843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15366367.2021.1996819
R. Levy
ABSTRACT Obtaining values for latent variables in factor analysis models, also referred to as factor scores, has long been of interest to researchers. However, many treatments of factor analysis do not focus on inference about the latent variables, and even fewer do so from a Bayesian perspective. Researchers may therefore be ill-acquainted with Bayesian thinking on this issue, despite the fact that certain existing procedures may be seen as Bayesian to some extent. The focus of this paper is to provide a conceptual grounding for Bayesian inference for latent variables, articulating not only what Bayesian inference has to say about values for latent variables, but why Bayesian inference is suited for this problem. As to why, it is argued that the notion of exchangeability motivates the form of factor analysis, as well as Bayesian inference for latent variables. The argument is supported by documenting the widespread use of Bayesian inference in analogous settings, including latent variables in other measurement models, multilevel models, and missing data. As to what, this work describes a Bayesian analysis when other parameters are known, as well as partially and fully Bayesian analyses when other parameters are unknown. This facilitates a discussion of various choices researchers have when adopting Bayesian approaches to inference about latent variables.
{"title":"Conceptual Grounding for Bayesian Inference for Latent Variables in Factor Analysis","authors":"R. Levy","doi":"10.1080/15366367.2021.1996819","DOIUrl":"https://doi.org/10.1080/15366367.2021.1996819","url":null,"abstract":"ABSTRACT Obtaining values for latent variables in factor analysis models, also referred to as factor scores, has long been of interest to researchers. However, many treatments of factor analysis do not focus on inference about the latent variables, and even fewer do so from a Bayesian perspective. Researchers may therefore be ill-acquainted with Bayesian thinking on this issue, despite the fact that certain existing procedures may be seen as Bayesian to some extent. The focus of this paper is to provide a conceptual grounding for Bayesian inference for latent variables, articulating not only what Bayesian inference has to say about values for latent variables, but why Bayesian inference is suited for this problem. As to why, it is argued that the notion of exchangeability motivates the form of factor analysis, as well as Bayesian inference for latent variables. The argument is supported by documenting the widespread use of Bayesian inference in analogous settings, including latent variables in other measurement models, multilevel models, and missing data. As to what, this work describes a Bayesian analysis when other parameters are known, as well as partially and fully Bayesian analyses when other parameters are unknown. This facilitates a discussion of various choices researchers have when adopting Bayesian approaches to inference about latent variables.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"41 1","pages":"195 - 214"},"PeriodicalIF":1.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88876916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15366367.2022.2026736
Gregory M. Hurtz
ABSTRACT Item response theory (IRT) and Rasch models have many useful features for test development practitioners and measurement researchers, while some classical test theory (CTT) diagnostics remain useful for understanding items’ properties and sources of model misfit. In a relatively user-friendly fashion, the Xcalibre software estimates a number of dichotomous and polytomous IRT and Rasch models, and provides CTT statistics as well. This article reviews Xcalibre, including a review of the models it calibrates, its scoring methods, diagnostics for model fit analysis, differential item functioning analysis, output file contents, and an overview of setting up the analysis. Examples are provided for a multiple-choice knowledge test, an attitude measure with a rating scale, and an analysis involving anchored item parameters. Screenshots are provided of the graphical user interface screens and sections of the output to help readers understand the look and feel of the software and the types of output it provides.
{"title":"Xcalibre Item Parameter Calibration Software for Item Response Theory and Rasch Models","authors":"Gregory M. Hurtz","doi":"10.1080/15366367.2022.2026736","DOIUrl":"https://doi.org/10.1080/15366367.2022.2026736","url":null,"abstract":"ABSTRACT Item response theory (IRT) and Rasch models have many useful features for test development practitioners and measurement researchers, while some classical test theory (CTT) diagnostics remain useful for understanding items’ properties and sources of model misfit. In a relatively user-friendly fashion, the Xcalibre software estimates a number of dichotomous and polytomous IRT and Rasch models, and provides CTT statistics as well. This article reviews Xcalibre, including a review of the models it calibrates, its scoring methods, diagnostics for model fit analysis, differential item functioning analysis, output file contents, and an overview of setting up the analysis. Examples are provided for a multiple-choice knowledge test, an attitude measure with a rating scale, and an analysis involving anchored item parameters. Screenshots are provided of the graphical user interface screens and sections of the output to help readers understand the look and feel of the software and the types of output it provides.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"32 1","pages":"257 - 279"},"PeriodicalIF":1.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81125347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-03DOI: 10.1080/15366367.2021.1953315
B. Leventhal, Nikole Gregg, Allison J. Ames
ABSTRACT Response styles introduce construct-irrelevant variance as a result of respondents systematically responding to Likert-type items regardless of content. Methods to account for response styles through data analysis as well as approaches to mitigating the effects of response styles during data collection have been well-documented. Recent approaches to modeling Likert responses, such as the IRTree model, rely on the response process individuals take when answering item responses. In this study, we advocate for the use of IRTrees to analyze Likert items in addition to using the hypothesized response process to design new items. Combining these two approaches facilitates answering Likert item design questions that have plagued researchers. These include the interpretation of a middle response option, the optimal number of response options, and how to label the response options. We present 7 research questions that could be answered using this new approach, outline methods of data collection and analysis for each, and present results from an empirical example to address one of these seven questions.
{"title":"Accounting for Response Styles: Leveraging the Benefits of Combining Response Process Data Collection and Response Process Analysis Methods","authors":"B. Leventhal, Nikole Gregg, Allison J. Ames","doi":"10.1080/15366367.2021.1953315","DOIUrl":"https://doi.org/10.1080/15366367.2021.1953315","url":null,"abstract":"ABSTRACT Response styles introduce construct-irrelevant variance as a result of respondents systematically responding to Likert-type items regardless of content. Methods to account for response styles through data analysis as well as approaches to mitigating the effects of response styles during data collection have been well-documented. Recent approaches to modeling Likert responses, such as the IRTree model, rely on the response process individuals take when answering item responses. In this study, we advocate for the use of IRTrees to analyze Likert items in addition to using the hypothesized response process to design new items. Combining these two approaches facilitates answering Likert item design questions that have plagued researchers. These include the interpretation of a middle response option, the optimal number of response options, and how to label the response options. We present 7 research questions that could be answered using this new approach, outline methods of data collection and analysis for each, and present results from an empirical example to address one of these seven questions.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"17 1","pages":"151 - 174"},"PeriodicalIF":1.0,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82174021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-03DOI: 10.1080/15366367.2021.2018216
L. Feuerstahler
Educational and Psychological Measurement, written by W. Holmes Finch and Brian French and first published in 2018 by Routledge is a 17-chapter textbook that provides an accessible introduction to classical and modern psychometrics. In this book, the authors provide a broad overview of a wide range of topics and regularly suggest more specialized texts for readers seeking deeper understanding. This book is intended “for students at the graduate school level, and for researchers working in the field of educational and psychological measurement who need a broad resource for understanding test theory” (Finch & French, p. ix). I imagine that this text could also be appropriate for an advanced undergraduate course on educational or psychological measurement. Several aspects of this book were designed particularly for an audience with minimal experience working with statistics or mathematics. Most chapters include “How It Works” sections in which equations are worked out for the reader with example input values. In addition, most chapters include “Psychometrics in the Real World” sections that provide extended worked examples and interpretations. Although the text is accessible to an audience new to statistics, audiences with this background will still learn much from the authors’ clear but nuanced explanations of topics throughout the book. The following review is based on my reading of the e-version of this textbook as accessed through the VitalSource platform. However, errors and typos that I found in the e-version were cross-checked with a print copy of the book, and the vast majority exist in the same way in both versions. Further comments on the e-version are included toward the end of this review. The remainder of this review includes an overview of the book’s 17 chapters and supplemental resources, followed by a discussion of the book’s overall strengths and limitations and some concluding thoughts.
{"title":"Educational and Psychological Measurement","authors":"L. Feuerstahler","doi":"10.1080/15366367.2021.2018216","DOIUrl":"https://doi.org/10.1080/15366367.2021.2018216","url":null,"abstract":"Educational and Psychological Measurement, written by W. Holmes Finch and Brian French and first published in 2018 by Routledge is a 17-chapter textbook that provides an accessible introduction to classical and modern psychometrics. In this book, the authors provide a broad overview of a wide range of topics and regularly suggest more specialized texts for readers seeking deeper understanding. This book is intended “for students at the graduate school level, and for researchers working in the field of educational and psychological measurement who need a broad resource for understanding test theory” (Finch & French, p. ix). I imagine that this text could also be appropriate for an advanced undergraduate course on educational or psychological measurement. Several aspects of this book were designed particularly for an audience with minimal experience working with statistics or mathematics. Most chapters include “How It Works” sections in which equations are worked out for the reader with example input values. In addition, most chapters include “Psychometrics in the Real World” sections that provide extended worked examples and interpretations. Although the text is accessible to an audience new to statistics, audiences with this background will still learn much from the authors’ clear but nuanced explanations of topics throughout the book. The following review is based on my reading of the e-version of this textbook as accessed through the VitalSource platform. However, errors and typos that I found in the e-version were cross-checked with a print copy of the book, and the vast majority exist in the same way in both versions. Further comments on the e-version are included toward the end of this review. The remainder of this review includes an overview of the book’s 17 chapters and supplemental resources, followed by a discussion of the book’s overall strengths and limitations and some concluding thoughts.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"105 1","pages":"175 - 180"},"PeriodicalIF":1.0,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85154818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-03DOI: 10.1080/15366367.2022.2102722
{"title":"Now in JMP® Pro: Structual Equation Modeling","authors":"","doi":"10.1080/15366367.2022.2102722","DOIUrl":"https://doi.org/10.1080/15366367.2022.2102722","url":null,"abstract":"","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"39 1","pages":"1 - 1"},"PeriodicalIF":1.0,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88173313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-03DOI: 10.1080/15366367.2022.2025569
S. Y. Kim
ABSTRACT This article reviews the GENOVA Suite designed for generalizability theory analyses. The GENOVA Suite consists of three programs: GENOVA, urGENOVA, and mGENOVA. Key features and capabilities of the programs are presented and two illustrative example analyses are provided using mGENOVA. Additionally, comparisons with some existing programs are made.
{"title":"Using Generalizability Theory software suite: GENOVA, urGENOVA, and mGENOVA","authors":"S. Y. Kim","doi":"10.1080/15366367.2022.2025569","DOIUrl":"https://doi.org/10.1080/15366367.2022.2025569","url":null,"abstract":"ABSTRACT This article reviews the GENOVA Suite designed for generalizability theory analyses. The GENOVA Suite consists of three programs: GENOVA, urGENOVA, and mGENOVA. Key features and capabilities of the programs are presented and two illustrative example analyses are provided using mGENOVA. Additionally, comparisons with some existing programs are made.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"44 1","pages":"181 - 194"},"PeriodicalIF":1.0,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73842349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}