The WINSTEPS software is widely used for Rasch model calibrations. Recently, SAS/STAT released the PROC IRT procedure for IRT analysis, including Rasch. The purpose of the study is compare the performance of the PROC IRT procedure with WINSTEPS to calibrate dichotomous and polytomous Rasch models in order to diagnose the possibility of using PROC IRT as a viable alternative. A simulation study was used to compare the two programs in terms of the convergence rate, run time, item parameter estimates, and ability estimates with different test lengths and sample sizes. Implications of the results and the features of each software are discussed for research applications and large-scale assessment.
{"title":"Rasch Model Calibrations with SAS PROC IRT and WINSTEPS.","authors":"Ki Cole","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The WINSTEPS software is widely used for Rasch model calibrations. Recently, SAS/STAT released the PROC IRT procedure for IRT analysis, including Rasch. The purpose of the study is compare the performance of the PROC IRT procedure with WINSTEPS to calibrate dichotomous and polytomous Rasch models in order to diagnose the possibility of using PROC IRT as a viable alternative. A simulation study was used to compare the two programs in terms of the convergence rate, run time, item parameter estimates, and ability estimates with different test lengths and sample sizes. Implications of the results and the features of each software are discussed for research applications and large-scale assessment.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"20 1","pages":"27-45"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36986092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefanie A Wind, Behzad Mansouri, Parvaney Yaghoubi Jami
Isolated and integrated grammar instruction are two approaches to grammar teaching that can be implemented within a form-focused instruction (FFI) framework. In both approaches, instructors primarily concentrate on meaning, and the difference is in the timing of instruction on specific language forms. In previous studies, researchers have observed that the match between teachers' and learners' beliefs related to the effectiveness of instructional approaches is an important component in predicting the success of grammar instruction. In this study, we report on the psychometric properties of a questionnaire designed to measure students' perceptions of isolated and integrated FFI taking place in Iranian secondary schools. The Iranian context is interesting with regard to approaches to grammar instruction in light of recent policy reforms that emphasize isolated FFI. Using a combination of principal components analysis and Rasch measurement theory techniques, we observed that Iranian students distinguish among the two forms of grammar instruction. Looking within each approach, we observed significant differences among individual students as well as differences in the difficulty for students to endorse different instructional activities related to both isolated and integrated instruction. Together, our findings highlight the importance of examining students' beliefs about the effectiveness of approaches to grammar instruction within different instructional contexts. We discuss implications for research and practice.
{"title":"Student Perceptions of Grammar Instruction in Iranian Secondary Education: Evaluation of an Instrument using Rasch Measurement Theory.","authors":"Stefanie A Wind, Behzad Mansouri, Parvaney Yaghoubi Jami","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Isolated and integrated grammar instruction are two approaches to grammar teaching that can be implemented within a form-focused instruction (FFI) framework. In both approaches, instructors primarily concentrate on meaning, and the difference is in the timing of instruction on specific language forms. In previous studies, researchers have observed that the match between teachers' and learners' beliefs related to the effectiveness of instructional approaches is an important component in predicting the success of grammar instruction. In this study, we report on the psychometric properties of a questionnaire designed to measure students' perceptions of isolated and integrated FFI taking place in Iranian secondary schools. The Iranian context is interesting with regard to approaches to grammar instruction in light of recent policy reforms that emphasize isolated FFI. Using a combination of principal components analysis and Rasch measurement theory techniques, we observed that Iranian students distinguish among the two forms of grammar instruction. Looking within each approach, we observed significant differences among individual students as well as differences in the difficulty for students to endorse different instructional activities related to both isolated and integrated instruction. Together, our findings highlight the importance of examining students' beliefs about the effectiveness of approaches to grammar instruction within different instructional contexts. We discuss implications for research and practice.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"20 1","pages":"46-65"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36986093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Teachers are expected to use data and assessments to drive their instruction. This is accomplished at a classroom level via the assessment process. The teachers Knowledge and Use of Data and Assessment (tKUDA) measure was created to capture teachers' knowledge and use of this assessment process. This paper explores the measure's utility using Rasch analysis. Evidence of reliability and validity was seen for both knowledge and use factors. Scale was used as expected and item analyses demonstrates good spread with a few items identified for future revision. Item difficulty and results are connected back to literature. Findings support use of this measure to identify teachers' knowledge and use of data and assessment in classroom practice.
{"title":"Rasch Analysis of the Teachers' Knowledge and Use of Data and Assessment (tKUDA) Measure.","authors":"Courtney Donovan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Teachers are expected to use data and assessments to drive their instruction. This is accomplished at a classroom level via the assessment process. The teachers Knowledge and Use of Data and Assessment (tKUDA) measure was created to capture teachers' knowledge and use of this assessment process. This paper explores the measure's utility using Rasch analysis. Evidence of reliability and validity was seen for both knowledge and use factors. Scale was used as expected and item analyses demonstrates good spread with a few items identified for future revision. Item difficulty and results are connected back to literature. Findings support use of this measure to identify teachers' knowledge and use of data and assessment in classroom practice.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 1","pages":"76-92"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35932761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we consider hierarchical and higher-order factor models and the relationship between them, and, in particular, we use Rasch models to focus on the exploration of these models. We present these models, their similarities and/or differences from within the Rasch modeling perspective and discuss their use in various settings. One motivation for this work is that certain well-known similarities and differences between the equivalent models in the two-parameter logistic model (2PL) approach do not apply in the Rasch modeling tradition. Another motivation is that there is some ambiguity as to the potential uses of these models, and we seek to clarify those uses. In recent work in the Item Response Theory (IRT) literature, the estimation of these models has been mostly presented using the Bayesian framework: here we show the use of these models using traditional maximum likelihood methods. We also show how to re-parameterize these models, which in some cases can improve estimation and convergence. These alternative parameterizations are also useful in "translating" suggestions for the 2PL models to the Rasch tradition (since these suggestions involve the interpretation of item discriminations, which are required to be unity in the Rasch tradition). Alternative parameterizations can also be used to clarify the relationship among these models. We discuss the use of these models for modeling multidimensionality and testlet effects and compare the interpretation of the obtained solutions to the interpretation for the multidimenisional Rasch model - a more common approach for accounting multidimensionality in the Rasch tradition. We demonstrate the use of these models using the partial credit model.
{"title":"Hierarchical and Higher-Order Factor Structures in the Rasch Tradition: A Didactic.","authors":"Perman Gochyyev, Mark Wilson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In this paper, we consider hierarchical and higher-order factor models and the relationship between them, and, in particular, we use Rasch models to focus on the exploration of these models. We present these models, their similarities and/or differences from within the Rasch modeling perspective and discuss their use in various settings. One motivation for this work is that certain well-known similarities and differences between the equivalent models in the two-parameter logistic model (2PL) approach do not apply in the Rasch modeling tradition. Another motivation is that there is some ambiguity as to the potential uses of these models, and we seek to clarify those uses. In recent work in the Item Response Theory (IRT) literature, the estimation of these models has been mostly presented using the Bayesian framework: here we show the use of these models using traditional maximum likelihood methods. We also show how to re-parameterize these models, which in some cases can improve estimation and convergence. These alternative parameterizations are also useful in \"translating\" suggestions for the 2PL models to the Rasch tradition (since these suggestions involve the interpretation of item discriminations, which are required to be unity in the Rasch tradition). Alternative parameterizations can also be used to clarify the relationship among these models. We discuss the use of these models for modeling multidimensionality and testlet effects and compare the interpretation of the obtained solutions to the interpretation for the multidimenisional Rasch model - a more common approach for accounting multidimensionality in the Rasch tradition. We demonstrate the use of these models using the partial credit model.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 4","pages":"338-362"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36668563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Current Statistics Self-Efficacy (CSSE) scale, developed by Finney and Schraw (2003), is a 14-item instrument to assess students' statistics self-efficacy. No previous research has used the Rasch measurement models to evaluate the psychometric structure of its scores at the item level, and only a few of them have applied the CSSE in a graduate school setting. A modified 30-item CSSE scale was tested on a graduate student population (N = 179). The Rasch rating scale analysis identified 26 items forming a unidimensional measure. Assumptions of sample-free and test-free measurement were confirmed, showing scores from the CSSE-26 are reliable and valid to assess graduate students' level of statistics self-efficacy. Findings suggest the CSSE-26 could help facilitate professors' understanding and enhancement of students' statistics self-efficacy.
当前统计自我效能感量表(Current Statistics Self-Efficacy, CSSE)是芬尼和施劳(Finney and Schraw, 2003)开发的一个包含14个项目的测量学生统计自我效能感的工具。在以往的研究中,没有使用Rasch测量模型在项目水平上评估其得分的心理测量结构,只有少数研究将CSSE应用于研究生院环境。采用改良的30题CSSE量表对179名研究生进行测试。拉什评定量表分析确定了26个项目构成一个单维测量。验证了无样本和免检验的假设,表明CSSE-26的分数在评估研究生统计自我效能水平方面是可靠和有效的。研究结果表明,《统计自我效能量表》有助于教授对学生统计自我效能感的理解和提高。
{"title":"Psychometric Evaluation of the Revised Current Statistics Self-efficacy (CSSE-26) in a Graduate Student Population using Rasch Analysis.","authors":"Pei-Chin Lu, Samantha Estrada, Steven Pulos","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The Current Statistics Self-Efficacy (CSSE) scale, developed by Finney and Schraw (2003), is a 14-item instrument to assess students' statistics self-efficacy. No previous research has used the Rasch measurement models to evaluate the psychometric structure of its scores at the item level, and only a few of them have applied the CSSE in a graduate school setting. A modified 30-item CSSE scale was tested on a graduate student population (N = 179). The Rasch rating scale analysis identified 26 items forming a unidimensional measure. Assumptions of sample-free and test-free measurement were confirmed, showing scores from the CSSE-26 are reliable and valid to assess graduate students' level of statistics self-efficacy. Findings suggest the CSSE-26 could help facilitate professors' understanding and enhancement of students' statistics self-efficacy.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 2","pages":"201-215"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36215375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vernon Mogol, Yan Chen, Marcus Henning, Andy Wearn, Jennifer Weller, Jill Yielder, Warwich Bagg
The Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) was developed in 1998 using the true score theory to measure students' deep approaches (DA) and surface approaches (SA) to learning. Using Rasch analyses, this study aimed to 1) validate the R-SPQ-2F's two-factor structure, and 2) explore whether the full scale (FS), after reverse scoring responses to SA items, could measure learning approach as a uni-dimensional construct. University students (N = 327) completed an online version of the R-SPQ-2F. The researchers validated the R-SPQ-2F by showing that items on the three rating scales (DA, SA, and FS) had acceptable fit; both DA and FS, but not SA, showed acceptable targeting function; and all three scales had acceptable reliabilities (0.74 - 0.79). The DA and SA scales, not the FS, satisfied the unidimensionality requirement, supporting the claim that student approaches to learning are represented by DA and SA as separate constructs.
{"title":"Rasch Analysis of the Revised Two-Factor Study Process Questionnaire: A Validation Study.","authors":"Vernon Mogol, Yan Chen, Marcus Henning, Andy Wearn, Jennifer Weller, Jill Yielder, Warwich Bagg","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) was developed in 1998 using the true score theory to measure students' deep approaches (DA) and surface approaches (SA) to learning. Using Rasch analyses, this study aimed to 1) validate the R-SPQ-2F's two-factor structure, and 2) explore whether the full scale (FS), after reverse scoring responses to SA items, could measure learning approach as a uni-dimensional construct. University students (N = 327) completed an online version of the R-SPQ-2F. The researchers validated the R-SPQ-2F by showing that items on the three rating scales (DA, SA, and FS) had acceptable fit; both DA and FS, but not SA, showed acceptable targeting function; and all three scales had acceptable reliabilities (0.74 - 0.79). The DA and SA scales, not the FS, satisfied the unidimensionality requirement, supporting the claim that student approaches to learning are represented by DA and SA as separate constructs.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 4","pages":"428-441"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36729347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study examined a measurement model for the construct of consumer patriotism in the context of city-based consumers in Vietnam, a developing country, and the linkage of consumer patriotism with consumer ethnocentrism. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were conducted to assess the measurement model. A mediator effect test was utilised to test the hypothesis of the model, using a multiple regression procedure. Two studies were carried out, the first a preliminary study with a convenience sample of 230 people and the second a full study with a probability sample of 300 people. Both studies showed that there was an acceptable fit for the measurement model of consumer patriotism. In addition, consumer patriotism was found to be a mediator in the connection of natural patriotism and ethnocentrism for city-based Vietnamese consumers.
{"title":"A Measurement Model of City-Based Consumer Patriotism in Developing Countries: The Case of Vietnam.","authors":"Ngoc Chu Nguyen Mong, Trong Hoang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study examined a measurement model for the construct of consumer patriotism in the context of city-based consumers in Vietnam, a developing country, and the linkage of consumer patriotism with consumer ethnocentrism. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were conducted to assess the measurement model. A mediator effect test was utilised to test the hypothesis of the model, using a multiple regression procedure. Two studies were carried out, the first a preliminary study with a convenience sample of 230 people and the second a full study with a probability sample of 300 people. Both studies showed that there was an acceptable fit for the measurement model of consumer patriotism. In addition, consumer patriotism was found to be a mediator in the connection of natural patriotism and ethnocentrism for city-based Vietnamese consumers.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 4","pages":"442-459"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36729346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
By treating each examination as a polytomous item and a grade that a student achieved in the exam as a score on the item, the partial credit model (PCM) has been used to analyse data from examinations in 16 GCSE subjects taken by 16-year olds in England. These examinations are provided by four different exam boards. By further treating students taking the exams testing the same subject but provided by different exam boards as different subgroups, differential category functioning (DCF) analysis was used to investigate the comparability of standards at specific grades in the examinations between the exam boards. It was found that for most of the grades across the examinations, the magnitude of the DCF effect with respect to exam boards for the majority of the subjects studied is small, with the differences between grade difficulties for individual exam boards and the all-board difficulty in the unit of grade being less than one fifth of a grade. The effect of DCF varies between subjects and between grades within the same subject, with higher grades shown to be generally more comparable in standards than the lower grades between the exam boards.
{"title":"Using the Rasch Model to Investigate Inter-board Comparability of Examination Standards in GCSE.","authors":"Qingping He, Michelle Meadows","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>By treating each examination as a polytomous item and a grade that a student achieved in the exam as a score on the item, the partial credit model (PCM) has been used to analyse data from examinations in 16 GCSE subjects taken by 16-year olds in England. These examinations are provided by four different exam boards. By further treating students taking the exams testing the same subject but provided by different exam boards as different subgroups, differential category functioning (DCF) analysis was used to investigate the comparability of standards at specific grades in the examinations between the exam boards. It was found that for most of the grades across the examinations, the magnitude of the DCF effect with respect to exam boards for the majority of the subjects studied is small, with the differences between grade difficulties for individual exam boards and the all-board difficulty in the unit of grade being less than one fifth of a grade. The effect of DCF varies between subjects and between grades within the same subject, with higher grades shown to be generally more comparable in standards than the lower grades between the exam boards.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 2","pages":"129-147"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36215994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When selecting a design for rater-mediated assessments, one important consideration is the number of raters who rate each examinee. In balancing costs and rater-coverage, rating designs are often implemented wherein only a portion of the examinees are rated by each judge, resulting in large amounts of missing data. One drawback to these sparse rating designs is the reduced precision of examinee ability estimates they provide. When increasing the number of raters per examinee is not feasible, another option may be to increase the number of ratings provided by each rater per examinee. This study applies a Rasch model to explore the effect of increasing the number of rating occasions used by raters to judge examinee proficiency. We used a simulation study to approximate a sparse but connected rater network with a sequentially increasing number of repeated ratings per examinee. The generated data were used to explore the influence of repeated ratings on the precision of rater, examinee, and task parameter estimates as measured by parameter standard errors, the correlation of sparse parameter estimates to true estimates, and the root mean square error of parameter estimates. Results suggest that increasing the number of rating occasions significantly improves the precision of examinee and rater parameter estimates. Results also suggest that parameter recovery levels of rater and task estimates are quite robust to reductions in the number of repeated ratings, although examinee parameter estimates are more sensitive to them. Implications for research and practice in the context of rater-mediated assessment designs are discussed.
{"title":"Using Repeated Ratings to Improve Measurement Precision in Incomplete Rating Designs.","authors":"Eli Jones, Stefanie A Wind","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>When selecting a design for rater-mediated assessments, one important consideration is the number of raters who rate each examinee. In balancing costs and rater-coverage, rating designs are often implemented wherein only a portion of the examinees are rated by each judge, resulting in large amounts of missing data. One drawback to these sparse rating designs is the reduced precision of examinee ability estimates they provide. When increasing the number of raters per examinee is not feasible, another option may be to increase the number of ratings provided by each rater per examinee. This study applies a Rasch model to explore the effect of increasing the number of rating occasions used by raters to judge examinee proficiency. We used a simulation study to approximate a sparse but connected rater network with a sequentially increasing number of repeated ratings per examinee. The generated data were used to explore the influence of repeated ratings on the precision of rater, examinee, and task parameter estimates as measured by parameter standard errors, the correlation of sparse parameter estimates to true estimates, and the root mean square error of parameter estimates. Results suggest that increasing the number of rating occasions significantly improves the precision of examinee and rater parameter estimates. Results also suggest that parameter recovery levels of rater and task estimates are quite robust to reductions in the number of repeated ratings, although examinee parameter estimates are more sensitive to them. Implications for research and practice in the context of rater-mediated assessment designs are discussed.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 2","pages":"148-161"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36216474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rose E Stafford, Edward W Wolfe, Jodi M Casablanca, Tian Song
Previous research has shown that indices obtained from partial credit model (PCM) estimates can detect severity and centrality rater effects, though it remains unknown how rater effect detection is impacted by the missingness inherent in double-scoring rating designs. This simulation study evaluated the impact of missing data on rater severity and centrality detection. Data were generated for each rater effect type, which varied in rater pool quality, rater effect prevalence and magnitude, and extent of missingness. Raters were flagged using rater location as a severity indicator and the standard deviation of rater thresholds a centrality indicator. Two methods of identifying extreme scores on these indices were compared. Results indicate that both methods result in low Type I and Type II error rates (i.e., incorrectly flagging non-effect raters and not flagging effect raters) and that the presence of missing data has negligible impact on the detection of severe and central raters.
{"title":"Detecting Rater Effects under Rating Designs with Varying Levels of Missingness.","authors":"Rose E Stafford, Edward W Wolfe, Jodi M Casablanca, Tian Song","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Previous research has shown that indices obtained from partial credit model (PCM) estimates can detect severity and centrality rater effects, though it remains unknown how rater effect detection is impacted by the missingness inherent in double-scoring rating designs. This simulation study evaluated the impact of missing data on rater severity and centrality detection. Data were generated for each rater effect type, which varied in rater pool quality, rater effect prevalence and magnitude, and extent of missingness. Raters were flagged using rater location as a severity indicator and the standard deviation of rater thresholds a centrality indicator. Two methods of identifying extreme scores on these indices were compared. Results indicate that both methods result in low Type I and Type II error rates (i.e., incorrectly flagging non-effect raters and not flagging effect raters) and that the presence of missing data has negligible impact on the detection of severe and central raters.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 3","pages":"243-257"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36451138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}