首页 > 最新文献

Journal of applied measurement最新文献

英文 中文
Rasch Model Calibrations with SAS PROC IRT and WINSTEPS. Rasch模型校准与SAS PROC IRT和WINSTEPS。
Pub Date : 2019-01-01
Ki Cole

The WINSTEPS software is widely used for Rasch model calibrations. Recently, SAS/STAT released the PROC IRT procedure for IRT analysis, including Rasch. The purpose of the study is compare the performance of the PROC IRT procedure with WINSTEPS to calibrate dichotomous and polytomous Rasch models in order to diagnose the possibility of using PROC IRT as a viable alternative. A simulation study was used to compare the two programs in terms of the convergence rate, run time, item parameter estimates, and ability estimates with different test lengths and sample sizes. Implications of the results and the features of each software are discussed for research applications and large-scale assessment.

WINSTEPS软件被广泛用于Rasch模型校准。最近SAS/STAT发布了包括Rasch在内的用于IRT分析的PROC IRT程序。本研究的目的是比较PROC IRT程序与WINSTEPS的性能,以校准二分和多分Rasch模型,以诊断使用PROC IRT作为可行替代方案的可能性。通过模拟研究,比较了两个程序在不同测试长度和样本量下的收敛速度、运行时间、项目参数估计和能力估计。结果的含义和每个软件的特点,讨论了研究应用和大规模评估。
{"title":"Rasch Model Calibrations with SAS PROC IRT and WINSTEPS.","authors":"Ki Cole","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The WINSTEPS software is widely used for Rasch model calibrations. Recently, SAS/STAT released the PROC IRT procedure for IRT analysis, including Rasch. The purpose of the study is compare the performance of the PROC IRT procedure with WINSTEPS to calibrate dichotomous and polytomous Rasch models in order to diagnose the possibility of using PROC IRT as a viable alternative. A simulation study was used to compare the two programs in terms of the convergence rate, run time, item parameter estimates, and ability estimates with different test lengths and sample sizes. Implications of the results and the features of each software are discussed for research applications and large-scale assessment.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"20 1","pages":"27-45"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36986092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Student Perceptions of Grammar Instruction in Iranian Secondary Education: Evaluation of an Instrument using Rasch Measurement Theory. 伊朗中学教育中学生对语法教学的认知:用Rasch测量理论评价一种工具。
Pub Date : 2019-01-01
Stefanie A Wind, Behzad Mansouri, Parvaney Yaghoubi Jami

Isolated and integrated grammar instruction are two approaches to grammar teaching that can be implemented within a form-focused instruction (FFI) framework. In both approaches, instructors primarily concentrate on meaning, and the difference is in the timing of instruction on specific language forms. In previous studies, researchers have observed that the match between teachers' and learners' beliefs related to the effectiveness of instructional approaches is an important component in predicting the success of grammar instruction. In this study, we report on the psychometric properties of a questionnaire designed to measure students' perceptions of isolated and integrated FFI taking place in Iranian secondary schools. The Iranian context is interesting with regard to approaches to grammar instruction in light of recent policy reforms that emphasize isolated FFI. Using a combination of principal components analysis and Rasch measurement theory techniques, we observed that Iranian students distinguish among the two forms of grammar instruction. Looking within each approach, we observed significant differences among individual students as well as differences in the difficulty for students to endorse different instructional activities related to both isolated and integrated instruction. Together, our findings highlight the importance of examining students' beliefs about the effectiveness of approaches to grammar instruction within different instructional contexts. We discuss implications for research and practice.

孤立语法教学和综合语法教学是在以形式为中心的教学框架下可以实施的两种语法教学方法。在这两种方法中,教师都主要关注意义,区别在于对特定语言形式的教学时间。在以往的研究中,研究人员已经观察到,教师和学习者对教学方法有效性的信念之间的匹配是预测语法教学成功的重要组成部分。在这项研究中,我们报告了一份问卷的心理测量特性,该问卷旨在测量伊朗中学学生对孤立和综合FFI的看法。鉴于最近强调孤立FFI的政策改革,伊朗在语法教学方法方面的背景很有趣。使用主成分分析和Rasch测量理论技术的组合,我们观察到伊朗学生区分两种形式的语法教学。在每一种方法中,我们观察到学生个体之间的显著差异,以及学生认可与孤立教学和综合教学相关的不同教学活动的难度差异。总之,我们的研究结果强调了在不同的教学环境中检查学生对语法教学方法有效性的信念的重要性。我们讨论了对研究和实践的影响。
{"title":"Student Perceptions of Grammar Instruction in Iranian Secondary Education: Evaluation of an Instrument using Rasch Measurement Theory.","authors":"Stefanie A Wind,&nbsp;Behzad Mansouri,&nbsp;Parvaney Yaghoubi Jami","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Isolated and integrated grammar instruction are two approaches to grammar teaching that can be implemented within a form-focused instruction (FFI) framework. In both approaches, instructors primarily concentrate on meaning, and the difference is in the timing of instruction on specific language forms. In previous studies, researchers have observed that the match between teachers' and learners' beliefs related to the effectiveness of instructional approaches is an important component in predicting the success of grammar instruction. In this study, we report on the psychometric properties of a questionnaire designed to measure students' perceptions of isolated and integrated FFI taking place in Iranian secondary schools. The Iranian context is interesting with regard to approaches to grammar instruction in light of recent policy reforms that emphasize isolated FFI. Using a combination of principal components analysis and Rasch measurement theory techniques, we observed that Iranian students distinguish among the two forms of grammar instruction. Looking within each approach, we observed significant differences among individual students as well as differences in the difficulty for students to endorse different instructional activities related to both isolated and integrated instruction. Together, our findings highlight the importance of examining students' beliefs about the effectiveness of approaches to grammar instruction within different instructional contexts. We discuss implications for research and practice.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"20 1","pages":"46-65"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36986093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rasch Analysis of the Teachers' Knowledge and Use of Data and Assessment (tKUDA) Measure. 教师知识与使用数据与评估(tKUDA)测量的拉什分析。
Pub Date : 2018-01-01
Courtney Donovan

Teachers are expected to use data and assessments to drive their instruction. This is accomplished at a classroom level via the assessment process. The teachers Knowledge and Use of Data and Assessment (tKUDA) measure was created to capture teachers' knowledge and use of this assessment process. This paper explores the measure's utility using Rasch analysis. Evidence of reliability and validity was seen for both knowledge and use factors. Scale was used as expected and item analyses demonstrates good spread with a few items identified for future revision. Item difficulty and results are connected back to literature. Findings support use of this measure to identify teachers' knowledge and use of data and assessment in classroom practice.

教师被期望使用数据和评估来推动他们的教学。这是通过评估过程在课堂上完成的。教师的知识和使用数据和评估(tKUDA)措施的创建是为了捕捉教师的知识和使用这一评估过程。本文运用Rasch分析法探讨了该测度的效用。在知识因素和使用因素上都有信度和效度的证据。量表按预期使用,项目分析显示了良好的传播,确定了几个项目供未来修订。项目难度和结果与文献有关。调查结果支持使用这一措施来确定教师的知识以及在课堂实践中使用数据和评估。
{"title":"Rasch Analysis of the Teachers' Knowledge and Use of Data and Assessment (tKUDA) Measure.","authors":"Courtney Donovan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Teachers are expected to use data and assessments to drive their instruction. This is accomplished at a classroom level via the assessment process. The teachers Knowledge and Use of Data and Assessment (tKUDA) measure was created to capture teachers' knowledge and use of this assessment process. This paper explores the measure's utility using Rasch analysis. Evidence of reliability and validity was seen for both knowledge and use factors. Scale was used as expected and item analyses demonstrates good spread with a few items identified for future revision. Item difficulty and results are connected back to literature. Findings support use of this measure to identify teachers' knowledge and use of data and assessment in classroom practice.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 1","pages":"76-92"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35932761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical and Higher-Order Factor Structures in the Rasch Tradition: A Didactic. 拉希传统中的等级和高阶因子结构:一种说教。
Pub Date : 2018-01-01
Perman Gochyyev, Mark Wilson

In this paper, we consider hierarchical and higher-order factor models and the relationship between them, and, in particular, we use Rasch models to focus on the exploration of these models. We present these models, their similarities and/or differences from within the Rasch modeling perspective and discuss their use in various settings. One motivation for this work is that certain well-known similarities and differences between the equivalent models in the two-parameter logistic model (2PL) approach do not apply in the Rasch modeling tradition. Another motivation is that there is some ambiguity as to the potential uses of these models, and we seek to clarify those uses. In recent work in the Item Response Theory (IRT) literature, the estimation of these models has been mostly presented using the Bayesian framework: here we show the use of these models using traditional maximum likelihood methods. We also show how to re-parameterize these models, which in some cases can improve estimation and convergence. These alternative parameterizations are also useful in "translating" suggestions for the 2PL models to the Rasch tradition (since these suggestions involve the interpretation of item discriminations, which are required to be unity in the Rasch tradition). Alternative parameterizations can also be used to clarify the relationship among these models. We discuss the use of these models for modeling multidimensionality and testlet effects and compare the interpretation of the obtained solutions to the interpretation for the multidimenisional Rasch model - a more common approach for accounting multidimensionality in the Rasch tradition. We demonstrate the use of these models using the partial credit model.

在本文中,我们考虑了层次和高阶因子模型以及它们之间的关系,特别是我们使用Rasch模型来重点探索这些模型。我们从Rasch建模的角度介绍了这些模型,它们的相似性和/或差异,并讨论了它们在各种环境中的使用。这项工作的一个动机是,在双参数逻辑模型(2PL)方法中,等效模型之间的某些众所周知的相似性和差异性并不适用于Rasch建模传统。另一个动机是,这些模型的潜在用途有一些模糊,我们试图澄清这些用途。在最近的项目反应理论(IRT)文献中,这些模型的估计主要是使用贝叶斯框架提出的:在这里,我们展示了使用传统的最大似然方法来使用这些模型。我们还展示了如何重新参数化这些模型,这在某些情况下可以改善估计和收敛性。这些可选的参数化在将2PL模型的建议“翻译”到Rasch传统中也很有用(因为这些建议涉及对项目区分的解释,这在Rasch传统中需要是统一的)。还可以使用其他参数化来澄清这些模型之间的关系。我们讨论了这些模型对多维度和测试效应建模的使用,并将所获得的解的解释与多维Rasch模型的解释进行了比较-这是Rasch传统中用于计算多维度的一种更常见的方法。我们使用部分信用模型来演示这些模型的使用。
{"title":"Hierarchical and Higher-Order Factor Structures in the Rasch Tradition: A Didactic.","authors":"Perman Gochyyev,&nbsp;Mark Wilson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In this paper, we consider hierarchical and higher-order factor models and the relationship between them, and, in particular, we use Rasch models to focus on the exploration of these models. We present these models, their similarities and/or differences from within the Rasch modeling perspective and discuss their use in various settings. One motivation for this work is that certain well-known similarities and differences between the equivalent models in the two-parameter logistic model (2PL) approach do not apply in the Rasch modeling tradition. Another motivation is that there is some ambiguity as to the potential uses of these models, and we seek to clarify those uses. In recent work in the Item Response Theory (IRT) literature, the estimation of these models has been mostly presented using the Bayesian framework: here we show the use of these models using traditional maximum likelihood methods. We also show how to re-parameterize these models, which in some cases can improve estimation and convergence. These alternative parameterizations are also useful in \"translating\" suggestions for the 2PL models to the Rasch tradition (since these suggestions involve the interpretation of item discriminations, which are required to be unity in the Rasch tradition). Alternative parameterizations can also be used to clarify the relationship among these models. We discuss the use of these models for modeling multidimensionality and testlet effects and compare the interpretation of the obtained solutions to the interpretation for the multidimenisional Rasch model - a more common approach for accounting multidimensionality in the Rasch tradition. We demonstrate the use of these models using the partial credit model.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 4","pages":"338-362"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36668563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Psychometric Evaluation of the Revised Current Statistics Self-efficacy (CSSE-26) in a Graduate Student Population using Rasch Analysis. 用Rasch分析法对研究生群体的自我效能感(CSSE-26)进行修正的心理测量学评价。
Pub Date : 2018-01-01
Pei-Chin Lu, Samantha Estrada, Steven Pulos

The Current Statistics Self-Efficacy (CSSE) scale, developed by Finney and Schraw (2003), is a 14-item instrument to assess students' statistics self-efficacy. No previous research has used the Rasch measurement models to evaluate the psychometric structure of its scores at the item level, and only a few of them have applied the CSSE in a graduate school setting. A modified 30-item CSSE scale was tested on a graduate student population (N = 179). The Rasch rating scale analysis identified 26 items forming a unidimensional measure. Assumptions of sample-free and test-free measurement were confirmed, showing scores from the CSSE-26 are reliable and valid to assess graduate students' level of statistics self-efficacy. Findings suggest the CSSE-26 could help facilitate professors' understanding and enhancement of students' statistics self-efficacy.

当前统计自我效能感量表(Current Statistics Self-Efficacy, CSSE)是芬尼和施劳(Finney and Schraw, 2003)开发的一个包含14个项目的测量学生统计自我效能感的工具。在以往的研究中,没有使用Rasch测量模型在项目水平上评估其得分的心理测量结构,只有少数研究将CSSE应用于研究生院环境。采用改良的30题CSSE量表对179名研究生进行测试。拉什评定量表分析确定了26个项目构成一个单维测量。验证了无样本和免检验的假设,表明CSSE-26的分数在评估研究生统计自我效能水平方面是可靠和有效的。研究结果表明,《统计自我效能量表》有助于教授对学生统计自我效能感的理解和提高。
{"title":"Psychometric Evaluation of the Revised Current Statistics Self-efficacy (CSSE-26) in a Graduate Student Population using Rasch Analysis.","authors":"Pei-Chin Lu,&nbsp;Samantha Estrada,&nbsp;Steven Pulos","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The Current Statistics Self-Efficacy (CSSE) scale, developed by Finney and Schraw (2003), is a 14-item instrument to assess students' statistics self-efficacy. No previous research has used the Rasch measurement models to evaluate the psychometric structure of its scores at the item level, and only a few of them have applied the CSSE in a graduate school setting. A modified 30-item CSSE scale was tested on a graduate student population (N = 179). The Rasch rating scale analysis identified 26 items forming a unidimensional measure. Assumptions of sample-free and test-free measurement were confirmed, showing scores from the CSSE-26 are reliable and valid to assess graduate students' level of statistics self-efficacy. Findings suggest the CSSE-26 could help facilitate professors' understanding and enhancement of students' statistics self-efficacy.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 2","pages":"201-215"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36215375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rasch Analysis of the Revised Two-Factor Study Process Questionnaire: A Validation Study. 修订后的双因素研究过程问卷的Rasch分析:一项验证性研究。
Pub Date : 2018-01-01
Vernon Mogol, Yan Chen, Marcus Henning, Andy Wearn, Jennifer Weller, Jill Yielder, Warwich Bagg

The Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) was developed in 1998 using the true score theory to measure students' deep approaches (DA) and surface approaches (SA) to learning. Using Rasch analyses, this study aimed to 1) validate the R-SPQ-2F's two-factor structure, and 2) explore whether the full scale (FS), after reverse scoring responses to SA items, could measure learning approach as a uni-dimensional construct. University students (N = 327) completed an online version of the R-SPQ-2F. The researchers validated the R-SPQ-2F by showing that items on the three rating scales (DA, SA, and FS) had acceptable fit; both DA and FS, but not SA, showed acceptable targeting function; and all three scales had acceptable reliabilities (0.74 - 0.79). The DA and SA scales, not the FS, satisfied the unidimensionality requirement, supporting the claim that student approaches to learning are represented by DA and SA as separate constructs.

修订后的双因素学习过程问卷(R-SPQ-2F)是在1998年开发的,使用真分理论来衡量学生的深度方法(DA)和表面方法(SA)的学习。本研究采用Rasch分析,旨在验证R-SPQ-2F的双因子结构,并探讨对SA项目进行反向评分后的完整量表(FS)是否可以作为一个单维结构来衡量学习方法。大学生(N = 327)完成了R-SPQ-2F的在线版本。研究人员通过显示三个评定量表(DA、SA和FS)上的项目具有可接受的拟合来验证R-SPQ-2F;DA和FS均具有良好的靶向功能,SA不具有;三种量表均具有可接受信度(0.74 ~ 0.79)。DA和SA量表满足单维性要求,而不是FS,这支持了学生的学习方法是由DA和SA作为单独的构念来表示的说法。
{"title":"Rasch Analysis of the Revised Two-Factor Study Process Questionnaire: A Validation Study.","authors":"Vernon Mogol,&nbsp;Yan Chen,&nbsp;Marcus Henning,&nbsp;Andy Wearn,&nbsp;Jennifer Weller,&nbsp;Jill Yielder,&nbsp;Warwich Bagg","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) was developed in 1998 using the true score theory to measure students' deep approaches (DA) and surface approaches (SA) to learning. Using Rasch analyses, this study aimed to 1) validate the R-SPQ-2F's two-factor structure, and 2) explore whether the full scale (FS), after reverse scoring responses to SA items, could measure learning approach as a uni-dimensional construct. University students (N = 327) completed an online version of the R-SPQ-2F. The researchers validated the R-SPQ-2F by showing that items on the three rating scales (DA, SA, and FS) had acceptable fit; both DA and FS, but not SA, showed acceptable targeting function; and all three scales had acceptable reliabilities (0.74 - 0.79). The DA and SA scales, not the FS, satisfied the unidimensionality requirement, supporting the claim that student approaches to learning are represented by DA and SA as separate constructs.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 4","pages":"428-441"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36729347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Measurement Model of City-Based Consumer Patriotism in Developing Countries: The Case of Vietnam. 发展中国家城市消费者爱国主义的测量模型:以越南为例。
Pub Date : 2018-01-01
Ngoc Chu Nguyen Mong, Trong Hoang

This study examined a measurement model for the construct of consumer patriotism in the context of city-based consumers in Vietnam, a developing country, and the linkage of consumer patriotism with consumer ethnocentrism. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were conducted to assess the measurement model. A mediator effect test was utilised to test the hypothesis of the model, using a multiple regression procedure. Two studies were carried out, the first a preliminary study with a convenience sample of 230 people and the second a full study with a probability sample of 300 people. Both studies showed that there was an acceptable fit for the measurement model of consumer patriotism. In addition, consumer patriotism was found to be a mediator in the connection of natural patriotism and ethnocentrism for city-based Vietnamese consumers.

本研究检视发展中国家越南城市消费者爱国主义建构的测量模型,以及消费者爱国主义与消费者民族中心主义的关联性。采用探索性因子分析(EFA)和验证性因子分析(CFA)对测量模型进行评估。使用多元回归程序,采用中介效应检验来检验模型的假设。进行了两项研究,第一个是230人的便利样本的初步研究,第二个是300人的概率样本的全面研究。两项研究均表明,消费者爱国主义的测量模型具有可接受的拟合性。此外,消费者爱国主义在越南城市消费者的自然爱国主义与民族中心主义之间起中介作用。
{"title":"A Measurement Model of City-Based Consumer Patriotism in Developing Countries: The Case of Vietnam.","authors":"Ngoc Chu Nguyen Mong,&nbsp;Trong Hoang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study examined a measurement model for the construct of consumer patriotism in the context of city-based consumers in Vietnam, a developing country, and the linkage of consumer patriotism with consumer ethnocentrism. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were conducted to assess the measurement model. A mediator effect test was utilised to test the hypothesis of the model, using a multiple regression procedure. Two studies were carried out, the first a preliminary study with a convenience sample of 230 people and the second a full study with a probability sample of 300 people. Both studies showed that there was an acceptable fit for the measurement model of consumer patriotism. In addition, consumer patriotism was found to be a mediator in the connection of natural patriotism and ethnocentrism for city-based Vietnamese consumers.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 4","pages":"442-459"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36729346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using the Rasch Model to Investigate Inter-board Comparability of Examination Standards in GCSE. 用Rasch模型研究GCSE考试标准的校际可比性。
Pub Date : 2018-01-01
Qingping He, Michelle Meadows

By treating each examination as a polytomous item and a grade that a student achieved in the exam as a score on the item, the partial credit model (PCM) has been used to analyse data from examinations in 16 GCSE subjects taken by 16-year olds in England. These examinations are provided by four different exam boards. By further treating students taking the exams testing the same subject but provided by different exam boards as different subgroups, differential category functioning (DCF) analysis was used to investigate the comparability of standards at specific grades in the examinations between the exam boards. It was found that for most of the grades across the examinations, the magnitude of the DCF effect with respect to exam boards for the majority of the subjects studied is small, with the differences between grade difficulties for individual exam boards and the all-board difficulty in the unit of grade being less than one fifth of a grade. The effect of DCF varies between subjects and between grades within the same subject, with higher grades shown to be generally more comparable in standards than the lower grades between the exam boards.

部分学分模型(PCM)通过将每个考试视为一个多分制项目,并将学生在考试中取得的成绩作为该项目的分数,用于分析英国16岁学生参加的16门普通中等教育证书科目的考试数据。这些考试由四个不同的考试委员会提供。通过进一步将参加同一科目考试但由不同考试委员会提供的考试的学生作为不同的亚组,差分类别功能(DCF)分析用于调查考试委员会之间特定年级考试标准的可比性。研究发现,对于考试中的大多数年级,大多数科目的DCF效应对考试委员会的影响程度很小,个别考试委员会的年级难度与以年级为单位的全体班级难度之间的差异小于五分之一。DCF的效果在不同科目之间和同一科目的不同年级之间有所不同,在考试委员会之间,较高的年级通常比较低的年级在标准上更具可比性。
{"title":"Using the Rasch Model to Investigate Inter-board Comparability of Examination Standards in GCSE.","authors":"Qingping He,&nbsp;Michelle Meadows","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>By treating each examination as a polytomous item and a grade that a student achieved in the exam as a score on the item, the partial credit model (PCM) has been used to analyse data from examinations in 16 GCSE subjects taken by 16-year olds in England. These examinations are provided by four different exam boards. By further treating students taking the exams testing the same subject but provided by different exam boards as different subgroups, differential category functioning (DCF) analysis was used to investigate the comparability of standards at specific grades in the examinations between the exam boards. It was found that for most of the grades across the examinations, the magnitude of the DCF effect with respect to exam boards for the majority of the subjects studied is small, with the differences between grade difficulties for individual exam boards and the all-board difficulty in the unit of grade being less than one fifth of a grade. The effect of DCF varies between subjects and between grades within the same subject, with higher grades shown to be generally more comparable in standards than the lower grades between the exam boards.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 2","pages":"129-147"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36215994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Repeated Ratings to Improve Measurement Precision in Incomplete Rating Designs. 在不完全额定值设计中使用重复额定值来提高测量精度。
Pub Date : 2018-01-01
Eli Jones, Stefanie A Wind

When selecting a design for rater-mediated assessments, one important consideration is the number of raters who rate each examinee. In balancing costs and rater-coverage, rating designs are often implemented wherein only a portion of the examinees are rated by each judge, resulting in large amounts of missing data. One drawback to these sparse rating designs is the reduced precision of examinee ability estimates they provide. When increasing the number of raters per examinee is not feasible, another option may be to increase the number of ratings provided by each rater per examinee. This study applies a Rasch model to explore the effect of increasing the number of rating occasions used by raters to judge examinee proficiency. We used a simulation study to approximate a sparse but connected rater network with a sequentially increasing number of repeated ratings per examinee. The generated data were used to explore the influence of repeated ratings on the precision of rater, examinee, and task parameter estimates as measured by parameter standard errors, the correlation of sparse parameter estimates to true estimates, and the root mean square error of parameter estimates. Results suggest that increasing the number of rating occasions significantly improves the precision of examinee and rater parameter estimates. Results also suggest that parameter recovery levels of rater and task estimates are quite robust to reductions in the number of repeated ratings, although examinee parameter estimates are more sensitive to them. Implications for research and practice in the context of rater-mediated assessment designs are discussed.

在为评分中介评估选择设计时,一个重要的考虑因素是为每个考生评分的评分员的数量。为了平衡成本和评分覆盖率,评分设计通常是由每位评委只对一部分考生进行评分,导致大量数据缺失。这些稀疏评级设计的一个缺点是,它们提供的考生能力估计的精度降低了。当增加每个考生的评分员数量不可行的时候,另一个选择可能是增加每个考生的评分员提供的评分数量。本研究运用Rasch模型,探讨评核员增加评核次数对考生熟练程度的影响。我们使用模拟研究来近似一个稀疏但连接的评分网络,每个考生的重复评分数量依次增加。通过参数标准误差、稀疏参数估计值与真实估计值的相关性以及参数估计值的均方根误差,利用生成的数据探讨重复评分对评分者、考生和任务参数估计值精度的影响。结果表明,增加评分次数可以显著提高考生和评分者参数估计的精度。结果还表明,评分者和任务估计的参数恢复水平对重复评分数量的减少相当稳健,尽管考生参数估计对它们更为敏感。对研究和实践的影响在评级中介的评估设计的背景下进行了讨论。
{"title":"Using Repeated Ratings to Improve Measurement Precision in Incomplete Rating Designs.","authors":"Eli Jones,&nbsp;Stefanie A Wind","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>When selecting a design for rater-mediated assessments, one important consideration is the number of raters who rate each examinee. In balancing costs and rater-coverage, rating designs are often implemented wherein only a portion of the examinees are rated by each judge, resulting in large amounts of missing data. One drawback to these sparse rating designs is the reduced precision of examinee ability estimates they provide. When increasing the number of raters per examinee is not feasible, another option may be to increase the number of ratings provided by each rater per examinee. This study applies a Rasch model to explore the effect of increasing the number of rating occasions used by raters to judge examinee proficiency. We used a simulation study to approximate a sparse but connected rater network with a sequentially increasing number of repeated ratings per examinee. The generated data were used to explore the influence of repeated ratings on the precision of rater, examinee, and task parameter estimates as measured by parameter standard errors, the correlation of sparse parameter estimates to true estimates, and the root mean square error of parameter estimates. Results suggest that increasing the number of rating occasions significantly improves the precision of examinee and rater parameter estimates. Results also suggest that parameter recovery levels of rater and task estimates are quite robust to reductions in the number of repeated ratings, although examinee parameter estimates are more sensitive to them. Implications for research and practice in the context of rater-mediated assessment designs are discussed.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 2","pages":"148-161"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36216474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Rater Effects under Rating Designs with Varying Levels of Missingness. 在不同缺失程度的评分设计下检测评分者效应。
Pub Date : 2018-01-01
Rose E Stafford, Edward W Wolfe, Jodi M Casablanca, Tian Song

Previous research has shown that indices obtained from partial credit model (PCM) estimates can detect severity and centrality rater effects, though it remains unknown how rater effect detection is impacted by the missingness inherent in double-scoring rating designs. This simulation study evaluated the impact of missing data on rater severity and centrality detection. Data were generated for each rater effect type, which varied in rater pool quality, rater effect prevalence and magnitude, and extent of missingness. Raters were flagged using rater location as a severity indicator and the standard deviation of rater thresholds a centrality indicator. Two methods of identifying extreme scores on these indices were compared. Results indicate that both methods result in low Type I and Type II error rates (i.e., incorrectly flagging non-effect raters and not flagging effect raters) and that the presence of missing data has negligible impact on the detection of severe and central raters.

先前的研究表明,从部分信用模型(PCM)估计中获得的指数可以检测严重性和中心性评分效应,尽管仍然不清楚双重评分评分设计中固有的缺失如何影响评分效应检测。该模拟研究评估了缺失数据对严重程度和中心性检测的影响。为每种评级效应类型生成数据,这些数据在评级池质量、评级效应的流行程度和程度以及缺失程度等方面有所不同。使用评分者位置作为严重程度指标和评分者阈值的标准偏差作为中心性指标来标记评分者。比较了确定这些指标极值分数的两种方法。结果表明,这两种方法都会导致较低的I型和II型错误率(即错误地标记非效果评级者和未标记效果评级者),并且缺失数据的存在对检测严重和中心评级者的影响可以忽略不计。
{"title":"Detecting Rater Effects under Rating Designs with Varying Levels of Missingness.","authors":"Rose E Stafford,&nbsp;Edward W Wolfe,&nbsp;Jodi M Casablanca,&nbsp;Tian Song","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Previous research has shown that indices obtained from partial credit model (PCM) estimates can detect severity and centrality rater effects, though it remains unknown how rater effect detection is impacted by the missingness inherent in double-scoring rating designs. This simulation study evaluated the impact of missing data on rater severity and centrality detection. Data were generated for each rater effect type, which varied in rater pool quality, rater effect prevalence and magnitude, and extent of missingness. Raters were flagged using rater location as a severity indicator and the standard deviation of rater thresholds a centrality indicator. Two methods of identifying extreme scores on these indices were compared. Results indicate that both methods result in low Type I and Type II error rates (i.e., incorrectly flagging non-effect raters and not flagging effect raters) and that the presence of missing data has negligible impact on the detection of severe and central raters.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 3","pages":"243-257"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36451138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of applied measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1