首页 > 最新文献

Journal of applied measurement最新文献

英文 中文
Bootstrap Estimate of Bias for Intraclass Correlation. 类内相关偏差的自举估计。
Pub Date : 2020-01-01
Xiaofeng Steven Liu, Kelvin Terrell Pompey

The estimates of intraclass correlations are known to be biased, but there are few analytical ways to assess the amount of bias. The analytical approach requires the normality assumption to estimate bias. Bootstrap requires no such assumption and can, therefore, be used to estimate bias, regardless of the model assumption. We utilize cluster bootstrapping to calculate the bias in estimating the intraclass correlation. A well-known dataset is provided to illustrate the bias estimation in a typical study design of intraclass correlation, and its implications for other study designs are also discussed.

已知类内相关性的估计是有偏差的,但很少有分析方法来评估偏差的量。分析方法需要正态性假设来估计偏差。Bootstrap不需要这样的假设,因此,无论模型假设如何,都可以用来估计偏差。我们利用聚类自举来计算估计类内相关性的偏差。提供了一个著名的数据集来说明典型的类内相关研究设计中的偏差估计,并讨论了它对其他研究设计的影响。
{"title":"Bootstrap Estimate of Bias for Intraclass Correlation.","authors":"Xiaofeng Steven Liu,&nbsp;Kelvin Terrell Pompey","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The estimates of intraclass correlations are known to be biased, but there are few analytical ways to assess the amount of bias. The analytical approach requires the normality assumption to estimate bias. Bootstrap requires no such assumption and can, therefore, be used to estimate bias, regardless of the model assumption. We utilize cluster bootstrapping to calculate the bias in estimating the intraclass correlation. A well-known dataset is provided to illustrate the bias estimation in a typical study design of intraclass correlation, and its implications for other study designs are also discussed.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 1","pages":"101-108"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37704087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Impact of Multidimensionality on Type I and Type II Error Rates using the Q-Index Item Fit Statistic for the Rasch Model. 利用Rasch模型的Q-Index项目拟合统计量评估多维度对I型和II型错误率的影响。
Pub Date : 2020-01-01 DOI: 10.31219/osf.io/kh7vq
Samantha Estrada
To understand the role of fit statistics in Rasch measurement is simple: applied researchers can only benefit from the desirable properties of the Rasch model when the data fit the model. The purpose of the current study was to assess the Q-Index robustness (Ostini and Nering, 2006), and its performance was compared to the current popular fit statistics known as MSQ Infit, MSQ Outfit, and standardized Infit and Outfit (ZSTDs) under varying conditions of test length, sample size, item difficulty (normal and uniform), and dimensionality utilizing a Monte Carlo simulation. The Type I and Type II error rates are also examined across fit indices. This study provides applied researchers guidelines the robustness and appropriateness of the use of the Q-Index, which is an alternative to the currently available item fit statistics. The Q-Index was slightly more sensitive to the levels of multidimensionality set in the study while MSQ Infit, Outfit, and standardized Infit and Outfit (ZSTDs) failed to identify the multidimensional conditions. The Type I error rate of the Q-Index was lower than the rest of the fit indices; however, the Type II error rate was higher than the anticipated beta = .20 across all fit indices.
理解拟合统计在Rasch测量中的作用很简单:只有当数据与模型拟合时,应用研究人员才能从Rasch模型的理想特性中受益。本研究的目的是评估q指数的稳健性(Ostini和Nering, 2006),并将其性能与当前流行的拟合统计(MSQ Infit、MSQ Outfit和标准化Infit和Outfit (ZSTDs))在不同条件下的测试长度、样本量、项目难度(正常和均匀)和利用蒙特卡洛模拟的维度进行比较。类型I和类型II错误率也检查跨拟合指数。本研究为应用研究人员提供了使用q指数的稳健性和适当性的指导方针,这是目前可用的项目拟合统计的替代方法。在研究中,Q-Index对多维度设置的水平略敏感,而MSQ Infit、Outfit和标准化Infit和Outfit (ZSTDs)未能识别多维条件。q -指数的I型错误率低于其他拟合指数;然而,在所有拟合指数中,II型错误率高于预期的β = .20。
{"title":"Evaluating the Impact of Multidimensionality on Type I and Type II Error Rates using the Q-Index Item Fit Statistic for the Rasch Model.","authors":"Samantha Estrada","doi":"10.31219/osf.io/kh7vq","DOIUrl":"https://doi.org/10.31219/osf.io/kh7vq","url":null,"abstract":"To understand the role of fit statistics in Rasch measurement is simple: applied researchers can only benefit from the desirable properties of the Rasch model when the data fit the model. The purpose of the current study was to assess the Q-Index robustness (Ostini and Nering, 2006), and its performance was compared to the current popular fit statistics known as MSQ Infit, MSQ Outfit, and standardized Infit and Outfit (ZSTDs) under varying conditions of test length, sample size, item difficulty (normal and uniform), and dimensionality utilizing a Monte Carlo simulation. The Type I and Type II error rates are also examined across fit indices. This study provides applied researchers guidelines the robustness and appropriateness of the use of the Q-Index, which is an alternative to the currently available item fit statistics. The Q-Index was slightly more sensitive to the levels of multidimensionality set in the study while MSQ Infit, Outfit, and standardized Infit and Outfit (ZSTDs) failed to identify the multidimensional conditions. The Type I error rate of the Q-Index was lower than the rest of the fit indices; however, the Type II error rate was higher than the anticipated beta = .20 across all fit indices.","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 4 1","pages":"496-514"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69636647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining the Pre-service School Principals' Impromptu Speech Skills with a Many-Facet Rasch Model. 用多面Rasch模型考察职前学校校长的即席演讲技巧。
Pub Date : 2020-01-01
Mingchuan Hsieh, Akihito Kamata

The purpose of this study is to demonstrate an application of the many-facet Rasch model (MFRM) in evaluating the impromptu speech skills of pre-service principals in Taiwan. The findings showed that the topics of speech did not exhibit different difficulty measures. With respect to scoring criteria, time control was the most difficult aspect among the scoring criteria. Regarding gender difference in raters, female raters gave lower scores than male raters, but there was no statistical evidence for gender-related bias. However, raters exhibited statistically significant differences in rater severity. The results of this study demonstrates that the MFRM provides a scientific approach to assessment, which can reveal some useful diagnostic information from the original ordinal rating scores on impromptu speech.

摘要本研究旨在探讨多面向Rasch模型(MFRM)在评估台湾地区职前校长即席演讲技巧上的应用。研究结果表明,演讲主题并没有表现出不同的难度测量。就评分标准而言,时间控制是评分标准中最困难的方面。在评分者的性别差异方面,女性评分者的得分低于男性评分者,但没有统计学证据表明存在性别偏见。然而,评分者在评分严重程度上表现出统计学上的显著差异。本研究结果表明,MFRM提供了一种科学的评估方法,可以从原始的即兴演讲序数评分中揭示一些有用的诊断信息。
{"title":"Examining the Pre-service School Principals' Impromptu Speech Skills with a Many-Facet Rasch Model.","authors":"Mingchuan Hsieh,&nbsp;Akihito Kamata","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The purpose of this study is to demonstrate an application of the many-facet Rasch model (MFRM) in evaluating the impromptu speech skills of pre-service principals in Taiwan. The findings showed that the topics of speech did not exhibit different difficulty measures. With respect to scoring criteria, time control was the most difficult aspect among the scoring criteria. Regarding gender difference in raters, female raters gave lower scores than male raters, but there was no statistical evidence for gender-related bias. However, raters exhibited statistically significant differences in rater severity. The results of this study demonstrates that the MFRM provides a scientific approach to assessment, which can reveal some useful diagnostic information from the original ordinal rating scores on impromptu speech.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 3","pages":"282-293"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38978106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing Differential Statement Functioning in Polytomous Multidimensional Pairwise Comparison Items. 多元多维两两比较项目中差异陈述功能的评估。
Pub Date : 2020-01-01
Xue-Lan Qiu

Multidimensional pairwise comparison (MPC) items have been widely used to assess career interest, value and personality to avoid response bias in educational sectors. In reality, a statement in an MPC item may have different utilities for different groups, which is referred to as differential statement functioning (DSF). Few studies have been investigated DSF assessment. Based on a Rasch model for MPC items, this study adapts three methods to detect DSF for polytomous MPC items: the equal-mean-utility (EMU) method, the all-other-statement (AOS) method and the constant-statement (CS) method. Simulation study was conducted to evaluate the recovery of parameters as well as the performance of the proposed methods. Results showed that when the test contains DSF statement(s), the CS method where one or more DSF-free statements are chosen as an anchor will yield accurate estimates and perform well for DSF assessment. An empirical example of career interest assessment was provided. .

多维两两比较(MPC)项目被广泛应用于职业兴趣、价值和个性的评估,以避免教育部门的反应偏差。实际上,MPC项目中的语句对于不同的组可能具有不同的实用程序,这被称为差分语句功能(DSF)。关于DSF评价的研究很少。本研究基于MPC题项的Rasch模型,采用三种方法检测多同构MPC题项的DSF:等平均效用法(EMU)、全其他陈述法(AOS)和不变陈述法(CS)。通过仿真研究,对所提方法的参数恢复效果和性能进行了评价。结果表明,当测试包含DSF语句时,选择一个或多个无DSF语句作为锚点的CS方法将产生准确的估计,并在DSF评估中表现良好。最后给出了职业兴趣评价的实证实例。
{"title":"Assessing Differential Statement Functioning in Polytomous Multidimensional Pairwise Comparison Items.","authors":"Xue-Lan Qiu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Multidimensional pairwise comparison (MPC) items have been widely used to assess career interest, value and personality to avoid response bias in educational sectors. In reality, a statement in an MPC item may have different utilities for different groups, which is referred to as differential statement functioning (DSF). Few studies have been investigated DSF assessment. Based on a Rasch model for MPC items, this study adapts three methods to detect DSF for polytomous MPC items: the equal-mean-utility (EMU) method, the all-other-statement (AOS) method and the constant-statement (CS) method. Simulation study was conducted to evaluate the recovery of parameters as well as the performance of the proposed methods. Results showed that when the test contains DSF statement(s), the CS method where one or more DSF-free statements are chosen as an anchor will yield accurate estimates and perform well for DSF assessment. An empirical example of career interest assessment was provided. .</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 3","pages":"329-346"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38978109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Psychometric Replication of Fan (1998) Item Response Theory and Classical Test Theory: An Empirical Comparison of their Item/Person Statistics. 项目反应理论与经典测试理论:项目/人统计的实证比较。
Pub Date : 2020-01-01
Nicholas Marosszeky, E Arthur Shores, Michael P Jones, Rassoul Sadeghi

Streiner, Norman and Cairney (2015) "Health Measurement Scales: A practical guide to their development and use", now in its fifth edition, is one of the foundational texts of the health outcomes movement. It states that "the differences between scales constructed with IRT and CTT are trivial." (Streiner, Norman and Cairney, 2015, p. 299) This statement is representative of the view which emphasizes the equivalence of True-Score Theory (TST) (also known as Classical Test Theory [CTT]) and the Rasch Measurement Model [RMM]). This view is widely held and has been one factor in limiting the application of RMM in the development of health outcome measures. However, this equivalence view relies heavily on a paper by Fan (1998) which examined the item statistics derived from TST, IRT (Item Response Theory) and the RMM for a large educational dataset. While subject to a number of theoretical and practical criticisms from a RMM perspective this paper has not been replicated with a large sample. This paper by replicating and extending the paper by Fan (1998) challenges the finding that item difficulty indexes derived from high and low ability samples using TST techniques are invariant. They are not. On the other hand, item locations derived from the RMM have a high degree of invariance. This secondary data analysis, by working through the methods used by Fan (1998) also demonstrates that a reliance on the magnitude of correlational coefficients cannot be used to determine the invariance of item difficulty indexes. An investigation into the linearity of the correlations using scatter plots is also required. Finally, an item analysis derived from the item difficulty indexes which displays a picture of the test as a whole shows that, for this large sample, the differences between scales constructed with TST and the RMM are not trivial.

斯特雷纳、诺曼和凯恩(2015)《健康测量量表:编制和使用的实用指南》现已出版第五版,是健康成果运动的基础文本之一。它指出“用IRT和CTT构建的量表之间的差异是微不足道的。”(Streiner, Norman and Cairney, 2015, p. 299)这一说法代表了强调真分数理论(TST)(也称为经典测试理论[CTT])和Rasch测量模型[RMM]等价的观点。这一观点被广泛接受,并且是限制在制定保健结果措施中应用RMM的一个因素。然而,这种等效性观点在很大程度上依赖于范(1998)的一篇论文,该论文对大型教育数据集的项目统计数据进行了研究,这些统计数据来自TST、IRT(项目反应理论)和RMM。虽然从RMM的角度来看,这篇论文受到了许多理论和实践的批评,但没有得到大样本的复制。本文复制并扩展了Fan(1998)的论文,挑战了使用TST技术从高能力和低能力样本中得出的项目难度指数不变的发现。事实并非如此。另一方面,从RMM派生的项目位置具有高度的不变性。通过Fan(1998)使用的方法进行的二次数据分析也表明,不能使用对相关系数大小的依赖来确定项目难度指数的不变性。还需要使用散点图对相关性的线性进行调查。最后,从项目难度指数中得出的项目分析显示了测试的整体情况,对于这个大样本,用TST和RMM构建的量表之间的差异并非微不足道。
{"title":"A Psychometric Replication of Fan (1998) Item Response Theory and Classical Test Theory: An Empirical Comparison of their Item/Person Statistics.","authors":"Nicholas Marosszeky,&nbsp;E Arthur Shores,&nbsp;Michael P Jones,&nbsp;Rassoul Sadeghi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Streiner, Norman and Cairney (2015) \"Health Measurement Scales: A practical guide to their development and use\", now in its fifth edition, is one of the foundational texts of the health outcomes movement. It states that \"the differences between scales constructed with IRT and CTT are trivial.\" (Streiner, Norman and Cairney, 2015, p. 299) This statement is representative of the view which emphasizes the equivalence of True-Score Theory (TST) (also known as Classical Test Theory [CTT]) and the Rasch Measurement Model [RMM]). This view is widely held and has been one factor in limiting the application of RMM in the development of health outcome measures. However, this equivalence view relies heavily on a paper by Fan (1998) which examined the item statistics derived from TST, IRT (Item Response Theory) and the RMM for a large educational dataset. While subject to a number of theoretical and practical criticisms from a RMM perspective this paper has not been replicated with a large sample. This paper by replicating and extending the paper by Fan (1998) challenges the finding that item difficulty indexes derived from high and low ability samples using TST techniques are invariant. They are not. On the other hand, item locations derived from the RMM have a high degree of invariance. This secondary data analysis, by working through the methods used by Fan (1998) also demonstrates that a reliance on the magnitude of correlational coefficients cannot be used to determine the invariance of item difficulty indexes. An investigation into the linearity of the correlations using scatter plots is also required. Finally, an item analysis derived from the item difficulty indexes which displays a picture of the test as a whole shows that, for this large sample, the differences between scales constructed with TST and the RMM are not trivial.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 4","pages":"456-480"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38912689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A-priori Weighting of Items with the Rasch Model. 基于Rasch模型的项目先验加权。
Pub Date : 2020-01-01
David Andrich, Sonia Sappl

Many assessment scales in the social sciences are composed of multiple items that form a subscale structure. They have this structure because more than one aspect of the variable is assessed and more than one item assesses each aspect. Nevertheless, generally, a single measurement is required from the scale. A characteristic of this measurement is that the greater the number of items, and categories within an item, that assess an aspect, the greater its influence on the final measurement. One way to control this influence is to include the desired relative number of items and categories to assess each aspect in the scale. However, there are circumstances where designing the required number of items and categories for each aspect is challenging. This paper shows a method of controlling the influence of the number of items and categories assessing each aspect by a-priori weighting of items at the person measurement stage with the Rasch model.

社会科学的许多评估量表都是由多个项目组成的,这些项目形成了一个子量表结构。它们具有这种结构是因为评估变量的多个方面,并且多个项目评估每个方面。然而,一般来说,需要从秤上进行一次测量。这种测量的一个特点是,评估一个方面的项目和项目内的类别越多,其对最终测量的影响就越大。控制这种影响的一种方法是在量表中包括所需的项目和类别的相对数量,以评估每个方面。然而,在某些情况下,为每个方面设计所需数量的项目和类别是具有挑战性的。本文提出了一种控制项目数量和类别影响的方法,通过Rasch模型在人的测量阶段对项目进行先验加权来评估各个方面。
{"title":"A-priori Weighting of Items with the Rasch Model.","authors":"David Andrich,&nbsp;Sonia Sappl","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Many assessment scales in the social sciences are composed of multiple items that form a subscale structure. They have this structure because more than one aspect of the variable is assessed and more than one item assesses each aspect. Nevertheless, generally, a single measurement is required from the scale. A characteristic of this measurement is that the greater the number of items, and categories within an item, that assess an aspect, the greater its influence on the final measurement. One way to control this influence is to include the desired relative number of items and categories to assess each aspect in the scale. However, there are circumstances where designing the required number of items and categories for each aspect is challenging. This paper shows a method of controlling the influence of the number of items and categories assessing each aspect by a-priori weighting of items at the person measurement stage with the Rasch model.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 3","pages":"243-255"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38978134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alignment of a Language Instrument Scores to CEFR Levels: Methodological and Empirical Considerations. 语言工具分数与CEFR水平的一致性:方法和经验考虑。
Pub Date : 2020-01-01
Georgios D Sideridis, Abdulrahman Al-Samrani, Bjorn Norrbom

The purpose of the present report was to assess congruence between a language-based national examination (termed English placement test - EPT) and the Common European Framework of Reference for Languages (CEFR) levels. To this end, a series of methodological steps were put forth to accumulate evidence suggesting that language performance based on the EPT instrument can be split onto meaningful subgroups based on theoretical (expert judgement on difficulty level and CEFR correspondence) and empirical considerations (i.e., how well these levels and subgroups emerged). Participants were 2642 high school graduates who took on the EPT instrument as part of their entry criteria to the university and for the purposes of the present study only the structure subscale is presented. Items were classified as reflecting specific CEFR levels and a person-based analysis attempted to classify individuals sharing the same behavioral patterns. Results using a latent class analysis (LCA) indicated that a Pre-A1, an A1 an A2 a B1 and a B2 levels were present with regard to the structure domain of language. Results showed a strong alignment between the EPT structure domain and CEFR guidelines using various methodological approaches.

本报告的目的是评估以语文为基础的国家考试(称为英语分班考试- EPT)与欧洲语文共同参考框架水平之间的一致性。为此,研究人员提出了一系列方法步骤,以积累证据,表明基于EPT工具的语言表现可以根据理论(专家对难度水平和CEFR对应程度的判断)和经验考虑(即这些水平和子组的出现程度)划分为有意义的子组。参与者是2642名高中毕业生,他们将EPT作为大学入学标准的一部分,为了本研究的目的,只提出了结构子量表。项目被分类为反映特定的CEFR水平,基于人的分析试图对具有相同行为模式的个体进行分类。潜在类分析(LCA)结果表明,在语言结构领域存在Pre-A1、A1、A2、B1和B2水平。结果显示EPT结构域与CEFR指南之间有很强的一致性。
{"title":"Alignment of a Language Instrument Scores to CEFR Levels: Methodological and Empirical Considerations.","authors":"Georgios D Sideridis,&nbsp;Abdulrahman Al-Samrani,&nbsp;Bjorn Norrbom","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The purpose of the present report was to assess congruence between a language-based national examination (termed English placement test - EPT) and the Common European Framework of Reference for Languages (CEFR) levels. To this end, a series of methodological steps were put forth to accumulate evidence suggesting that language performance based on the EPT instrument can be split onto meaningful subgroups based on theoretical (expert judgement on difficulty level and CEFR correspondence) and empirical considerations (i.e., how well these levels and subgroups emerged). Participants were 2642 high school graduates who took on the EPT instrument as part of their entry criteria to the university and for the purposes of the present study only the structure subscale is presented. Items were classified as reflecting specific CEFR levels and a person-based analysis attempted to classify individuals sharing the same behavioral patterns. Results using a latent class analysis (LCA) indicated that a Pre-A1, an A1 an A2 a B1 and a B2 levels were present with regard to the structure domain of language. Results showed a strong alignment between the EPT structure domain and CEFR guidelines using various methodological approaches.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 1","pages":"68-90"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37704117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trade-Offs in the Implementation of Observational Ratings Systems. 实施观察评级系统的权衡。
Pub Date : 2020-01-01
Stephen M Ponisciak, Rob Meyer, Anna Brown, Tracy Schatzberg

A consensus has developed that high-quality teacher evaluation systems require multiple measures. We examine multiple measures from a large urban school district, which has included observational ratings and value-added ratings in its system since 2010. Evaluation systems that do not account for observer severity, classroom context, and other factors may yield different results from systems that do account for these factors. Choosing a simpler system involves a trade-off regarding a system's robustness or defensibility. Using a many-faceted Rasch model, we explore rating components like observer, time of year, and subdomain. We find high reliability of the resulting teacher ratings, some impact of adjusting for observer differences and differences between subdomains, and positive correlation with value-added measures. A comprehensive analysis like MFRM should be part of a district's evaluation system, even if only as a robustness check, and districts should examine how observational scores and classroom context are related.

一个共识是,高质量的教师评价体系需要多种措施。我们研究了一个大型城市学区的多项指标,该学区自2010年以来在其系统中包括观察评级和增值评级。不考虑观察者严重程度、课堂环境和其他因素的评估系统可能会产生与考虑这些因素的系统不同的结果。选择一个更简单的系统需要权衡系统的健壮性或可防御性。使用多面Rasch模型,我们探索了像观察者、一年中的时间和子域这样的评级组件。我们发现得出的教师评分具有很高的可靠性,对观察者差异和子域之间的差异进行调整有一定的影响,并且与增值措施呈正相关。像MFRM这样的综合分析应该成为学区评估系统的一部分,即使只是作为稳健性检查,学区也应该检查观察分数和课堂环境之间的关系。
{"title":"Trade-Offs in the Implementation of Observational Ratings Systems.","authors":"Stephen M Ponisciak,&nbsp;Rob Meyer,&nbsp;Anna Brown,&nbsp;Tracy Schatzberg","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A consensus has developed that high-quality teacher evaluation systems require multiple measures. We examine multiple measures from a large urban school district, which has included observational ratings and value-added ratings in its system since 2010. Evaluation systems that do not account for observer severity, classroom context, and other factors may yield different results from systems that do account for these factors. Choosing a simpler system involves a trade-off regarding a system's robustness or defensibility. Using a many-faceted Rasch model, we explore rating components like observer, time of year, and subdomain. We find high reliability of the resulting teacher ratings, some impact of adjusting for observer differences and differences between subdomains, and positive correlation with value-added measures. A comprehensive analysis like MFRM should be part of a district's evaluation system, even if only as a robustness check, and districts should examine how observational scores and classroom context are related.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 1","pages":"50-67"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37704116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing Causes of Dependency: Shared Latent Trait or Dependence on Observed Response. 比较依赖的原因:共同的潜在特质或对观察反应的依赖。
Pub Date : 2020-01-01
Christine E DeMars

Accurate parameter estimation in the Rasch model involves the assumption of conditional independence, also termed local independence. Conditional on ability, the responses to items A and B should be independent. Two types of conditional dependence are detailed in this pedagogical piece: trait dependency and response dependency. The bias in difficulty and reliability and the estimates of fit and correlated residuals resulting from these dependencies are compared and contrasted to results from using models that account for the dependency. Contrasts with results from a 2-parameter item response theory model are also briefly noted.

在Rasch模型中,精确的参数估计涉及到条件独立性的假设,也称为局部独立性。在能力的条件下,对A项和B项的反应应该是独立的。这篇教学文章详细介绍了两种类型的条件依赖:特质依赖和反应依赖。将这些依赖关系导致的难度和可靠性偏差以及拟合和相关残差的估计与使用考虑依赖关系的模型的结果进行比较和对比。并简要说明了与双参数项目反应理论模型结果的对比。
{"title":"Comparing Causes of Dependency: Shared Latent Trait or Dependence on Observed Response.","authors":"Christine E DeMars","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Accurate parameter estimation in the Rasch model involves the assumption of conditional independence, also termed local independence. Conditional on ability, the responses to items A and B should be independent. Two types of conditional dependence are detailed in this pedagogical piece: trait dependency and response dependency. The bias in difficulty and reliability and the estimates of fit and correlated residuals resulting from these dependencies are compared and contrasted to results from using models that account for the dependency. Contrasts with results from a 2-parameter item response theory model are also briefly noted.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 4","pages":"400-419"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38912686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diabetes Distress in Emerging Adults: Refining the Problem Areas in Diabetes-Emerging Adult Version using Rasch Analysis. 糖尿病困扰在新兴成人:提炼糖尿病的问题领域-新兴成人版本使用皮疹分析。
Pub Date : 2020-01-01
Katherine Wentzel, Judith A Vessey, Lori Laffel, Larry Ludlow

The emotional burden of living with Type 1 diabetes (T1D) is experienced differently in each life stage. Thus the measurement of diabetes distress (DD) warrants tailoring to particular developmental stages, specifically emerging adulthood (ages 18-30). The new measure entitled the Problem Areas in Diabetes- Emerging Adult version (PAID-EA) is intended to be a developmentally-embedded measure of DD for use in clinical and research settings. The goal of the present study was to use Rasch psychometric analysis to reduce and refine the PAID-EA. Emerging adults with T1D (n = 194) completed the 30-item online survey. Evaluation of response category functioning, measurement precision, redundancy, unidimensionality and targeting guided item reduction through iterative revisions. The reduced and refined PAID-EA consists of 25 items and shows promising utility for clinicians and researchers.

1型糖尿病患者(T1D)的情感负担在每个生命阶段都有不同的经历。因此,糖尿病痛苦(DD)的测量需要根据特定的发育阶段,特别是成年初期(18-30岁)进行调整。名为“糖尿病问题领域-新兴成人版”(PAID-EA)的新测量方法旨在成为临床和研究环境中使用的DD发展嵌入式测量方法。本研究的目的是使用Rasch心理测量分析来减少和完善付费ea。新发成年T1D患者(n = 194)完成了30项在线调查。通过迭代修正评价反应类别功能、测量精度、冗余度、单维性和目标导向项目缩减。简化和改进的付费ea包括25个项目,对临床医生和研究人员显示出有希望的效用。
{"title":"Diabetes Distress in Emerging Adults: Refining the Problem Areas in Diabetes-Emerging Adult Version using Rasch Analysis.","authors":"Katherine Wentzel,&nbsp;Judith A Vessey,&nbsp;Lori Laffel,&nbsp;Larry Ludlow","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The emotional burden of living with Type 1 diabetes (T1D) is experienced differently in each life stage. Thus the measurement of diabetes distress (DD) warrants tailoring to particular developmental stages, specifically emerging adulthood (ages 18-30). The new measure entitled the Problem Areas in Diabetes- Emerging Adult version (PAID-EA) is intended to be a developmentally-embedded measure of DD for use in clinical and research settings. The goal of the present study was to use Rasch psychometric analysis to reduce and refine the PAID-EA. Emerging adults with T1D (n = 194) completed the 30-item online survey. Evaluation of response category functioning, measurement precision, redundancy, unidimensionality and targeting guided item reduction through iterative revisions. The reduced and refined PAID-EA consists of 25 items and shows promising utility for clinicians and researchers.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"21 4","pages":"481-495"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38912690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of applied measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1