首页 > 最新文献

Journal of applied measurement最新文献

英文 中文
Ordered Partition Model for Confidence Marking Modeling. 置信标记建模的有序划分模型。
Pub Date : 2017-01-01
Oliver Prosperi

Confidence marking is increasingly used in multiple choice testing situations, but when the Rasch measurement model is applied to the data, only the binary data is used, discarding the information given by the confidence marking. This study shows how Wilson's ordered partition model (OPM), a member of the Rasch family of models, can be used to model the confidence information. The result is a model which is in strict relation to the binary Rasch model, since the Rasch ICC's are "split" into a set of curves each representing a confidence level. The new model provides a set of item parameters that map the probability of being in each confidence level in relation to the test-taker's ability. The study provides a powerful diagnostic tool to assess item difficulty, overconfidence or misuse of confidence levels but also the fact that a question is particularly tricky or creates a lot of doubt.

置信标记越来越多地应用于选择题测试中,但是当Rasch测量模型应用于数据时,只使用二进制数据,而丢弃了置信标记给出的信息。本研究展示了Wilson的有序划分模型(OPM)是Rasch模型族中的一员,它可以用来建模置信信息。结果是一个与二进制Rasch模型严格相关的模型,因为Rasch ICC被“分割”成一组曲线,每个曲线代表一个置信水平。新模型提供了一组项目参数,这些参数映射了与考生能力相关的每个置信度的概率。这项研究提供了一个强大的诊断工具来评估题目的难度、过度自信或滥用自信水平,以及一个问题特别棘手或产生了很多疑问的事实。
{"title":"Ordered Partition Model for Confidence Marking Modeling.","authors":"Oliver Prosperi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Confidence marking is increasingly used in multiple choice testing situations, but when the Rasch measurement model is applied to the data, only the binary data is used, discarding the information given by the confidence marking. This study shows how Wilson's ordered partition model (OPM), a member of the Rasch family of models, can be used to model the confidence information. The result is a model which is in strict relation to the binary Rasch model, since the Rasch ICC's are \"split\" into a set of curves each representing a confidence level. The new model provides a set of item parameters that map the probability of being in each confidence level in relation to the test-taker's ability. The study provides a powerful diagnostic tool to assess item difficulty, overconfidence or misuse of confidence levels but also the fact that a question is particularly tricky or creates a lot of doubt.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 3","pages":"319-359"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35948999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Model-Data Fit by Comparing Parametric and Nonparametric Item Response Functions: Application of a Tukey-Hann Procedure. 通过比较参数和非参数项目响应函数来评估模型-数据拟合:Tukey-Hann程序的应用。
Pub Date : 2017-01-01
Jeremy Kyle Jennings, George Engelhard

This study describes an approach for examining model-data fit for the dichotomous Rasch model using Tukey-Hann item response functions (TH-IRFs). The procedure proposed in this paper is based on an iterative version of a smoothing technique proposed by Tukey (1977) for estimating nonparametric item response functions (IRFs). A root integrated squared error (RISE) statistic (Douglas and Cohen, 2001) is used to compare the TH-IRFs to the Rasch IRFs. Data from undergraduate students at a large university are used to demonstrate this iterative smoothing technique. The RISE statistic is used for comparing the item response functions to assess model-data fit. A comparison between the residual based Infit and Outfit statistics and RISE statistics are also examined. The results suggest that the RISE statistic and TH-IRFs provide a useful analytical and graphical approach for evaluating item fit. Implications for research, theory and practice related to model-data fit are discussed.

本研究描述了一种使用Tukey-Hann项目响应函数(th - irf)检查二分类Rasch模型模型数据拟合的方法。本文提出的程序是基于Tukey(1977)提出的用于估计非参数项目响应函数(irf)的平滑技术的迭代版本。根积分平方误差(RISE)统计(Douglas and Cohen, 2001)用于比较TH-IRFs和Rasch IRFs。本文使用了一所大型大学本科生的数据来演示这种迭代平滑技术。RISE统计量用于比较项目响应函数以评估模型-数据拟合。基于残差的Infit和Outfit统计与RISE统计之间的比较也进行了检查。结果表明,RISE统计和th - irf为评价项目拟合提供了一种有用的分析和图解方法。讨论了模型-数据拟合对研究、理论和实践的影响。
{"title":"Evaluating Model-Data Fit by Comparing Parametric and Nonparametric Item Response Functions: Application of a Tukey-Hann Procedure.","authors":"Jeremy Kyle Jennings,&nbsp;George Engelhard","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study describes an approach for examining model-data fit for the dichotomous Rasch model using Tukey-Hann item response functions (TH-IRFs). The procedure proposed in this paper is based on an iterative version of a smoothing technique proposed by Tukey (1977) for estimating nonparametric item response functions (IRFs). A root integrated squared error (RISE) statistic (Douglas and Cohen, 2001) is used to compare the TH-IRFs to the Rasch IRFs. Data from undergraduate students at a large university are used to demonstrate this iterative smoothing technique. The RISE statistic is used for comparing the item response functions to assess model-data fit. A comparison between the residual based Infit and Outfit statistics and RISE statistics are also examined. The results suggest that the RISE statistic and TH-IRFs provide a useful analytical and graphical approach for evaluating item fit. Implications for research, theory and practice related to model-data fit are discussed.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 1","pages":"54-66"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34950746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PSM7 and PSM8: Validating Two Problem-solving Measures. PSM7和PSM8:验证两个解决问题的措施。
Pub Date : 2017-01-01
Jonathan D Bostic, Toni A Sondergeld, Timothy Folger, Lance Kruse

New mathematics standards were adopted broadly across the United States of America between 2011-2013. Problem solving is a central facet of these new standards. Given new standards and the prominence of mathematical problem solving, there is a need for valid and reliable assessments that measure students' abilities related to those standards. Moreover, Rasch measurement techniques support psychometric analyses during validation studies, effectively measuring students' and items' properties in ways not afforded by true score theory. This manuscript builds upon past research (see Bostic and Sondergeld, 2015a, 2015b) with a validity study of two related problem-solving measures for grades seven and eight. Results from this validation study indicated that the problem-solving measures for grades seven and eight had sufficient evidence for their use.

2011-2013年,美国广泛采用了新的数学标准。解决问题是这些新标准的一个核心方面。考虑到新的标准和数学解决问题的重要性,我们需要有效和可靠的评估来衡量学生与这些标准相关的能力。此外,Rasch测量技术在验证研究中支持心理测量分析,有效地测量学生和项目的属性,这是真实得分理论所不能提供的。本文建立在过去的研究基础上(见Bostic和Sondergeld, 2015a, 2015b),并对七年级和八年级的两个相关问题解决措施进行了有效性研究。本验证性研究的结果表明,七年级和八年级的问题解决措施有足够的证据证明他们的使用。
{"title":"PSM7 and PSM8: Validating Two Problem-solving Measures.","authors":"Jonathan D Bostic,&nbsp;Toni A Sondergeld,&nbsp;Timothy Folger,&nbsp;Lance Kruse","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>New mathematics standards were adopted broadly across the United States of America between 2011-2013. Problem solving is a central facet of these new standards. Given new standards and the prominence of mathematical problem solving, there is a need for valid and reliable assessments that measure students' abilities related to those standards. Moreover, Rasch measurement techniques support psychometric analyses during validation studies, effectively measuring students' and items' properties in ways not afforded by true score theory. This manuscript builds upon past research (see Bostic and Sondergeld, 2015a, 2015b) with a validity study of two related problem-solving measures for grades seven and eight. Results from this validation study indicated that the problem-solving measures for grades seven and eight had sufficient evidence for their use.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 2","pages":"151-162"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35454485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Psychometric Validation of the 10-item Connor-Davidson Resilience Scale. 十项康诺-戴维森弹性量表的心理计量学检验。
Pub Date : 2017-01-01
John Ehrich, Angela Mornane, Tim Powern

Resilience is the personality trait of having positive dispositions which enable individuals to cope with stressful situations. Hence, a reliable resilience scale can provide useful information on understanding and treating individuals suffering from stress and trauma. The 10-item Connor-Davidson Resiliance Scale (CD-RISC-10) is a candidate scale. However, very little psychometric research has been conducted on this scale and, moreover, psychometric analyses to date have not been conclusive. To attain further evidence of the scale's psychometric properties, we tested the CD-RISC-10 on 288 adult Education major students at an Australian University using both traditional (factor analyses) and modern (Rasch) measurement approaches. Factor analyses indicated good psychometric functioning of the scale. However, Rasch modelling revealed evidence of item misfit and multiple dimensions. Optimal performance was achieved after the removal of two misfitting items indicating a well-functioning 8-item scale.

适应力是一种具有积极倾向的人格特征,它使个人能够应对压力。因此,一个可靠的弹性量表可以为理解和治疗遭受压力和创伤的个体提供有用的信息。包含10个项目的康纳-戴维森弹性量表(CD-RISC-10)是一个候选量表。然而,在这个尺度上进行的心理测量研究很少,而且,心理测量分析迄今为止还没有定论。为了进一步证明该量表的心理测量特性,我们使用传统(因子分析)和现代(Rasch)测量方法对澳大利亚一所大学的288名成人教育专业学生进行了CD-RISC-10测试。因子分析表明该量表具有良好的心理测量功能。然而,Rasch模型揭示了项目不匹配和多维度的证据。在去除两个不合适的项目后,达到了最佳性能,这表明一个功能良好的8项量表。
{"title":"Psychometric Validation of the 10-item Connor-Davidson Resilience Scale.","authors":"John Ehrich,&nbsp;Angela Mornane,&nbsp;Tim Powern","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Resilience is the personality trait of having positive dispositions which enable individuals to cope with stressful situations. Hence, a reliable resilience scale can provide useful information on understanding and treating individuals suffering from stress and trauma. The 10-item Connor-Davidson Resiliance Scale (CD-RISC-10) is a candidate scale. However, very little psychometric research has been conducted on this scale and, moreover, psychometric analyses to date have not been conclusive. To attain further evidence of the scale's psychometric properties, we tested the CD-RISC-10 on 288 adult Education major students at an Australian University using both traditional (factor analyses) and modern (Rasch) measurement approaches. Factor analyses indicated good psychometric functioning of the scale. However, Rasch modelling revealed evidence of item misfit and multiple dimensions. Optimal performance was achieved after the removal of two misfitting items indicating a well-functioning 8-item scale.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 2","pages":"122-136"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35454043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scale Anchoring with the Rasch Model. 用Rasch模型进行尺度锚定。
Pub Date : 2017-01-01
Adam E Wyse

Scale anchoring is a method to provide additional meaning to particular scores at different points along a score scale by identifying representative items associated with the particular scores. These items are then analyzed to write statements of what types of performance can be expected of a person with the particular scores to help test takers and other stakeholders better understand what it means to achieve the different scores. This article provides simple formulas that can be used to identify possible items to serve as scale anchors with the Rasch model. Specific attention is given to practical considerations and challenges that may be encountered when applying the formulas in different contexts. An illustrative example using data from a medical imaging certification program demonstrates how the formulas can be applied in practice.

尺度锚定是一种通过识别与特定分数相关的代表性项目,为得分量表上不同点的特定分数提供额外含义的方法。然后对这些项目进行分析,写出具有特定分数的人可以期望的表现类型的陈述,以帮助考生和其他利益相关者更好地理解达到不同分数意味着什么。本文提供了一些简单的公式,可用于识别可能用作Rasch模型的比例锚点的项目。特别注意在不同情况下应用公式时可能遇到的实际考虑和挑战。一个使用医学成像认证程序数据的说明性示例演示了公式如何在实践中应用。
{"title":"Scale Anchoring with the Rasch Model.","authors":"Adam E Wyse","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Scale anchoring is a method to provide additional meaning to particular scores at different points along a score scale by identifying representative items associated with the particular scores. These items are then analyzed to write statements of what types of performance can be expected of a person with the particular scores to help test takers and other stakeholders better understand what it means to achieve the different scores. This article provides simple formulas that can be used to identify possible items to serve as scale anchors with the Rasch model. Specific attention is given to practical considerations and challenges that may be encountered when applying the formulas in different contexts. An illustrative example using data from a medical imaging certification program demonstrates how the formulas can be applied in practice.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 1","pages":"43-53"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34950745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rasch Derived Teachers' Emotions Questionnaire. Rasch衍生教师情绪问卷。
Pub Date : 2017-01-01
Kristin L K Koskey, Renee R Mudrey, Wondimu Ahmed

The purpose of this research was to estimate the reliability of the scores produced from and validity of the inferences drawn from the revised 90-item Teachers' Emotion Questionnaire consisting of three measures: frequency of emotional expressivity, self-efficacy for regulation of emotional expressivity when teaching, and self-efficacy for regulation of context-specific emotional expressivity. A void exists in an instrument assessing teachers' regulation and communication of their emotions. One-hundred seventeen practicing teachers participated in this study at Time 1 and 46 at Time 2. Rasch rating scale analyses indicated sufficient item and person separation and reliability and some support for the construct validity of the inferences drawn from the measures. Test re-test reliability for the person estimates was supported for all three measures over a four-week period: r=.592, p<.001, r=.473, p<.01, and r=.641, p<.001, respectively. Concurrent validity for the self-efficacy for regulation of emotional expressivity when teaching measure with the re-appraisal and suppression sub-scales on the Emotional Regulation Questionnaire (Gross and John, 2003) was supported at Time 1. Modifications to rating scales and future directions for assessing teachers' emotions based on these results are discussed.

本研究的目的是评估修订后的教师情绪问卷(共90项,包括情绪表达频率、教学时情绪表达调节的自我效能感和特定情境情绪表达调节的自我效能感三个测量项)所得分数的信度和推断的效度。评估教师情绪调节和沟通的工具存在空白。117名在职教师在时间1和46名时间2参加了这项研究。Rasch量表分析显示,从测量中得出的推论具有足够的项目和人的分离度和信度,并对其结构效度有一定的支持。在四周的时间里,所有三种测量方法都支持人员估计的测试再测试信度:r=。592页
{"title":"Rasch Derived Teachers' Emotions Questionnaire.","authors":"Kristin L K Koskey,&nbsp;Renee R Mudrey,&nbsp;Wondimu Ahmed","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The purpose of this research was to estimate the reliability of the scores produced from and validity of the inferences drawn from the revised 90-item Teachers' Emotion Questionnaire consisting of three measures: frequency of emotional expressivity, self-efficacy for regulation of emotional expressivity when teaching, and self-efficacy for regulation of context-specific emotional expressivity. A void exists in an instrument assessing teachers' regulation and communication of their emotions. One-hundred seventeen practicing teachers participated in this study at Time 1 and 46 at Time 2. Rasch rating scale analyses indicated sufficient item and person separation and reliability and some support for the construct validity of the inferences drawn from the measures. Test re-test reliability for the person estimates was supported for all three measures over a four-week period: r=.592, p<.001, r=.473, p<.01, and r=.641, p<.001, respectively. Concurrent validity for the self-efficacy for regulation of emotional expressivity when teaching measure with the re-appraisal and suppression sub-scales on the Emotional Regulation Questionnaire (Gross and John, 2003) was supported at Time 1. Modifications to rating scales and future directions for assessing teachers' emotions based on these results are discussed.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 1","pages":"67-86"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34950748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
I'm scared to go to School! Capturing the Effects of Chronic Daily Fears on Students' Concept of Self. 我害怕去上学!捕捉慢性日常恐惧对学生自我概念的影响。
Pub Date : 2017-01-01
Rense Lange, Cynthia Martinez-Garrido, Alexandre Ventura

Students may experience considerable fear and stress in school settings, and based on Dweck's (2006) notion of "mindset" we hypothesized that fear introduces qualitative changes in students' self-concepts. Hypotheses were tested on 3847 third-grade students from nine Iberoamerican countries (Bolivia, Chile, Colombia, Cuba, Ecuador, Panama, Peru, Spain, and Venezuela), who completed Murillo's (2007) adaptation of Marsh' (1988) SDQ-I. Rasch scaling indicated that the information-content of High-Fear students' ratings was more localized across the latent dimension than was that of Low-Fear students, and their ratings also showed less cognitive variety. The resulting measurement distortions could be captured via logistic regression over the ratings' residuals. Also, using training and validation samples (with respectively 60 and 40% of all cases), it proved possible to predict students' fear levels and their gender. We see the present findings as a first step towards implementing an online warning and detection system for signs of bullying among students.

学生在学校环境中可能会经历相当大的恐惧和压力,基于Dweck(2006)的“心态”概念,我们假设恐惧会导致学生自我概念的质变。我们对来自9个伊比利亚美洲国家(玻利维亚、智利、哥伦比亚、古巴、厄瓜多尔、巴拿马、秘鲁、西班牙和委内瑞拉)的3847名三年级学生进行了假设检验,这些学生完成了Murillo(2007)对Marsh (1988) sdq - 1的改编。Rasch量表显示,与低恐惧学生相比,高恐惧学生评分的信息含量在潜在维度上更局部化,而且他们的评分也显示出较少的认知多样性。由此产生的测量失真可以通过对评级残差的逻辑回归来捕获。此外,使用训练和验证样本(分别占所有案例的60%和40%),可以预测学生的恐惧程度和性别。我们认为目前的发现是实施在线警告和检测系统的第一步,以发现学生中的欺凌迹象。
{"title":"I'm scared to go to School! Capturing the Effects of Chronic Daily Fears on Students' Concept of Self.","authors":"Rense Lange,&nbsp;Cynthia Martinez-Garrido,&nbsp;Alexandre Ventura","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Students may experience considerable fear and stress in school settings, and based on Dweck's (2006) notion of \"mindset\" we hypothesized that fear introduces qualitative changes in students' self-concepts. Hypotheses were tested on 3847 third-grade students from nine Iberoamerican countries (Bolivia, Chile, Colombia, Cuba, Ecuador, Panama, Peru, Spain, and Venezuela), who completed Murillo's (2007) adaptation of Marsh' (1988) SDQ-I. Rasch scaling indicated that the information-content of High-Fear students' ratings was more localized across the latent dimension than was that of Low-Fear students, and their ratings also showed less cognitive variety. The resulting measurement distortions could be captured via logistic regression over the ratings' residuals. Also, using training and validation samples (with respectively 60 and 40% of all cases), it proved possible to predict students' fear levels and their gender. We see the present findings as a first step towards implementing an online warning and detection system for signs of bullying among students.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 4","pages":"420-433"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35665983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stability of INFIT and OUTFIT Compared to Simulated Estimates in Applied Setting. 应用环境下INFIT和OUTFIT的稳定性与模拟估计的比较。
Pub Date : 2017-01-01
Kari J Hodge, Grant B Morgan

Residual-based fit statistics are commonly used as an indication of the extent to which the item response data fit the Rash model. Fit statistic estimates are influenced by sample size and rules-of thumb estimates may result in incorrect conclusions about the extent to which the model fits the data. Estimates obtained in this analysis were compared to 250 simulated data sets to examine the stability of the estimates. All INFIT estimates were within the rule-of-thumb range of 0.7 to 1.3. However, only 82% of the INFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's INFIT distributions using this 95% confidence-like interval. This is a 18 percentage point difference in items that were classified as acceptable. Fourty-eight percent of OUTFIT estimates fell within the 0.7 to 1.3 rule- of-thumb range. Whereas 34% of OUTFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's OUTFIT distributions. This is a 13 percentage point difference in items that were classified as acceptable. When using the rule-of- thumb ranges for fit estimates the magnitude of misfit was smaller than with the 95% confidence interval of the simulated distribution. The findings indicate that the use of confidence intervals as critical values for fit statistics leads to different model data fit conclusions than traditional rule of thumb critical values.

基于残差的拟合统计通常被用作项目反应数据与拉什模型拟合程度的指示。拟合统计估计受样本量的影响,经验法则估计可能导致关于模型拟合数据程度的错误结论。将分析中获得的估计值与250个模拟数据集进行比较,以检查估计值的稳定性。所有INFIT估计都在0.7到1.3的经验范围内。然而,使用95%置信区间,只有82%的INFIT估计值落在模拟项目的INFIT分布的2.5和97.5%之间。在可接受的项目中,这是18个百分点的差异。OUTFIT估计的48%落在0.7到1.3的经验范围内。而34%的OUTFIT估计落在模拟项目的OUTFIT分布的2.5和97.5%之间。在可接受的项目中,这是13个百分点的差异。当使用经验法则范围进行拟合估计时,失配的幅度小于模拟分布的95%置信区间。研究结果表明,与传统的经验法则临界值相比,使用置信区间作为拟合统计的临界值导致模型数据拟合结论不同。
{"title":"Stability of INFIT and OUTFIT Compared to Simulated Estimates in Applied Setting.","authors":"Kari J Hodge,&nbsp;Grant B Morgan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Residual-based fit statistics are commonly used as an indication of the extent to which the item response data fit the Rash model. Fit statistic estimates are influenced by sample size and rules-of thumb estimates may result in incorrect conclusions about the extent to which the model fits the data. Estimates obtained in this analysis were compared to 250 simulated data sets to examine the stability of the estimates. All INFIT estimates were within the rule-of-thumb range of 0.7 to 1.3. However, only 82% of the INFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's INFIT distributions using this 95% confidence-like interval. This is a 18 percentage point difference in items that were classified as acceptable. Fourty-eight percent of OUTFIT estimates fell within the 0.7 to 1.3 rule- of-thumb range. Whereas 34% of OUTFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's OUTFIT distributions. This is a 13 percentage point difference in items that were classified as acceptable. When using the rule-of- thumb ranges for fit estimates the magnitude of misfit was smaller than with the 95% confidence interval of the simulated distribution. The findings indicate that the use of confidence intervals as critical values for fit statistics leads to different model data fit conclusions than traditional rule of thumb critical values.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 4","pages":"383-392"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35666121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Q-Matrix Optimization Based on the Linear Logistic Test Model. 基于线性Logistic检验模型的q -矩阵优化。
Pub Date : 2017-01-01
Lin Ma, Kelly E Green

This study explored optimization of item-attribute matrices with the linear logistic test model (Fischer, 1973), with optimal models explaining more variance in item difficulty due to identified item attributes. Data were 8th-grade mathematics test item responses of two TIMSS 2007 booklets. The study investigated three categories of attributes (content, cognitive process, and comprehensive cognitive process) at two grain levels (larger, smaller) and also compared results with random attribute matrices. The proposed attributes accounted for most of the variance in item difficulty for two assessment booklets (81% and 65%). The variance explained by the content attributes was very small (13% to 31%), less than variance explained by the comprehensive cognitive process attributes which explained much more variance than the content and cognitive process attributes. The variances explained by the grain level were similar to each other. However, the attributes did not predict the item difficulties of two assessment booklets equally.

本研究利用线性逻辑检验模型(Fischer, 1973)探索了项目属性矩阵的优化,最优模型解释了由于识别项目属性而导致的项目难度的更大差异。数据为两本timss2007小册子的八年级数学测验题目回答。本研究在大、小两个粒度层次上考察了三大类属性(内容、认知过程和综合认知过程),并将结果与随机属性矩阵进行了比较。建议的属性占了两个评估手册中项目难度差异的大部分(81%和65%)。由内容属性解释的方差非常小(13% ~ 31%),小于由综合认知过程属性解释的方差,后者比内容和认知过程属性解释的方差大得多。用粒度水平解释的方差是相似的。然而,这些属性并不能平均地预测两种评估手册的题目难度。
{"title":"Q-Matrix Optimization Based on the Linear Logistic Test Model.","authors":"Lin Ma,&nbsp;Kelly E Green","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study explored optimization of item-attribute matrices with the linear logistic test model (Fischer, 1973), with optimal models explaining more variance in item difficulty due to identified item attributes. Data were 8th-grade mathematics test item responses of two TIMSS 2007 booklets. The study investigated three categories of attributes (content, cognitive process, and comprehensive cognitive process) at two grain levels (larger, smaller) and also compared results with random attribute matrices. The proposed attributes accounted for most of the variance in item difficulty for two assessment booklets (81% and 65%). The variance explained by the content attributes was very small (13% to 31%), less than variance explained by the comprehensive cognitive process attributes which explained much more variance than the content and cognitive process attributes. The variances explained by the grain level were similar to each other. However, the attributes did not predict the item difficulties of two assessment booklets equally.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 3","pages":"247-267"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35948992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What You Don't Know Can Hurt You: Missing Data and Partial Credit Model Estimates. 你不知道的东西会伤害你:缺失数据和部分信贷模型估算。
Pub Date : 2016-01-01
Sarah L Thomas, Karen M Schmidt, Monica K Erbacher, Cindy S Bergeman

The authors investigated the effect of missing completely at random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates.

作者在一项关于积极情感的纵向研究中调查了完全随机缺失(MCAR)项目回答对部分信用模型(PCM)参数估计的影响。参与者是圣母大学健康与幸福研究(Notre Dame Study of Health and Well-Being,Bergeman 和 Deboeck,2014 年)老年队列中的 307 名成年人,他们在 56 天内完成了包括积极情感项目在内的问卷调查。除了现有的缺失数据外,我们还在数据中引入了额外的缺失应答,随机替换了每个项目和每天 20%、50% 和 70% 的应答缺失值。结果表明,随着诱导缺失数据退化程度的增加,项目位置和个人特质水平的测量结果与原始估计值出现了偏差。此外,这些估计值的标准误差也随着退化程度的增加而增加。因此,MCAR 数据确实会损害 PCM 估计值的质量和精度。
{"title":"What You Don't Know Can Hurt You: Missing Data and Partial Credit Model Estimates.","authors":"Sarah L Thomas, Karen M Schmidt, Monica K Erbacher, Cindy S Bergeman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The authors investigated the effect of missing completely at random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates. </p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"17 1","pages":"14-34"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141238882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of applied measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1