Confidence marking is increasingly used in multiple choice testing situations, but when the Rasch measurement model is applied to the data, only the binary data is used, discarding the information given by the confidence marking. This study shows how Wilson's ordered partition model (OPM), a member of the Rasch family of models, can be used to model the confidence information. The result is a model which is in strict relation to the binary Rasch model, since the Rasch ICC's are "split" into a set of curves each representing a confidence level. The new model provides a set of item parameters that map the probability of being in each confidence level in relation to the test-taker's ability. The study provides a powerful diagnostic tool to assess item difficulty, overconfidence or misuse of confidence levels but also the fact that a question is particularly tricky or creates a lot of doubt.
{"title":"Ordered Partition Model for Confidence Marking Modeling.","authors":"Oliver Prosperi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Confidence marking is increasingly used in multiple choice testing situations, but when the Rasch measurement model is applied to the data, only the binary data is used, discarding the information given by the confidence marking. This study shows how Wilson's ordered partition model (OPM), a member of the Rasch family of models, can be used to model the confidence information. The result is a model which is in strict relation to the binary Rasch model, since the Rasch ICC's are \"split\" into a set of curves each representing a confidence level. The new model provides a set of item parameters that map the probability of being in each confidence level in relation to the test-taker's ability. The study provides a powerful diagnostic tool to assess item difficulty, overconfidence or misuse of confidence levels but also the fact that a question is particularly tricky or creates a lot of doubt.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 3","pages":"319-359"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35948999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study describes an approach for examining model-data fit for the dichotomous Rasch model using Tukey-Hann item response functions (TH-IRFs). The procedure proposed in this paper is based on an iterative version of a smoothing technique proposed by Tukey (1977) for estimating nonparametric item response functions (IRFs). A root integrated squared error (RISE) statistic (Douglas and Cohen, 2001) is used to compare the TH-IRFs to the Rasch IRFs. Data from undergraduate students at a large university are used to demonstrate this iterative smoothing technique. The RISE statistic is used for comparing the item response functions to assess model-data fit. A comparison between the residual based Infit and Outfit statistics and RISE statistics are also examined. The results suggest that the RISE statistic and TH-IRFs provide a useful analytical and graphical approach for evaluating item fit. Implications for research, theory and practice related to model-data fit are discussed.
本研究描述了一种使用Tukey-Hann项目响应函数(th - irf)检查二分类Rasch模型模型数据拟合的方法。本文提出的程序是基于Tukey(1977)提出的用于估计非参数项目响应函数(irf)的平滑技术的迭代版本。根积分平方误差(RISE)统计(Douglas and Cohen, 2001)用于比较TH-IRFs和Rasch IRFs。本文使用了一所大型大学本科生的数据来演示这种迭代平滑技术。RISE统计量用于比较项目响应函数以评估模型-数据拟合。基于残差的Infit和Outfit统计与RISE统计之间的比较也进行了检查。结果表明,RISE统计和th - irf为评价项目拟合提供了一种有用的分析和图解方法。讨论了模型-数据拟合对研究、理论和实践的影响。
{"title":"Evaluating Model-Data Fit by Comparing Parametric and Nonparametric Item Response Functions: Application of a Tukey-Hann Procedure.","authors":"Jeremy Kyle Jennings, George Engelhard","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study describes an approach for examining model-data fit for the dichotomous Rasch model using Tukey-Hann item response functions (TH-IRFs). The procedure proposed in this paper is based on an iterative version of a smoothing technique proposed by Tukey (1977) for estimating nonparametric item response functions (IRFs). A root integrated squared error (RISE) statistic (Douglas and Cohen, 2001) is used to compare the TH-IRFs to the Rasch IRFs. Data from undergraduate students at a large university are used to demonstrate this iterative smoothing technique. The RISE statistic is used for comparing the item response functions to assess model-data fit. A comparison between the residual based Infit and Outfit statistics and RISE statistics are also examined. The results suggest that the RISE statistic and TH-IRFs provide a useful analytical and graphical approach for evaluating item fit. Implications for research, theory and practice related to model-data fit are discussed.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 1","pages":"54-66"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34950746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan D Bostic, Toni A Sondergeld, Timothy Folger, Lance Kruse
New mathematics standards were adopted broadly across the United States of America between 2011-2013. Problem solving is a central facet of these new standards. Given new standards and the prominence of mathematical problem solving, there is a need for valid and reliable assessments that measure students' abilities related to those standards. Moreover, Rasch measurement techniques support psychometric analyses during validation studies, effectively measuring students' and items' properties in ways not afforded by true score theory. This manuscript builds upon past research (see Bostic and Sondergeld, 2015a, 2015b) with a validity study of two related problem-solving measures for grades seven and eight. Results from this validation study indicated that the problem-solving measures for grades seven and eight had sufficient evidence for their use.
{"title":"PSM7 and PSM8: Validating Two Problem-solving Measures.","authors":"Jonathan D Bostic, Toni A Sondergeld, Timothy Folger, Lance Kruse","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>New mathematics standards were adopted broadly across the United States of America between 2011-2013. Problem solving is a central facet of these new standards. Given new standards and the prominence of mathematical problem solving, there is a need for valid and reliable assessments that measure students' abilities related to those standards. Moreover, Rasch measurement techniques support psychometric analyses during validation studies, effectively measuring students' and items' properties in ways not afforded by true score theory. This manuscript builds upon past research (see Bostic and Sondergeld, 2015a, 2015b) with a validity study of two related problem-solving measures for grades seven and eight. Results from this validation study indicated that the problem-solving measures for grades seven and eight had sufficient evidence for their use.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 2","pages":"151-162"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35454485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Resilience is the personality trait of having positive dispositions which enable individuals to cope with stressful situations. Hence, a reliable resilience scale can provide useful information on understanding and treating individuals suffering from stress and trauma. The 10-item Connor-Davidson Resiliance Scale (CD-RISC-10) is a candidate scale. However, very little psychometric research has been conducted on this scale and, moreover, psychometric analyses to date have not been conclusive. To attain further evidence of the scale's psychometric properties, we tested the CD-RISC-10 on 288 adult Education major students at an Australian University using both traditional (factor analyses) and modern (Rasch) measurement approaches. Factor analyses indicated good psychometric functioning of the scale. However, Rasch modelling revealed evidence of item misfit and multiple dimensions. Optimal performance was achieved after the removal of two misfitting items indicating a well-functioning 8-item scale.
{"title":"Psychometric Validation of the 10-item Connor-Davidson Resilience Scale.","authors":"John Ehrich, Angela Mornane, Tim Powern","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Resilience is the personality trait of having positive dispositions which enable individuals to cope with stressful situations. Hence, a reliable resilience scale can provide useful information on understanding and treating individuals suffering from stress and trauma. The 10-item Connor-Davidson Resiliance Scale (CD-RISC-10) is a candidate scale. However, very little psychometric research has been conducted on this scale and, moreover, psychometric analyses to date have not been conclusive. To attain further evidence of the scale's psychometric properties, we tested the CD-RISC-10 on 288 adult Education major students at an Australian University using both traditional (factor analyses) and modern (Rasch) measurement approaches. Factor analyses indicated good psychometric functioning of the scale. However, Rasch modelling revealed evidence of item misfit and multiple dimensions. Optimal performance was achieved after the removal of two misfitting items indicating a well-functioning 8-item scale.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 2","pages":"122-136"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35454043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scale anchoring is a method to provide additional meaning to particular scores at different points along a score scale by identifying representative items associated with the particular scores. These items are then analyzed to write statements of what types of performance can be expected of a person with the particular scores to help test takers and other stakeholders better understand what it means to achieve the different scores. This article provides simple formulas that can be used to identify possible items to serve as scale anchors with the Rasch model. Specific attention is given to practical considerations and challenges that may be encountered when applying the formulas in different contexts. An illustrative example using data from a medical imaging certification program demonstrates how the formulas can be applied in practice.
{"title":"Scale Anchoring with the Rasch Model.","authors":"Adam E Wyse","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Scale anchoring is a method to provide additional meaning to particular scores at different points along a score scale by identifying representative items associated with the particular scores. These items are then analyzed to write statements of what types of performance can be expected of a person with the particular scores to help test takers and other stakeholders better understand what it means to achieve the different scores. This article provides simple formulas that can be used to identify possible items to serve as scale anchors with the Rasch model. Specific attention is given to practical considerations and challenges that may be encountered when applying the formulas in different contexts. An illustrative example using data from a medical imaging certification program demonstrates how the formulas can be applied in practice.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 1","pages":"43-53"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34950745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of this research was to estimate the reliability of the scores produced from and validity of the inferences drawn from the revised 90-item Teachers' Emotion Questionnaire consisting of three measures: frequency of emotional expressivity, self-efficacy for regulation of emotional expressivity when teaching, and self-efficacy for regulation of context-specific emotional expressivity. A void exists in an instrument assessing teachers' regulation and communication of their emotions. One-hundred seventeen practicing teachers participated in this study at Time 1 and 46 at Time 2. Rasch rating scale analyses indicated sufficient item and person separation and reliability and some support for the construct validity of the inferences drawn from the measures. Test re-test reliability for the person estimates was supported for all three measures over a four-week period: r=.592, p<.001, r=.473, p<.01, and r=.641, p<.001, respectively. Concurrent validity for the self-efficacy for regulation of emotional expressivity when teaching measure with the re-appraisal and suppression sub-scales on the Emotional Regulation Questionnaire (Gross and John, 2003) was supported at Time 1. Modifications to rating scales and future directions for assessing teachers' emotions based on these results are discussed.
{"title":"Rasch Derived Teachers' Emotions Questionnaire.","authors":"Kristin L K Koskey, Renee R Mudrey, Wondimu Ahmed","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The purpose of this research was to estimate the reliability of the scores produced from and validity of the inferences drawn from the revised 90-item Teachers' Emotion Questionnaire consisting of three measures: frequency of emotional expressivity, self-efficacy for regulation of emotional expressivity when teaching, and self-efficacy for regulation of context-specific emotional expressivity. A void exists in an instrument assessing teachers' regulation and communication of their emotions. One-hundred seventeen practicing teachers participated in this study at Time 1 and 46 at Time 2. Rasch rating scale analyses indicated sufficient item and person separation and reliability and some support for the construct validity of the inferences drawn from the measures. Test re-test reliability for the person estimates was supported for all three measures over a four-week period: r=.592, p<.001, r=.473, p<.01, and r=.641, p<.001, respectively. Concurrent validity for the self-efficacy for regulation of emotional expressivity when teaching measure with the re-appraisal and suppression sub-scales on the Emotional Regulation Questionnaire (Gross and John, 2003) was supported at Time 1. Modifications to rating scales and future directions for assessing teachers' emotions based on these results are discussed.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 1","pages":"67-86"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34950748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Students may experience considerable fear and stress in school settings, and based on Dweck's (2006) notion of "mindset" we hypothesized that fear introduces qualitative changes in students' self-concepts. Hypotheses were tested on 3847 third-grade students from nine Iberoamerican countries (Bolivia, Chile, Colombia, Cuba, Ecuador, Panama, Peru, Spain, and Venezuela), who completed Murillo's (2007) adaptation of Marsh' (1988) SDQ-I. Rasch scaling indicated that the information-content of High-Fear students' ratings was more localized across the latent dimension than was that of Low-Fear students, and their ratings also showed less cognitive variety. The resulting measurement distortions could be captured via logistic regression over the ratings' residuals. Also, using training and validation samples (with respectively 60 and 40% of all cases), it proved possible to predict students' fear levels and their gender. We see the present findings as a first step towards implementing an online warning and detection system for signs of bullying among students.
{"title":"I'm scared to go to School! Capturing the Effects of Chronic Daily Fears on Students' Concept of Self.","authors":"Rense Lange, Cynthia Martinez-Garrido, Alexandre Ventura","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Students may experience considerable fear and stress in school settings, and based on Dweck's (2006) notion of \"mindset\" we hypothesized that fear introduces qualitative changes in students' self-concepts. Hypotheses were tested on 3847 third-grade students from nine Iberoamerican countries (Bolivia, Chile, Colombia, Cuba, Ecuador, Panama, Peru, Spain, and Venezuela), who completed Murillo's (2007) adaptation of Marsh' (1988) SDQ-I. Rasch scaling indicated that the information-content of High-Fear students' ratings was more localized across the latent dimension than was that of Low-Fear students, and their ratings also showed less cognitive variety. The resulting measurement distortions could be captured via logistic regression over the ratings' residuals. Also, using training and validation samples (with respectively 60 and 40% of all cases), it proved possible to predict students' fear levels and their gender. We see the present findings as a first step towards implementing an online warning and detection system for signs of bullying among students.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 4","pages":"420-433"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35665983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Residual-based fit statistics are commonly used as an indication of the extent to which the item response data fit the Rash model. Fit statistic estimates are influenced by sample size and rules-of thumb estimates may result in incorrect conclusions about the extent to which the model fits the data. Estimates obtained in this analysis were compared to 250 simulated data sets to examine the stability of the estimates. All INFIT estimates were within the rule-of-thumb range of 0.7 to 1.3. However, only 82% of the INFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's INFIT distributions using this 95% confidence-like interval. This is a 18 percentage point difference in items that were classified as acceptable. Fourty-eight percent of OUTFIT estimates fell within the 0.7 to 1.3 rule- of-thumb range. Whereas 34% of OUTFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's OUTFIT distributions. This is a 13 percentage point difference in items that were classified as acceptable. When using the rule-of- thumb ranges for fit estimates the magnitude of misfit was smaller than with the 95% confidence interval of the simulated distribution. The findings indicate that the use of confidence intervals as critical values for fit statistics leads to different model data fit conclusions than traditional rule of thumb critical values.
{"title":"Stability of INFIT and OUTFIT Compared to Simulated Estimates in Applied Setting.","authors":"Kari J Hodge, Grant B Morgan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Residual-based fit statistics are commonly used as an indication of the extent to which the item response data fit the Rash model. Fit statistic estimates are influenced by sample size and rules-of thumb estimates may result in incorrect conclusions about the extent to which the model fits the data. Estimates obtained in this analysis were compared to 250 simulated data sets to examine the stability of the estimates. All INFIT estimates were within the rule-of-thumb range of 0.7 to 1.3. However, only 82% of the INFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's INFIT distributions using this 95% confidence-like interval. This is a 18 percentage point difference in items that were classified as acceptable. Fourty-eight percent of OUTFIT estimates fell within the 0.7 to 1.3 rule- of-thumb range. Whereas 34% of OUTFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's OUTFIT distributions. This is a 13 percentage point difference in items that were classified as acceptable. When using the rule-of- thumb ranges for fit estimates the magnitude of misfit was smaller than with the 95% confidence interval of the simulated distribution. The findings indicate that the use of confidence intervals as critical values for fit statistics leads to different model data fit conclusions than traditional rule of thumb critical values.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 4","pages":"383-392"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35666121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study explored optimization of item-attribute matrices with the linear logistic test model (Fischer, 1973), with optimal models explaining more variance in item difficulty due to identified item attributes. Data were 8th-grade mathematics test item responses of two TIMSS 2007 booklets. The study investigated three categories of attributes (content, cognitive process, and comprehensive cognitive process) at two grain levels (larger, smaller) and also compared results with random attribute matrices. The proposed attributes accounted for most of the variance in item difficulty for two assessment booklets (81% and 65%). The variance explained by the content attributes was very small (13% to 31%), less than variance explained by the comprehensive cognitive process attributes which explained much more variance than the content and cognitive process attributes. The variances explained by the grain level were similar to each other. However, the attributes did not predict the item difficulties of two assessment booklets equally.
{"title":"Q-Matrix Optimization Based on the Linear Logistic Test Model.","authors":"Lin Ma, Kelly E Green","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study explored optimization of item-attribute matrices with the linear logistic test model (Fischer, 1973), with optimal models explaining more variance in item difficulty due to identified item attributes. Data were 8th-grade mathematics test item responses of two TIMSS 2007 booklets. The study investigated three categories of attributes (content, cognitive process, and comprehensive cognitive process) at two grain levels (larger, smaller) and also compared results with random attribute matrices. The proposed attributes accounted for most of the variance in item difficulty for two assessment booklets (81% and 65%). The variance explained by the content attributes was very small (13% to 31%), less than variance explained by the comprehensive cognitive process attributes which explained much more variance than the content and cognitive process attributes. The variances explained by the grain level were similar to each other. However, the attributes did not predict the item difficulties of two assessment booklets equally.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"18 3","pages":"247-267"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35948992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah L Thomas, Karen M Schmidt, Monica K Erbacher, Cindy S Bergeman
The authors investigated the effect of missing completely at random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates.
作者在一项关于积极情感的纵向研究中调查了完全随机缺失(MCAR)项目回答对部分信用模型(PCM)参数估计的影响。参与者是圣母大学健康与幸福研究(Notre Dame Study of Health and Well-Being,Bergeman 和 Deboeck,2014 年)老年队列中的 307 名成年人,他们在 56 天内完成了包括积极情感项目在内的问卷调查。除了现有的缺失数据外,我们还在数据中引入了额外的缺失应答,随机替换了每个项目和每天 20%、50% 和 70% 的应答缺失值。结果表明,随着诱导缺失数据退化程度的增加,项目位置和个人特质水平的测量结果与原始估计值出现了偏差。此外,这些估计值的标准误差也随着退化程度的增加而增加。因此,MCAR 数据确实会损害 PCM 估计值的质量和精度。
{"title":"What You Don't Know Can Hurt You: Missing Data and Partial Credit Model Estimates.","authors":"Sarah L Thomas, Karen M Schmidt, Monica K Erbacher, Cindy S Bergeman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The authors investigated the effect of missing completely at random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates. </p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"17 1","pages":"14-34"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141238882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}