首页 > 最新文献

Journal of applied measurement最新文献

英文 中文
A Rasch Model Analysis of the Emotion Regulation Questionnaire. 情绪调节问卷的Rasch模型分析。
Pub Date : 2018-01-01
Michael J Ireland, Hong Eng Goh, Ida Marais

The 10-item Emotion Regulation Questionnaire (ERQ) was developed to measure individual differences in the tendency to use two common emotion regulation strategies: cognitive reappraisal and suppression. The current study examined the psychometric properties of the ERQ in a heterogeneous mixed sample of 713 (64.9% female) community residents using the polytomous Rasch model. The results showed that the 10-item ERQ was multidimensional and supported the two distinct factors. The reappraisal and suppression subscales were both found to be unidimensional and fit the Rasch model. No evidence of local dependence was observed. The five response categories also functioned as intended. Differential item functioning (DIF) was assessed across sub-samples defined by gender, self-report experiencing symptoms of mental illness, regular meditation practice, and age groupings. No evidence emerged of items functioning differently across any of these groups. Using Rasch measure scores, a number of meaningful group differences in person location emerged. Less use of reappraisal was reported by younger adults, non-meditators, and those reporting experiencing symptoms of mental illness. Non-meditators also reported greater use of suppression compared with regular meditators; no other age group, gender, or symptomatic group differences emerged on suppression.

本研究编制了10项情绪调节问卷(ERQ),以衡量个体在使用认知重评和抑制两种常见情绪调节策略的倾向上的差异。本研究采用多分体Rasch模型,对713名社区居民(女性占64.9%)的异质性混合样本进行了ERQ的心理测量。结果表明,10项ERQ是多维的,并支持这两个因素的显著性。重评和抑制分量表均为单维,符合Rasch模型。没有观察到局部依赖性的证据。五个回应类别也发挥了预期的作用。差异项目功能(DIF)在由性别、自我报告经历精神疾病症状、定期冥想练习和年龄组定义的子样本中进行评估。没有证据表明这些物品在任何一组中都有不同的功能。使用Rasch测量分数,在人的位置上出现了一些有意义的群体差异。年轻人、非冥想者和报告有精神疾病症状的人较少使用重新评估。与定期冥想者相比,非冥想者也报告了更多的压抑;在抑制方面没有其他年龄组、性别或症状组的差异。
{"title":"A Rasch Model Analysis of the Emotion Regulation Questionnaire.","authors":"Michael J Ireland,&nbsp;Hong Eng Goh,&nbsp;Ida Marais","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The 10-item Emotion Regulation Questionnaire (ERQ) was developed to measure individual differences in the tendency to use two common emotion regulation strategies: cognitive reappraisal and suppression. The current study examined the psychometric properties of the ERQ in a heterogeneous mixed sample of 713 (64.9% female) community residents using the polytomous Rasch model. The results showed that the 10-item ERQ was multidimensional and supported the two distinct factors. The reappraisal and suppression subscales were both found to be unidimensional and fit the Rasch model. No evidence of local dependence was observed. The five response categories also functioned as intended. Differential item functioning (DIF) was assessed across sub-samples defined by gender, self-report experiencing symptoms of mental illness, regular meditation practice, and age groupings. No evidence emerged of items functioning differently across any of these groups. Using Rasch measure scores, a number of meaningful group differences in person location emerged. Less use of reappraisal was reported by younger adults, non-meditators, and those reporting experiencing symptoms of mental illness. Non-meditators also reported greater use of suppression compared with regular meditators; no other age group, gender, or symptomatic group differences emerged on suppression.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 3","pages":"258-270"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36451691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equating Errors and Scale Drift in Linked-Chain IRT Equating with Mixed-Format Tests. 链链IRT混合格式等价测试中的等价误差和尺度漂移。
Pub Date : 2018-01-01
Bo Hu

In linked-chain equating, equating errors may accumulate and cause scale drift. This simulation study extends the investigation on scale drift in linked-chain equating to mixed-format test. Specifically, the impact of equating method and the characteristics of anchor test and equating chain on equating errors and scale drift in IRT true score equating is examined. To evaluate equating results, a new method is used to derive true linking coefficients. The results indicate that the characteristic curve methods produce more accurate and reliable equating results than the moment methods. Although using more anchor items or an anchor test configuration with more IRT parameters can lower the variability of equating results, neither of them help control equating bias. Additionally, scale drift increases when an equating chain runs longer or poorly calibrated test forms are added to the chain. The role of calibration precision in evaluating equating results is highlighted.

在链式方程中,方程误差会累积并引起尺度漂移。该模拟研究将链链尺度漂移的研究扩展为混合格式试验。具体而言,考察了IRT真分等式中等式方法、锚点检验和等式链的特性对等式误差和尺度漂移的影响。为了对方程结果进行评价,采用了一种新的方法来推导真正的连接系数。结果表明,与矩量法相比,特征曲线法的等效结果更加准确可靠。虽然使用更多的锚点项目或锚点测试配置有更多的IRT参数可以降低相等结果的可变性,但它们都无助于控制相等偏差。此外,当平衡链运行较长或校准不良的测试表格添加到链中时,刻度漂移会增加。强调了标定精度在评价标定结果中的作用。
{"title":"Equating Errors and Scale Drift in Linked-Chain IRT Equating with Mixed-Format Tests.","authors":"Bo Hu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In linked-chain equating, equating errors may accumulate and cause scale drift. This simulation study extends the investigation on scale drift in linked-chain equating to mixed-format test. Specifically, the impact of equating method and the characteristics of anchor test and equating chain on equating errors and scale drift in IRT true score equating is examined. To evaluate equating results, a new method is used to derive true linking coefficients. The results indicate that the characteristic curve methods produce more accurate and reliable equating results than the moment methods. Although using more anchor items or an anchor test configuration with more IRT parameters can lower the variability of equating results, neither of them help control equating bias. Additionally, scale drift increases when an equating chain runs longer or poorly calibrated test forms are added to the chain. The role of calibration precision in evaluating equating results is highlighted.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 1","pages":"41-58"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35932759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validation of Response Similarity Analysis for the Detection of Academic Cheating: An Experimental Study. 响应相似度分析在学术作弊检测中的验证:一项实验研究。
Pub Date : 2018-01-01
Georgios D Sideridis, Cengiz Zopluoglu

The purpose of the present study was to evaluate various analytical means to detect academic cheating in an experimental setting. The omega index was compared and contrasted given a gold criterion of academic cheating which entailed a discrepant score between two administrations using an experimental study with real test takers. Participants were 164 elementary school students who were administered a mathematics exam followed by an equivalent mock exam under conditions of strict and relaxed, invigilation, respectively. Discrepant scores were defined as exceeding 7 responses in any direction (correct or incorrect), based on what was expected due to chance. Results indicated that the omega index was successful in capturing more than 39% of the cases who exceeded the conventional plus or minus 7 discrepancy criteria. It is concluded that the response similarity analysis may be an important tool in detecting academic cheating.

本研究的目的是在实验环境中评估各种检测学术作弊的分析方法。通过对真实考生进行实验研究,将omega指数与学术作弊的黄金标准进行比较和对比,该标准涉及两个管理部门之间的差异分数。参与者是164名小学生,他们分别在严格和宽松的监考条件下进行数学考试和模拟考试。差异分数被定义为在任何方向(正确或不正确)超过7个答案,基于偶然的预期。结果表明,omega指数成功捕获超过39%的超过常规正负7差异标准的病例。结果表明,响应相似度分析可能是检测学术作弊的重要工具。
{"title":"Validation of Response Similarity Analysis for the Detection of Academic Cheating: An Experimental Study.","authors":"Georgios D Sideridis,&nbsp;Cengiz Zopluoglu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The purpose of the present study was to evaluate various analytical means to detect academic cheating in an experimental setting. The omega index was compared and contrasted given a gold criterion of academic cheating which entailed a discrepant score between two administrations using an experimental study with real test takers. Participants were 164 elementary school students who were administered a mathematics exam followed by an equivalent mock exam under conditions of strict and relaxed, invigilation, respectively. Discrepant scores were defined as exceeding 7 responses in any direction (correct or incorrect), based on what was expected due to chance. Results indicated that the omega index was successful in capturing more than 39% of the cases who exceeded the conventional plus or minus 7 discrepancy criteria. It is concluded that the response similarity analysis may be an important tool in detecting academic cheating.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 1","pages":"59-75"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35932760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Person-Level Analysis of the Effect of Cognitive Loading by Question Difficulty and Question Time Intensity on Didactic Examination Fluency (Speed-Accuracy Tradeoff). 题目难度和题目时间强度的认知负荷对教学考试流畅性影响的个人水平分析(速度-准确性权衡)。
Pub Date : 2018-01-01
James J Thompson

Fluency may be considered as a conjoint measure of work product quality and speed. It is especially useful in educational and medical settings to evaluate expertise and/or competence. In this paper, didactic exams were used to model fluency. Binned propensity matching with question difficulty and time intensity was used to define a 'load' variable and construct fluency (sum correct/ elapsed response time). Response surfaces as speed-accuracy tradeoffs resulted from the analysis. Person by load fluency matrices behaved well in Rasch analysis and warranted the definition of a person fluency variable ('skill'). A path model with skill and load as mediators substantially described the fluency data. The indirect paths through skill and load dominated direct variable effects. This is supportive evidence that skill and load have stand-alone merit. Therefore, it appears that the constructs of skill, load, and fluency could provide psychometrically defensible descriptors when utilized in appropriate contexts.

流畅性可以被认为是工作产品质量和速度的联合度量。在教育和医疗环境中,评估专业知识和/或能力特别有用。在本文中,教学测试被用来模拟流畅性。用问题难度和时间强度的分类倾向匹配来定义“负载”变量和构建流畅性(正确/经过的响应时间总和)。响应面是分析得出的速度与精度的权衡。人按负载的流畅性矩阵在Rasch分析中表现良好,并保证了人流畅性变量(“技能”)的定义。以技能和负荷为中介的路径模型实质上描述了流畅性数据。通过技能和负荷的间接路径主导了直接变量效应。这是支持性证据,表明技能和负荷有各自的价值。因此,在适当的语境下,技能、负荷和流畅性的构念似乎可以提供心理测量学上站得住脚的描述。
{"title":"Person-Level Analysis of the Effect of Cognitive Loading by Question Difficulty and Question Time Intensity on Didactic Examination Fluency (Speed-Accuracy Tradeoff).","authors":"James J Thompson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Fluency may be considered as a conjoint measure of work product quality and speed. It is especially useful in educational and medical settings to evaluate expertise and/or competence. In this paper, didactic exams were used to model fluency. Binned propensity matching with question difficulty and time intensity was used to define a 'load' variable and construct fluency (sum correct/ elapsed response time). Response surfaces as speed-accuracy tradeoffs resulted from the analysis. Person by load fluency matrices behaved well in Rasch analysis and warranted the definition of a person fluency variable ('skill'). A path model with skill and load as mediators substantially described the fluency data. The indirect paths through skill and load dominated direct variable effects. This is supportive evidence that skill and load have stand-alone merit. Therefore, it appears that the constructs of skill, load, and fluency could provide psychometrically defensible descriptors when utilized in appropriate contexts.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 3","pages":"229-242"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36451136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing and Validating a Scientific Multi-Text Reading Comprehension Assessment: In the Text Case of the Dispute of whether to Continue the Fourth Nuclear Power Plant Construction in Taiwan. 建立与验证科学的多文本阅读理解评估:以台湾第四核电站是否继续建设之争为文本案例。
Pub Date : 2018-01-01
Lin Hsiao-Hui, Yuh-Tsuen Tzeng

This study aimed to advance the Scientific Multi-Text Reading Comprehension Assessment (SMTRCA) by developing a rubric which consisted of 4 subscales: information retrieval, information generalization, information interpretation, and information integration. The assessment tool included 11 close-ended and 8 open-ended items and its rubric. Two texts describing opposing views of the dispute of whether to continue the Fourth Nuclear Power Plant construction in Taiwan were developed and 1535 grade 5-9 students read these two texts in a counterbalanced order and answered the test items. First, the results showed that the Cronbach's values were more than .9, indicating very good intra-rater consistency. The Kendall coefficient of concordance of the inter-rater reliability was larger than .8, denoting a consistent scoring pattern between raters. Second, the analysis of many-facet Rasch measurement showed that there were significant difference in rater severity, and both severe and lenient raters could distinguish high versus low-ability students effectively. The comparison of the rating scale model and the partial credit model indicated that each rater had a unique rating scale structure, meaning that the rating procedures involve human interpretation and evaluation during the scoring processes so that it is difficult to reach a machine-like consistency level. However, this is in line with expectations of typical human judgment processes. Third, the Cronbach's coefficient of the full assessment were above .85, denoting that the SMTRCA has high internal-consistency. Finally, confirmatory factory analysis showed that there was an acceptable goodness-of-fit among the SMTRCA. These results suggest that the SMTRCA was a useful tool for measuring multi-text reading comprehension abilities.

摘要本研究以科学多文本阅读理解评估(SMTRCA)为研究对象,设计了包含信息检索、信息概括、信息解释和信息整合4个分量表的评分标准。评估工具包括11个封闭式项目和8个开放式项目及其标题。本研究以1535名5-9年级学生为对象,编写了两篇文章,描述台湾是否继续建设第四核电站的争议的对立观点,并以平衡的顺序阅读这两篇文章,并回答测试项目。首先,结果显示Cronbach’s值大于0.9,表明非常好的内部一致性。信度的肯德尔一致性系数大于0.8,表明评分者之间的评分模式一致。第二,多面Rasch测量分析显示,评分者在评分严重程度上存在显著差异,严厉评分者和宽松评分者都能有效区分高、低能力学生。评分表模型与部分信用模型的比较表明,每个评分员都有独特的评分表结构,这意味着评分过程中涉及到人工的解释和评价,很难达到类似机器的一致性水平。然而,这符合典型人类判断过程的预期。第三,完整评估的Cronbach’s系数均在0.85以上,表明SMTRCA具有较高的内部一致性。最后,验证性工厂分析显示SMTRCA之间存在可接受的拟合优度。这些结果表明SMTRCA是测量多文本阅读理解能力的有用工具。
{"title":"Developing and Validating a Scientific Multi-Text Reading Comprehension Assessment: In the Text Case of the Dispute of whether to Continue the Fourth Nuclear Power Plant Construction in Taiwan.","authors":"Lin Hsiao-Hui,&nbsp;Yuh-Tsuen Tzeng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study aimed to advance the Scientific Multi-Text Reading Comprehension Assessment (SMTRCA) by developing a rubric which consisted of 4 subscales: information retrieval, information generalization, information interpretation, and information integration. The assessment tool included 11 close-ended and 8 open-ended items and its rubric. Two texts describing opposing views of the dispute of whether to continue the Fourth Nuclear Power Plant construction in Taiwan were developed and 1535 grade 5-9 students read these two texts in a counterbalanced order and answered the test items. First, the results showed that the Cronbach's values were more than .9, indicating very good intra-rater consistency. The Kendall coefficient of concordance of the inter-rater reliability was larger than .8, denoting a consistent scoring pattern between raters. Second, the analysis of many-facet Rasch measurement showed that there were significant difference in rater severity, and both severe and lenient raters could distinguish high versus low-ability students effectively. The comparison of the rating scale model and the partial credit model indicated that each rater had a unique rating scale structure, meaning that the rating procedures involve human interpretation and evaluation during the scoring processes so that it is difficult to reach a machine-like consistency level. However, this is in line with expectations of typical human judgment processes. Third, the Cronbach's coefficient of the full assessment were above .85, denoting that the SMTRCA has high internal-consistency. Finally, confirmatory factory analysis showed that there was an acceptable goodness-of-fit among the SMTRCA. These results suggest that the SMTRCA was a useful tool for measuring multi-text reading comprehension abilities.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 3","pages":"320-337"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36451142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Psychometric Properties and Differential Item Functioning of a Web-Based Assessment of Children's Social Perspective-Taking. 基于网络的儿童社会视角采取评估的心理测量特征和差异项目功能。
Pub Date : 2018-01-01
Beyza Aksu Dunya, Clark McKown, Everett V Smith

Social perspective-taking (SPT), which involves the ability infer others' intentions, is a consequential social cognitive process. The purpose of this study is to evaluate the psychometric properties of a web-based social perspective-taking (SELweb SPT) assessment designed for children in kindergarten through third grade. Data were collected from two separate samples of children. The first sample included 3224 children and the second sample included 4419 children. Data were calibrated using Rasch dichotomous model (Rasch, 1960). Differential item and test functioning were also evaluated across gender and ethnicity groups. Across both samples, we found: evidence of consistent item fit; unidimensional item structure; and adequate item targeting. Poor item targeting at high and low ability levels suggests that more items are needed to distinguish low and high ability respondents. Analyses of DIF found some significant item-level DIF across gender, but no DIF across ethnicity. The analyses of person measure calibrations with and without DIF items evidenced negligible differential test functioning (DTF) across gender and ethnicity groups in both samples.

社会换位思考(SPT)是一种社会性认知过程,涉及推断他人意图的能力。本研究的目的是评估基于网络的社会视角评估(SELweb SPT)对幼儿园至三年级儿童的心理测量特性。数据是从两个独立的儿童样本中收集的。第一个样本包括3224名儿童,第二个样本包括4419名儿童。数据使用Rasch二分类模型进行校准(Rasch, 1960)。差异项目和测试功能也被评估跨性别和种族群体。在这两个样本中,我们发现:一致的项目契合度的证据;单维项目结构;适当的项目定位。高、低能力水平的项目定位不佳表明需要更多的项目来区分低能力和高能力的被调查者。对DIF的分析发现,不同性别的DIF有显著的项目水平,但不同种族的DIF没有显著差异。对有和没有DIF项目的个人测量校准的分析证明,在两个样本中,性别和种族群体的差异测试功能(DTF)可以忽略不计。
{"title":"Psychometric Properties and Differential Item Functioning of a Web-Based Assessment of Children's Social Perspective-Taking.","authors":"Beyza Aksu Dunya,&nbsp;Clark McKown,&nbsp;Everett V Smith","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Social perspective-taking (SPT), which involves the ability infer others' intentions, is a consequential social cognitive process. The purpose of this study is to evaluate the psychometric properties of a web-based social perspective-taking (SELweb SPT) assessment designed for children in kindergarten through third grade. Data were collected from two separate samples of children. The first sample included 3224 children and the second sample included 4419 children. Data were calibrated using Rasch dichotomous model (Rasch, 1960). Differential item and test functioning were also evaluated across gender and ethnicity groups. Across both samples, we found: evidence of consistent item fit; unidimensional item structure; and adequate item targeting. Poor item targeting at high and low ability levels suggests that more items are needed to distinguish low and high ability respondents. Analyses of DIF found some significant item-level DIF across gender, but no DIF across ethnicity. The analyses of person measure calibrations with and without DIF items evidenced negligible differential test functioning (DTF) across gender and ethnicity groups in both samples.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 1","pages":"93-105"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35932762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and Calibration of Chemistry Items to Create an Item Bank, using the Rasch Measurement Model. 开发和校准化学项目,以创建一个题库,使用拉希测量模型。
Pub Date : 2018-01-01
Joseph N Njiru, Joseph T Romanoski

This article describes the development and calibration of items from the 1997 to 2006 Tertiary Entrance Exams (TEE) in Chemistry conducted by the Curriculum Council of Western Australia for the purposes of establishing a Chemistry item bank. Only items that met the strict Rasch measurement criterion of ordered thresholds were included. Item Residuals and Chi-square conformity of the items were likewise scrutinized. Further, specialist experts in chemistry were employed to ascertain the qualitative properties of the items, particularly the item wording, so as to provide accurate item descriptors. An item bank of 174 items was created. This item bank may now be accurately used by teachers in their classrooms for the purposes of developing class assessments in Chemistry and/or for classroom diagnostic purposes.

本文描述了西澳大利亚课程委员会为建立化学题库而进行的1997年至2006年化学高等教育入学考试(TEE)项目的开发和校准。只有符合有序阈值的严格Rasch测量标准的项目才被纳入。项目的残差和卡方一致性同样被仔细检查。此外,还聘请了化学方面的专家来确定项目的性质,特别是项目措辞,以便提供准确的项目说明。创建了一个包含174项的题库。这个题库现在可以被教师在课堂上准确地用于化学课堂评估和/或课堂诊断目的。
{"title":"Development and Calibration of Chemistry Items to Create an Item Bank, using the Rasch Measurement Model.","authors":"Joseph N Njiru,&nbsp;Joseph T Romanoski","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This article describes the development and calibration of items from the 1997 to 2006 Tertiary Entrance Exams (TEE) in Chemistry conducted by the Curriculum Council of Western Australia for the purposes of establishing a Chemistry item bank. Only items that met the strict Rasch measurement criterion of ordered thresholds were included. Item Residuals and Chi-square conformity of the items were likewise scrutinized. Further, specialist experts in chemistry were employed to ascertain the qualitative properties of the items, particularly the item wording, so as to provide accurate item descriptors. An item bank of 174 items was created. This item bank may now be accurately used by teachers in their classrooms for the purposes of developing class assessments in Chemistry and/or for classroom diagnostic purposes.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 2","pages":"192-200"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36215372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Missing Values and Single Imputation upon Rasch Analysis Outcomes: A Simulation Study. 缺失值和单一输入对Rasch分析结果的影响:模拟研究。
Pub Date : 2018-01-01
Carolina Saskia Fellinghauer, Birgit Prodinger, Alan Tennant

Imputation becomes common practice through availability of easy-to-use algorithms and software. This study aims to determine if different imputation strategies are robust to the extent and type of missingness, local item dependencies (LID), differential item functioning (DIF), and misfit when doing a Rasch analysis. Four samples were simulated and represented a sample with good metric properties, a sample with LID, a sample with DIF, and a sample with LID and DIF. Missing values were generated with increasing proportion and were either missing at random or completely at random. Four imputation techniques were applied before Rasch analysis and deviation of the results and the quality of fit compared. Imputation strategies showed good performance with less than 15% of missingness. The analysis with missing values performed best in recovering statistical estimates. The best strategy, when doing a Rasch analysis, is the analysis with missing values. If for some reason imputation is necessary, we recommend using the expectation-maximization algorithm.

通过易于使用的算法和软件的可用性,Imputation成为常见的做法。本研究旨在确定在进行Rasch分析时,不同的imputation策略是否在缺失、局部项目依赖(LID)、差异项目功能(DIF)和失配的程度和类型上具有鲁棒性。模拟了四个样本,并表示具有良好度量特性的样本,具有LID的样本,具有DIF的样本以及具有LID和DIF的样本。缺失值产生的比例越来越大,要么是随机缺失,要么是完全随机缺失。采用四种方法进行拉希分析,并对结果偏差和拟合质量进行比较。插补策略表现出良好的性能,缺失率低于15%。缺失值的分析在恢复统计估计方面表现最好。在进行Rasch分析时,最好的策略是对缺失值进行分析。如果由于某种原因需要输入,我们建议使用期望最大化算法。
{"title":"The Impact of Missing Values and Single Imputation upon Rasch Analysis Outcomes: A Simulation Study.","authors":"Carolina Saskia Fellinghauer,&nbsp;Birgit Prodinger,&nbsp;Alan Tennant","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Imputation becomes common practice through availability of easy-to-use algorithms and software. This study aims to determine if different imputation strategies are robust to the extent and type of missingness, local item dependencies (LID), differential item functioning (DIF), and misfit when doing a Rasch analysis. Four samples were simulated and represented a sample with good metric properties, a sample with LID, a sample with DIF, and a sample with LID and DIF. Missing values were generated with increasing proportion and were either missing at random or completely at random. Four imputation techniques were applied before Rasch analysis and deviation of the results and the quality of fit compared. Imputation strategies showed good performance with less than 15% of missingness. The analysis with missing values performed best in recovering statistical estimates. The best strategy, when doing a Rasch analysis, is the analysis with missing values. If for some reason imputation is necessary, we recommend using the expectation-maximization algorithm.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 1","pages":"1-25"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35932758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Levels of Discrimination on Vertical Equating in the Rasch Model. Rasch模型中歧视程度对垂直等值的影响。
Pub Date : 2018-01-01
Stephen N Humphrey

Aligning scales in vertical equating carries a number of challenges for practitioners in contexts such as large-scale testing. This paper examines the impact of high and low discrimination on the results of vertical equating when the Rasch model is applied. A simulation study is used to show that different levels of discrimination introduce systematic error into estimates. A second simulation study shows that for the purpose of vertical equating, items with high or low discrimination contain information about translation constants that contains systematic error. The impact of differential item discrimination on vertical equating is examined and subsequently illustrated in terms of a real data set from a large-scale testing program, with vertical links between grade 3 and 5 numeracy tests. Implications of the results for practitioners conducting vertical equating with the Rasch model are identified, including monitoring progress over time. Implications for other item response models are also discussed.

在垂直等值中对刻度进行对齐给从业者带来了许多挑战,例如大规模测试。本文研究了在应用Rasch模型时,高低区分对垂直方程结果的影响。仿真研究表明,不同程度的区分会引入系统误差的估计。第二个模拟研究表明,对于垂直相等的目的,具有高或低判别的项目包含包含系统误差的平移常数信息。差异项目区分对垂直等式的影响进行了检验,随后用大规模测试方案的真实数据集加以说明,并在3年级和5年级的算术测试之间建立了垂直联系。结果的含义为从业人员进行纵向等同于拉希模型被确定,包括监测进展随着时间的推移。对其他项目反应模型的影响也进行了讨论。
{"title":"The Impact of Levels of Discrimination on Vertical Equating in the Rasch Model.","authors":"Stephen N Humphrey","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Aligning scales in vertical equating carries a number of challenges for practitioners in contexts such as large-scale testing. This paper examines the impact of high and low discrimination on the results of vertical equating when the Rasch model is applied. A simulation study is used to show that different levels of discrimination introduce systematic error into estimates. A second simulation study shows that for the purpose of vertical equating, items with high or low discrimination contain information about translation constants that contains systematic error. The impact of differential item discrimination on vertical equating is examined and subsequently illustrated in terms of a real data set from a large-scale testing program, with vertical links between grade 3 and 5 numeracy tests. Implications of the results for practitioners conducting vertical equating with the Rasch model are identified, including monitoring progress over time. Implications for other item response models are also discussed.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 3","pages":"216-228"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36451686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Differential Item Functioning on the Warwick-Edinburgh Mental Well-Being Scale. 差异项目功能对华威-爱丁堡心理健康量表的影响。
Pub Date : 2018-01-01
Hong Eng Goh, Ida Marais, Michael Ireland

Establishing the internal validity of psychometric instruments is an important research priority, and is especially vital for instruments that are used to collect data to guide public policy decisions. The Warwick-Edinburgh Mental Well-Being Scale (WEMWBS) is a well-established and widely-used instrument for assessing individual differences in well-being. The current analyses were motivated by concerns that metal wellbeing items that refer to interpersonal relationships (Items 9 and 12) may operate differently for those in a relationship compared to those not in a relationship. To assess this, the present study used item characteristic curves (ICC) and ANOVA of residuals to scrutinize the differential item functioning (DIF) of the 14 WEMWBS items for participant relationship status (n with partner = 261, n without partner = 210). Items 5, 9, and 12 showed evidence of DIF which impacted group mean differences. Item 5 ("energy to spare") was unexpected, however plausible explanation is discussed. For participants at the same level of mental wellbeing, those in a relationship scored higher on items 9 and 12 than those not in a relationship. This suggests these items are sensitive to non-wellbeing related variance associated with relationship status. Implications and future research directions are discussed.

建立心理测量工具的内部效度是一个重要的研究重点,对于用于收集数据以指导公共政策决策的工具尤其重要。沃里克-爱丁堡心理幸福感量表(WEMWBS)是一种完善且广泛使用的评估个体幸福感差异的工具。当前分析的动机是担心涉及人际关系的金属幸福感项目(第9项和第12项)对于有关系的人和没有关系的人来说可能会有不同的运作方式。为了评估这一点,本研究使用项目特征曲线(ICC)和残差方差分析来仔细检查14个WEMWBS项目对参与者关系状态的差异项目功能(DIF) (n个有伴侣= 261,n个无伴侣= 210)。第5、9和12项显示DIF影响组平均差异的证据。项目5(“多余的能源”)是出乎意料的,尽管讨论了合理的解释。对于心理健康水平相同的参与者来说,那些有伴侣的人在第9项和第12项上的得分高于那些没有伴侣的人。这表明这些项目对与关系状态相关的非幸福相关方差很敏感。讨论了研究的意义和未来的研究方向。
{"title":"The Impact of Differential Item Functioning on the Warwick-Edinburgh Mental Well-Being Scale.","authors":"Hong Eng Goh,&nbsp;Ida Marais,&nbsp;Michael Ireland","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Establishing the internal validity of psychometric instruments is an important research priority, and is especially vital for instruments that are used to collect data to guide public policy decisions. The Warwick-Edinburgh Mental Well-Being Scale (WEMWBS) is a well-established and widely-used instrument for assessing individual differences in well-being. The current analyses were motivated by concerns that metal wellbeing items that refer to interpersonal relationships (Items 9 and 12) may operate differently for those in a relationship compared to those not in a relationship. To assess this, the present study used item characteristic curves (ICC) and ANOVA of residuals to scrutinize the differential item functioning (DIF) of the 14 WEMWBS items for participant relationship status (n with partner = 261, n without partner = 210). Items 5, 9, and 12 showed evidence of DIF which impacted group mean differences. Item 5 (\"energy to spare\") was unexpected, however plausible explanation is discussed. For participants at the same level of mental wellbeing, those in a relationship scored higher on items 9 and 12 than those not in a relationship. This suggests these items are sensitive to non-wellbeing related variance associated with relationship status. Implications and future research directions are discussed.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"19 2","pages":"162-172"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36216477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of applied measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1