The Method for Comprehensive Quality Evaluation of Tests. Part 2

V. Kukharenko, L. Perkhun, N. M. Tovmachenko
{"title":"The Method for Comprehensive Quality Evaluation of Tests. Part 2","authors":"V. Kukharenko, L. Perkhun, N. M. Tovmachenko","doi":"10.31767/SU.4(83)2018.04.09","DOIUrl":null,"url":null,"abstract":"In the article, the description of the complex evaluation method is given, as well as the classical method of Data Mining and Item Response Theory (IRT). In the general method there are six steps. This article describes steps 4-6. \nThe fourth step of the method is to evaluate the reliability of the test. A universal two-step procedure is proposed – the assessment of the reliability of individual test tasks based on the coefficient of internal coherence of Kjuder – Richardson and the evaluation of the reliability of the test as a whole by the coefficient of generalization. The first of the coefficients is considered acceptable at the level of 0.7 and above, the second – at the level of 0.8 and above. Two-factor ANOVA variance analysis without repeated measurements in SPSS was used to calculate the second coefficient. \nAt the fifth stage of the methodology, the quality of students' differentiation is assessed by a test that is being studied. The tool for this is selected hierarchical cluster procedures, classification trees and classification discriminant functions. The calculations were performed by means of Statistica and SPSS. Three clusters of students with high, medium and low academic performance were identified. It is shown that the test under study allows the differentiation of students. \nAt the last, sixth stage, a study of the quality of the test is described based on the one-parameter model of Rash. The levels of the difficulty of the test assignment and the mastering of the student's study material are measured in logics. The analytical task of the characteristic individual curve of the test assignment and the characteristic individual curve of the student, as well as the auxiliary formulas for their calculations, are given. The description is illustrated by a specific example. It is noted that the characteristic curves of students based on the Rash model by means of MathCAD, can clearly divide the latter into two groups – strong (have positive logic) and weak (have negative logic). Recommendations on the interpretation of the obtained results for certain test tasks are formulated. In particular, in case of overlap of the characteristic curves of various test tasks, they must be deleted (normative-oriented test) or reconstructed (criterion-oriented test). This paper does not consider how to determine which test question is to be deleted or corrected, but it is indicated that this can be established with the help of a two-parameter Birnbaum model. If the density of the characteristic curves of the test tasks is not the same; It is recommended to add a test task (in the case of a normative-oriented test) or thus change the duplicate test questions (in the case of a normative-oriented test) to fill the gaps of the abscissa, where there are no characteristic curves. \nBy the practical implementation of this technique, the authors determine the development of a separate plug-in that is compatible with the Moodle distance learning platform. \nThe prospect of further research in the theoretical framework is determined by the authors of the study of the boundaries of the use of two-parameter and three-parameter models of Birnbaum to improve the process and test results of students in distance learning systems.","PeriodicalId":52812,"journal":{"name":"Statistika Ukrayini","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistika Ukrayini","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31767/SU.4(83)2018.04.09","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the article, the description of the complex evaluation method is given, as well as the classical method of Data Mining and Item Response Theory (IRT). In the general method there are six steps. This article describes steps 4-6. The fourth step of the method is to evaluate the reliability of the test. A universal two-step procedure is proposed – the assessment of the reliability of individual test tasks based on the coefficient of internal coherence of Kjuder – Richardson and the evaluation of the reliability of the test as a whole by the coefficient of generalization. The first of the coefficients is considered acceptable at the level of 0.7 and above, the second – at the level of 0.8 and above. Two-factor ANOVA variance analysis without repeated measurements in SPSS was used to calculate the second coefficient. At the fifth stage of the methodology, the quality of students' differentiation is assessed by a test that is being studied. The tool for this is selected hierarchical cluster procedures, classification trees and classification discriminant functions. The calculations were performed by means of Statistica and SPSS. Three clusters of students with high, medium and low academic performance were identified. It is shown that the test under study allows the differentiation of students. At the last, sixth stage, a study of the quality of the test is described based on the one-parameter model of Rash. The levels of the difficulty of the test assignment and the mastering of the student's study material are measured in logics. The analytical task of the characteristic individual curve of the test assignment and the characteristic individual curve of the student, as well as the auxiliary formulas for their calculations, are given. The description is illustrated by a specific example. It is noted that the characteristic curves of students based on the Rash model by means of MathCAD, can clearly divide the latter into two groups – strong (have positive logic) and weak (have negative logic). Recommendations on the interpretation of the obtained results for certain test tasks are formulated. In particular, in case of overlap of the characteristic curves of various test tasks, they must be deleted (normative-oriented test) or reconstructed (criterion-oriented test). This paper does not consider how to determine which test question is to be deleted or corrected, but it is indicated that this can be established with the help of a two-parameter Birnbaum model. If the density of the characteristic curves of the test tasks is not the same; It is recommended to add a test task (in the case of a normative-oriented test) or thus change the duplicate test questions (in the case of a normative-oriented test) to fill the gaps of the abscissa, where there are no characteristic curves. By the practical implementation of this technique, the authors determine the development of a separate plug-in that is compatible with the Moodle distance learning platform. The prospect of further research in the theoretical framework is determined by the authors of the study of the boundaries of the use of two-parameter and three-parameter models of Birnbaum to improve the process and test results of students in distance learning systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
试验品综合质量评价方法。第2部分
本文给出了复杂评价方法的描述,以及数据挖掘和项目反应理论(IRT)的经典方法。一般的方法有六个步骤。本文介绍步骤4-6。该方法的第四步是评估测试的信度。提出了一种通用的两步程序——基于Kjuder - Richardson的内部相干系数来评估单个测试任务的信度和基于泛化系数来评估整体测试的信度。第一个系数在0.7及以上水平被认为是可以接受的,第二个系数在0.8及以上水平被认为是可以接受的。采用SPSS中无重复测量的双因素方差分析计算第二系数。在方法论的第五阶段,学生分化的质量是通过正在研究的测试来评估的。工具是选择层次聚类过程,分类树和分类判别函数。采用Statistica和SPSS软件进行计算。本研究将学生分为高、中、低三组。结果表明,所研究的测试允许学生的分化。第六阶段,基于拉什单参数模型对测试质量进行了研究。测试作业的难度和学生对学习材料的掌握程度是用逻辑来衡量的。给出了作业特征曲线和学生特征曲线的分析任务,以及计算的辅助公式。通过一个具体的例子说明了这种描述。我们注意到,通过MathCAD基于Rash模型绘制的学生特征曲线,可以清楚地将学生分为强(具有正逻辑)和弱(具有负逻辑)两组。对某些测试任务所获得的结果的解释提出了建议。特别是当各种测试任务的特征曲线重叠时,必须删除(规范导向测试)或重建(标准导向测试)。本文没有考虑如何确定哪些试题需要删除或修改,但指出这可以借助双参数Birnbaum模型来建立。如果测试任务的特征曲线密度不相同;建议增加一个测试任务(在规范导向测试的情况下)或因此改变重复的测试问题(在规范导向测试的情况下)来填补横坐标的空白,没有特征曲线。通过该技术的实际实现,作者确定了与Moodle远程学习平台兼容的单独插件的开发。在理论框架下进一步研究的前景是由研究作者使用Birnbaum的两参数和三参数模型的边界来改善远程学习系统中学生的过程和测试结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
10
审稿时长
12 weeks
期刊最新文献
The Ukrainian Trace on the Way of Development of the International Statistical Institute Information and Analytical Support for the Management of Law Enforcement and Socio-Economic Activities (on the Basis of Methodologies and Practices of Applied Statistics) The Mortality from External Causes: Impact of the COVID-19 Pandemic and the War in Ukraine Interaction of Social Capital Forms in the Structure of Civil Society Networks: Managerial Aspect Counteracting the Risks of International Investment in the Conditions of War
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1