Educational Measurement-Issues and Practice最新文献

英文中文

Defining Test-Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument 定义考试成绩的解释、使用和主张:有效性论证的德尔菲研究

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-06-27 DOI: 10.1111/emip.12569

Timothy D. Folger, Jonathan Bostic, Erin E. Krupa

Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test-score interpretation, test-score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in validity and validation conceptualize test-score interpretation, use, and claims. Definitions were developed through multiple iterations of data collection and analysis. By clarifying the language used when conducting validation, validation may be more accessible to a broader audience, including but not limited to test developers, test users, and test consumers.

有效性是测试开发和测试评估的基本考虑因素。本研究的目的是定义和明确效度和效度的三个关键方面，即考试成绩解释、考试成绩使用和支持解释和使用的主张。本研究采用德尔菲法探讨专家如何在效度和验证概念化考试成绩的解释，使用和主张。定义是通过数据收集和分析的多次迭代制定的。通过澄清在进行验证时使用的语言，可以使更广泛的受众更容易接受验证，包括但不限于测试开发人员、测试用户和测试消费者。

引用次数: 1

Hierarchical Agglomerative Clustering to Detect Test Collusion on Computer-Based Tests 基于层次聚集聚类的计算机测试共谋检测

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-06-19 DOI: 10.1111/emip.12568

Soo Jeong Ingrisone, James N. Ingrisone

There has been a growing interest in approaches based on machine learning (ML) for detecting test collusion as an alternative to the traditional methods. Clustering analysis under an unsupervised learning technique appears especially promising to detect group collusion. In this study, the effectiveness of hierarchical agglomerative clustering (HAC) for detecting aberrant test takers on Computer-Based Testing (CBT) is explored. Random forest ensembles are used to evaluate the accuracy of the clustering and find the important features to classify the aberrant test takers. Testing data from a certification exam is used. The level of overlap between the exact response matches on incorrectly keyed items in the exam preparation material and HAC are compared. Integrating HAC as an investigation mean is promising in this field to improve the accuracy of classification of aberrant test takers.

人们越来越关注基于机器学习(ML)的方法来检测测试合谋，作为传统方法的替代方案。在无监督学习技术下的聚类分析在检测群体合谋方面显得特别有前景。本研究探讨了层次凝聚聚类(HAC)在计算机测试(CBT)中检测异常考生的有效性。使用随机森林集合来评估聚类的准确性，并找到对异常考生进行分类的重要特征。使用来自认证考试的测试数据。在考试准备材料和HAC中错误的关键问题的准确回答匹配之间的重叠程度进行比较。将HAC作为一种调查手段，在提高异常考生分类的准确性方面具有广阔的应用前景。

引用次数: 0

A Probabilistic Filtering Approach to Non-Effortful Responding 一种非费力响应的概率过滤方法

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-06-16 DOI: 10.1111/emip.12567

Esther Ulitzsch, Benjamin W. Domingue, Radhika Kapoor, Klint Kanopka, Joseph A. Rios

Common response-time-based approaches for non-effortful response behavior (NRB) in educational achievement tests filter responses that are associated with response times below some threshold. These approaches are, however, limited in that they require a binary decision on whether a response is classified as stemming from NRB; thus ignoring potential classification uncertainty in resulting parameter estimates. We developed a response-time-based probabilistic filtering procedure that overcomes this limitation. The procedure is rooted in the principles of multiple imputation. Instead of creating multiple plausible replacements of missing data, however, multiple data sets are created that represent plausible filtered response data. We propose two different approaches to filtering models, originating in different research traditions and conceptualizations of response-time-based identification of NRB. The first approach uses Gaussian mixture modeling to identify a response time subcomponent stemming from NRB. Plausible filtered data sets are created based on examinees' posterior probabilities of belonging to the NRB subcomponent. The second approach defines a plausible range of response time thresholds and creates plausible filtered data sets by drawing multiple response time thresholds from the defined range. We illustrate the workings of the proposed procedure as well as differences between the proposed filtering models based on both simulated data and empirical data from PISA 2018.

在教育成就测试中，常见的基于响应时间的非努力响应行为(NRB)方法会过滤与响应时间低于某个阈值相关的响应。然而，这些方法的局限性在于，它们需要对反应是否归类为NRB的二元决策;从而忽略了结果参数估计中潜在的分类不确定性。我们开发了一种基于响应时间的概率过滤程序来克服这一限制。该程序植根于多重归算的原则。但是，不是为缺失的数据创建多个合理的替换，而是创建多个数据集来表示合理的过滤响应数据。我们提出了两种不同的过滤模型方法，源自不同的研究传统和基于响应时间的NRB识别概念。第一种方法使用高斯混合建模来识别源于NRB的响应时间子分量。可信的过滤数据集是基于考生属于NRB子成分的后验概率创建的。第二种方法定义响应时间阈值的合理范围，并通过从所定义的范围中绘制多个响应时间阈值来创建合理的过滤数据集。我们根据2018年PISA的模拟数据和经验数据说明了拟议程序的工作原理以及拟议过滤模型之间的差异。

{"title":"A Probabilistic Filtering Approach to Non-Effortful Responding","authors":"Esther Ulitzsch, Benjamin W. Domingue, Radhika Kapoor, Klint Kanopka, Joseph A. Rios","doi":"10.1111/emip.12567","DOIUrl":"10.1111/emip.12567","url":null,"abstract":"Common response-time-based approaches for non-effortful response behavior (NRB) in educational achievement tests filter responses that are associated with response times below some threshold. These approaches are, however, limited in that they require a binary decision on whether a response is classified as stemming from NRB; thus ignoring potential classification uncertainty in resulting parameter estimates. We developed a response-time-based probabilistic filtering procedure that overcomes this limitation. The procedure is rooted in the principles of multiple imputation. Instead of creating multiple plausible replacements of missing data, however, multiple data sets are created that represent plausible filtered response data. We propose two different approaches to filtering models, originating in different research traditions and conceptualizations of response-time-based identification of NRB. The first approach uses Gaussian mixture modeling to identify a response time subcomponent stemming from NRB. Plausible filtered data sets are created based on examinees' posterior probabilities of belonging to the NRB subcomponent. The second approach defines a plausible range of response time thresholds and creates plausible filtered data sets by drawing multiple response time thresholds from the defined range. We illustrate the workings of the proposed procedure as well as differences between the proposed filtering models based on both simulated data and empirical data from PISA 2018.","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 3","pages":"50-64"},"PeriodicalIF":2.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12567","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46209020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Issue Cover 发行封面

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-06-09 DOI: 10.1111/emip.12514

引用次数: 0

Digital Module 32: Understanding and Mitigating the Impact of Low Effort on Common Uses of Test and Survey Scores 数字模块32：理解和减轻低努力对测试和调查分数常用的影响

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-06-09 DOI: 10.1111/emip.12555

James Soland

Most individuals who take, interpret, design, or score tests are aware that examinees do not always provide full effort when responding to items. However, many such individuals are not aware of how pervasive the issue is, what its consequences are, and how to address it. In this digital ITEMS module, Dr. James Soland will help fill these gaps in the knowledge base. Specifically, the module enumerates how frequently behaviors associated with low effort occur, and some of the ways they can distort inferences based on test scores. Then, the module explains some of the most common approaches for identifying low effort, and correcting for it when examining test scores. Brief discussion is also given to how these methods align with, and diverge from, those used to deal with low respondent effort in self-report contexts. Data and code are also provided such that readers can better implement some of the desired methods in their own work.

大多数参加、解释、设计或评分考试的人都意识到，考生在回答问题时并不总是全力以赴。然而，许多这样的人并没有意识到这个问题有多普遍，它的后果是什么，以及如何解决它。在这个数字项目模块中，James Soland博士将帮助填补知识库中的这些空白。具体来说，该模块列举了与低努力相关的行为发生的频率，以及它们可能歪曲基于考试成绩的推断的一些方式。然后，该模块解释了一些最常见的识别低努力的方法，并在检查考试成绩时对其进行纠正。还简要讨论了这些方法如何与那些用于处理自我报告背景下低应答者努力的方法保持一致，并与之不同。还提供了数据和代码，以便读者可以在自己的工作中更好地实现一些所需的方法。

引用次数: 0

Visualizing Distributions Across Grades 可视化各个年级的分布

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-06-09 DOI: 10.1111/emip.12558

Yuan-Ling Liaw

引用次数: 0

ITEMS Corner Update: The Initial Steps in the ITEMS Development Process 项目角落更新:项目开发过程中的初始步骤

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-06-09 DOI: 10.1111/emip.12556

Brian C. Leventhal

In the previous issue of Educational Measurement: Issues and Practice (EM:IP) I outlined the ten steps to authoring and producing a digital module for the Instructional Topics in Educational Measurement Series (ITEMS). In the current piece, I detail the first three steps: Step 1—Content Outline; Step 2—Content Development; and Step 3—Draft Review. After in-depth discussion of these three steps, I introduce the newest ITEMS module.Prior to beginning the ten-step process, ITEMS module development starts with an initial meeting between myself (as editor) and the lead author(s). During this meeting, I discuss the development process in detail, showcasing what a final product looks like from the learners’ perspective in addition to a sneak-peek behind-the-scenes at what the final product looks like from the editorial perspective. After discussing the end product, the remaining conversation focuses on the 10-step process and the user-friendly templates to guide authors. The conversation concludes after coming to an agreement of the topic and general scope for the module.Authors then independently work through a module outline template to refine the scope and sequencing of the module (Step 1). During this step, authors are encouraged to first specify their audience before setting the learning objectives of the module. Once learning objectives are set, authors are then tasked with determining the prerequisite knowledge for learners. In the next section of the template, authors outline the content and sequencing of the 4–6 sections of the module. Each section has its own learning objectives that map to the objectives of the module. One of the sections is a learner-focused interactive activity, whether it be a demonstration of software or a case study that is relevant to the content of the other sections. Once the outline is completed, the authors receive feedback to ensure adequate sequencing, feasibility of module development (e.g., covering a reasonable amount of content), and appropriateness for the audience. This is an example of the unique nature of ITEMS module development. Unlike most other publications, ITEMS module development consists of regular communication and feedback from the editor. Once the scope and outline of content have been agreed to, the authors move on to Step 2: Content Development.For Step 2, authors are provided a slide deck template to assist in developing content consistent with the ITEMS format and brand. Using this slide deck, authors maintain creative flexibility by choosing among many slide layouts, each preprogrammed with consistent font, sizing, and color. Authors create individual slide decks for each section of the module, embedding media (e.g., pictures/figures) wherever necessary to assist learner understanding. At this stage, authors are not expected to record audio nor are they expected to put in animations. The primary focus for the authors i

在上一期《教育测量:问题与实践》(EM:IP)中，我概述了为教育测量系列(ITEMS)中的教学主题编写和制作数字模块的十个步骤。在当前的文章中，我详细介绍了前三个步骤:步骤1 -内容大纲;步骤2 -内容开发;和步骤3 -草案审查。在深入讨论了这三个步骤之后，我介绍了最新的ITEMS模块。在开始十步流程之前，ITEMS模块开发始于我(作为编辑)和主要作者之间的初始会议。在这次会议上，我详细讨论了开发过程，从学习者的角度展示了最终产品的外观，并从编辑的角度偷窥了最终产品的幕后情况。在讨论了最终产品之后，剩下的对话集中在10步流程和指导作者的用户友好模板上。在对主题和模块的总体范围达成一致后，对话结束。然后作者独立地通过模块大纲模板来完善模块的范围和顺序(第1步)。在这一步中，鼓励作者在设定模块的学习目标之前首先指定他们的受众。一旦设定了学习目标，作者的任务就是确定学习者的先决知识。在模板的下一节中，作者概述了模块的4-6部分的内容和顺序。每个部分都有自己的学习目标，这些目标与模块的目标相对应。其中一个部分是一个以学习者为中心的互动活动，无论是软件演示还是与其他部分内容相关的案例研究。一旦大纲完成，作者将收到反馈，以确保适当的顺序、模块开发的可行性(例如，覆盖合理数量的内容)以及对受众的适当性。这是ITEMS模块开发的独特性质的一个例子。与大多数其他出版物不同，ITEMS模块的开发由编辑的定期交流和反馈组成。一旦对内容的范围和大纲达成一致，作者就进入第二步:内容开发。在第二步，为作者提供一个幻灯片模板，以帮助他们开发与ITEMS格式和品牌一致的内容。使用此幻灯片，作者可以通过在许多幻灯片布局中进行选择来保持创造性的灵活性，每个幻灯片布局都预先编程为一致的字体、大小和颜色。作者为模块的每个部分创建了单独的幻灯片，在必要时嵌入媒体(例如图片/数字)，以帮助学习者理解。在这个阶段，作者不需要录制音频，也不需要添加动画。作者主要关注的是内容，其余的由编辑团队负责。根据主题的不同，典型的部分有10到15张幻灯片，作者计划在每张幻灯片上发言1到2分钟。通常，作者在为一个部分开发内容后要求反馈，以确认文本、图形和数字的适当平衡。在内容开发阶段，作者发现写详细的笔记是很有价值的，无论是通过项目符号还是实际的脚本来帮助之后的音频录制。然后由编辑团队审查每个部分的草稿(步骤3)，其中步骤2和步骤3成为迭代，直到作者和编辑都同意工作。在幻灯片完成后，作者可以选择外部审查，或者选择录制音频，并在稍后的过程中寻求外部审查。每次都与编辑讨论寻求审稿的利弊。在未来的EM:IP中，我将详细介绍ITEMS模块开发过程的其余步骤。本次展览的目的是:(1)熟悉读者、学习者和潜在作者的发展过程，这种非典型出版物;(2)推广这些模块作者完成的幕后详细工作;(3)通过展示严格的、有指导的开发过程来吸引潜在作者的兴趣。ITEMS模块对许多受众(例如，研究生和教师，客户，教育测量领域内外的专业人士)具有不可思议的实用性。正是通过作者、编辑团队和审稿人的自愿贡献，才有了如此美妙的产品。最后，我很高兴地宣布，由James Soland博士编写的最新模块《数字模块32:理解和减轻低努力对测试和调查成绩的影响》即将出版。在这个由六部分组成的模块中，

{"title":"ITEMS Corner Update: The Initial Steps in the ITEMS Development Process","authors":"Brian C. Leventhal","doi":"10.1111/emip.12556","DOIUrl":"10.1111/emip.12556","url":null,"abstract":"In the previous issue of Educational Measurement: Issues and Practice (EM:IP) I outlined the ten steps to authoring and producing a digital module for the Instructional Topics in Educational Measurement Series (ITEMS). In the current piece, I detail the first three steps: Step 1—Content Outline; Step 2—Content Development; and Step 3—Draft Review. After in-depth discussion of these three steps, I introduce the newest ITEMS module.Prior to beginning the ten-step process, ITEMS module development starts with an initial meeting between myself (as editor) and the lead author(s). During this meeting, I discuss the development process in detail, showcasing what a final product looks like from the learners’ perspective in addition to a sneak-peek behind-the-scenes at what the final product looks like from the editorial perspective. After discussing the end product, the remaining conversation focuses on the 10-step process and the user-friendly templates to guide authors. The conversation concludes after coming to an agreement of the topic and general scope for the module.Authors then independently work through a module outline template to refine the scope and sequencing of the module (Step 1). During this step, authors are encouraged to first specify their audience before setting the learning objectives of the module. Once learning objectives are set, authors are then tasked with determining the prerequisite knowledge for learners. In the next section of the template, authors outline the content and sequencing of the 4–6 sections of the module. Each section has its own learning objectives that map to the objectives of the module. One of the sections is a learner-focused interactive activity, whether it be a demonstration of software or a case study that is relevant to the content of the other sections. Once the outline is completed, the authors receive feedback to ensure adequate sequencing, feasibility of module development (e.g., covering a reasonable amount of content), and appropriateness for the audience. This is an example of the unique nature of ITEMS module development. Unlike most other publications, ITEMS module development consists of regular communication and feedback from the editor. Once the scope and outline of content have been agreed to, the authors move on to Step 2: Content Development.For Step 2, authors are provided a slide deck template to assist in developing content consistent with the ITEMS format and brand. Using this slide deck, authors maintain creative flexibility by choosing among many slide layouts, each preprogrammed with consistent font, sizing, and color. Authors create individual slide decks for each section of the module, embedding media (e.g., pictures/figures) wherever necessary to assist learner understanding. At this stage, authors are not expected to record audio nor are they expected to put in animations. The primary focus for the authors i","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 2","pages":"74"},"PeriodicalIF":2.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12556","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42654384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Role of Response Style Adjustments in Cross-Country Comparisons—A Case Study Using Data from the PISA 2015 Questionnaire 反应风格调整在跨国比较中的作用——基于2015年PISA问卷数据的案例研究

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-05-01 DOI: 10.1111/emip.12552

Esther Ulitzsch, Oliver Lüdtke, Alexander Robitzsch

Country differences in response styles (RS) may jeopardize cross-country comparability of Likert-type scales. When adjusting for rather than investigating RS is the primary goal, it seems advantageous to impose minimal assumptions on RS structures and leverage information from multiple scales for RS measurement. Using PISA 2015 background questionnaire data, we investigate such an adjustment procedure and explore its impact on cross-country comparisons in contrast to customary analyses and RS adjustments that (a) leave RS unconsidered, (b) incorporate stronger assumptions on RS structure, and/or (c) only use some selected scales for RS measurement. Our findings suggest that not only the decision as to whether to adjust for RS but also how to adjust may heavily impact cross-country comparisons. This concerns both the assumptions on RS structures and the scales employed for RS measurement. Implications for RS adjustments in cross-country comparisons are derived, strongly advocating for taking model uncertainty into account.

国家间反应风格的差异可能会危及李克特量表的跨国可比性。当调整而不是调查RS是主要目标时，对RS结构施加最小假设并利用来自多个尺度的RS测量信息似乎是有利的。使用PISA 2015背景问卷数据，我们调查了这种调整程序，并探讨了它对跨国比较的影响，而不是习惯分析和RS调整(a)不考虑RS， (b)对RS结构纳入更强的假设，和/或(c)仅使用一些选定的量表进行RS测量。我们的研究结果表明，不仅决定是否调整RS，而且如何调整也可能严重影响跨国比较。这既涉及RS结构的假设，也涉及RS测量所采用的尺度。本文推导了跨国比较对RS调整的影响，强烈主张考虑模式的不确定性。

{"title":"The Role of Response Style Adjustments in Cross-Country Comparisons—A Case Study Using Data from the PISA 2015 Questionnaire","authors":"Esther Ulitzsch, Oliver Lüdtke, Alexander Robitzsch","doi":"10.1111/emip.12552","DOIUrl":"10.1111/emip.12552","url":null,"abstract":"Country differences in response styles (RS) may jeopardize cross-country comparability of Likert-type scales. When adjusting for rather than investigating RS is the primary goal, it seems advantageous to impose minimal assumptions on RS structures and leverage information from multiple scales for RS measurement. Using PISA 2015 background questionnaire data, we investigate such an adjustment procedure and explore its impact on cross-country comparisons in contrast to customary analyses and RS adjustments that (a) leave RS unconsidered, (b) incorporate stronger assumptions on RS structure, and/or (c) only use some selected scales for RS measurement. Our findings suggest that not only the decision as to whether to adjust for RS but also how to adjust may heavily impact cross-country comparisons. This concerns both the assumptions on RS structures and the scales employed for RS measurement. Implications for RS adjustments in cross-country comparisons are derived, strongly advocating for taking model uncertainty into account.","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 3","pages":"65-79"},"PeriodicalIF":2.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12552","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49547140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diving Into Students’ Transcripts: High School Course-Taking Sequences and Postsecondary Enrollment 深入研究学生成绩单：高中课程选修顺序和中学后入学

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-04-23 DOI: 10.1111/emip.12554

Burhan Ogut, Ruhan Circi

The purpose of this study was to explore high school course-taking sequences and their relationship to college enrollment. Specifically, we implemented sequence analysis to discover common course-taking trajectories in math, science, and English language arts using high school transcript data from a recent nationally representative survey. Through sequence clustering, we reduced the complexity of the sequences and examined representative course-taking sequences. Classification tree, random forests, and multinomial logistic regression analyses were used to explore the relationship between the course sequences students complete and their postsecondary outcomes. Results showed that distinct representative course-taking sequences can be identified for all students as well as student subgroups. More advanced and complex course-taking sequences were associated with postsecondary enrollment.

摘要本研究旨在探讨高中修课顺序与大学录取的关系。具体来说，我们利用最近一项全国代表性调查的高中成绩单数据实施了序列分析，以发现数学、科学和英语语言艺术的共同课程学习轨迹。通过序列聚类，降低了序列的复杂度，检验了具有代表性的选修序列。本研究使用分类树、随机森林及多项逻辑回归分析，探讨学生修习的课程顺序与他们的大专后成绩之间的关系。结果表明，在所有学生和学生亚群中，都可以识别出具有明显代表性的选修课程序列。更高级和复杂的课程选择顺序与中学后入学有关。

引用次数: 0

Validation as Evaluating Desired and Undesired Effects: Insights From Cross-Classified Mixed Effects Model 评估期望和不期望效果的验证:来自交叉分类混合效应模型的见解

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2023-04-05 DOI: 10.1111/emip.12553

Xuejun Ryan Ji, Amery D. Wu

The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence. Validity is viewed as the coherence among the elements of a measurement system. As such, validity can be evaluated by the user-reasoned desired or undesired fixed and random effects. Based on the data of ePIRLS 2016 Reading Assessment, we demonstrate how to obtain evidence for reliability and validity by CCMEM. We conclude with a discussion on the practicality and benefits of this validation method.

交叉分类混合效应模型(CCMEM)已被测量专家证明是一个灵活的评估可靠性的框架。信度可以根据测试分数的方差分量来估计。在此基础上，本研究将CCMEM扩展至评估效度证据。效度被视为一个测量系统的要素之间的一致性。因此，有效性可以通过用户推理的期望或不期望的固定和随机效果来评估。本文以ePIRLS 2016阅读测评数据为基础，论证了如何利用CCMEM获取信度和效度证据。最后讨论了该验证方法的实用性和效益。

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Educational Measurement-Issues and Practice

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀