首页 > 最新文献

Educational Measurement-Issues and Practice最新文献

英文 中文
ITEMS Corner Update: Recording Audio and Adding an Editorial Polish to an ITEMS Module 物品角落更新:录制音频并在物品模块中添加编辑修饰
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-09-06 DOI: 10.1111/emip.12573
Brian C. Leventhal

In the first issue of Educational Measurement: Issues and Practice (EM:IP) in 2023, I outlined the 10 steps to the Instructional Topics in Educational Measurement Series (ITEMS) module development process. I then detailed the first three steps in the second issue, and in this issue, I discuss Steps 4–7, focusing on the audio recording process, editorial polish, interactive activities, and learning check development. I devote space discussing each in detail to provide readers and potential authors with a better understanding of the behind-the-scenes efforts throughout the ITEMS module development process. Following this discussion, I reiterate a call for module topics and conclude by introducing the latest entry to the ITEMS module library.

Throughout content development (Step 3), authors are encouraged to draft notes or a script for each slide to assist in audio recording. After drafted content is approved by the editorial team, the author begins Step 4: audio recording. There are no special skills or software needed to record the audio, and hardware (i.e., a microphone) is provided when necessary. Audio recording is done within PowerPoint and on each slide independently. In this sense, a 20-minute module section's audio is recorded in 1–3 minutes bits so that should re-recording be required, the author does not need to fully re-record an entire section. This also facilitates smoother transitions throughout each section, leading to a more natural speaking style. Although authors are encouraged to use a script (this is helpful should re-recording be necessary), it is emphasized that the audio should not sound like reading. Rather audio should be in a similar style to that of an instructor providing a professional workshop.

Once the audio recording is complete, the work shifts to the editorial team. During Step 5, the editorial team polishes the module content and audio. On each slide, they clean up the audio by reducing background noise, editing sections of silence, and increasing or decreasing the volume. After audio editing is complete, the editorial team adds slide transitions, object animations, and other stylistic tools to assist learning. For example, transition animations and timing assist smooth continuation of thought and content from slide to slide. Animations are synced with the audio to have bullet points appear when discussed, figures fade in when mentioned, and other content displayed systematically to not overwhelm the learner. Additional stylistic tools and techniques are employed to take advantage of the digital platform. For example, graph elements (e.g., axis labels) are animated in stages, fading into view as they are described throughout the audio to help focus the learner. Shapes, such as circles or arrows, may also be added to figures to highlight specific elements when emphasized in the audio. To assist with flow and organization, the editorial team may use additional slides or flow charts. For

在2023年第一期《教育测量:问题与实践》(EM:IP)中,我概述了教育测量系列(ITEMS)模块开发过程中的教学主题的10个步骤。然后我在第二期中详细介绍了前三个步骤,在这一期中,我将讨论步骤4-7,重点是音频录制过程、编辑润色、互动活动和学习检查开发。为了让读者和潜在的作者更好地理解贯穿ITEMS模块开发过程的幕后工作,我专门腾出篇幅详细讨论每个模块。在此讨论之后,我重申对模块主题的调用,并通过介绍ITEMS模块库中的最新条目来结束讨论。在整个内容开发过程中(步骤3),鼓励作者为每张幻灯片起草注释或脚本,以帮助录音。初稿内容通过编辑组审核后,作者开始第四步:录音。录制音频不需要特殊技能或软件,必要时提供硬件(即麦克风)。录音是在PowerPoint中完成的,每张幻灯片都是独立的。从这个意义上说,一个20分钟的模块部分的音频被记录在1-3分钟的比特,所以如果需要重新录制,作者不需要完全重新录制整个部分。这也有助于在每个部分中更流畅地过渡,从而形成更自然的说话风格。虽然我们鼓励作者使用脚本(如果需要重新录音,这很有帮助),但强调的是音频听起来不应该像阅读。相反,音频应该在一个类似的风格,一个讲师提供一个专业的研讨会。一旦录音完成,工作就会转移到编辑团队。在步骤5中,编辑团队对模块内容和音频进行润色。在每张幻灯片上,他们通过减少背景噪音、编辑沉默部分、增加或减少音量来清理音频。音频编辑完成后,编辑团队添加幻灯片过渡,对象动画和其他风格工具来帮助学习。例如,过渡动画和计时有助于思想和内容在幻灯片之间的顺利延续。动画与音频同步,以便在讨论时显示要点,在提到时淡入数字,并系统地显示其他内容,以免学习者不知所措。额外的风格工具和技术被用来利用数字平台。例如,图形元素(例如,轴标签)是分阶段动画的,随着音频的描述逐渐淡出视野,以帮助学习者集中注意力。图形,如圆圈或箭头,也可以添加到图形中,以突出在音频中强调的特定元素。为了帮助流程和组织,编辑团队可以使用额外的幻灯片或流程图。例如,如果将十张幻灯片分为三个层次主题,则可以在每个主题之间添加一张带有显示三个主题的流程图的幻灯片,以提醒学习者主题的结构和相互联系。这个抛光过程需要3到4周才能完成。完成后,编辑团队将幻灯片导出为视频供作者审查。作者和编辑团队在最后定稿前共同进行必要的调整。如果该模块尚未经过外部审核,那么视频将被发送出去进行审核。在内容定稿、模块润色、复习完成后,作者开展互动活动和章节学习检查(过程中的第6步和第7步)。该活动可以是案例研究,示例数据和语法,或其他交互式组件,为学习者提供在整个模块中应用所学知识的机会。例如,如果模块专注于统计建模,则示例活动可以展示特定软件包中的语法。对于这种类型的活动,作者为学习者录制了一段视频,讨论语法和输出。作者还为模块的每个部分开发了三到五个选择响应项。这些学习检查是为了让学习者在进入下一部分之前检查他们对内容的理解。这些问题在一个word文档模板中开发,编辑团队将其重新格式化为PowerPoint幻灯片,以使学习检查具有互动性。在EM:IP 2023的最后一期中,我将概述ITEMS模块开发过程的其余步骤。 提醒一下,本次博览会的目标是(1)让读者、学习者和潜在作者熟悉这个非典型出版物的开发过程,(2)宣传这些模块作者完成的幕后详细工作,以及(3)通过展示严谨而又有指导意义的开发过程来吸引潜在作者的兴趣。最近,NCME的一个工作组介绍了教育测量的基本能力。在这一点上,他们强调了不同的主题,这些主题应该作为测量学术课程的基础。这些包括仪器开发、项目分析、可靠性和测量误差、有效性、抽样等。ITEMS目前正在寻找作者来开发与其中一些主题相关的模块。如果你感兴趣,请联系Brian Leventhal ([email protected])。如果你有其他的话题,请不要犹豫,联系我们!我们愿意讨论所有的想法(例如,计算机自适应测试,过程数据)。最后,我很高兴地宣布ITEMS数字模块库的最新条目。在数字模块33中,Amir Rasooli博士讨论了课堂评估中的公平性:维度和张力。在这个由五个部分组成的模块中,Rasooli博士根据各国的理论和实证研究成果,分享了在课堂评估环境中提高公平性的最佳实践。
{"title":"ITEMS Corner Update: Recording Audio and Adding an Editorial Polish to an ITEMS Module","authors":"Brian C. Leventhal","doi":"10.1111/emip.12573","DOIUrl":"10.1111/emip.12573","url":null,"abstract":"<p>In the first issue of <i>Educational Measurement: Issues and Practice</i> (EM:IP) in 2023, I outlined the 10 steps to the <i>Instructional Topics in Educational Measurement Series (ITEMS)</i> module development process. I then detailed the first three steps in the second issue, and in this issue, I discuss Steps 4–7, focusing on the audio recording process, editorial polish, interactive activities, and learning check development. I devote space discussing each in detail to provide readers and potential authors with a better understanding of the behind-the-scenes efforts throughout the ITEMS module development process. Following this discussion, I reiterate a call for module topics and conclude by introducing the latest entry to the ITEMS module library.</p><p>Throughout content development (Step 3), authors are encouraged to draft notes or a script for each slide to assist in audio recording. After drafted content is approved by the editorial team, the author begins Step 4: audio recording. There are no special skills or software needed to record the audio, and hardware (i.e., a microphone) is provided when necessary. Audio recording is done within PowerPoint and on each slide independently. In this sense, a 20-minute module section's audio is recorded in 1–3 minutes bits so that should re-recording be required, the author does not need to fully re-record an entire section. This also facilitates smoother transitions throughout each section, leading to a more natural speaking style. Although authors are encouraged to use a script (this is helpful should re-recording be necessary), it is emphasized that the audio should not sound like reading. Rather audio should be in a similar style to that of an instructor providing a professional workshop.</p><p>Once the audio recording is complete, the work shifts to the editorial team. During Step 5, the editorial team polishes the module content and audio. On each slide, they clean up the audio by reducing background noise, editing sections of silence, and increasing or decreasing the volume. After audio editing is complete, the editorial team adds slide transitions, object animations, and other stylistic tools to assist learning. For example, transition animations and timing assist smooth continuation of thought and content from slide to slide. Animations are synced with the audio to have bullet points appear when discussed, figures fade in when mentioned, and other content displayed systematically to not overwhelm the learner. Additional stylistic tools and techniques are employed to take advantage of the digital platform. For example, graph elements (e.g., axis labels) are animated in stages, fading into view as they are described throughout the audio to help focus the learner. Shapes, such as circles or arrows, may also be added to figures to highlight specific elements when emphasized in the audio. To assist with flow and organization, the editorial team may use additional slides or flow charts. For ","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12573","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43923249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Item Selection Algorithm Based on Collaborative Filtering for Item Exposure Control 基于协同过滤的项目暴露控制项目选择算法
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-08-29 DOI: 10.1111/emip.12578
Yiqin Pan, Oren Livne, James A. Wollack, Sandip Sinharay

In computerized adaptive testing, overexposure of items in the bank is a serious problem and might result in item compromise. We develop an item selection algorithm that utilizes the entire bank well and reduces the overexposure of items. The algorithm is based on collaborative filtering and selects an item in two stages. In the first stage, a set of candidate items whose expected performance matches the examinee's current performance is selected. In the second stage, an item that is approximately matched to the examinee's observed performance is selected from the candidate set. The expected performance of an examinee on an item is predicted by autoencoders. Experiment results show that the proposed algorithm outperforms existing item selection algorithms in terms of item exposure while incurring only a small loss in measurement precision.

在计算机自适应测试中,银行项目的过度暴露是一个严重的问题,可能会导致项目泄露。我们开发了一种项目选择算法,可以很好地利用整个银行,减少项目的过度暴露。该算法基于协同过滤,分两个阶段选择一个项目。在第一阶段,选择一组预期成绩与考生当前成绩相匹配的候选项目。在第二阶段,从候选集合中选择与考生观察到的表现大致匹配的项目。考生在某一项目上的预期表现是由自动编码器预测的。实验结果表明,所提出的算法在项目暴露方面优于现有的项目选择算法,同时在测量精度方面只产生较小的损失。
{"title":"Item Selection Algorithm Based on Collaborative Filtering for Item Exposure Control","authors":"Yiqin Pan,&nbsp;Oren Livne,&nbsp;James A. Wollack,&nbsp;Sandip Sinharay","doi":"10.1111/emip.12578","DOIUrl":"10.1111/emip.12578","url":null,"abstract":"<p>In computerized adaptive testing, overexposure of items in the bank is a serious problem and might result in item compromise. We develop an item selection algorithm that utilizes the entire bank well and reduces the overexposure of items. The algorithm is based on collaborative filtering and selects an item in two stages. In the first stage, a set of candidate items whose expected performance matches the examinee's current performance is selected. In the second stage, an item that is approximately matched to the examinee's observed performance is selected from the candidate set. The expected performance of an examinee on an item is predicted by autoencoders. Experiment results show that the proposed algorithm outperforms existing item selection algorithms in terms of item exposure while incurring only a small loss in measurement precision.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42948381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measurement Efficiency for Technology-Enhanced and Multiple-Choice Items in a K–12 Mathematics Accountability Assessment K-12数学问责性评估中技术增强和多项选择项目的测量效率
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-08-25 DOI: 10.1111/emip.12580
Ozge Ersan, Yufeng Berry

The increasing use of computerization in the testing industry and the need for items potentially measuring higher-order skills have led educational measurement communities to develop technology-enhanced (TE) items and conduct validity studies on the use of TE items. Parallel to this goal, the purpose of this study was to collect validity evidence comparing item information functions, expected information values, and measurement efficiencies (item information per time unit) between multiple-choice (MC) and technology-enhanced (TE) items. The data came from K–12 mathematics large-scale accountability assessments. The study results were mainly interpreted descriptively, and the presence of specific patterns between MC and TE items was examined across grades and depth of knowledge levels. Although many earlier researchers pointed out that TE items were not as efficient as MC items, the results from the study point to ways that TE items might provide more information and were more than or equally efficient as MC items overall.

随着测试行业越来越多地使用计算机,以及对可能测量高阶技能的项目的需求,教育测量社区开发了技术增强(TE)项目,并对TE项目的使用进行了有效性研究。与此目标平行,本研究的目的是收集有效性证据,比较多项选择题(MC)和技术增强题(TE)之间的项目信息功能、预期信息值和测量效率(每时间单位的项目信息)。数据来自K-12数学大规模问责评估。研究结果主要以描述性解释为主,并在不同年级和不同知识深度的学生中考察了MC和TE项目之间存在的特定模式。尽管许多早期的研究者指出,电子教学项目不如MC项目有效,但研究结果表明,电子教学项目可能提供更多的信息,并且总体上比MC项目更有效或同样有效。
{"title":"Measurement Efficiency for Technology-Enhanced and Multiple-Choice Items in a K–12 Mathematics Accountability Assessment","authors":"Ozge Ersan,&nbsp;Yufeng Berry","doi":"10.1111/emip.12580","DOIUrl":"10.1111/emip.12580","url":null,"abstract":"<p>The increasing use of computerization in the testing industry and the need for items potentially measuring higher-order skills have led educational measurement communities to develop technology-enhanced (TE) items and conduct validity studies on the use of TE items. Parallel to this goal, the purpose of this study was to collect validity evidence comparing item information functions, expected information values, and measurement efficiencies (item information per time unit) between multiple-choice (MC) and technology-enhanced (TE) items. The data came from K–12 mathematics large-scale accountability assessments. The study results were mainly interpreted descriptively, and the presence of specific patterns between MC and TE items was examined across grades and depth of knowledge levels. Although many earlier researchers pointed out that TE items were not as efficient as MC items, the results from the study point to ways that TE items might provide more information and were more than or equally efficient as MC items overall.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41782558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighing the Value of Complex Growth Estimation Methods to Evaluate Individual Student Response to Instruction 权衡复杂成长评估方法的价值,以评估个别学生对教学的反应
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-08-24 DOI: 10.1111/emip.12579
Ethan R. Van Norman

Sophisticated analytic strategies have been proposed as viable methods to improve the quantification of student improvement and to assist educators in making treatment decisions. The performance of three categories of latent growth modeling techniques (linear, quadratic, and dual change) to capture growth in oral reading fluency in response to a 12-week structured supplemental reading intervention among 280 grade three students at-risk for learning disabilities were compared. Although the most complex approach (dual-change) yielded the best model fit indices, there were few practical differences between predicted values from simpler linear models. A discussion to carefully consider the relative benefits and appropriateness of increasingly complex growth modeling strategies to evaluate individual student responses to intervention is offered.

复杂的分析策略已被提出作为可行的方法,以改善学生进步的量化,并协助教育工作者作出治疗决定。对280名有学习障碍风险的三年级学生进行了为期12周的结构化补充阅读干预,比较了三种潜在增长建模技术(线性、二次和双重变化)的表现,以捕捉口语阅读流畅性的增长。虽然最复杂的方法(双重变化)产生了最好的模型拟合指数,但简单线性模型的预测值之间几乎没有实际差异。仔细考虑的相对利益和适当的日益复杂的成长建模策略,以评估个别学生对干预的反应提供了讨论。
{"title":"Weighing the Value of Complex Growth Estimation Methods to Evaluate Individual Student Response to Instruction","authors":"Ethan R. Van Norman","doi":"10.1111/emip.12579","DOIUrl":"10.1111/emip.12579","url":null,"abstract":"<p>Sophisticated analytic strategies have been proposed as viable methods to improve the quantification of student improvement and to assist educators in making treatment decisions. The performance of three categories of latent growth modeling techniques (linear, quadratic, and dual change) to capture growth in oral reading fluency in response to a 12-week structured supplemental reading intervention among 280 grade three students at-risk for learning disabilities were compared. Although the most complex approach (dual-change) yielded the best model fit indices, there were few practical differences between predicted values from simpler linear models. A discussion to carefully consider the relative benefits and appropriateness of increasingly complex growth modeling strategies to evaluate individual student responses to intervention is offered.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12579","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47957423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does It Matter How the Rigor of High School Coursework Is Measured? Gaps in Coursework Among Students and Across Grades 如何衡量高中课程的严谨性重要吗?学生之间和年级之间的课程差距
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-08-23 DOI: 10.1111/emip.12577
Burhan Ogut, Darrick Yee, Ruhan Circi, Nevin Dizdari

Researchshows that the intensity of high school course-taking is related to postsecondary outcomes. However, there are various approaches to measuring the intensity of students’ course-taking. This study presents new measures of coursework intensity that rely on differing levels of quantity and quality of coursework. We used these new indices to provide a current description of variations in high school course-taking across grades and student subgroups using a nationally representative dataset, the High School Longitudinal Study of 2009. Results showed that for measures emphasizing the quality of coursework the gaps in coursework among underserved students were larger and there was less upward movement in rigor across grades.

研究表明,高中课程的强度与中学后的成绩有关。然而,有各种方法可以衡量学生的课程强度。这项研究提出了新的衡量作业强度的方法,这些方法依赖于不同水平的作业数量和质量。我们使用这些新的指数,使用一个具有全国代表性的数据集,即2009年的高中纵向研究,来提供当前高中课程选择在不同年级和学生亚组之间的变化描述。结果显示,在强调课程质量的措施中,服务不足的学生的课程差距更大,各年级的严谨性上升幅度较小。
{"title":"Does It Matter How the Rigor of High School Coursework Is Measured? Gaps in Coursework Among Students and Across Grades","authors":"Burhan Ogut,&nbsp;Darrick Yee,&nbsp;Ruhan Circi,&nbsp;Nevin Dizdari","doi":"10.1111/emip.12577","DOIUrl":"10.1111/emip.12577","url":null,"abstract":"<p>Research\u0000shows that the intensity of high school course-taking is related to postsecondary outcomes. However, there are various approaches to measuring the intensity of students’ course-taking. This study presents new measures of coursework intensity that rely on differing levels of quantity and quality of coursework. We used these new indices to provide a current description of variations in high school course-taking across grades and student subgroups using a nationally representative dataset, the High School Longitudinal Study of 2009. Results showed that for measures emphasizing the quality of coursework the gaps in coursework among underserved students were larger and there was less upward movement in rigor across grades.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43299184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploration of Latent Structure in Test Revision and Review Log Data 测试校核测井资料中的潜在结构探讨
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-08-14 DOI: 10.1111/emip.12576
Susu Zhang, Anqi Li, Shiyu Wang

In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and instructions. In the current study, we used recently proposed statistical learning methods for sequence data to provide an exploratory analysis of item-level revision and review log data. Based on the revision log data collected from computer-based classroom assessments, common prototypes of revisit and review behavior were identified. The relationship between revision behavior and various item, test, and individual covariates was further explored under a Bayesian multivariate generalized linear mixed model.

在允许修改和复习的计算机测试中,考生的访问顺序和对问题的回答更改可以被记录下来。可变长度的复习日志数据给收集的数据带来了新的复杂性,但同时也提供了有关考生考试行为的额外信息,可以为考试开发和指导提供信息。在当前的研究中,我们使用了最近提出的序列数据的统计学习方法,对项目级别的修订和回顾日志数据进行了探索性分析。基于从基于计算机的课堂评估中收集的复习日志数据,确定了重访和复习行为的常见原型。在贝叶斯多元广义线性混合模型下,进一步探讨复习行为与各项目、测验和个体协变量之间的关系。
{"title":"Exploration of Latent Structure in Test Revision and Review Log Data","authors":"Susu Zhang,&nbsp;Anqi Li,&nbsp;Shiyu Wang","doi":"10.1111/emip.12576","DOIUrl":"10.1111/emip.12576","url":null,"abstract":"<p>In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and instructions. In the current study, we used recently proposed statistical learning methods for sequence data to provide an exploratory analysis of item-level revision and review log data. Based on the revision log data collected from computer-based classroom assessments, common prototypes of revisit and review behavior were identified. The relationship between revision behavior and various item, test, and individual covariates was further explored under a Bayesian multivariate generalized linear mixed model.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12576","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45210991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying a Mixture Rasch Model-Based Approach to Standard Setting 应用混合Rasch模型为基础的方法来制定标准
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-07-17 DOI: 10.1111/emip.12571
Michael R. Peabody, Timothy J. Muckle, Yu Meng

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional standard-setting methods. We found that heterogeneity of the sample is clearly necessary for the mixture Rasch model approach to standard setting to be useful. While possibly not sufficient to determine passing standards on their own, there may be value in these data-driven models for providing additional validity evidence to support decision-making bodies entrusted with establishing cut scores. They may also provide a useful tool for evaluating existing cut scores and determining if they continue to be supported or if a new study is warranted.

标准制定的主观方面经常受到批评,但数据驱动的标准制定方法很少应用。因此,我们采用混合Rasch模型方法在不同规模的几个测试项目中设定性能标准,并将结果与传统标准制定方法得出的现有合格标准进行比较。我们发现,样品的异质性显然是必要的混合拉希模型方法的标准设置是有用的。虽然这些数据驱动的模型本身可能不足以确定通过标准,但它们可能有价值,可以提供额外的有效性证据,以支持受托建立cut分数的决策机构。他们也可以提供一个有用的工具来评估现有的削减分数,并确定是否继续支持他们,或者是否有必要进行新的研究。
{"title":"Applying a Mixture Rasch Model-Based Approach to Standard Setting","authors":"Michael R. Peabody,&nbsp;Timothy J. Muckle,&nbsp;Yu Meng","doi":"10.1111/emip.12571","DOIUrl":"10.1111/emip.12571","url":null,"abstract":"<p>The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional standard-setting methods. We found that heterogeneity of the sample is clearly necessary for the mixture Rasch model approach to standard setting to be useful. While possibly not sufficient to determine passing standards on their own, there may be value in these data-driven models for providing additional validity evidence to support decision-making bodies entrusted with establishing cut scores. They may also provide a useful tool for evaluating existing cut scores and determining if they continue to be supported or if a new study is warranted.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43146823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Do Subject Matter Experts’ Judgments of Multiple-Choice Format Suitability Predict Item Quality? 主题专家对多选格式适用性的判断能预测项目质量吗?
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-07-11 DOI: 10.1111/emip.12570
Rebecca F. Berenbon, Bridget C. McHugh

To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ suitability for a given content standard and the associated item characteristics. Prior to item writing, we surveyed SMEs on MCQ suitability for each content standard. Following field testing, we then used SMEs’ average ratings for each content standard to predict item characteristics for the tests. We analyzed multilevel models predicting item difficulty (p value), discrimination, and nonfunctioning distractor presence. Items were nested within courses and content standards. There was a curvilinear relationship between SMEs’ ratings and item difficulty such that very low MCQ suitability ratings were predictive of easier items. After controlling for item difficulty, items with higher MCQ suitability ratings had higher discrimination and were less likely to have one or more nonfunctioning distractors. This research has practical implications for optimizing test blueprints. Additionally, psychometricians may use these ratings to better prepare for coaching SMEs during item writing.

为了编写高质量的测试,心理测量学家依靠主题专家(sme)来编写高质量的项目。然而,中小企业通常没有机会就最适合选择题的内容标准提供意见。在本研究中,我们探讨了感知MCQ对给定内容标准的适应性与相关项目特征之间的关系。在撰写项目之前,我们调查了中小企业对每个内容标准的MCQ适用性。在现场测试之后,我们使用中小企业对每个内容标准的平均评分来预测测试的项目特征。我们分析了预测项目难度(p值)、歧视和无功能干扰物存在的多级模型。项被嵌套在课程和内容标准中。中小企业的评级与项目难度之间存在曲线关系,因此非常低的MCQ适宜性评级可以预测较容易的项目。在控制了项目难度后,MCQ适宜性评级较高的项目具有更高的歧视,并且不太可能有一个或多个不起作用的干扰物。本研究对优化测试蓝图具有实际意义。此外,心理测量学家可以使用这些评分来更好地准备在项目写作期间指导中小企业。
{"title":"Do Subject Matter Experts’ Judgments of Multiple-Choice Format Suitability Predict Item Quality?","authors":"Rebecca F. Berenbon,&nbsp;Bridget C. McHugh","doi":"10.1111/emip.12570","DOIUrl":"10.1111/emip.12570","url":null,"abstract":"<p>To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ suitability for a given content standard and the associated item characteristics. Prior to item writing, we surveyed SMEs on MCQ suitability for each content standard. Following field testing, we then used SMEs’ average ratings for each content standard to predict item characteristics for the tests. We analyzed multilevel models predicting item difficulty (<i>p</i> value), discrimination, and nonfunctioning distractor presence. Items were nested within courses and content standards. There was a curvilinear relationship between SMEs’ ratings and item difficulty such that very low MCQ suitability ratings were predictive of easier items. After controlling for item difficulty, items with higher MCQ suitability ratings had higher discrimination and were less likely to have one or more nonfunctioning distractors. This research has practical implications for optimizing test blueprints. Additionally, psychometricians may use these ratings to better prepare for coaching SMEs during item writing.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12570","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46840085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Defining Test-Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument 定义考试成绩的解释、使用和主张:有效性论证的德尔菲研究
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-06-27 DOI: 10.1111/emip.12569
Timothy D. Folger, Jonathan Bostic, Erin E. Krupa

Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test-score interpretation, test-score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in validity and validation conceptualize test-score interpretation, use, and claims. Definitions were developed through multiple iterations of data collection and analysis. By clarifying the language used when conducting validation, validation may be more accessible to a broader audience, including but not limited to test developers, test users, and test consumers.

有效性是测试开发和测试评估的基本考虑因素。本研究的目的是定义和明确效度和效度的三个关键方面,即考试成绩解释、考试成绩使用和支持解释和使用的主张。本研究采用德尔菲法探讨专家如何在效度和验证概念化考试成绩的解释,使用和主张。定义是通过数据收集和分析的多次迭代制定的。通过澄清在进行验证时使用的语言,可以使更广泛的受众更容易接受验证,包括但不限于测试开发人员、测试用户和测试消费者。
{"title":"Defining Test-Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument","authors":"Timothy D. Folger,&nbsp;Jonathan Bostic,&nbsp;Erin E. Krupa","doi":"10.1111/emip.12569","DOIUrl":"10.1111/emip.12569","url":null,"abstract":"<p>Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test-score interpretation, test-score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in validity and validation conceptualize test-score interpretation, use, and claims. Definitions were developed through multiple iterations of data collection and analysis. By clarifying the language used when conducting validation, validation may be more accessible to a broader audience, including but not limited to test developers, test users, and test consumers.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12569","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44066378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hierarchical Agglomerative Clustering to Detect Test Collusion on Computer-Based Tests 基于层次聚集聚类的计算机测试共谋检测
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2023-06-19 DOI: 10.1111/emip.12568
Soo Jeong Ingrisone, James N. Ingrisone

There has been a growing interest in approaches based on machine learning (ML) for detecting test collusion as an alternative to the traditional methods. Clustering analysis under an unsupervised learning technique appears especially promising to detect group collusion. In this study, the effectiveness of hierarchical agglomerative clustering (HAC) for detecting aberrant test takers on Computer-Based Testing (CBT) is explored. Random forest ensembles are used to evaluate the accuracy of the clustering and find the important features to classify the aberrant test takers. Testing data from a certification exam is used. The level of overlap between the exact response matches on incorrectly keyed items in the exam preparation material and HAC are compared. Integrating HAC as an investigation mean is promising in this field to improve the accuracy of classification of aberrant test takers.

人们越来越关注基于机器学习(ML)的方法来检测测试合谋,作为传统方法的替代方案。在无监督学习技术下的聚类分析在检测群体合谋方面显得特别有前景。本研究探讨了层次凝聚聚类(HAC)在计算机测试(CBT)中检测异常考生的有效性。使用随机森林集合来评估聚类的准确性,并找到对异常考生进行分类的重要特征。使用来自认证考试的测试数据。在考试准备材料和HAC中错误的关键问题的准确回答匹配之间的重叠程度进行比较。将HAC作为一种调查手段,在提高异常考生分类的准确性方面具有广阔的应用前景。
{"title":"Hierarchical Agglomerative Clustering to Detect Test Collusion on Computer-Based Tests","authors":"Soo Jeong Ingrisone,&nbsp;James N. Ingrisone","doi":"10.1111/emip.12568","DOIUrl":"10.1111/emip.12568","url":null,"abstract":"<p>There has been a growing interest in approaches based on machine learning (ML) for detecting test collusion as an alternative to the traditional methods. Clustering analysis under an unsupervised learning technique appears especially promising to detect group collusion. In this study, the effectiveness of hierarchical agglomerative clustering (HAC) for detecting aberrant test takers on Computer-Based Testing (CBT) is explored. Random forest ensembles are used to evaluate the accuracy of the clustering and find the important features to classify the aberrant test takers. Testing data from a certification exam is used. The level of overlap between the exact response matches on incorrectly keyed items in the exam preparation material and HAC are compared. Integrating HAC as an investigation mean is promising in this field to improve the accuracy of classification of aberrant test takers.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49103459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational Measurement-Issues and Practice
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1