首页 > 最新文献

Educational Measurement-Issues and Practice最新文献

英文 中文
AI: Can You Help Address This Issue? AI:你能帮我解决这个问题吗?
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-10 DOI: 10.1111/emip.12655
Deborah J. Harris
<p>Linking across test forms or pools of items is necessary to ensure scores that are reported across different administrations are comparable and lead to consistent decisions for examinees whose abilities are the same, but who were administered different items. Most of these linkages consist of equating test forms or scaling calibrated items or pools to be on the same theta scale. The typical methodology to accomplish this linking makes use of common examinees or common items, where common examinees are understood to be groups of examinees of comparable ability, whether obtained through a single group (where the same examinees are administered multiple assessments) or a random groups design, where random assignment or pseudo random assignment is done (such as spiraling the test forms, say 1, 2, 3, 4, 5, and distributing them such that every 5th examinee receives the same form). Common item methodology is usually implemented by having identical items in multiple forms and using those items to link across forms or pools. These common items may be scored or unscored in terms of whether they are treated as internal or external anchors (i.e., whether they are contributing to the examinee's score).</p><p>There are situations where it is not practical to have either common examinees nor common items. Typically, these are high-stakes settings, where the security of the assessment questions would likely be at risk if any were repeated. This would include scenarios where the entire assessment is released after administration to promote transparency. In some countries, a single form of a national test may be administered to all examinees during a single administration time. While in some cases a student who does not do as well as they had hoped may retest the following year, this may be a small sample and these students would not be considered representative of the entire body of test-takers. In addition, it is presumed they would have spent the intervening year studying for the exam, and so they could not really be considered common examinees across years and assessment forms.</p><p>Although the decisions (such as university admissions) based on the assessment scores are comparable within the year, because all examinees are administered the same set of items on the same date, it is difficult to monitor trends over time as there is no linkage between forms across years. Although the general populations may be similar (e.g., 2024 secondary school graduates versus 2023 secondary school graduates), there is no evidence that the groups are strictly equivalent across years. Similarly, comparing how examinees perform across years (e.g., highest scores, average raw score, and so on) is challenging as there is no adjustment for yearly fluctuations in form difficulty across years.</p><p>There have been variations of both common item and common examinee linking, such as using similar items, rather than identical items, including where perhaps these similar items are
为了确保不同部门报告的分数具有可比性,并为那些能力相同但参加不同项目的考生做出一致的决定,有必要在考试表格或项目池之间建立联系。大多数这些联系包括相等的测试形式或缩放校准项目或池,使其处于相同的θ刻度上。实现这种联系的典型方法是利用普通考生或普通项目,其中普通考生被理解为具有相当能力的考生群体,无论是通过单一群体(同一考生接受多次评估)还是随机群体设计获得的,其中随机分配或伪随机分配(例如螺旋形测试表格,例如1,2,3,4,5)。并将其分发给每5个考生中就有1个收到相同的表格)。公共项方法通常通过在多个表单中拥有相同的项并使用这些项跨表单或池进行链接来实现。这些常见的题目可以根据它们是内部锚还是外部锚(即它们是否对考生的分数有贡献)来评分或不评分。在某些情况下,既要有共同的考生,也要有共同的项目,这是不现实的。通常,这些都是高风险的设置,如果有任何问题被重复,评估问题的安全性可能会受到威胁。这将包括在行政管理后公布整个评估以提高透明度的情况。在一些国家,单一形式的国家考试可以在一次管理时间内对所有考生进行管理。虽然在某些情况下,表现不如预期的学生可能会在第二年重新参加考试,但这可能是一个小样本,这些学生不会被认为是整个考生群体的代表。此外,据推测,他们将花一年的时间为考试学习,因此他们不能真正被视为跨年和评估形式的普通考生。尽管基于评估分数的决定(如大学录取)在一年内具有可比性,但由于所有考生在同一天接受相同的考试,因此很难监测一段时间内的趋势,因为不同年份的表格之间没有联系。虽然一般人群可能相似(例如,2024年中学毕业生与2023年中学毕业生),但没有证据表明这些群体在各年之间是严格相等的。同样,比较不同年份考生的表现(例如,最高分数、平均原始分数等)也很有挑战性,因为不同年份的考试难度每年的波动都没有调整。常见题型和常见考生题型的链接都有变化,比如使用相似题型,而不是相同题型,包括这些相似题型可能是彼此的克隆,以及使用各种类型的匹配技术,试图通过在表格中创建相同的子组来实现常见考生。克隆项目或从模板生成项目在创建相同难度的项目方面取得了一些成功。但是,作为已发布项目的克隆的项目是否仍能保持其完整性和属性,足以作为链接项目,则需要进行研究。我和其他许多人都参与了几项研究,试图在既没有共同项目也没有共同考生的情况下实现可比性的连接。这个简短的部分提供了一些研究的一瞥。Harris和Fang(2015)考虑了多种选择,以解决在没有共同项目和没有共同考生的情况下,跨年比较评估分数的问题。其中两个选项涉及假设,其他选项涉及调整。在第一个例子中,假设不同年份的考生群体是相等的。这仅仅是一种假设,没有确凿的证据证明这种假设是合理的。一旦有了这样的假设,研究人员就随机选取了一组年龄相等的人群。第二种方法是假设测试表格在难度上是相同的,同样没有证据表明情况确实如此。因为假设测试形式在难度上是相等的,所以相等是不必要的(跨测试形式的回忆相等只调整形式难度的微小差异;如果难度没有差别,就不需要等号),不同年份不同形式的分数可以直接比较。第三种选择是将考生分组,希望能模拟出每一种形式的同等分组,然后进行随机分组。 其中一个子组选项是通过使用每年考生分数分布的中间80%来创建的,另一个是使用考生提供的自我报告信息,比如他们修过的课程和获得的课程成绩,来创建跨年的可比子组。Huh等人(2016年,2017年)在Harris和Fang(2015年)的基础上进行了扩展,再次研究了在项目不能重新管理的情况下使用等效方法的替代方法,并且不能假设接受不同考试形式的考生群体是等效群体。作者将他们研究的方法称为“伪相等”,因为使用了相等方法,但与实际相等相关的数据假设,例如真正随机相等的考生组或共同项目,都没有。其中包括哈里斯和方研究的两种基于假设的方法,以及调整考生群体的两种方法。假设这两种表格是按照相同的难度规格构建的,并且假设两个考生的样本是相同的,那么做出调整以试图形成可比较的子组是行不通的。Harris和Fang使用的两种调整被复制了:每个考生分布的中间80%被使用,同样的基础是,可能考生组在尾部的差异比在中心的差异更大,并且根据额外的信息匹配组分布。当采用经典测试理论等百分位后平滑等式方法时,根据自述科目成绩和课外活动等重要变量,对次年组的分数分布进行加权调整,以匹配初始组的分数分布。当使用IRT真实分数相等方法时,通过匹配相关变量定义的每个层内的比例,为两组创建可比较的样本进行校准。一般来说,尝试匹配群组比简单地假设群组或形成难度相等效果更好。Wu等人(2017)也试图在两个沮丧的考生群体中创建匹配的样本分布,结果相似。Kim和Walker(2021)研究了使用子组权重来创建具有相似能力的组,并增加了少量常见项目。倾向得分加权,包括诸如粗化精确匹配的变体,也被用于创建等效的考生组(例如,参见Cho等人,2024;Kapoor等人,2024;Li et al., 2024;Woods等人,2024年,他们都着眼于在模式研究的背景下创建等效的考生样本,其中一组考生在设备上测试,另一组在纸上测试)。倾向得分是“给定观察到的协变量向量,分配给特定治疗的条件概率”(罗森鲍姆&amp;Rubin, 1983,第41页)。在我们的场景中,这意味着考生在一种评估表上测试,而不是在另一种评估表上测试。实现倾向得分匹配的一个关键步骤是确定要包括的协变量。在我们的场景中,这些变量将适用于两种人群,并与感兴趣的变量适当相关,最终得到相同的考生样本,接受不同的测试形式,从而允许我们适当地在两种形式之间进行随机分组。例如,所选数学课的数量、所选特定数学课的名称、单个数学课的成绩、数学课的总体平均成绩等等都是可能包含的协变量。从考生那里收集什么样的数据,以及在什么样的粒度级别上需要做出决定。关于变量的数据是自我报告还是来自可信来源(例如,自我报告的课程成绩与成绩单数据)也需要考虑。需要包括多少协变量、是否需要精确匹配、如何处理缺失数据、决定使用哪种匹配算法等等,这些都是需要做出的进一步决策。克隆项目,从模板中生成项目,或从后续表单中找到“匹配项目”的其他方法,以替换为普通项目与想要链接到的表单,也已经在各种设置中进行了研究。在我们的场景中,问题是影响正在使用的公共项目等价方法的假设的项目的特征是否与它所替代的项目“匹配”。项目内容和项目位置应该相当容易评估。然而,诸如IRT参数和经典难度和辨析等项目统计数据将使用接受后续表格的考生的回答来计算,因此不能直接比较。 考生群体也被划分为两个子群体,当两个群体在不同的年份进行测试时(例如,2024年中学毕业生与2023年中学毕业生),两个子群体在能力、人口统计学、样本量等方面的构成上显示出合理的差异。AI的任务将是调整奇数形态的分数,使其与偶数形态的分数相当。这可以通过估计偶数形式量表上奇数形式项目的项目特征并进行IRT等同于获得可比分数来完成。或者从管理单独表格的子组中创建等效样本,并运行一个随机组,或对项目,考生或两者的组合进行一些其他调整。因为所有的考生实际上都被“管理”了单双题,因为所有的项目都是一起管理的,可以一起校准,所以有多种方法可以创建评估人工智能解决方案的标准。(就我个
{"title":"AI: Can You Help Address This Issue?","authors":"Deborah J. Harris","doi":"10.1111/emip.12655","DOIUrl":"https://doi.org/10.1111/emip.12655","url":null,"abstract":"&lt;p&gt;Linking across test forms or pools of items is necessary to ensure scores that are reported across different administrations are comparable and lead to consistent decisions for examinees whose abilities are the same, but who were administered different items. Most of these linkages consist of equating test forms or scaling calibrated items or pools to be on the same theta scale. The typical methodology to accomplish this linking makes use of common examinees or common items, where common examinees are understood to be groups of examinees of comparable ability, whether obtained through a single group (where the same examinees are administered multiple assessments) or a random groups design, where random assignment or pseudo random assignment is done (such as spiraling the test forms, say 1, 2, 3, 4, 5, and distributing them such that every 5th examinee receives the same form). Common item methodology is usually implemented by having identical items in multiple forms and using those items to link across forms or pools. These common items may be scored or unscored in terms of whether they are treated as internal or external anchors (i.e., whether they are contributing to the examinee's score).&lt;/p&gt;&lt;p&gt;There are situations where it is not practical to have either common examinees nor common items. Typically, these are high-stakes settings, where the security of the assessment questions would likely be at risk if any were repeated. This would include scenarios where the entire assessment is released after administration to promote transparency. In some countries, a single form of a national test may be administered to all examinees during a single administration time. While in some cases a student who does not do as well as they had hoped may retest the following year, this may be a small sample and these students would not be considered representative of the entire body of test-takers. In addition, it is presumed they would have spent the intervening year studying for the exam, and so they could not really be considered common examinees across years and assessment forms.&lt;/p&gt;&lt;p&gt;Although the decisions (such as university admissions) based on the assessment scores are comparable within the year, because all examinees are administered the same set of items on the same date, it is difficult to monitor trends over time as there is no linkage between forms across years. Although the general populations may be similar (e.g., 2024 secondary school graduates versus 2023 secondary school graduates), there is no evidence that the groups are strictly equivalent across years. Similarly, comparing how examinees perform across years (e.g., highest scores, average raw score, and so on) is challenging as there is no adjustment for yearly fluctuations in form difficulty across years.&lt;/p&gt;&lt;p&gt;There have been variations of both common item and common examinee linking, such as using similar items, rather than identical items, including where perhaps these similar items are","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"9-12"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolving Educational Testing to Meet Students’ Needs: Design-in-Real-Time Assessment 不断发展的教育测试以满足学生的需求:设计实时评估
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-10 DOI: 10.1111/emip.12653
Stephen G. Sireci, Javier Suárez-Álvarez, April L. Zenisky, Maria Elena Oliveri

The goal in personalized assessment is to best fit the needs of each individual test taker, given the assessment purposes. Design-In-Real-Time (DIRTy) assessment reflects the progressive evolution in testing from a single test, to an adaptive test, to an adaptive assessment system. In this article, we lay the foundation for DIRTy assessment and illustrate how it meets the complex needs of each individual learner. The assessment framework incorporates culturally responsive assessment principles, thus making it innovative with respect to both technology and equity. Key aspects are (a) assessment building blocks called “assessment task modules” (ATMs) linked to multiple content standards and skill domains, (b) gathering information on test takers’ characteristics and preferences and using this information to improve their testing experience, and (c) selecting, modifying, and compiling ATMs to create a personalized test that best meets the needs of the testing purpose and individual test taker.

个性化评估的目标是根据评估目的,最好地满足每个考生的需求。实时设计(DIRTy)评估反映了测试从单一测试到自适应测试,再到自适应评估系统的逐步演变。在本文中,我们为DIRTy评估奠定了基础,并说明了它如何满足每个学习者的复杂需求。评估框架纳入了对文化有反应的评估原则,从而使其在技术和平等方面具有创新性。关键方面是(a)与多个内容标准和技能领域相关联的称为“评估任务模块”(atm)的评估构建块;(b)收集有关考生特征和偏好的信息,并利用这些信息改善他们的考试体验;(c)选择、修改和编译atm,以创建最能满足考试目的和考生个人需求的个性化考试。
{"title":"Evolving Educational Testing to Meet Students’ Needs: Design-in-Real-Time Assessment","authors":"Stephen G. Sireci,&nbsp;Javier Suárez-Álvarez,&nbsp;April L. Zenisky,&nbsp;Maria Elena Oliveri","doi":"10.1111/emip.12653","DOIUrl":"https://doi.org/10.1111/emip.12653","url":null,"abstract":"<p>The goal in personalized assessment is to best fit the needs of each individual test taker, given the assessment purposes. Design-In-Real-Time (DIRTy) assessment reflects the progressive evolution in testing from a single test, to an adaptive test, to an adaptive assessment <i>system</i>. In this article, we lay the foundation for DIRTy assessment and illustrate how it meets the complex needs of each individual learner. The assessment framework incorporates culturally responsive assessment principles, thus making it innovative with respect to both technology and equity. Key aspects are (a) assessment building blocks called “assessment task modules” (ATMs) linked to multiple content standards and skill domains, (b) gathering information on test takers’ characteristics and preferences and using this information to improve their testing experience, and (c) selecting, modifying, and compiling ATMs to create a personalized test that best meets the needs of the testing purpose and individual test taker.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"112-118"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative Analysis of Psychometric Frameworks and Properties of Scores from Autogenerated Test Forms 自动生成测试表格的心理测量框架和分数性质的比较分析
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-10 DOI: 10.1111/emip.12648
Won-Chan Lee, Stella Y. Kim

This paper explores the psychometric properties of scores derived from autogenerated test forms by introducing three conceptual frameworks: Alternate Test Forms, Randomly Parallel Forms, and Approximately Parallel Forms. Each framework provides a distinct perspective on score comparability, definitions of true score and standard error of measurement (SEM), and the necessity of equating. Through a simulation study, we illustrate how these frameworks compare in terms of true scores and SEMs, while also assessing the impact of equating on score comparability across varying levels of form variability. Ultimately, this study seeks to lay the groundwork for implementing scoring practices in large-scale standardized assessments that use autogenerated forms.

本文通过引入三个概念框架:交替测验形式、随机平行测验形式和近似平行测验形式,探讨了自动生成测验形式所得分数的心理测量特性。每个框架对分数的可比性、真实分数和测量标准误差(SEM)的定义以及相等的必要性提供了不同的视角。通过模拟研究,我们说明了这些框架如何在真实分数和sem方面进行比较,同时还评估了在不同形式可变性水平上相等对分数可比性的影响。最终,本研究旨在为在使用自动生成表格的大规模标准化评估中实施评分实践奠定基础。
{"title":"Comparative Analysis of Psychometric Frameworks and Properties of Scores from Autogenerated Test Forms","authors":"Won-Chan Lee,&nbsp;Stella Y. Kim","doi":"10.1111/emip.12648","DOIUrl":"https://doi.org/10.1111/emip.12648","url":null,"abstract":"<p>This paper explores the psychometric properties of scores derived from autogenerated test forms by introducing three conceptual frameworks: Alternate Test Forms, Randomly Parallel Forms, and Approximately Parallel Forms. Each framework provides a distinct perspective on score comparability, definitions of true score and standard error of measurement (SEM), and the necessity of equating. Through a simulation study, we illustrate how these frameworks compare in terms of true scores and SEMs, while also assessing the impact of equating on score comparability across varying levels of form variability. Ultimately, this study seeks to lay the groundwork for implementing scoring practices in large-scale standardized assessments that use autogenerated forms.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"13-23"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12648","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking Unlinkable Tests: A Step Forward 链接不可链接的测试:向前迈进了一步
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-10 DOI: 10.1111/emip.12638
Silvia Testa, Renato Miceli, Renato Miceli

Random Equating (RE) and Heuristic Approach (HA) are two linking procedures that may be used to compare the scores of individuals in two tests that measure the same latent trait, in conditions where there are no common items or individuals. In this study, RE—that may only be used when the individuals taking the two tests come from the same population—was used as a benchmark for evaluating HA, which, in contrast, does not require any distributional assumptions. The comparison was based on both simulated and empirical data. Simulations showed that HA was good at reproducing the link shift connecting the difficulty parameters of the two sets of items, performing similarly to RE under the condition of slight violation of the distributional assumption. Empirical results showed satisfactory correspondence between the estimates of item and person parameters obtained via the two procedures.

随机等式(RE)和启发式方法(HA)是两种联系程序,可用于在没有共同项目或个人的情况下比较测量相同潜在特征的两个测试中的个体得分。在这项研究中,re(只有当参加两项测试的个体来自同一人群时才可以使用)被用作评估HA的基准,相反,它不需要任何分布假设。对比基于模拟数据和经验数据。仿真结果表明,HA很好地再现了连接两组项目难度参数的链接移位,在轻微违反分布假设的情况下,其表现与RE相似。实证结果表明,通过这两种方法获得的项目参数估计值和人参数估计值之间具有令人满意的对应关系。
{"title":"Linking Unlinkable Tests: A Step Forward","authors":"Silvia Testa,&nbsp;Renato Miceli,&nbsp;Renato Miceli","doi":"10.1111/emip.12638","DOIUrl":"https://doi.org/10.1111/emip.12638","url":null,"abstract":"<p>Random Equating (RE) and Heuristic Approach (HA) are two linking procedures that may be used to compare the scores of individuals in two tests that measure the same latent trait, in conditions where there are no common items or individuals. In this study, RE—that may only be used when the individuals taking the two tests come from the same population—was used as a benchmark for evaluating HA, which, in contrast, does not require any distributional assumptions. The comparison was based on both simulated and empirical data. Simulations showed that HA was good at reproducing the link shift connecting the difficulty parameters of the two sets of items, performing similarly to RE under the condition of slight violation of the distributional assumption. Empirical results showed satisfactory correspondence between the estimates of item and person parameters obtained via the two procedures.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"66-72"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143424022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Mandated to Test-Optional College Admissions Testing: Where Do We Go from Here? 从强制性到可选性的大学入学考试:我们该何去何从?
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-10 DOI: 10.1111/emip.12649
Kyndra V. Middleton, Comfort H. Omonkhodion, Ernest Y. Amoateng, Lucy O. Okam, Daniela Cardoza, Alexis Oakley
{"title":"From Mandated to Test-Optional College Admissions Testing: Where Do We Go from Here?","authors":"Kyndra V. Middleton,&nbsp;Comfort H. Omonkhodion,&nbsp;Ernest Y. Amoateng,&nbsp;Lucy O. Okam,&nbsp;Daniela Cardoza,&nbsp;Alexis Oakley","doi":"10.1111/emip.12649","DOIUrl":"https://doi.org/10.1111/emip.12649","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"33-37"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Approaches to Controlling Item Position Effects in Computerized Adaptive Tests 计算机化自适应测试中控制项目位置效应的方法研究
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-10-27 DOI: 10.1111/emip.12637
Ye Ma, Deborah J. Harris

Item position effect (IPE) refers to situations where an item performs differently when it is administered in different positions on a test. The majority of previous research studies have focused on investigating IPE under linear testing. There is a lack of IPE research under adaptive testing. In addition, the existence of IPE might violate Item Response Theory (IRT)’s item parameter invariance assumption, which facilitates applications of IRT in various psychometric tasks such as computerized adaptive testing (CAT). Ignoring IPE might lead to issues such as inaccurate ability estimation in CAT. This article extends research on IPE by proposing and evaluating approaches to controlling position effects under an item-level computerized adaptive test via a simulation study. The results show that adjusting IPE via a pretesting design (approach 3) or a pool design (approach 4) results in better ability estimation accuracy compared to no adjustment (baseline approach) and item-level adjustment (approach 2). Practical implications of each approach as well as future research directions are discussed as well.

项目位置效应(IPE)是指当一个项目在测试中的不同位置进行管理时,其表现不同的情况。以往的研究大多集中在线性测试下的IPE研究。适应性测试下的IPE研究缺乏。此外,IPE的存在可能违反了项目反应理论(IRT)的项目参数不变性假设,这有利于IRT在计算机化自适应测试(CAT)等各种心理测量任务中的应用。忽略IPE可能会导致诸如在CAT中不准确的能力估计等问题。本文扩展了IPE的研究,通过模拟研究提出并评估了在项目级计算机自适应测试下控制位置效应的方法。结果表明,与不调整(基线法)和项目水平调整(方法2)相比,通过预测设计(方法3)或池设计(方法4)调整IPE的能力估计精度更高。并讨论了每种方法的实际意义以及未来的研究方向。
{"title":"Investigating Approaches to Controlling Item Position Effects in Computerized Adaptive Tests","authors":"Ye Ma,&nbsp;Deborah J. Harris","doi":"10.1111/emip.12637","DOIUrl":"https://doi.org/10.1111/emip.12637","url":null,"abstract":"<p>Item position effect (IPE) refers to situations where an item performs differently when it is administered in different positions on a test. The majority of previous research studies have focused on investigating IPE under linear testing. There is a lack of IPE research under adaptive testing. In addition, the existence of IPE might violate Item Response Theory (IRT)’s item parameter invariance assumption, which facilitates applications of IRT in various psychometric tasks such as computerized adaptive testing (CAT). Ignoring IPE might lead to issues such as inaccurate ability estimation in CAT. This article extends research on IPE by proposing and evaluating approaches to controlling position effects under an item-level computerized adaptive test via a simulation study. The results show that adjusting IPE via a pretesting design (approach 3) or a pool design (approach 4) results in better ability estimation accuracy compared to no adjustment (baseline approach) and item-level adjustment (approach 2). Practical implications of each approach as well as future research directions are discussed as well.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"44-54"},"PeriodicalIF":2.7,"publicationDate":"2024-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143424315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital Module 36: Applying Intersectionality Theory to Educational Measurement 数字模块 36:在教育测量中应用交叉性理论
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-10-09 DOI: 10.1111/emip.12622
Michael Russell

Module Abstract

Over the past decade, interest in applying Intersectionality Theory to quantitative analyses has grown. This module examines key concepts that form the foundation of Intersectionality Theory and considers challenges and opportunities these concepts present for quantitative methods. Two examples are presented to demonstrate how an intersectional approach to quantitative analyses differs from a traditional single-axis approach. The first example employs a linear regression technique to examine the efficacy of an educational intervention and to explore whether efficacy differs among subgroups of students. The second example compares findings when a differential item function analysis is conducted in a single-axis manner versus an intersectional lens. The module ends by exploring key considerations analysts and psychometricians encounter when applying Intersectionality Theory to a quantitative analysis.

模块摘要 在过去十年中,将交叉性理论应用于定量分析的兴趣与日俱增。本单元研究构成交叉性理论基础的关键概念,并探讨这些概念给定量方法带来的挑战和机遇。本单元将通过两个实例来说明交叉性定量分析方法与传统的单轴方法有何不同。第一个例子采用线性回归技术来研究教育干预的效果,并探讨不同学生子群体的效果是否不同。第二个例子比较了以单轴方式和交叉视角进行差异项目函数分析的结果。本模块最后探讨了分析师和心理测量学家在将交叉性理论应用于定量分析时会遇到的主要注意事项。
{"title":"Digital Module 36: Applying Intersectionality Theory to Educational Measurement","authors":"Michael Russell","doi":"10.1111/emip.12622","DOIUrl":"https://doi.org/10.1111/emip.12622","url":null,"abstract":"<div>\u0000 \u0000 <section>\u0000 \u0000 <h3> Module Abstract</h3>\u0000 \u0000 <p>Over the past decade, interest in applying Intersectionality Theory to quantitative analyses has grown. This module examines key concepts that form the foundation of Intersectionality Theory and considers challenges and opportunities these concepts present for quantitative methods. Two examples are presented to demonstrate how an intersectional approach to quantitative analyses differs from a traditional single-axis approach. The first example employs a linear regression technique to examine the efficacy of an educational intervention and to explore whether efficacy differs among subgroups of students. The second example compares findings when a differential item function analysis is conducted in a single-axis manner versus an intersectional lens. The module ends by exploring key considerations analysts and psychometricians encounter when applying Intersectionality Theory to a quantitative analysis.</p>\u0000 </section>\u0000 </div>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 3","pages":"106-108"},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12622","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142404751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demystifying Adequate Growth Percentiles 揭开适当增长百分位数的神秘面纱
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-10-09 DOI: 10.1111/emip.12635
Katherine E. Castellano, Daniel F. McCaffrey, Joseph A. Martineau

Growth-to-standard models evaluate student growth against the growth needed to reach a future standard or target of interest, such as proficiency. A common growth-to-standard model involves comparing the popular Student Growth Percentile (SGP) to Adequate Growth Percentiles (AGPs). AGPs follow from an involved process based on fitting a series of nonlinear quantile regression models to longitudinal student test score data. This paper demystifies AGPs by deriving them in the more familiar linear regression framework. It further shows that unlike SGPs, AGPs and on-track classifications based on AGPs are strongly related to status. Lastly, AGPs are evaluated in terms of their classification accuracy. An empirical study and analytic derivations reveal AGPs can be problematic indicators of students’ future performance with previously not proficient students being more likely incorrectly flagged as not on-track and previously proficient students as on track. These classification errors have equity implications at the individual and school levels.

成长到标准模型评估学生的成长与达到未来标准或兴趣目标(如熟练程度)所需的成长。一个常见的增长到标准模型涉及比较流行的学生增长百分位数(SGP)和适当增长百分位数(AGPs)。agp是基于一系列非线性分位数回归模型拟合纵向学生考试成绩数据的一个复杂过程。本文通过在更熟悉的线性回归框架中推导agp来揭开它们的神秘面纱。这进一步表明,与sgp不同,agp和基于agp的轨道分类与地位密切相关。最后,对agp的分类精度进行了评价。一项实证研究和分析推导表明,agp可能是学生未来表现的有问题的指标,以前不熟练的学生更有可能被错误地标记为没有走上正轨,而以前熟练的学生则被错误地标记为走上正轨。这些分类错误在个人和学校层面都有公平的含义。
{"title":"Demystifying Adequate Growth Percentiles","authors":"Katherine E. Castellano,&nbsp;Daniel F. McCaffrey,&nbsp;Joseph A. Martineau","doi":"10.1111/emip.12635","DOIUrl":"https://doi.org/10.1111/emip.12635","url":null,"abstract":"<p>Growth-to-standard models evaluate student growth against the growth needed to reach a future standard or target of interest, such as proficiency. A common growth-to-standard model involves comparing the popular Student Growth Percentile (SGP) to Adequate Growth Percentiles (AGPs). AGPs follow from an involved process based on fitting a series of nonlinear quantile regression models to longitudinal student test score data. This paper demystifies AGPs by deriving them in the more familiar linear regression framework. It further shows that unlike SGPs, AGPs and on-track classifications based on AGPs are strongly related to status. Lastly, AGPs are evaluated in terms of their classification accuracy. An empirical study and analytic derivations reveal AGPs can be problematic indicators of students’ future performance with previously not proficient students being more likely incorrectly flagged as not on-track and previously proficient students as on track. These classification errors have equity implications at the individual and school levels.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"31-43"},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Cover: Gendered Trajectories of Digital Literacy Development: Insights from a Longitudinal Cohort Study 封面上:数字素养发展的性别轨迹:纵向队列研究的启示
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-10-09 DOI: 10.1111/emip.12625
Yuan-Ling Liaw
{"title":"On the Cover: Gendered Trajectories of Digital Literacy Development: Insights from a Longitudinal Cohort Study","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12625","DOIUrl":"https://doi.org/10.1111/emip.12625","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 3","pages":"6"},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12625","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142404750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Issue Cover 发行封面
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-10-09 DOI: 10.1111/emip.12564
{"title":"Issue Cover","authors":"","doi":"10.1111/emip.12564","DOIUrl":"https://doi.org/10.1111/emip.12564","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 3","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12564","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142404260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational Measurement-Issues and Practice
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1