首页 > 最新文献

Educational Measurement-Issues and Practice最新文献

英文 中文
Weighting Content Specifications for the National Medical Licensing Examination via Group Analytic Hierarchy Process 通过分组层次分析法确定国家医师资格考试内容规范的权重
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-07-19 DOI: 10.1111/emip.12620
Xiaomei Hong, Zhehan Jiang, Hanyu Liu, Fen Cai

Job and practice analysis is a commonly used method for determining examination content specifications. However, difficulties arise when many domains are present, as mainstream approaches do not fully adhere to the essence of the weighing process, namely a “comparison-evaluation-decision” framework for assigning percentage values to the content. Stemming from the principle of comparing multiple criteria for making decisions, the Analytic Hierarchy Process (AHP) provides an appropriate solution that circumvents the aforementioned obstacle. We propose using an extended version of AHP called Group AHP (GAHP) to weight content specifications for standardized medical education assessment. Specifically, GAHP is integrated with the Delphi method and expected to aid exam developers in integrating feedback from diverse experienced physicians when determining content specifications for the National Medical Licensing Examination (NMLE) in China. The complete flow of the proposed approach was demonstrated in this study with an application to the NMLE.

工作与实践分析是确定考试内容规格的常用方法。然而,当涉及多个领域时就会出现困难,因为主流方法并没有完全遵循权衡过程的本质,即为内容分配百分比值的 "比较-评价-决策 "框架。分析层次过程(AHP)源于比较多个标准进行决策的原则,它提供了一个适当的解决方案,绕过了上述障碍。我们建议使用 AHP 的扩展版本--组 AHP(GAHP)--来为标准化医学教育评估的内容规格加权。具体而言,GAHP 与德尔菲法相结合,有望帮助考试开发人员在确定中国国家医师资格考试(NMLE)的内容规范时整合来自不同经验医师的反馈意见。本研究通过对国家医师资格考试的应用,展示了所建议方法的完整流程。
{"title":"Weighting Content Specifications for the National Medical Licensing Examination via Group Analytic Hierarchy Process","authors":"Xiaomei Hong,&nbsp;Zhehan Jiang,&nbsp;Hanyu Liu,&nbsp;Fen Cai","doi":"10.1111/emip.12620","DOIUrl":"10.1111/emip.12620","url":null,"abstract":"<p>Job and practice analysis is a commonly used method for determining examination content specifications. However, difficulties arise when many domains are present, as mainstream approaches do not fully adhere to the essence of the weighing process, namely a “comparison-evaluation-decision” framework for assigning percentage values to the content. Stemming from the principle of comparing multiple criteria for making decisions, the Analytic Hierarchy Process (AHP) provides an appropriate solution that circumvents the aforementioned obstacle. We propose using an extended version of AHP called Group AHP (GAHP) to weight content specifications for standardized medical education assessment. Specifically, GAHP is integrated with the Delphi method and expected to aid exam developers in integrating feedback from diverse experienced physicians when determining content specifications for the National Medical Licensing Examination (NMLE) in China. The complete flow of the proposed approach was demonstrated in this study with an application to the NMLE.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"7-17"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141823629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Instructional Decision-Making Using Diagnostic Classification Models 利用诊断分类模型改进教学决策
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-06-25 DOI: 10.1111/emip.12619
W. Jake Thompson, Amy K. Clark

In recent years, educators, administrators, policymakers, and measurement experts have called for assessments that support educators in making better instructional decisions. One promising approach to measurement to support instructional decision-making is diagnostic classification models (DCMs). DCMs are flexible psychometric models that facilitate fine-grained reporting on skills that students have mastered. In this article, we describe how DCMs can be leveraged to support better decision-making. We first provide a high-level overview of DCMs. We then describe different methods for reporting results from DCM-based assessments that support decision-making for different stakeholder groups. We close with a discussion of considerations for implementing DCMs in an operational setting, including how they can inform decision-making at state and local levels, and share future directions for research.

近年来,教育工作者、管理者、政策制定者和测量专家都呼吁进行评估,以支持教 育者做出更好的教学决策。诊断分类模型(DCMs)是支持教学决策的一种很有前途的测量方法。诊断分类模型是一种灵活的心理测量模型,便于对学生掌握的技能进行精细报告。在本文中,我们将介绍如何利用 DCM 来支持更好的决策。首先,我们对 DCM 进行了高层次的概述。然后,我们介绍了报告基于 DCM 的评估结果的不同方法,以支持不同利益相关群体的决策。最后,我们讨论了在操作环境中实施 DCM 的注意事项,包括如何为州和地方层面的决策提供信息,并分享了未来的研究方向。
{"title":"Improving Instructional Decision-Making Using Diagnostic Classification Models","authors":"W. Jake Thompson,&nbsp;Amy K. Clark","doi":"10.1111/emip.12619","DOIUrl":"10.1111/emip.12619","url":null,"abstract":"<p>In recent years, educators, administrators, policymakers, and measurement experts have called for assessments that support educators in making better instructional decisions. One promising approach to measurement to support instructional decision-making is diagnostic classification models (DCMs). DCMs are flexible psychometric models that facilitate fine-grained reporting on skills that students have mastered. In this article, we describe how DCMs can be leveraged to support better decision-making. We first provide a high-level overview of DCMs. We then describe different methods for reporting results from DCM-based assessments that support decision-making for different stakeholder groups. We close with a discussion of considerations for implementing DCMs in an operational setting, including how they can inform decision-making at state and local levels, and share future directions for research.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"146-156"},"PeriodicalIF":2.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Item Response Theory Models for Polytomous Multidimensional Forced-Choice Items to Measure Construct Differentiation 用于测量结构差异的多项式多维强迫选择题的项目反应理论模型
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-06-10 DOI: 10.1111/emip.12621
Xuelan Qiu, Jimmy de la Torre, You-Gan Wang, Jinran Wu

Multidimensional forced-choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed, majority of which are for MFC items with binary responses. However, MFC items with polytomous responses are more informative and have many applications. This paper develops a polytomous Rasch ipsative model (pRIM) that can deal with ipsative data and yield estimates that measure construct differentiation—a latent trait that describes the degree to which the personality constructs (e.g., interests) distinguish between each other. The pRIM and its simpler form are applied to a career interests assessment containing four-category MFC items and the measures of interests differentiation are used for both intra- and interpersonal comparisons. Simulations are conducted to examine the recovery of the parameters under various conditions. The results show that the parameters of the pRIM can be well recovered, particularly when a complete linking design and a large sample are used. The implications and application of the pRIM in the personality assessment using MFC items are discussed.

多维强迫选择(MFC)项目被认为有助于减少人格评估中的反应偏差。然而,MFC 项目的传统计分方法会产生误差数据,阻碍了 MFC 格式的广泛应用。在过去的十年中,人们开发了许多项目反应理论(IRT)模型,其中大部分是针对二元反应的 MFC 项目。然而,具有多态反应的 MFC 项目信息量更大,应用范围更广。本文开发了一种多项式 Rasch ipsative 模型(pRIM),它可以处理 ipsative 数据,并产生测量构念区分度的估计值--一种描述人格构念(如兴趣)相互区分程度的潜在特质。pRIM 及其简化形式被应用于包含四类 MFC 项目的职业兴趣评估,兴趣差异的测量结果被用于内部和人际比较。研究人员进行了模拟,以检验在各种条件下参数的恢复情况。结果表明,pRIM 的参数可以很好地恢复,特别是在使用完整的链接设计和大样本的情况下。研究还讨论了 pRIM 在使用 MFC 项目进行人格评估时的意义和应用。
{"title":"Item Response Theory Models for Polytomous Multidimensional Forced-Choice Items to Measure Construct Differentiation","authors":"Xuelan Qiu,&nbsp;Jimmy de la Torre,&nbsp;You-Gan Wang,&nbsp;Jinran Wu","doi":"10.1111/emip.12621","DOIUrl":"10.1111/emip.12621","url":null,"abstract":"<p>Multidimensional forced-choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed, majority of which are for MFC items with binary responses. However, MFC items with polytomous responses are more informative and have many applications. This paper develops a polytomous Rasch ipsative model (pRIM) that can deal with ipsative data and yield estimates that measure construct differentiation—a latent trait that describes the degree to which the personality constructs (e.g., interests) distinguish between each other. The pRIM and its simpler form are applied to a career interests assessment containing four-category MFC items and the measures of interests differentiation are used for both intra- and interpersonal comparisons. Simulations are conducted to examine the recovery of the parameters under various conditions. The results show that the parameters of the pRIM can be well recovered, particularly when a complete linking design and a large sample are used. The implications and application of the pRIM in the personality assessment using MFC items are discussed.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"157-168"},"PeriodicalIF":2.7,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141362892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Cover: Predicted Racial-Ethnic Composition of Educational Measurement Publications 封面:教育测量出版物的种族-民族构成预测
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-05-20 DOI: 10.1111/emip.12610
Yuan-Ling Liaw

 

{"title":"On the Cover: Predicted Racial-Ethnic Composition of Educational Measurement Publications","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12610","DOIUrl":"https://doi.org/10.1111/emip.12610","url":null,"abstract":"<p> \u0000\u0000 <figure>\u0000 <div><picture>\u0000 <source></source></picture><p></p>\u0000 </div>\u0000 </figure></p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 2","pages":"4-5"},"PeriodicalIF":2.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141073715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Issue Cover 发行封面
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-05-20 DOI: 10.1111/emip.12562
{"title":"Issue Cover","authors":"","doi":"10.1111/emip.12562","DOIUrl":"https://doi.org/10.1111/emip.12562","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12562","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141073716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blending Strategic Expertise and Technology: A Case Study for Practice Analysis 战略专业知识与技术相结合:实践分析案例研究
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-05-20 DOI: 10.1111/emip.12607
Bharati B. Belwalkar, Matthew Schultz, Christina Curnow, J. Carl Setzer

There is a growing integration of technology in the workplace (World Economic Forum), and with it, organizations are increasingly relying on advanced technological approaches for improving their human capital processes to stay relevant and competitive in complex environments. All professions must keep up with this transition and begin integrating technology into their tools and processes. This paper centers on how advanced technological approaches (such as natural language processing (NLP) and data mining) have complemented a traditional practice analysis of the accounting profession. We also discuss strategic selection and use of subject-matter experts (SMEs) for more efficient practice analysis. The authors have adopted a triangulation process—gathering information from traditional practice analysis, using selected SMEs, and confirming findings with a novel NLP-based approach. These methods collectively contributed to the revision of the Uniform CPA Exam blueprint and in understanding accounting trends.

技术与工作场所的融合日益加深(世界经济论坛),各组织也越来越依赖先进的技术方法来改进其人力资本流程,以便在复杂的环境中保持相关性和竞争力。所有行业都必须跟上这一转变,并开始将技术融入其工具和流程。本文主要探讨先进的技术方法(如自然语言处理(NLP)和数据挖掘)如何与会计行业的传统实践分析相辅相成。我们还讨论了如何战略性地选择和使用主题专家 (SME),以提高实践分析的效率。作者采用了一个三角测量过程--从传统的实践分析中收集信息,利用选定的中小型企业,并通过基于 NLP 的新方法确认研究结果。这些方法共同为修订注册会计师统一考试蓝图和了解会计趋势做出了贡献。
{"title":"Blending Strategic Expertise and Technology: A Case Study for Practice Analysis","authors":"Bharati B. Belwalkar,&nbsp;Matthew Schultz,&nbsp;Christina Curnow,&nbsp;J. Carl Setzer","doi":"10.1111/emip.12607","DOIUrl":"10.1111/emip.12607","url":null,"abstract":"<p>There is a growing integration of technology in the workplace (World Economic Forum), and with it, organizations are increasingly relying on advanced technological approaches for improving their human capital processes to stay relevant and competitive in complex environments. All professions must keep up with this transition and begin integrating technology into their tools and processes. This paper centers on how advanced technological approaches (such as natural language processing (NLP) and data mining) have complemented a traditional practice analysis of the accounting profession. We also discuss strategic selection and use of subject-matter experts (SMEs) for more efficient practice analysis. The authors have adopted a triangulation process—gathering information from traditional practice analysis, using selected SMEs, and confirming findings with a novel NLP-based approach. These methods collectively contributed to the revision of the Uniform CPA Exam blueprint and in understanding accounting trends.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 3","pages":"85-94"},"PeriodicalIF":2.7,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2023 NCME Presidential Address: Some Musings on Comparable Scores 2023 年全国教育大会主席致辞:关于可比分数的一些想法
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-05-12 DOI: 10.1111/emip.12609
Deborah J. Harris

This article is based on my 2023 NCME Presidential Address, where I talked a bit about my journey into the profession, and more substantively about comparable scores. Specifically, I discussed some of the different ways ‘comparable scores’ are defined, highlighted some areas I think we as a profession need to pay more attention to when considering score comparability, and emphasized that comparability in this context is a matter of degree which varies according to the decisions we plan to make on particular scores.

这篇文章是根据我在 2023 年全国医学教育大会上的主席致辞撰写的,我在致辞中谈到了我进入这一行业的一些历程,并更实质性地谈到了可比分数。具体来说,我讨论了 "可比分数 "的一些不同定义方式,强调了我认为在考虑分数可比性时我们作为一个行业需要更加关注的一些领域,并强调在这种情况下可比性是一个程度问题,根据我们计划对特定分数做出的决定而有所不同。
{"title":"2023 NCME Presidential Address: Some Musings on Comparable Scores","authors":"Deborah J. Harris","doi":"10.1111/emip.12609","DOIUrl":"10.1111/emip.12609","url":null,"abstract":"<p>This article is based on my 2023 NCME Presidential Address, where I talked a bit about my journey into the profession, and more substantively about comparable scores. Specifically, I discussed some of the different ways ‘comparable scores’ are defined, highlighted some areas I think we as a profession need to pay more attention to when considering score comparability, and emphasized that comparability in this context is a matter of degree which varies according to the decisions we plan to make on particular scores.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 2","pages":"6-15"},"PeriodicalIF":2.0,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12609","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining Gender Differences in TIMSS 2019 Using a Multiple-Group Hierarchical Speed-Accuracy-Revisits Model 使用多组分层速度-准确性-重访模型研究 2019 年 TIMSS 考试中的性别差异
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-04-24 DOI: 10.1111/emip.12606
Dihao Leng, Ummugul Bezirhan, Lale Khorramdel, Bethany Fishbein, Matthias von Davier

This study capitalizes on response and process data from the computer-based TIMSS 2019 Problem Solving and Inquiry tasks to investigate gender differences in test-taking behaviors and their association with mathematics achievement at the eighth grade. Specifically, a recently proposed hierarchical speed-accuracy-revisits (SAR) model was adapted to multiple country-by-gender groups to examine the extent to which mathematics ability, response speed, revisit propensity, and the relationship among them differ between boys and girls. Results across 10 countries showed that boys responded to items faster on average than girls, and there was greater variation in boys’ response speed across students. A mixture distribution of revisit propensity was found for all country-by-gender groups. Both genders had moderate to strong negative correlations between mathematics ability and response speed, supporting the speed-accuracy tradeoff pattern reported in the literature. Results are discussed in the context of low-stakes assessments and in relation to the utility of the multiple-group SAR model.

本研究利用基于计算机的 TIMSS 2019 年 "问题解决与探究 "任务中的反应和过程数据,研究八年级学生在考试行为方面的性别差异及其与数学成绩之间的关联。具体来说,我们将最近提出的分层速度-测准-重访(SAR)模型应用于多个国家的不同性别群体,以研究男生和女生在数学能力、反应速度、重访倾向以及它们之间的关系方面的差异程度。10 个国家的研究结果表明,男生对题目的平均反应速度比女生快,而且男生的反应速度在不同学生之间的差异更大。在所有国家和性别组中,重访倾向呈混合分布。男女生的数学能力与反应速度之间都存在中度到高度的负相关,这支持了文献中报道的速度-准确性权衡模式。本研究结合低分值评估以及多组 SAR 模型的实用性对结果进行了讨论。
{"title":"Examining Gender Differences in TIMSS 2019 Using a Multiple-Group Hierarchical Speed-Accuracy-Revisits Model","authors":"Dihao Leng,&nbsp;Ummugul Bezirhan,&nbsp;Lale Khorramdel,&nbsp;Bethany Fishbein,&nbsp;Matthias von Davier","doi":"10.1111/emip.12606","DOIUrl":"10.1111/emip.12606","url":null,"abstract":"<p>This study capitalizes on response and process data from the computer-based TIMSS 2019 Problem Solving and Inquiry tasks to investigate gender differences in test-taking behaviors and their association with mathematics achievement at the eighth grade. Specifically, a recently proposed hierarchical speed-accuracy-revisits (SAR) model was adapted to multiple country-by-gender groups to examine the extent to which mathematics ability, response speed, revisit propensity, and the relationship among them differ between boys and girls. Results across 10 countries showed that boys responded to items faster on average than girls, and there was greater variation in boys’ response speed across students. A mixture distribution of revisit propensity was found for all country-by-gender groups. Both genders had moderate to strong negative correlations between mathematics ability and response speed, supporting the speed-accuracy tradeoff pattern reported in the literature. Results are discussed in the context of low-stakes assessments and in relation to the utility of the multiple-group SAR model.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 3","pages":"64-75"},"PeriodicalIF":2.7,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12606","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140663098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guesses and Slips as Proficiency-Related Phenomena and Impacts on Parameter Invariance 猜测和失误作为与能力有关的现象及其对参数不变性的影响
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-04-08 DOI: 10.1111/emip.12605
Xiangyi Liao, Daniel M Bolt

Traditional approaches to the modeling of multiple-choice item response data (e.g., 3PL, 4PL models) emphasize slips and guesses as random events. In this paper, an item response model is presented that characterizes both disjunctively interacting guessing and conjunctively interacting slipping processes as proficiency-related phenomena. We show how evidence for this perspective is seen in the systematic form of invariance violations for item slip and guess parameters under four-parameter IRT models when compared across populations of different mean proficiency levels. Specifically, higher proficiency populations tend to show higher guess and lower slip probabilities than lower proficiency populations. The results undermine the use of traditional models for IRT applications that require invariance and would suggest greater attention to alternatives.

传统的多选题项目反应数据建模方法(如 3PL、4PL 模型)强调滑题和猜题是随机事件。本文提出了一种项目反应模型,它将不相关的交互猜测和相关的交互滑动过程都描述为与能力相关的现象。我们展示了这种观点的证据,即在四参数 IRT 模型下,在不同平均能力水平的人群之间进行比较时,项目滑点和猜测参数的不变量违反是如何以系统的形式出现的。具体来说,与水平较低的人群相比,水平较高的人群往往表现出较高的猜测概率和较低的失误概率。这些结果不利于在要求不变量的 IRT 应用中使用传统模型,并建议更多地关注替代模型。
{"title":"Guesses and Slips as Proficiency-Related Phenomena and Impacts on Parameter Invariance","authors":"Xiangyi Liao,&nbsp;Daniel M Bolt","doi":"10.1111/emip.12605","DOIUrl":"10.1111/emip.12605","url":null,"abstract":"<p>Traditional approaches to the modeling of multiple-choice item response data (e.g., 3PL, 4PL models) emphasize slips and guesses as random events. In this paper, an item response model is presented that characterizes both disjunctively interacting guessing and conjunctively interacting slipping processes as proficiency-related phenomena. We show how evidence for this perspective is seen in the systematic form of invariance violations for item slip and guess parameters under four-parameter IRT models when compared across populations of different mean proficiency levels. Specifically, higher proficiency populations tend to show higher guess and lower slip probabilities than lower proficiency populations. The results undermine the use of traditional models for IRT applications that require invariance and would suggest greater attention to alternatives.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 3","pages":"76-84"},"PeriodicalIF":2.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12605","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140589673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI 改变评估:大型语言模型和生成式人工智能的影响和意义
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-04-04 DOI: 10.1111/emip.12602
Jiangang Hao, Alina A. von Davier, Victoria Yaneva, Susan Lottridge, Matthias von Davier, Deborah J. Harris

The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting-edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely, these innovations raise significant concerns regarding validity, reliability, transparency, fairness, equity, and test security, necessitating careful thinking when applying them in assessments. In this article, we discuss the impacts and implications of LLMs and generative AI on critical dimensions of assessment with example use cases and call for a community effort to equip assessment professionals with the needed AI literacy to harness the potential effectively.

以 ChatGPT 为代表的人工智能(AI)取得了长足进步,为评估领域带来了大量机遇和挑战。将尖端的大型语言模型(LLMs)和生成式人工智能应用于评估,在提高效率、减少偏差和促进定制化评估方面大有可为。与此相反,这些创新技术在有效性、可靠性、透明度、公平性、公正性和测试安全性等方面也引起了极大的关注,因此在评估中应用这些技术时必须慎重考虑。在这篇文章中,我们通过使用实例讨论了 LLMs 和生成式人工智能对评估关键维度的影响和意义,并呼吁社会各界共同努力,使评估专业人员具备必要的人工智能素养,从而有效利用人工智能的潜力。
{"title":"Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI","authors":"Jiangang Hao,&nbsp;Alina A. von Davier,&nbsp;Victoria Yaneva,&nbsp;Susan Lottridge,&nbsp;Matthias von Davier,&nbsp;Deborah J. Harris","doi":"10.1111/emip.12602","DOIUrl":"10.1111/emip.12602","url":null,"abstract":"<p>The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting-edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely, these innovations raise significant concerns regarding validity, reliability, transparency, fairness, equity, and test security, necessitating careful thinking when applying them in assessments. In this article, we discuss the impacts and implications of LLMs and generative AI on critical dimensions of assessment with example use cases and call for a community effort to equip assessment professionals with the needed AI literacy to harness the potential effectively.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 2","pages":"16-29"},"PeriodicalIF":2.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140589684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational Measurement-Issues and Practice
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1