首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes 基于层次结构属性划分的认知诊断多阶段测试
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-07-05 DOI: 10.1111/jedm.12339
Rae Yeong Kim, Yun Joo Yoo

In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem, this study proposes a cognitive diagnostic multistage testing by partitioning hierarchically structured attributes (CD-MST-PH) as a multistage testing for CDM. In CD-MST-PH, multiple testlets can be constructed based on separate attribute groups before testing occurs, which retains the advantages of multistage testing over fully adaptive testing or the on-the-fly approach. Moreover, testlets are offered sequentially and adaptively, thus improving test accuracy and efficiency. An item information measure is proposed to compute the discrimination power of an item for each attribute, and a module assembly method is presented to construct modules anchored at each separate attribute group. Several module selection indices for CD-MST-PH are also proposed by modifying the item selection indices used in cognitive diagnostic computerized adaptive testing. The results of simulation study show that CD-MST-PH can improve test accuracy and efficiency relative to the conventional test without adaptive stages.

在认知诊断模型(CDMs)中,需要一组细粒度的属性来描述复杂问题的解决,并提供有关考生的详细诊断信息。然而,当测试的目的是在大规模的属性图中识别考生的属性概况时,如何保证可靠的估计和控制计算复杂度是一个挑战。为了解决这一问题,本研究提出了一种通过划分层次结构属性的认知诊断多阶段测试(CD-MST-PH)作为CDM的多阶段测试。在CD-MST-PH中,可以在测试发生之前基于单独的属性组构建多个测试let,这保留了多阶段测试相对于完全自适应测试或动态方法的优点。此外,测试块的顺序和自适应提供,从而提高了测试精度和效率。提出了一种项目信息测度来计算项目对每个属性的识别能力,并提出了一种模块组装方法来构建锚定在每个单独属性组上的模块。通过对认知诊断计算机化自适应测试中常用的项目选择指标的修改,提出了CD-MST-PH的若干模块选择指标。仿真研究结果表明,CD-MST-PH相对于传统无自适应阶段的测试,可以提高测试精度和效率。
{"title":"Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes","authors":"Rae Yeong Kim,&nbsp;Yun Joo Yoo","doi":"10.1111/jedm.12339","DOIUrl":"10.1111/jedm.12339","url":null,"abstract":"<p>In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem, this study proposes a cognitive diagnostic multistage testing by partitioning hierarchically structured attributes (CD-MST-PH) as a multistage testing for CDM. In CD-MST-PH, multiple testlets can be constructed based on separate attribute groups before testing occurs, which retains the advantages of multistage testing over fully adaptive testing or the on-the-fly approach. Moreover, testlets are offered sequentially and adaptively, thus improving test accuracy and efficiency. An item information measure is proposed to compute the discrimination power of an item for each attribute, and a module assembly method is presented to construct modules anchored at each separate attribute group. Several module selection indices for CD-MST-PH are also proposed by modifying the item selection indices used in cognitive diagnostic computerized adaptive testing. The results of simulation study show that CD-MST-PH can improve test accuracy and efficiency relative to the conventional test without adaptive stages.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45947771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model 用简单结构MIRT模型估计多个测度的分类精度和一致性指标
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-06-20 DOI: 10.1111/jedm.12338
Seohee Park, Kyung Yong Kim, Won-Chan Lee

Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an approach to estimate classification consistency and accuracy indices for multiple measures under four possible decision rules: (1) complementary, (2) conjunctive, (3) compensatory, and (4) pairwise combinations of the three. The current study uses the IRT-recursive-based approach with the simple-structure multidimensional IRT model (SS-MIRT) to estimate the classification consistency and accuracy for multiple measures. Theoretical formulations of the four decision rules with a binary decision (Pass/Fail) are presented. The estimation procedures are illustrated using an empirical data example based on SS-MIRT. In addition, this study applies the estimation procedures to the unidimensional IRT (UIRT) context, considering that UIRT is practically used more. This application shows that the proposed procedure of classification consistency and accuracy could be used with a UIRT model for individual measures as an alternative method of SS-MIRT.

多种测试方法,如多种内容领域或多种类型的表现,在各种测试程序中被用于对考生进行筛选或选择。尽管多测度的用法比较普遍,但对多测度的分类一致性和准确性的研究却很少。基于此,本文提出了一种基于四种可能的决策规则(1)互补、(2)连接、(3)补偿和(4)三者的两两组合来估计多度量的分类一致性和准确度指标的方法。本研究采用基于IRT递归的方法,结合简单结构多维IRT模型(SS-MIRT)来估计多测量的分类一致性和准确性。给出了四种二元决策规则(通过/不通过)的理论表达式。利用基于SS-MIRT的经验数据示例说明了估计过程。此外,考虑到一维IRT的实际应用较多,本研究将估计过程应用于一维IRT情境。该应用表明,所提出的分类一致性和准确性的程序可以与单个测量的irt模型一起使用,作为SS-MIRT的替代方法。
{"title":"Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model","authors":"Seohee Park,&nbsp;Kyung Yong Kim,&nbsp;Won-Chan Lee","doi":"10.1111/jedm.12338","DOIUrl":"10.1111/jedm.12338","url":null,"abstract":"<p>Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an approach to estimate classification consistency and accuracy indices for multiple measures under four possible decision rules: (1) complementary, (2) conjunctive, (3) compensatory, and (4) pairwise combinations of the three. The current study uses the IRT-recursive-based approach with the simple-structure multidimensional IRT model (SS-MIRT) to estimate the classification consistency and accuracy for multiple measures. Theoretical formulations of the four decision rules with a binary decision (Pass/Fail) are presented. The estimation procedures are illustrated using an empirical data example based on SS-MIRT. In addition, this study applies the estimation procedures to the unidimensional IRT (UIRT) context, considering that UIRT is practically used more. This application shows that the proposed procedure of classification consistency and accuracy could be used with a UIRT model for individual measures as an alternative method of SS-MIRT.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45264295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Latent Space Model for Process Data 过程数据的潜在空间模型
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-06-12 DOI: 10.1111/jedm.12337
Yi Chen, Jingru Zhang, Yi Yang, Young-Sun Lee

The development of human-computer interactive items in educational assessments provides opportunities to extract useful process information for problem-solving. However, the complex, intensive, and noisy nature of process data makes it challenging to model with the traditional psychometric methods. Social network methods have been applied to visualize and analyze process data. Nonetheless, research about statistical modeling of process information using social network methods is still limited. This article explored the application of the latent space model (LSM) for analyzing process data in educational assessment. The adjacent matrix of transitions between actions was created based on the weighted and directed network of action sequences and related auxiliary information. Then, the adjacent matrix was modeled with LSM to identify the lower-dimensional latent positions of actions. Three applications based on the results from LSM were introduced: action clustering, error analysis, and performance measurement. The simulation study showed that LSM can cluster actions from the same problem-solving strategy and measure students’ performance by comparing their action sequences with the optimal strategy. Finally, we analyzed the empirical data from PISA 2012 as a real case scenario to illustrate how to use LSM.

教育评估中人机交互项目的发展为提取解决问题的有用过程信息提供了机会。然而,过程数据的复杂性、密集性和噪声性给传统的心理测量方法建模带来了挑战。社会网络方法已被应用于可视化和分析过程数据。然而,利用社会网络方法对过程信息进行统计建模的研究仍然有限。本文探讨了潜在空间模型(LSM)在教育评价过程数据分析中的应用。基于动作序列的加权有向网络和相关辅助信息,建立动作间的相邻过渡矩阵。然后,利用LSM对相邻矩阵进行建模,识别动作的低维潜在位置。介绍了基于LSM结果的三种应用:动作聚类、误差分析和性能测量。仿真研究表明,LSM可以聚类来自相同问题解决策略的动作,并通过比较学生的动作序列与最优策略来衡量学生的表现。最后,我们分析了PISA 2012的实证数据作为一个真实的案例场景来说明如何使用LSM。
{"title":"Latent Space Model for Process Data","authors":"Yi Chen,&nbsp;Jingru Zhang,&nbsp;Yi Yang,&nbsp;Young-Sun Lee","doi":"10.1111/jedm.12337","DOIUrl":"10.1111/jedm.12337","url":null,"abstract":"<p>The development of human-computer interactive items in educational assessments provides opportunities to extract useful process information for problem-solving. However, the complex, intensive, and noisy nature of process data makes it challenging to model with the traditional psychometric methods. Social network methods have been applied to visualize and analyze process data. Nonetheless, research about statistical modeling of process information using social network methods is still limited. This article explored the application of the latent space model (LSM) for analyzing process data in educational assessment. The adjacent matrix of transitions between actions was created based on the weighted and directed network of action sequences and related auxiliary information. Then, the adjacent matrix was modeled with LSM to identify the lower-dimensional latent positions of actions. Three applications based on the results from LSM were introduced: action clustering, error analysis, and performance measurement. The simulation study showed that LSM can cluster actions from the same problem-solving strategy and measure students’ performance by comparing their action sequences with the optimal strategy. Finally, we analyzed the empirical data from PISA 2012 as a real case scenario to illustrate how to use LSM.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42099226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optimizing Implementation of Artificial-Intelligence-Based Automated Scoring: An Evidence Centered Design Approach for Designing Assessments for AI-based Scoring 基于人工智能的自动评分的优化实现:基于人工智能评分的评估设计的循证设计方法
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-06-12 DOI: 10.1111/jedm.12332
Kadriye Ercikan, Daniel F. McCaffrey

Artificial-intelligence-based automated scoring is often an afterthought and is considered after assessments have been developed, resulting in nonoptimal possibility of implementing automated scoring solutions. In this article, we provide a review of Artificial intelligence (AI)-based methodologies for scoring in educational assessments. We then propose an evidence-centered design framework for developing assessments to align conceptualization, scoring, and ultimate assessment interpretation and use with the advantages and limitations of AI-based scoring in mind. We provide recommendations for defining construct, task, and evidence models to guide task and assessment design that optimize the development and implementation of AI-based automated scoring of constructed response items and support the validity of inferences from and uses of scores.

基于人工智能的自动评分通常是事后才考虑的,并且是在评估开发之后才考虑的,这导致了实施自动评分解决方案的非最佳可能性。在本文中,我们对基于人工智能(AI)的教育评估评分方法进行了综述。然后,我们提出了一个以证据为中心的设计框架,用于开发评估,以使概念化,评分,最终评估解释和使用与基于人工智能的评分的优点和局限性保持一致。我们提供了定义构建、任务和证据模型的建议,以指导任务和评估设计,优化基于人工智能的构建响应项目自动评分的开发和实现,并支持从得分中推断和使用的有效性。
{"title":"Optimizing Implementation of Artificial-Intelligence-Based Automated Scoring: An Evidence Centered Design Approach for Designing Assessments for AI-based Scoring","authors":"Kadriye Ercikan,&nbsp;Daniel F. McCaffrey","doi":"10.1111/jedm.12332","DOIUrl":"10.1111/jedm.12332","url":null,"abstract":"<p>Artificial-intelligence-based automated scoring is often an afterthought and is considered after assessments have been developed, resulting in nonoptimal possibility of implementing automated scoring solutions. In this article, we provide a review of Artificial intelligence (AI)-based methodologies for scoring in educational assessments. We then propose an evidence-centered design framework for developing assessments to align conceptualization, scoring, and ultimate assessment interpretation and use with the advantages and limitations of AI-based scoring in mind. We provide recommendations for defining construct, task, and evidence models to guide task and assessment design that optimize the development and implementation of AI-based automated scoring of constructed response items and support the validity of inferences from and uses of scores.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43168099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment: A Discussion and Look Forward 创新教育评估中的有效性论证与人工智能:探讨与展望
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-06-09 DOI: 10.1111/jedm.12330
David W. Dorsey, Hillary R. Michaels

In this concluding article of the special issue, we provide an overall discussion and point to future emerging trends in AI that might shape our approach to validity and building validity arguments.

在本期特刊的最后一篇文章中,我们进行了全面的讨论,并指出了人工智能未来的新兴趋势,这些趋势可能会影响我们对有效性和建立有效性论证的方法。
{"title":"Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment: A Discussion and Look Forward","authors":"David W. Dorsey,&nbsp;Hillary R. Michaels","doi":"10.1111/jedm.12330","DOIUrl":"10.1111/jedm.12330","url":null,"abstract":"<p>In this concluding article of the special issue, we provide an overall discussion and point to future emerging trends in AI that might shape our approach to validity and building validity arguments.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44571648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Validity Arguments for AI-Based Automated Scores: Essay Scoring as an Illustration 基于人工智能的自动评分的有效性论证:以论文评分为例
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-06-08 DOI: 10.1111/jedm.12333
Steve Ferrara, Saed Qunbar

In this article, we argue that automated scoring engines should be transparent and construct relevant—that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and directly relevant to the target assessment construct. We address the current limitations on evidence and validity arguments for scores from automated scoring engines from the points of view of the Standards for Educational and Psychological Testing (i.e., construct relevance, construct representation, and fairness) and emerging principles in Artificial Intelligence (e.g., explainable AI, an examinee's right to explanations, and principled AI). We illustrate these concepts and arguments for automated essay scores.

在本文中,我们认为自动评分引擎应该是透明的和结构相关的——也就是说,尽可能多的是当前可行的。如果不考虑一些可能不容易解释和理解的特征,并且可能与目标评估结构不明显和直接相关,那么许多当前的自动评分引擎无法实现高度的评分准确性。我们从教育和心理测试标准(即构建相关性,构建代表性和公平性)和人工智能新兴原则(例如,可解释的人工智能,考生的解释权和原则性人工智能)的角度解决了自动评分引擎得分的证据和有效性论点的当前限制。我们举例说明这些概念和论点自动作文分数。
{"title":"Validity Arguments for AI-Based Automated Scores: Essay Scoring as an Illustration","authors":"Steve Ferrara,&nbsp;Saed Qunbar","doi":"10.1111/jedm.12333","DOIUrl":"10.1111/jedm.12333","url":null,"abstract":"<p>In this article, we argue that automated scoring engines should be transparent and construct relevant—that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and directly relevant to the target assessment construct. We address the current limitations on evidence and validity arguments for scores from automated scoring engines from the points of view of the Standards for Educational and Psychological Testing (i.e., construct relevance, construct representation, and fairness) and emerging principles in Artificial Intelligence (e.g., explainable AI, an examinee's right to explanations, and principled AI). We illustrate these concepts and arguments for automated essay scores.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48147561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Psychometric Methods to Evaluate Measurement and Algorithmic Bias in Automated Scoring 评估自动评分测量和算法偏差的心理测量学方法
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-06-01 DOI: 10.1111/jedm.12335
Matthew S. Johnson, Xiang Liu, Daniel F. McCaffrey

With the increasing use of automated scores in operational testing settings comes the need to understand the ways in which they can yield biased and unfair results. In this paper, we provide a brief survey of some of the ways in which the predictive methods used in automated scoring can lead to biased, and thus unfair automated scores. After providing definitions of fairness from machine learning and a psychometric framework to study them, we demonstrate how modeling decisions, like omitting variables, using proxy measures or confounded variables, and even the optimization criterion in estimation can lead to biased and unfair automated scores. We then introduce two simple methods for evaluating bias, evaluate their statistical properties through simulation, and apply to an item from a large-scale reading assessment.

随着在操作测试设置中越来越多地使用自动化分数,需要了解它们可能产生偏见和不公平结果的方式。在本文中,我们简要介绍了自动评分中使用的预测方法可能导致有偏见,从而导致不公平的自动评分的一些方式。在提供了机器学习公平性的定义和研究它们的心理测量框架之后,我们展示了建模决策,如省略变量,使用代理度量或混淆变量,甚至是估计中的优化标准,如何导致有偏见和不公平的自动分数。然后,我们介绍了两种评估偏差的简单方法,通过模拟评估它们的统计特性,并将其应用于大规模阅读评估中的一个项目。
{"title":"Psychometric Methods to Evaluate Measurement and Algorithmic Bias in Automated Scoring","authors":"Matthew S. Johnson,&nbsp;Xiang Liu,&nbsp;Daniel F. McCaffrey","doi":"10.1111/jedm.12335","DOIUrl":"10.1111/jedm.12335","url":null,"abstract":"<p>With the increasing use of automated scores in operational testing settings comes the need to understand the ways in which they can yield biased and unfair results. In this paper, we provide a brief survey of some of the ways in which the predictive methods used in automated scoring can lead to biased, and thus unfair automated scores. After providing definitions of fairness from machine learning and a psychometric framework to study them, we demonstrate how modeling decisions, like omitting variables, using proxy measures or confounded variables, and even the optimization criterion in estimation can lead to biased and unfair automated scores. We then introduce two simple methods for evaluating bias, evaluate their statistical properties through simulation, and apply to an item from a large-scale reading assessment.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49138205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Toward Argument-Based Fairness with an Application to AI-Enhanced Educational Assessments 基于论证的公平性及其在人工智能增强教育评估中的应用
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-06-01 DOI: 10.1111/jedm.12334
A. Corinne Huggins-Manley, Brandon M. Booth, Sidney K. D'Mello

The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible approach to fairness arguments that occurs outside of and complementary to validity arguments is required to address many of the views on fairness that a set of assessment stakeholders may hold. Accordingly, we focus this manuscript on two contributions: (a) introducing the argument-based fairness approach to complement argument-based validity for both traditional and artificial intelligence (AI)-enhanced assessments and (b) applying it in an illustrative AI assessment of perceived hireability in automated video interviews used to prescreen job candidates. We conclude with recommendations for further advancing argument-based fairness approaches.

教育测量领域将有效性和公平性作为评估质量的核心概念。先前的研究已经提出在基于论证的有效性过程中嵌入公平论证,特别是当公平被认为是跨群体评估属性的可比性时。然而,我们认为,为了解决一组评估利益相关者可能持有的许多关于公平性的观点,需要一种更灵活的方法来解决有效性论点之外的公平性论点并与之互补。因此,我们将本文的重点放在两个方面:(a)引入基于论证的公平性方法,以补充传统评估和人工智能(AI)增强评估的基于论证的有效性;(b)将其应用于用于预筛选求职者的自动视频面试中感知可雇佣性的说明性AI评估。最后,我们提出了进一步推进基于论证的公平方法的建议。
{"title":"Toward Argument-Based Fairness with an Application to AI-Enhanced Educational Assessments","authors":"A. Corinne Huggins-Manley,&nbsp;Brandon M. Booth,&nbsp;Sidney K. D'Mello","doi":"10.1111/jedm.12334","DOIUrl":"10.1111/jedm.12334","url":null,"abstract":"<p>The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible approach to fairness arguments that occurs outside of and complementary to validity arguments is required to address many of the views on fairness that a set of assessment stakeholders may hold. Accordingly, we focus this manuscript on two contributions: (a) introducing the argument-based fairness approach to complement argument-based validity for both traditional and artificial intelligence (AI)-enhanced assessments and (b) applying it in an illustrative AI assessment of perceived hireability in automated video interviews used to prescreen job candidates. We conclude with recommendations for further advancing argument-based fairness approaches.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45199321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Linking and Comparability across Conditions of Measurement: Established Frameworks and Proposed Updates 测量条件之间的联系和可比性:已建立的框架和建议的更新
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-05-30 DOI: 10.1111/jedm.12322
Tim Moses

One result of recent changes in testing is that previously established linking frameworks may not adequately address challenges in current linking situations. Test linking through equating, concordance, vertical scaling or battery scaling may not represent linkings for the scores of tests developed to measure constructs differently for different examinees, or tests that are administered in different modes and data collection designs. This article considers how previously proposed linking frameworks might be updated to address more recent testing situations. The first section summarizes the definitions and frameworks described in previous test linking discussions. Additional sections consider some sources of more disparate approaches to test development and administrations, as well as the implications of these for test linking. Possibilities for reflecting these features in an expanded test linking framework are proposed that encourage limited comparability, such as comparability that is restricted to subgroups or to the conditions of a linking study when a linking is produced, or within, but not across tests or test forms when an empirical linking based on examinee data is not produced. The implications of an updated framework of previously established linking approaches are further described in a final discussion.

最近测试变化的一个结果是,以前建立的链接框架可能无法充分应对当前链接情况下的挑战。通过等号法、一致性法、垂直缩放法或单元缩放法进行的考试联系,可能不代表为不同考生开发的不同构式的考试成绩之间的联系,也不代表以不同模式和数据收集设计进行的考试成绩之间的联系。本文考虑如何更新先前提出的链接框架,以解决最近的测试情况。第一部分总结了前面测试链接讨论中描述的定义和框架。另外的部分将考虑更多不同的测试开发和管理方法的一些来源,以及这些方法对测试链接的影响。建议在扩展的测试链接框架中反映这些特征的可能性,以鼓励有限的可比性,例如,当产生链接时,仅限于子组或链接研究条件的可比性,或者当没有产生基于考生数据的经验链接时,在测试或测试形式内而不是跨测试或测试形式的可比性。在最后的讨论中进一步描述了先前建立的联系方法的更新框架的含义。
{"title":"Linking and Comparability across Conditions of Measurement: Established Frameworks and Proposed Updates","authors":"Tim Moses","doi":"10.1111/jedm.12322","DOIUrl":"10.1111/jedm.12322","url":null,"abstract":"<p>One result of recent changes in testing is that previously established linking frameworks may not adequately address challenges in current linking situations. Test linking through equating, concordance, vertical scaling or battery scaling may not represent linkings for the scores of tests developed to measure constructs differently for different examinees, or tests that are administered in different modes and data collection designs. This article considers how previously proposed linking frameworks might be updated to address more recent testing situations. The first section summarizes the definitions and frameworks described in previous test linking discussions. Additional sections consider some sources of more disparate approaches to test development and administrations, as well as the implications of these for test linking. Possibilities for reflecting these features in an expanded test linking framework are proposed that encourage limited comparability, such as comparability that is restricted to subgroups or to the conditions of a linking study when a linking is produced, or within, but not across tests or test forms when an empirical linking based on examinee data is not produced. The implications of an updated framework of previously established linking approaches are further described in a final discussion.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46845951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Introduction to the Special Issue Maintaining Score Comparability: Recent Challenges and Some Possible Solutions 保持分数可比性特刊导论:最近的挑战和一些可能的解决方案
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-05-26 DOI: 10.1111/jedm.12323
Tim Moses, Gautam Puhan
{"title":"Introduction to the Special Issue Maintaining Score Comparability: Recent Challenges and Some Possible Solutions","authors":"Tim Moses,&nbsp;Gautam Puhan","doi":"10.1111/jedm.12323","DOIUrl":"10.1111/jedm.12323","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49471640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1