Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

IF 1.6 4区心理学 Q3 PSYCHOLOGY, APPLIED Journal of Educational Measurement Pub Date : 2023-06-09 DOI:10.1111/jedm.12372

Benjamin R. Shear

{"title":"Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests","authors":"Benjamin R. Shear","doi":"10.1111/jedm.12372","DOIUrl":null,"url":null,"abstract":"<p>Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 4","pages":"676-696"},"PeriodicalIF":1.6000,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12372","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

测试项目格式中的性别偏见：来自PISA 2009、2012和2015年数学和阅读测试的证据

大规模标准化考试通常用于衡量学生的整体成绩和学生分组。这些用途假设测试提供了跨学生亚组结果的可比测量，但先前的研究表明，跨性别群体的分数比较可能因使用的测试项目类型而变得复杂。本文提供的证据表明，在参加2009年、2012年和2015年PISA数学和阅读测试的美国15岁学生的全国代表性样本中，性别差异的项目格式是一致的。平均而言，男学生答对多项选择题的频率相对较高，女学生答对构念题的频率相对较高。这些模式在另外34个参与PISA的司法管辖区是一致的，尽管格式差异的大小各不相同，阅读的平均差异大于数学。格式差异的平均幅度不足以在旨在检测测试偏差的常规差异项目功能分析中进行标记，但足以对基于跨性别群体得分比较的推断的有效性提出质疑。研究人员和其他测试用户应该考虑到测试项目的格式，特别是在比较不同性别群体的分数时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Educational Measurement Multiple-

CiteScore

2.30

自引率

7.70%

发文量

期刊介绍： The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.

期刊最新文献

Evaluating General-Purpose Multimodal AI for Q-Matrix Generation from Math Items: A Cognitive Diagnostic Modeling Exploration AI and Measurement Concerns: Dealing with Imbalanced Data in Autoscoring Correction to “Using GPT-4 to Augment Imbalanced Data for Automatic Scoring” Issue Information Issue Information