大规模高风险语言测试中跨性别和学术背景的DIF调查

IF 0.1 Q4 LINGUISTICS Studies in Language Assessment Pub Date : 2015-01-01 DOI:10.58379/rshg8366

Xia-li Song, Liying Cheng, D. Klinger

{"title":"大规模高风险语言测试中跨性别和学术背景的DIF调查","authors":"Xia-li Song, Liying Cheng, D. Klinger","doi":"10.58379/rshg8366","DOIUrl":null,"url":null,"abstract":"High-stakes pre-entry language testing is the predominate tool used to measure test takers’ proficiency for admission purposes in higher education in China. Given the important role of these tests, there are heated discussions about how to ensure test fairness for different groups of test takers. This study examined the fairness of the Graduate School Entrance English Examination (GSEEE) that is used to decide whether over one million test takers can enter master’s programs in China. Using SIBTEST and content analysis, the study investigated differential item functioning (DIF) and the presence of potential bias on the GSEEE with aspects to groups of gender and academic background. Results found that a large percentage of the GSEEE items did not provide reliable results to distinguish good and poor performers. A number of DIF and DBF functioned differentially and three test reviewers identified a myriad of factors such as motivation and learning styles that potentially contributed to group performance differences. However, consistent evidence was not found to suggest these flagged items/texts exhibited bias. While systematic bias may not have been detected, the results revealed poor test reliability and the study highlighted an urgent need to improve test quality and clarify the purpose of the test. DIF issues may be revisited once test quality has been improved.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"90 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"DIF investigations across groups of gender and academic background in a large-scale high-stakes language test \",\"authors\":\"Xia-li Song, Liying Cheng, D. Klinger\",\"doi\":\"10.58379/rshg8366\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-stakes pre-entry language testing is the predominate tool used to measure test takers’ proficiency for admission purposes in higher education in China. Given the important role of these tests, there are heated discussions about how to ensure test fairness for different groups of test takers. This study examined the fairness of the Graduate School Entrance English Examination (GSEEE) that is used to decide whether over one million test takers can enter master’s programs in China. Using SIBTEST and content analysis, the study investigated differential item functioning (DIF) and the presence of potential bias on the GSEEE with aspects to groups of gender and academic background. Results found that a large percentage of the GSEEE items did not provide reliable results to distinguish good and poor performers. A number of DIF and DBF functioned differentially and three test reviewers identified a myriad of factors such as motivation and learning styles that potentially contributed to group performance differences. However, consistent evidence was not found to suggest these flagged items/texts exhibited bias. While systematic bias may not have been detected, the results revealed poor test reliability and the study highlighted an urgent need to improve test quality and clarify the purpose of the test. DIF issues may be revisited once test quality has been improved.\",\"PeriodicalId\":29650,\"journal\":{\"name\":\"Studies in Language Assessment\",\"volume\":\"90 1\",\"pages\":\"\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2015-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in Language Assessment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.58379/rshg8366\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Language Assessment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58379/rshg8366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 12

摘要

在中国高等教育中，高风险的入学前语言测试是衡量考生入学能力的主要工具。鉴于这些考试的重要作用，人们就如何确保不同考生群体的考试公平展开了激烈的讨论。这项研究考察了研究生入学英语考试(GSEEE)的公平性，GSEEE被用来决定中国100多万考生是否能进入硕士项目。本研究采用SIBTEST和内容分析方法，从性别和学术背景两个方面考察了GSEEE的差异项目功能(DIF)和潜在偏差的存在。结果发现，很大比例的GSEEE项目没有提供可靠的结果来区分优等生和劣等生。许多DIF和DBF的功能是不同的，三个测试评审者确定了无数的因素，如动机和学习风格，这些因素可能会导致小组表现的差异。然而，没有一致的证据表明这些标记的项目/文本显示出偏见。虽然系统偏倚可能没有被发现，但结果显示了较差的测试可靠性，该研究强调了提高测试质量和澄清测试目的的迫切需要。一旦测试质量得到改善，DIF问题可能会被重新审视。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DIF investigations across groups of gender and academic background in a large-scale high-stakes language test

High-stakes pre-entry language testing is the predominate tool used to measure test takers’ proficiency for admission purposes in higher education in China. Given the important role of these tests, there are heated discussions about how to ensure test fairness for different groups of test takers. This study examined the fairness of the Graduate School Entrance English Examination (GSEEE) that is used to decide whether over one million test takers can enter master’s programs in China. Using SIBTEST and content analysis, the study investigated differential item functioning (DIF) and the presence of potential bias on the GSEEE with aspects to groups of gender and academic background. Results found that a large percentage of the GSEEE items did not provide reliable results to distinguish good and poor performers. A number of DIF and DBF functioned differentially and three test reviewers identified a myriad of factors such as motivation and learning styles that potentially contributed to group performance differences. However, consistent evidence was not found to suggest these flagged items/texts exhibited bias. While systematic bias may not have been detected, the results revealed poor test reliability and the study highlighted an urgent need to improve test quality and clarify the purpose of the test. DIF issues may be revisited once test quality has been improved.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Studies in Language Assessment

自引率

0.00%

发文量