Pub Date : 2022-10-02DOI: 10.1080/15305058.2022.2143173
Christopher D. Nye
each to individuals in research evaluating their implications and utility for psychological assessment. the of on topic, the Journal of solicited addressing the use of technology for assessments in organizational, psychological, or educational settings. purpose of inviting these was to promote research on this topic and to address important issues related to the development and use of high-qual-ity,
{"title":"Technology-based assessments: Novel approaches to testing in organizational, psychological, and educational settings","authors":"Christopher D. Nye","doi":"10.1080/15305058.2022.2143173","DOIUrl":"https://doi.org/10.1080/15305058.2022.2143173","url":null,"abstract":"each to individuals in research evaluating their implications and utility for psychological assessment. the of on topic, the Journal of solicited addressing the use of technology for assessments in organizational, psychological, or educational settings. purpose of inviting these was to promote research on this topic and to address important issues related to the development and use of high-qual-ity,","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"213 - 215"},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45818419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15305058.2022.2070757
Gloria Liou, Cavan V. Bonner, L. Tay
Abstract With the advent of big data and advances in technology, psychological assessments have become increasingly sophisticated and complex. Nevertheless, traditional psychometric issues concerning the validity, reliability, and measurement bias of such assessments remain fundamental in determining whether score inferences of human attributes are appropriate. We focus on three technological advances—the use of organic data for psychological assessments, the application of machine learning algorithms, and adaptive and gamified assessments—and review how the concepts of validity, reliability, and measurement bias may apply in particular ways within those areas. This provides direction for researchers and practitioners to advance the rigor of technology-based assessments from a psychometric perspective.
{"title":"A psychometric view of technology-based assessments","authors":"Gloria Liou, Cavan V. Bonner, L. Tay","doi":"10.1080/15305058.2022.2070757","DOIUrl":"https://doi.org/10.1080/15305058.2022.2070757","url":null,"abstract":"Abstract With the advent of big data and advances in technology, psychological assessments have become increasingly sophisticated and complex. Nevertheless, traditional psychometric issues concerning the validity, reliability, and measurement bias of such assessments remain fundamental in determining whether score inferences of human attributes are appropriate. We focus on three technological advances—the use of organic data for psychological assessments, the application of machine learning algorithms, and adaptive and gamified assessments—and review how the concepts of validity, reliability, and measurement bias may apply in particular ways within those areas. This provides direction for researchers and practitioners to advance the rigor of technology-based assessments from a psychometric perspective.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"216 - 242"},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45801075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-12DOI: 10.1080/15305058.2022.2070756
Merve Sarac, E. Loken
Abstract This study is an exploratory analysis of examinee behavior in a large-scale language proficiency test. Despite a number-right scoring system with no penalty for guessing, we found that 16% of examinees omitted at least one answer and that women were more likely than men to omit answers. Item-response theory analyses treating the omitted responses as missing rather than wrong showed that examinees had underperformed by skipping the answers, with a greater underperformance among more able participants. An analysis of omitted answer patterns showed that reading passage items were most likely to be omitted, and that native language-translation items were least likely to be omitted. We hypothesized that since reading passage items were most tempting to skip, then among examinees who did answer every question there might be a tendency to guess at these items. Using cluster analyses, we found that underperformance on the reading items was more likely than underperformance on the non-reading passage items. In large-scale operational tests, examinees must know the optimal strategy for taking the test. Test developers must also understand how examinee behavior might impact the validity of score interpretations.
{"title":"Examining patterns of omitted responses in a large-scale English language proficiency test","authors":"Merve Sarac, E. Loken","doi":"10.1080/15305058.2022.2070756","DOIUrl":"https://doi.org/10.1080/15305058.2022.2070756","url":null,"abstract":"Abstract This study is an exploratory analysis of examinee behavior in a large-scale language proficiency test. Despite a number-right scoring system with no penalty for guessing, we found that 16% of examinees omitted at least one answer and that women were more likely than men to omit answers. Item-response theory analyses treating the omitted responses as missing rather than wrong showed that examinees had underperformed by skipping the answers, with a greater underperformance among more able participants. An analysis of omitted answer patterns showed that reading passage items were most likely to be omitted, and that native language-translation items were least likely to be omitted. We hypothesized that since reading passage items were most tempting to skip, then among examinees who did answer every question there might be a tendency to guess at these items. Using cluster analyses, we found that underperformance on the reading items was more likely than underperformance on the non-reading passage items. In large-scale operational tests, examinees must know the optimal strategy for taking the test. Test developers must also understand how examinee behavior might impact the validity of score interpretations.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"56 - 72"},"PeriodicalIF":1.7,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41620786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-15DOI: 10.1080/15305058.2022.2047692
Nathaniel M. Voss, Cassandra Chlevin-Thiele, Christopher J. Lake, Chi-Leigh Q. Warren
Abstract The goal of this study was to extend research on scale contextualization (i.e., frame-of-reference effect) to the decision making styles construct, compare the effects of contextualization across three unique decision style scales, and examine the consequences of scale contextualization within an item response theory framework. Based on a mixed experimental design, data gathered from 661 university students indicated that contextualized scales yielded higher predictive validity, occasionally possessed psychometric properties better than the original measures, and that the effects of contextualization are somewhat scale-specific. These findings provide important insights for researchers and practitioners seeking to modify and adapt existing scales.
{"title":"Using item response theory to understand the effects of scale contextualization: An illustration using decision making style scales","authors":"Nathaniel M. Voss, Cassandra Chlevin-Thiele, Christopher J. Lake, Chi-Leigh Q. Warren","doi":"10.1080/15305058.2022.2047692","DOIUrl":"https://doi.org/10.1080/15305058.2022.2047692","url":null,"abstract":"Abstract The goal of this study was to extend research on scale contextualization (i.e., frame-of-reference effect) to the decision making styles construct, compare the effects of contextualization across three unique decision style scales, and examine the consequences of scale contextualization within an item response theory framework. Based on a mixed experimental design, data gathered from 661 university students indicated that contextualized scales yielded higher predictive validity, occasionally possessed psychometric properties better than the original measures, and that the effects of contextualization are somewhat scale-specific. These findings provide important insights for researchers and practitioners seeking to modify and adapt existing scales.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"34 - 55"},"PeriodicalIF":1.7,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45802961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-25DOI: 10.1080/15305058.2022.2036162
M. van Wyk, G. Lipinska, M. Henry, T. K. Phillips, P. E. van der Walt
Abstract Resilience comprises various neurobiological, developmental, and psychosocial components. However, existing measures lack certain critical components, while having limited utility in low-to-middle-income settings. We aimed to develop a reliable and valid measure of resilience encompassing a broad range of components and that can be used across different income settings. We also set out to develop empirical cutoff scores for low, moderate, and high resilience. Results from 686 participants revealed the emergence of three components: positive affect (α = 0.879), early-life stability (α = 0.879), and stress mastery (α = 0.683). Convergent and incremental validity was confirmed using an existing resilience measure as the benchmark. Concurrent validity was also confirmed with significant negative correlations with measures of depression, anxiety, posttraumatic stress disorder, and sleep disruption. Finally, we successfully determined cutoff scores for low, moderate, and high resilience. Results confirm that the Resilience Index is a reliable and valid measure that can be utilized in both high- and low-to-middle-income settings.
{"title":"The development and validation of the Resilience Index","authors":"M. van Wyk, G. Lipinska, M. Henry, T. K. Phillips, P. E. van der Walt","doi":"10.1080/15305058.2022.2036162","DOIUrl":"https://doi.org/10.1080/15305058.2022.2036162","url":null,"abstract":"Abstract Resilience comprises various neurobiological, developmental, and psychosocial components. However, existing measures lack certain critical components, while having limited utility in low-to-middle-income settings. We aimed to develop a reliable and valid measure of resilience encompassing a broad range of components and that can be used across different income settings. We also set out to develop empirical cutoff scores for low, moderate, and high resilience. Results from 686 participants revealed the emergence of three components: positive affect (α = 0.879), early-life stability (α = 0.879), and stress mastery (α = 0.683). Convergent and incremental validity was confirmed using an existing resilience measure as the benchmark. Concurrent validity was also confirmed with significant negative correlations with measures of depression, anxiety, posttraumatic stress disorder, and sleep disruption. Finally, we successfully determined cutoff scores for low, moderate, and high resilience. Results confirm that the Resilience Index is a reliable and valid measure that can be utilized in both high- and low-to-middle-income settings.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"185 - 211"},"PeriodicalIF":1.7,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45130282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-25DOI: 10.1080/15305058.2022.2044821
H. Bulut, O. Bulut, Serkan Arıkan
Abstract This study examined group differences in online reading comprehension (ORC) using student data from the 2016 administration of the Progress in International Reading Literacy Study (ePIRLS). An explanatory item response modeling approach was used to explore the effects of item properties (i.e., item format, text complexity, and cognitive complexity), student characteristics (i.e., gender and language groups), and their interactions on dichotomous and polytomous item responses. The results showed that female students outperform male students in ORC tasks and that the achievement difference between female and male students appears to change text complexity increases. Similarly, the cognitive complexity of the items seems to play a significant role in explaining the gender gap in ORC performance. Students who never (or sometimes) speak the test language at home particularly struggled with answering ORC tasks. The achievement gap between students who always (or almost always) speak the test language at home and those who never (or sometimes) speak the test language at home was larger for constructed-response items and items with higher cognitive complexity. Overall, the findings suggest that item properties could help understand performance differences between gender and language groups in ORC assessments.
{"title":"Evaluating group differences in online reading comprehension: The impact of item properties","authors":"H. Bulut, O. Bulut, Serkan Arıkan","doi":"10.1080/15305058.2022.2044821","DOIUrl":"https://doi.org/10.1080/15305058.2022.2044821","url":null,"abstract":"Abstract This study examined group differences in online reading comprehension (ORC) using student data from the 2016 administration of the Progress in International Reading Literacy Study (ePIRLS). An explanatory item response modeling approach was used to explore the effects of item properties (i.e., item format, text complexity, and cognitive complexity), student characteristics (i.e., gender and language groups), and their interactions on dichotomous and polytomous item responses. The results showed that female students outperform male students in ORC tasks and that the achievement difference between female and male students appears to change text complexity increases. Similarly, the cognitive complexity of the items seems to play a significant role in explaining the gender gap in ORC performance. Students who never (or sometimes) speak the test language at home particularly struggled with answering ORC tasks. The achievement gap between students who always (or almost always) speak the test language at home and those who never (or sometimes) speak the test language at home was larger for constructed-response items and items with higher cognitive complexity. Overall, the findings suggest that item properties could help understand performance differences between gender and language groups in ORC assessments.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"10 - 33"},"PeriodicalIF":1.7,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42236674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-09DOI: 10.1080/15305058.2022.2036161
Joseph A. Rios, J. Soland
Abstract The objective of the present study was to investigate item-, examinee-, and country-level correlates of rapid guessing (RG) in the context of the 2018 PISA science assessment. Analyzing data from 267,148 examinees across 71 countries showed that over 50% of examinees engaged in RG on an average proportion of one in 10 items. Descriptive differences were noted between countries on the mean number of RG responses per examinee with discrepancies as large as 500%. Country-level differences in the odds of engaging in RG were associated with mean performance and regional membership. Furthermore, based on a two-level cross-classified hierarchical linear model, both item- and examinee-level correlates were found to moderate the likelihood of RG. Specifically, the inclusion of items with multimedia content was associated with a decrease in RG. A number of demographic and attitudinal examinee-level variables were also significant moderators, including sex, linguistic background, SES, and self-rated reading comprehension, motivation mastery, and fear of failure. The findings from this study imply that select subgroup comparisons within and across nations may be biased by differential test-taking effort. To mitigate RG in international assessments, future test developers may look to leverage technology-enhanced items.
{"title":"An investigation of item, examinee, and country correlates of rapid guessing in PISA","authors":"Joseph A. Rios, J. Soland","doi":"10.1080/15305058.2022.2036161","DOIUrl":"https://doi.org/10.1080/15305058.2022.2036161","url":null,"abstract":"Abstract The objective of the present study was to investigate item-, examinee-, and country-level correlates of rapid guessing (RG) in the context of the 2018 PISA science assessment. Analyzing data from 267,148 examinees across 71 countries showed that over 50% of examinees engaged in RG on an average proportion of one in 10 items. Descriptive differences were noted between countries on the mean number of RG responses per examinee with discrepancies as large as 500%. Country-level differences in the odds of engaging in RG were associated with mean performance and regional membership. Furthermore, based on a two-level cross-classified hierarchical linear model, both item- and examinee-level correlates were found to moderate the likelihood of RG. Specifically, the inclusion of items with multimedia content was associated with a decrease in RG. A number of demographic and attitudinal examinee-level variables were also significant moderators, including sex, linguistic background, SES, and self-rated reading comprehension, motivation mastery, and fear of failure. The findings from this study imply that select subgroup comparisons within and across nations may be biased by differential test-taking effort. To mitigate RG in international assessments, future test developers may look to leverage technology-enhanced items.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"57 3","pages":"154 - 184"},"PeriodicalIF":1.7,"publicationDate":"2022-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41309048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-02DOI: 10.1080/15305058.2021.2019750
Daniel A. Newman, Chen Tang, Q. Song, Serena Wee
Abstract In considering whether to retain the GRE in graduate school admissions, admissions committees often pursue two objectives: (a) performance in graduate school (e.g., admitting individuals who will perform better in classes and research), and (b) diversity/fairness (e.g., equal selection rates between demographic groups). Drawing upon HR research (adverse impact research), we address four issues in using the GRE. First, we review the tension created between two robust findings: (a) validity of the GRE for predicting graduate school performance (rooted in the principle of standardization and a half-century of educational and psychometric research), and (b) the achievement gap in test scores between demographic groups (rooted in several centuries of systemic racism). This empirical tension can often produce a local diversity-performance tradeoff for admissions committees. Second, we use Pareto-optimal tradeoff curves to formalize potential diversity-performance tradeoffs, guiding how much weight to assign the GRE in admissions. Whether dropping the GRE produces suboptimal admissions depends upon one’s relative valuation of diversity versus performance. Third, we review three distinct notions of test fairness—equality, test equity, and performance equity—which have differing implications for dropping the GRE. Finally, we consider test fairness under GRE-optional admissions, noting the missing data problem when GRE is an incomplete variable. Supplemental data for this article is available online at
{"title":"Dropping the GRE, keeping the GRE, or GRE-optional admissions? Considering tradeoffs and fairness","authors":"Daniel A. Newman, Chen Tang, Q. Song, Serena Wee","doi":"10.1080/15305058.2021.2019750","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019750","url":null,"abstract":"Abstract In considering whether to retain the GRE in graduate school admissions, admissions committees often pursue two objectives: (a) performance in graduate school (e.g., admitting individuals who will perform better in classes and research), and (b) diversity/fairness (e.g., equal selection rates between demographic groups). Drawing upon HR research (adverse impact research), we address four issues in using the GRE. First, we review the tension created between two robust findings: (a) validity of the GRE for predicting graduate school performance (rooted in the principle of standardization and a half-century of educational and psychometric research), and (b) the achievement gap in test scores between demographic groups (rooted in several centuries of systemic racism). This empirical tension can often produce a local diversity-performance tradeoff for admissions committees. Second, we use Pareto-optimal tradeoff curves to formalize potential diversity-performance tradeoffs, guiding how much weight to assign the GRE in admissions. Whether dropping the GRE produces suboptimal admissions depends upon one’s relative valuation of diversity versus performance. Third, we review three distinct notions of test fairness—equality, test equity, and performance equity—which have differing implications for dropping the GRE. Finally, we consider test fairness under GRE-optional admissions, noting the missing data problem when GRE is an incomplete variable. Supplemental data for this article is available online at","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"43 - 71"},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41543647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-02DOI: 10.1080/15305058.2021.2019751
Dev K. Dalal, Jason G. Randall, Ho Kwan Cheung, Brandon Gorman, Sylvia G. Roch, K. Williams
Abstract Individuals concerned with subgroup differences on standardized tests suggest replacing these tests with holistic evaluations of unstructured application materials, such as letters of recommendation (LORs), which they posit show less bias. We empirically investigate this proposition that LORs are bias-free, and argue that LORs might actually invite systematic, race and gender subgroup differences in the content and evaluation of LORs. We text analyzed over 37,000 LORs submitted on behalf of over 10,000 graduate school applicants. Results showed that LOR content does differ across applicants. Furthermore, we see some systematic gender, race, and gender-race intersection differences in LOR content. Content of LORs also systematically differed between degree programs (S.T.E.M. vs. non-S.T.E.M.) and degree sought (doctoral vs. masters). Finally, LOR content alone did not predict an appreciable amount of variance in offers of admission (the first barrier to increasing diversity and inclusion in graduate programs). Our results, combined with past research on LOR content bias, highlight concerns that LORs can be biased against marginalized groups. We conclude with suggestions for reducing potential bias in LOR and for increasing diversity in graduate programs.
{"title":"Is there bias in alternatives to standardized tests? An investigation into letters of recommendation","authors":"Dev K. Dalal, Jason G. Randall, Ho Kwan Cheung, Brandon Gorman, Sylvia G. Roch, K. Williams","doi":"10.1080/15305058.2021.2019751","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019751","url":null,"abstract":"Abstract Individuals concerned with subgroup differences on standardized tests suggest replacing these tests with holistic evaluations of unstructured application materials, such as letters of recommendation (LORs), which they posit show less bias. We empirically investigate this proposition that LORs are bias-free, and argue that LORs might actually invite systematic, race and gender subgroup differences in the content and evaluation of LORs. We text analyzed over 37,000 LORs submitted on behalf of over 10,000 graduate school applicants. Results showed that LOR content does differ across applicants. Furthermore, we see some systematic gender, race, and gender-race intersection differences in LOR content. Content of LORs also systematically differed between degree programs (S.T.E.M. vs. non-S.T.E.M.) and degree sought (doctoral vs. masters). Finally, LOR content alone did not predict an appreciable amount of variance in offers of admission (the first barrier to increasing diversity and inclusion in graduate programs). Our results, combined with past research on LOR content bias, highlight concerns that LORs can be biased against marginalized groups. We conclude with suggestions for reducing potential bias in LOR and for increasing diversity in graduate programs.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"21 - 42"},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42116647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-02DOI: 10.1080/15305058.2021.2019753
S. E. Woo, B. Wille, S. Sireci
Across the globe, educational tests are used for admissions decisions to competitive colleges, universities, high schools, and other programs. The high stakes associated with these tests have important consequences for students, and performance on them can determine whether students reach their academic and career aspirations. For this reason, their use is both widespread and contentious. Recently, the debates over the use of standardized tests in college and graduate admissions has increased, due in large part to concerns about score disparities resulting in disparate admissions outcomes. The International Journal of Testing has published many examples of criticisms and research with respect to admissions testing around the world, including in Chile (Ramirez et al., 2020), Israel (Rapp & Allalouf, 2003), Saudi Arabia (Tsaousis et al., 2018), Sweden (Wiberg & von Davier, 2017), and the United States (e.g., TalentoMiller, 2008). In the U.S. and Chile, public outcry against disparate outcomes for certain groups of students have marshaled in changes in admissions testing programs and the policies associated with them (Koljatic et al., 2021). In the U.S., several colleges and universities have suspended the SAT and ACT requirements for their applicants, which generated a number of heated discussions both within and outside academia. The use of the Graduate Record Examinations (GREs) in graduate admissions is also being hotly debated for similar reasons, and a number of graduate programs in the U.S. have opted to remove the GRE requirement from https://doi.org/10.1080/15305058.2021.2019753
在全球范围内,教育测试被用于竞争激烈的学院、大学、高中和其他项目的招生决定。与这些测试相关的高风险对学生有着重要的影响,而这些测试的表现可以决定学生是否达到他们的学术和职业抱负。因此,它们的使用既广泛又有争议。最近,关于在大学和研究生招生中使用标准化考试的争论愈演愈烈,这在很大程度上是因为人们担心分数差异会导致不同的招生结果。《国际考试杂志》在世界各地发表了许多关于招生考试的批评和研究实例,包括智利(Ramirez et al.,2020)、以色列(Rapp&Allalouf,2003)、沙特阿拉伯(Tsaouss et al.,2018)、瑞典(Wiberg&von Davier,2017)和美国(例如,TalentoMiller,2008)。在美国和智利,公众对某些学生群体不同结果的强烈抗议导致了招生测试项目及其相关政策的变化(Koljatic等人,2021)。在美国,几所学院和大学已经暂停了对申请人的SAT和ACT要求,这在学术界内外引发了一些激烈的讨论。出于类似的原因,研究生入学考试(GRE)的使用也受到了激烈的争论,美国的一些研究生项目选择从https://doi.org/10.1080/15305058.2021.2019753
{"title":"Introduction to International Journal of Testing special issue on equity and fairness in testing and assessment in school admissions","authors":"S. E. Woo, B. Wille, S. Sireci","doi":"10.1080/15305058.2021.2019753","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019753","url":null,"abstract":"Across the globe, educational tests are used for admissions decisions to competitive colleges, universities, high schools, and other programs. The high stakes associated with these tests have important consequences for students, and performance on them can determine whether students reach their academic and career aspirations. For this reason, their use is both widespread and contentious. Recently, the debates over the use of standardized tests in college and graduate admissions has increased, due in large part to concerns about score disparities resulting in disparate admissions outcomes. The International Journal of Testing has published many examples of criticisms and research with respect to admissions testing around the world, including in Chile (Ramirez et al., 2020), Israel (Rapp & Allalouf, 2003), Saudi Arabia (Tsaousis et al., 2018), Sweden (Wiberg & von Davier, 2017), and the United States (e.g., TalentoMiller, 2008). In the U.S. and Chile, public outcry against disparate outcomes for certain groups of students have marshaled in changes in admissions testing programs and the policies associated with them (Koljatic et al., 2021). In the U.S., several colleges and universities have suspended the SAT and ACT requirements for their applicants, which generated a number of heated discussions both within and outside academia. The use of the Graduate Record Examinations (GREs) in graduate admissions is also being hotly debated for similar reasons, and a number of graduate programs in the U.S. have opted to remove the GRE requirement from https://doi.org/10.1080/15305058.2021.2019753","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"1 - 4"},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48665709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}