首页 > 最新文献

International Journal of Testing最新文献

英文 中文
Generating reading comprehension items using automated processes 使用自动化过程生成阅读理解项目
IF 1.7 Q1 Social Sciences Pub Date : 2022-10-02 DOI: 10.1080/15305058.2022.2070755
Jinnie Shin, Mark J. Gierl
Abstract Over the last five years, tremendous strides have been made in advancing the AIG methodology required to produce items in diverse content areas. However, the one content area where enormous problems remain unsolved is language arts, generally, and reading comprehension, more specifically. While reading comprehension test items can be created using many different item formats, fill-in-the-blank remains one of the most common when the goal is to measure inferential knowledge. Currently, the item development process used to create fill-in-the-blank reading comprehension items is time-consuming and expensive. Hence, the purpose of the study is to introduce a new systematic method for generating fill-in-the-blank reading comprehension items using an item modeling approach. We describe the use of different unsupervised learning methods that can be paired with natural language processing techniques to identify the salient item models within existing texts. To demonstrate the capacity of our method, 1,013 test items were generated from 100 input texts taken from fill-in-the-blank reading comprehension items used on a high-stakes college entrance exam in South Korea. Our validation results indicated that the generated items produced higher semantic similarities between the item options while depicting little to no syntactic differences with the traditionally written test items.
摘要在过去的五年里,在推进制作不同内容领域项目所需的AIG方法方面取得了巨大进展。然而,一个巨大问题仍未解决的内容领域是语言艺术,更具体地说是阅读理解。虽然阅读理解测试项目可以使用许多不同的项目格式创建,但当目标是测量推理知识时,填空仍然是最常见的项目之一。目前,用于创建填空阅读理解项目的项目开发过程既耗时又昂贵。因此,本研究的目的是介绍一种利用项目建模方法生成填空阅读理解项目的新的系统方法。我们描述了使用不同的无监督学习方法,这些方法可以与自然语言处理技术相结合,以识别现有文本中的显著项目模型。为了证明我们的方法的能力,从韩国一次高风险大学入学考试中使用的填空阅读理解项目中提取的100个输入文本中生成了1013个测试项目。我们的验证结果表明,生成的项目在项目选项之间产生了更高的语义相似性,同时与传统的书面测试项目几乎没有语法差异。
{"title":"Generating reading comprehension items using automated processes","authors":"Jinnie Shin, Mark J. Gierl","doi":"10.1080/15305058.2022.2070755","DOIUrl":"https://doi.org/10.1080/15305058.2022.2070755","url":null,"abstract":"Abstract Over the last five years, tremendous strides have been made in advancing the AIG methodology required to produce items in diverse content areas. However, the one content area where enormous problems remain unsolved is language arts, generally, and reading comprehension, more specifically. While reading comprehension test items can be created using many different item formats, fill-in-the-blank remains one of the most common when the goal is to measure inferential knowledge. Currently, the item development process used to create fill-in-the-blank reading comprehension items is time-consuming and expensive. Hence, the purpose of the study is to introduce a new systematic method for generating fill-in-the-blank reading comprehension items using an item modeling approach. We describe the use of different unsupervised learning methods that can be paired with natural language processing techniques to identify the salient item models within existing texts. To demonstrate the capacity of our method, 1,013 test items were generated from 100 input texts taken from fill-in-the-blank reading comprehension items used on a high-stakes college entrance exam in South Korea. Our validation results indicated that the generated items produced higher semantic similarities between the item options while depicting little to no syntactic differences with the traditionally written test items.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45142900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the writing performance of educationally at-risk examinees using technology 使用技术调查受教育风险考生的写作表现
IF 1.7 Q1 Social Sciences Pub Date : 2022-10-02 DOI: 10.1080/15305058.2022.2050734
Mo Zhang, S. Sinharay
Abstract This article demonstrates how recent advances in technology allow fine-grained analyses of candidate-produced essays, thus providing a deeper insight on writing performance. We examined how essay features, automatically extracted using natural language processing and keystroke logging techniques, can predict various performance measures using data from a large-scale and high-stakes assessment for awarding high-school equivalency diploma. The features that are the most predictive of writing proficiency and broader academic success were identified and interpreted. The suggested methodology promises to be practically useful because it has the potential to point to specific writing skills that are important for improving essay writing and academic performance for educationally at-risk adult populations like the one considered in this article.
本文展示了最近的技术进步如何允许对候选人产生的文章进行细粒度分析,从而提供了对写作表现的更深入的了解。我们研究了使用自然语言处理和击键记录技术自动提取的论文特征如何使用来自授予高中同等学历的大规模高风险评估的数据来预测各种绩效指标。对写作能力和更广泛的学术成就最具预测性的特征进行了识别和解释。建议的方法有望在实践中发挥作用,因为它有可能指出特定的写作技巧,这些技巧对于提高论文写作和学习成绩非常重要,对于像本文中所考虑的那样有教育风险的成年人来说。
{"title":"Investigating the writing performance of educationally at-risk examinees using technology","authors":"Mo Zhang, S. Sinharay","doi":"10.1080/15305058.2022.2050734","DOIUrl":"https://doi.org/10.1080/15305058.2022.2050734","url":null,"abstract":"Abstract This article demonstrates how recent advances in technology allow fine-grained analyses of candidate-produced essays, thus providing a deeper insight on writing performance. We examined how essay features, automatically extracted using natural language processing and keystroke logging techniques, can predict various performance measures using data from a large-scale and high-stakes assessment for awarding high-school equivalency diploma. The features that are the most predictive of writing proficiency and broader academic success were identified and interpreted. The suggested methodology promises to be practically useful because it has the potential to point to specific writing skills that are important for improving essay writing and academic performance for educationally at-risk adult populations like the one considered in this article.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48678854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Technology-based assessments: Novel approaches to testing in organizational, psychological, and educational settings 基于技术的评估:在组织、心理和教育环境中进行测试的新方法
IF 1.7 Q1 Social Sciences Pub Date : 2022-10-02 DOI: 10.1080/15305058.2022.2143173
Christopher D. Nye
each to individuals in research evaluating their implications and utility for psychological assessment. the of on topic, the Journal of solicited addressing the use of technology for assessments in organizational, psychological, or educational settings. purpose of inviting these was to promote research on this topic and to address important issues related to the development and use of high-qual-ity,
每个人都在研究中评估其对心理评估的影响和效用。关于这一主题,《华尔街日报》征求了关于在组织、心理或教育环境中使用技术进行评估的意见。邀请他们的目的是促进对这一主题的研究,并解决与开发和使用高质量相关的重要问题,
{"title":"Technology-based assessments: Novel approaches to testing in organizational, psychological, and educational settings","authors":"Christopher D. Nye","doi":"10.1080/15305058.2022.2143173","DOIUrl":"https://doi.org/10.1080/15305058.2022.2143173","url":null,"abstract":"each to individuals in research evaluating their implications and utility for psychological assessment. the of on topic, the Journal of solicited addressing the use of technology for assessments in organizational, psychological, or educational settings. purpose of inviting these was to promote research on this topic and to address important issues related to the development and use of high-qual-ity,","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45818419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A psychometric view of technology-based assessments 基于技术的评估的心理测量学观点
IF 1.7 Q1 Social Sciences Pub Date : 2022-10-02 DOI: 10.1080/15305058.2022.2070757
Gloria Liou, Cavan V. Bonner, L. Tay
Abstract With the advent of big data and advances in technology, psychological assessments have become increasingly sophisticated and complex. Nevertheless, traditional psychometric issues concerning the validity, reliability, and measurement bias of such assessments remain fundamental in determining whether score inferences of human attributes are appropriate. We focus on three technological advances—the use of organic data for psychological assessments, the application of machine learning algorithms, and adaptive and gamified assessments—and review how the concepts of validity, reliability, and measurement bias may apply in particular ways within those areas. This provides direction for researchers and practitioners to advance the rigor of technology-based assessments from a psychometric perspective.
随着大数据的出现和技术的进步,心理评估变得越来越精密和复杂。然而,关于这些评估的效度、信度和测量偏差的传统心理测量学问题仍然是决定人类属性的得分推断是否适当的基础。我们将重点关注三个技术进步——使用有机数据进行心理评估,机器学习算法的应用,以及自适应和游戏化评估——并回顾有效性、可靠性和测量偏差的概念如何在这些领域以特定的方式应用。这为研究人员和从业人员从心理测量学的角度提高基于技术的评估的严谨性提供了方向。
{"title":"A psychometric view of technology-based assessments","authors":"Gloria Liou, Cavan V. Bonner, L. Tay","doi":"10.1080/15305058.2022.2070757","DOIUrl":"https://doi.org/10.1080/15305058.2022.2070757","url":null,"abstract":"Abstract With the advent of big data and advances in technology, psychological assessments have become increasingly sophisticated and complex. Nevertheless, traditional psychometric issues concerning the validity, reliability, and measurement bias of such assessments remain fundamental in determining whether score inferences of human attributes are appropriate. We focus on three technological advances—the use of organic data for psychological assessments, the application of machine learning algorithms, and adaptive and gamified assessments—and review how the concepts of validity, reliability, and measurement bias may apply in particular ways within those areas. This provides direction for researchers and practitioners to advance the rigor of technology-based assessments from a psychometric perspective.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45801075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining patterns of omitted responses in a large-scale English language proficiency test 大规模英语水平测试中省略回答模式的检验
IF 1.7 Q1 Social Sciences Pub Date : 2022-05-12 DOI: 10.1080/15305058.2022.2070756
Merve Sarac, E. Loken
Abstract This study is an exploratory analysis of examinee behavior in a large-scale language proficiency test. Despite a number-right scoring system with no penalty for guessing, we found that 16% of examinees omitted at least one answer and that women were more likely than men to omit answers. Item-response theory analyses treating the omitted responses as missing rather than wrong showed that examinees had underperformed by skipping the answers, with a greater underperformance among more able participants. An analysis of omitted answer patterns showed that reading passage items were most likely to be omitted, and that native language-translation items were least likely to be omitted. We hypothesized that since reading passage items were most tempting to skip, then among examinees who did answer every question there might be a tendency to guess at these items. Using cluster analyses, we found that underperformance on the reading items was more likely than underperformance on the non-reading passage items. In large-scale operational tests, examinees must know the optimal strategy for taking the test. Test developers must also understand how examinee behavior might impact the validity of score interpretations.
摘要本研究是对大规模语言能力测试中考生行为的探索性分析。尽管有一个数字权评分系统,对猜测没有惩罚,但我们发现16%的考生至少遗漏了一个答案,而且女性比男性更有可能遗漏答案。项目反应理论分析将遗漏的回答视为遗漏而非错误,结果表明,考生跳过答案表现不佳,能力更强的参与者表现更差。对省略回答模式的分析表明,阅读短文项目最有可能被省略,而母语翻译项目最不可能被省略。我们假设,由于阅读短文项目最容易跳过,那么在回答了每一个问题的考生中,可能会有猜测这些项目的倾向。使用聚类分析,我们发现阅读项目表现不佳的可能性高于非阅读短文项目表现不佳。在大规模的操作性考试中,考生必须知道参加考试的最佳策略。测试开发人员还必须了解考生的行为可能如何影响分数解释的有效性。
{"title":"Examining patterns of omitted responses in a large-scale English language proficiency test","authors":"Merve Sarac, E. Loken","doi":"10.1080/15305058.2022.2070756","DOIUrl":"https://doi.org/10.1080/15305058.2022.2070756","url":null,"abstract":"Abstract This study is an exploratory analysis of examinee behavior in a large-scale language proficiency test. Despite a number-right scoring system with no penalty for guessing, we found that 16% of examinees omitted at least one answer and that women were more likely than men to omit answers. Item-response theory analyses treating the omitted responses as missing rather than wrong showed that examinees had underperformed by skipping the answers, with a greater underperformance among more able participants. An analysis of omitted answer patterns showed that reading passage items were most likely to be omitted, and that native language-translation items were least likely to be omitted. We hypothesized that since reading passage items were most tempting to skip, then among examinees who did answer every question there might be a tendency to guess at these items. Using cluster analyses, we found that underperformance on the reading items was more likely than underperformance on the non-reading passage items. In large-scale operational tests, examinees must know the optimal strategy for taking the test. Test developers must also understand how examinee behavior might impact the validity of score interpretations.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41620786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using item response theory to understand the effects of scale contextualization: An illustration using decision making style scales 运用项目反应理论了解量表情境化的影响:以决策风格量表为例
IF 1.7 Q1 Social Sciences Pub Date : 2022-03-15 DOI: 10.1080/15305058.2022.2047692
Nathaniel M. Voss, Cassandra Chlevin-Thiele, Christopher J. Lake, Chi-Leigh Q. Warren
Abstract The goal of this study was to extend research on scale contextualization (i.e., frame-of-reference effect) to the decision making styles construct, compare the effects of contextualization across three unique decision style scales, and examine the consequences of scale contextualization within an item response theory framework. Based on a mixed experimental design, data gathered from 661 university students indicated that contextualized scales yielded higher predictive validity, occasionally possessed psychometric properties better than the original measures, and that the effects of contextualization are somewhat scale-specific. These findings provide important insights for researchers and practitioners seeking to modify and adapt existing scales.
摘要本研究的目的是将量表情境化(即参照系效应)的研究扩展到决策风格结构中,比较情境化在三个独特的决策风格量表中的影响,并在项目反应理论框架内检验量表语境化的后果。基于混合实验设计,从661名大学生中收集的数据表明,情境化量表产生了更高的预测有效性,偶尔比原始量表具有更好的心理测量特性,并且情境化的影响在一定程度上是量表特有的。这些发现为寻求修改和调整现有量表的研究人员和从业者提供了重要的见解。
{"title":"Using item response theory to understand the effects of scale contextualization: An illustration using decision making style scales","authors":"Nathaniel M. Voss, Cassandra Chlevin-Thiele, Christopher J. Lake, Chi-Leigh Q. Warren","doi":"10.1080/15305058.2022.2047692","DOIUrl":"https://doi.org/10.1080/15305058.2022.2047692","url":null,"abstract":"Abstract The goal of this study was to extend research on scale contextualization (i.e., frame-of-reference effect) to the decision making styles construct, compare the effects of contextualization across three unique decision style scales, and examine the consequences of scale contextualization within an item response theory framework. Based on a mixed experimental design, data gathered from 661 university students indicated that contextualized scales yielded higher predictive validity, occasionally possessed psychometric properties better than the original measures, and that the effects of contextualization are somewhat scale-specific. These findings provide important insights for researchers and practitioners seeking to modify and adapt existing scales.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45802961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The development and validation of the Resilience Index 弹性指数的制定和验证
IF 1.7 Q1 Social Sciences Pub Date : 2022-02-25 DOI: 10.1080/15305058.2022.2036162
M. van Wyk, G. Lipinska, M. Henry, T. K. Phillips, P. E. van der Walt
Abstract Resilience comprises various neurobiological, developmental, and psychosocial components. However, existing measures lack certain critical components, while having limited utility in low-to-middle-income settings. We aimed to develop a reliable and valid measure of resilience encompassing a broad range of components and that can be used across different income settings. We also set out to develop empirical cutoff scores for low, moderate, and high resilience. Results from 686 participants revealed the emergence of three components: positive affect (α = 0.879), early-life stability (α = 0.879), and stress mastery (α = 0.683). Convergent and incremental validity was confirmed using an existing resilience measure as the benchmark. Concurrent validity was also confirmed with significant negative correlations with measures of depression, anxiety, posttraumatic stress disorder, and sleep disruption. Finally, we successfully determined cutoff scores for low, moderate, and high resilience. Results confirm that the Resilience Index is a reliable and valid measure that can be utilized in both high- and low-to-middle-income settings.
摘要复原力包括各种神经生物学、发育和心理社会组成部分。然而,现有措施缺乏某些关键组成部分,而在中低收入环境中的效用有限。我们旨在制定一种可靠有效的弹性衡量标准,包括广泛的组成部分,可用于不同的收入环境。我们还着手制定低、中等和高弹性的经验截止分数。来自686名参与者的结果揭示了三个组成部分的出现:积极影响(α = 0.879),早期寿命稳定性(α = 0.879)和压力控制(α = 0.683)。使用现有的弹性测量作为基准来确认收敛性和增量有效性。同时有效性也被证实与抑郁、焦虑、创伤后应激障碍和睡眠中断的测量呈显著负相关。最后,我们成功地确定了低、中等和高弹性的临界分数。结果证实,弹性指数是一种可靠有效的衡量标准,可用于高收入和中低收入环境。
{"title":"The development and validation of the Resilience Index","authors":"M. van Wyk, G. Lipinska, M. Henry, T. K. Phillips, P. E. van der Walt","doi":"10.1080/15305058.2022.2036162","DOIUrl":"https://doi.org/10.1080/15305058.2022.2036162","url":null,"abstract":"Abstract Resilience comprises various neurobiological, developmental, and psychosocial components. However, existing measures lack certain critical components, while having limited utility in low-to-middle-income settings. We aimed to develop a reliable and valid measure of resilience encompassing a broad range of components and that can be used across different income settings. We also set out to develop empirical cutoff scores for low, moderate, and high resilience. Results from 686 participants revealed the emergence of three components: positive affect (α = 0.879), early-life stability (α = 0.879), and stress mastery (α = 0.683). Convergent and incremental validity was confirmed using an existing resilience measure as the benchmark. Concurrent validity was also confirmed with significant negative correlations with measures of depression, anxiety, posttraumatic stress disorder, and sleep disruption. Finally, we successfully determined cutoff scores for low, moderate, and high resilience. Results confirm that the Resilience Index is a reliable and valid measure that can be utilized in both high- and low-to-middle-income settings.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45130282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating group differences in online reading comprehension: The impact of item properties 评估在线阅读理解中的群体差异:项目属性的影响
IF 1.7 Q1 Social Sciences Pub Date : 2022-02-25 DOI: 10.1080/15305058.2022.2044821
H. Bulut, O. Bulut, Serkan Arıkan
Abstract This study examined group differences in online reading comprehension (ORC) using student data from the 2016 administration of the Progress in International Reading Literacy Study (ePIRLS). An explanatory item response modeling approach was used to explore the effects of item properties (i.e., item format, text complexity, and cognitive complexity), student characteristics (i.e., gender and language groups), and their interactions on dichotomous and polytomous item responses. The results showed that female students outperform male students in ORC tasks and that the achievement difference between female and male students appears to change text complexity increases. Similarly, the cognitive complexity of the items seems to play a significant role in explaining the gender gap in ORC performance. Students who never (or sometimes) speak the test language at home particularly struggled with answering ORC tasks. The achievement gap between students who always (or almost always) speak the test language at home and those who never (or sometimes) speak the test language at home was larger for constructed-response items and items with higher cognitive complexity. Overall, the findings suggest that item properties could help understand performance differences between gender and language groups in ORC assessments.
摘要本研究使用2016年国际阅读素养研究进展(ePIRLS)管理局的学生数据,调查了在线阅读理解(ORC)的群体差异。使用解释性项目反应建模方法来探索项目属性(即项目格式、文本复杂性和认知复杂性)、学生特征(即性别和语言群体)及其相互作用对二分和多分项目反应的影响。结果表明,女生在ORC任务中的表现优于男生,并且女生和男生之间的成绩差异似乎会改变文本复杂性的增加。同样,项目的认知复杂性似乎在解释ORC表现中的性别差距方面发挥了重要作用。从不(有时)在家里说测试语言的学生尤其难以回答ORC任务。对于构建的回答项目和认知复杂度较高的项目,总是(或几乎总是)在家说测试语言的学生与从不(或有时)在家讲测试语言的人之间的成绩差距更大。总体而言,研究结果表明,项目属性有助于理解ORC评估中性别和语言组之间的表现差异。
{"title":"Evaluating group differences in online reading comprehension: The impact of item properties","authors":"H. Bulut, O. Bulut, Serkan Arıkan","doi":"10.1080/15305058.2022.2044821","DOIUrl":"https://doi.org/10.1080/15305058.2022.2044821","url":null,"abstract":"Abstract This study examined group differences in online reading comprehension (ORC) using student data from the 2016 administration of the Progress in International Reading Literacy Study (ePIRLS). An explanatory item response modeling approach was used to explore the effects of item properties (i.e., item format, text complexity, and cognitive complexity), student characteristics (i.e., gender and language groups), and their interactions on dichotomous and polytomous item responses. The results showed that female students outperform male students in ORC tasks and that the achievement difference between female and male students appears to change text complexity increases. Similarly, the cognitive complexity of the items seems to play a significant role in explaining the gender gap in ORC performance. Students who never (or sometimes) speak the test language at home particularly struggled with answering ORC tasks. The achievement gap between students who always (or almost always) speak the test language at home and those who never (or sometimes) speak the test language at home was larger for constructed-response items and items with higher cognitive complexity. Overall, the findings suggest that item properties could help understand performance differences between gender and language groups in ORC assessments.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42236674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An investigation of item, examinee, and country correlates of rapid guessing in PISA 项目、考生和国家在PISA快速猜测中的相关性调查
IF 1.7 Q1 Social Sciences Pub Date : 2022-02-09 DOI: 10.1080/15305058.2022.2036161
Joseph A. Rios, J. Soland
Abstract The objective of the present study was to investigate item-, examinee-, and country-level correlates of rapid guessing (RG) in the context of the 2018 PISA science assessment. Analyzing data from 267,148 examinees across 71 countries showed that over 50% of examinees engaged in RG on an average proportion of one in 10 items. Descriptive differences were noted between countries on the mean number of RG responses per examinee with discrepancies as large as 500%. Country-level differences in the odds of engaging in RG were associated with mean performance and regional membership. Furthermore, based on a two-level cross-classified hierarchical linear model, both item- and examinee-level correlates were found to moderate the likelihood of RG. Specifically, the inclusion of items with multimedia content was associated with a decrease in RG. A number of demographic and attitudinal examinee-level variables were also significant moderators, including sex, linguistic background, SES, and self-rated reading comprehension, motivation mastery, and fear of failure. The findings from this study imply that select subgroup comparisons within and across nations may be biased by differential test-taking effort. To mitigate RG in international assessments, future test developers may look to leverage technology-enhanced items.
本研究的目的是在2018年PISA科学评估的背景下,调查项目、考生和国家层面的快速猜测(RG)相关性。对71个国家267148名考生的数据分析显示,超过50%的考生参与了RG,平均比例为十分之一。在每个考生的平均RG回答数上,不同国家之间存在描述性差异,差异高达500%。参与RG的几率在国家层面上的差异与平均表现和地区成员资格有关。此外,基于两水平交叉分类层次线性模型,发现项目水平和考生水平的相关因素都调节了RG的可能性。具体来说,包含多媒体内容的项目与RG的减少有关。一些人口统计学和态度考生水平变量也有显著的调节作用,包括性别、语言背景、社会经济地位、自评阅读理解、动机掌握和对失败的恐惧。这项研究的结果表明,国家内部和国家之间的选择亚组比较可能因不同的考试努力而有偏见。为了减轻国际评估中的RG,未来的测试开发人员可能会考虑利用技术增强的项目。
{"title":"An investigation of item, examinee, and country correlates of rapid guessing in PISA","authors":"Joseph A. Rios, J. Soland","doi":"10.1080/15305058.2022.2036161","DOIUrl":"https://doi.org/10.1080/15305058.2022.2036161","url":null,"abstract":"Abstract The objective of the present study was to investigate item-, examinee-, and country-level correlates of rapid guessing (RG) in the context of the 2018 PISA science assessment. Analyzing data from 267,148 examinees across 71 countries showed that over 50% of examinees engaged in RG on an average proportion of one in 10 items. Descriptive differences were noted between countries on the mean number of RG responses per examinee with discrepancies as large as 500%. Country-level differences in the odds of engaging in RG were associated with mean performance and regional membership. Furthermore, based on a two-level cross-classified hierarchical linear model, both item- and examinee-level correlates were found to moderate the likelihood of RG. Specifically, the inclusion of items with multimedia content was associated with a decrease in RG. A number of demographic and attitudinal examinee-level variables were also significant moderators, including sex, linguistic background, SES, and self-rated reading comprehension, motivation mastery, and fear of failure. The findings from this study imply that select subgroup comparisons within and across nations may be biased by differential test-taking effort. To mitigate RG in international assessments, future test developers may look to leverage technology-enhanced items.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41309048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Dropping the GRE, keeping the GRE, or GRE-optional admissions? Considering tradeoffs and fairness 放弃GRE,保留GRE,还是GRE可选录取?考虑权衡和公平
IF 1.7 Q1 Social Sciences Pub Date : 2022-01-02 DOI: 10.1080/15305058.2021.2019750
Daniel A. Newman, Chen Tang, Q. Song, Serena Wee
Abstract In considering whether to retain the GRE in graduate school admissions, admissions committees often pursue two objectives: (a) performance in graduate school (e.g., admitting individuals who will perform better in classes and research), and (b) diversity/fairness (e.g., equal selection rates between demographic groups). Drawing upon HR research (adverse impact research), we address four issues in using the GRE. First, we review the tension created between two robust findings: (a) validity of the GRE for predicting graduate school performance (rooted in the principle of standardization and a half-century of educational and psychometric research), and (b) the achievement gap in test scores between demographic groups (rooted in several centuries of systemic racism). This empirical tension can often produce a local diversity-performance tradeoff for admissions committees. Second, we use Pareto-optimal tradeoff curves to formalize potential diversity-performance tradeoffs, guiding how much weight to assign the GRE in admissions. Whether dropping the GRE produces suboptimal admissions depends upon one’s relative valuation of diversity versus performance. Third, we review three distinct notions of test fairness—equality, test equity, and performance equity—which have differing implications for dropping the GRE. Finally, we consider test fairness under GRE-optional admissions, noting the missing data problem when GRE is an incomplete variable. Supplemental data for this article is available online at
摘要在考虑是否在研究生院招生中保留GRE时,招生委员会通常追求两个目标:(a)研究生院的表现(例如,录取在课堂和研究中表现更好的个人),以及(b)多样性/公平性(例如,人口群体之间的平等选择率)。根据人力资源研究(不利影响研究),我们解决了GRE使用中的四个问题。首先,我们回顾了两个强有力的发现之间产生的紧张关系:(a)GRE在预测研究生院成绩方面的有效性(植根于标准化原则和半个世纪的教育和心理测量研究),以及(b)人口群体之间的考试成绩差距(植根于几个世纪的系统性种族主义)。这种经验上的紧张关系往往会给招生委员会带来当地多样性表现的权衡。其次,我们使用帕累托最优权衡曲线来形式化潜在的多样性-绩效权衡,指导GRE在招生中的权重分配。放弃GRE是否会产生次优录取取决于一个人对多样性与表现的相对评价。第三,我们回顾了三个不同的考试公平概念——平等、考试公平和成绩公平——它们对放弃GRE有不同的影响。最后,我们考虑了GRE可选录取下的考试公平性,注意到当GRE是一个不完全变量时,数据缺失的问题。本文的补充数据可在线获取,网址为
{"title":"Dropping the GRE, keeping the GRE, or GRE-optional admissions? Considering tradeoffs and fairness","authors":"Daniel A. Newman, Chen Tang, Q. Song, Serena Wee","doi":"10.1080/15305058.2021.2019750","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019750","url":null,"abstract":"Abstract In considering whether to retain the GRE in graduate school admissions, admissions committees often pursue two objectives: (a) performance in graduate school (e.g., admitting individuals who will perform better in classes and research), and (b) diversity/fairness (e.g., equal selection rates between demographic groups). Drawing upon HR research (adverse impact research), we address four issues in using the GRE. First, we review the tension created between two robust findings: (a) validity of the GRE for predicting graduate school performance (rooted in the principle of standardization and a half-century of educational and psychometric research), and (b) the achievement gap in test scores between demographic groups (rooted in several centuries of systemic racism). This empirical tension can often produce a local diversity-performance tradeoff for admissions committees. Second, we use Pareto-optimal tradeoff curves to formalize potential diversity-performance tradeoffs, guiding how much weight to assign the GRE in admissions. Whether dropping the GRE produces suboptimal admissions depends upon one’s relative valuation of diversity versus performance. Third, we review three distinct notions of test fairness—equality, test equity, and performance equity—which have differing implications for dropping the GRE. Finally, we consider test fairness under GRE-optional admissions, noting the missing data problem when GRE is an incomplete variable. Supplemental data for this article is available online at","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41543647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
International Journal of Testing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1