首页 > 最新文献

International Journal of Testing最新文献

英文 中文
Is there bias in alternatives to standardized tests? An investigation into letters of recommendation 标准化考试的替代方案是否存在偏见?对推荐信的调查
IF 1.7 Q1 Social Sciences Pub Date : 2022-01-02 DOI: 10.1080/15305058.2021.2019751
Dev K. Dalal, Jason G. Randall, Ho Kwan Cheung, Brandon Gorman, Sylvia G. Roch, K. Williams
Abstract Individuals concerned with subgroup differences on standardized tests suggest replacing these tests with holistic evaluations of unstructured application materials, such as letters of recommendation (LORs), which they posit show less bias. We empirically investigate this proposition that LORs are bias-free, and argue that LORs might actually invite systematic, race and gender subgroup differences in the content and evaluation of LORs. We text analyzed over 37,000 LORs submitted on behalf of over 10,000 graduate school applicants. Results showed that LOR content does differ across applicants. Furthermore, we see some systematic gender, race, and gender-race intersection differences in LOR content. Content of LORs also systematically differed between degree programs (S.T.E.M. vs. non-S.T.E.M.) and degree sought (doctoral vs. masters). Finally, LOR content alone did not predict an appreciable amount of variance in offers of admission (the first barrier to increasing diversity and inclusion in graduate programs). Our results, combined with past research on LOR content bias, highlight concerns that LORs can be biased against marginalized groups. We conclude with suggestions for reducing potential bias in LOR and for increasing diversity in graduate programs.
摘要关注标准化测试中亚组差异的个人建议用非结构化应用材料的整体评估来取代这些测试,例如推荐信(LOR),他们认为这些材料显示出较少的偏见。我们实证研究了LOR是无偏见的这一命题,并认为LOR实际上可能会在LOR的内容和评估中引起系统、种族和性别的亚组差异。我们对代表10000多名研究生院申请人提交的37000多份LOR进行了文本分析。结果显示,不同申请人的LOR含量不同。此外,我们在LOR内容中看到了一些系统性的性别、种族和性别-种族交叉差异。LOR的内容在学位课程(S.T.E.M.与非S.T.E.M..)和所寻求的学位(博士与硕士)之间也存在系统性差异。最后,LOR内容本身并不能预测录取通知书的显著差异(这是增加研究生项目多样性和包容性的第一个障碍)。我们的研究结果,结合过去对LOR内容偏见的研究,突出了人们对LOR可能对边缘化群体有偏见的担忧。最后,我们提出了减少LOR中潜在偏见和增加研究生项目多样性的建议。
{"title":"Is there bias in alternatives to standardized tests? An investigation into letters of recommendation","authors":"Dev K. Dalal, Jason G. Randall, Ho Kwan Cheung, Brandon Gorman, Sylvia G. Roch, K. Williams","doi":"10.1080/15305058.2021.2019751","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019751","url":null,"abstract":"Abstract Individuals concerned with subgroup differences on standardized tests suggest replacing these tests with holistic evaluations of unstructured application materials, such as letters of recommendation (LORs), which they posit show less bias. We empirically investigate this proposition that LORs are bias-free, and argue that LORs might actually invite systematic, race and gender subgroup differences in the content and evaluation of LORs. We text analyzed over 37,000 LORs submitted on behalf of over 10,000 graduate school applicants. Results showed that LOR content does differ across applicants. Furthermore, we see some systematic gender, race, and gender-race intersection differences in LOR content. Content of LORs also systematically differed between degree programs (S.T.E.M. vs. non-S.T.E.M.) and degree sought (doctoral vs. masters). Finally, LOR content alone did not predict an appreciable amount of variance in offers of admission (the first barrier to increasing diversity and inclusion in graduate programs). Our results, combined with past research on LOR content bias, highlight concerns that LORs can be biased against marginalized groups. We conclude with suggestions for reducing potential bias in LOR and for increasing diversity in graduate programs.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42116647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Introduction to International Journal of Testing special issue on equity and fairness in testing and assessment in school admissions 《国际考试杂志》关于招生考试和评估的公平性和公正性的特刊简介
IF 1.7 Q1 Social Sciences Pub Date : 2022-01-02 DOI: 10.1080/15305058.2021.2019753
S. E. Woo, B. Wille, S. Sireci
Across the globe, educational tests are used for admissions decisions to competitive colleges, universities, high schools, and other programs. The high stakes associated with these tests have important consequences for students, and performance on them can determine whether students reach their academic and career aspirations. For this reason, their use is both widespread and contentious. Recently, the debates over the use of standardized tests in college and graduate admissions has increased, due in large part to concerns about score disparities resulting in disparate admissions outcomes. The International Journal of Testing has published many examples of criticisms and research with respect to admissions testing around the world, including in Chile (Ramirez et al., 2020), Israel (Rapp & Allalouf, 2003), Saudi Arabia (Tsaousis et al., 2018), Sweden (Wiberg & von Davier, 2017), and the United States (e.g., TalentoMiller, 2008). In the U.S. and Chile, public outcry against disparate outcomes for certain groups of students have marshaled in changes in admissions testing programs and the policies associated with them (Koljatic et al., 2021). In the U.S., several colleges and universities have suspended the SAT and ACT requirements for their applicants, which generated a number of heated discussions both within and outside academia. The use of the Graduate Record Examinations (GREs) in graduate admissions is also being hotly debated for similar reasons, and a number of graduate programs in the U.S. have opted to remove the GRE requirement from https://doi.org/10.1080/15305058.2021.2019753
在全球范围内,教育测试被用于竞争激烈的学院、大学、高中和其他项目的招生决定。与这些测试相关的高风险对学生有着重要的影响,而这些测试的表现可以决定学生是否达到他们的学术和职业抱负。因此,它们的使用既广泛又有争议。最近,关于在大学和研究生招生中使用标准化考试的争论愈演愈烈,这在很大程度上是因为人们担心分数差异会导致不同的招生结果。《国际考试杂志》在世界各地发表了许多关于招生考试的批评和研究实例,包括智利(Ramirez et al.,2020)、以色列(Rapp&Allalouf,2003)、沙特阿拉伯(Tsaouss et al.,2018)、瑞典(Wiberg&von Davier,2017)和美国(例如,TalentoMiller,2008)。在美国和智利,公众对某些学生群体不同结果的强烈抗议导致了招生测试项目及其相关政策的变化(Koljatic等人,2021)。在美国,几所学院和大学已经暂停了对申请人的SAT和ACT要求,这在学术界内外引发了一些激烈的讨论。出于类似的原因,研究生入学考试(GRE)的使用也受到了激烈的争论,美国的一些研究生项目选择从https://doi.org/10.1080/15305058.2021.2019753
{"title":"Introduction to International Journal of Testing special issue on equity and fairness in testing and assessment in school admissions","authors":"S. E. Woo, B. Wille, S. Sireci","doi":"10.1080/15305058.2021.2019753","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019753","url":null,"abstract":"Across the globe, educational tests are used for admissions decisions to competitive colleges, universities, high schools, and other programs. The high stakes associated with these tests have important consequences for students, and performance on them can determine whether students reach their academic and career aspirations. For this reason, their use is both widespread and contentious. Recently, the debates over the use of standardized tests in college and graduate admissions has increased, due in large part to concerns about score disparities resulting in disparate admissions outcomes. The International Journal of Testing has published many examples of criticisms and research with respect to admissions testing around the world, including in Chile (Ramirez et al., 2020), Israel (Rapp & Allalouf, 2003), Saudi Arabia (Tsaousis et al., 2018), Sweden (Wiberg & von Davier, 2017), and the United States (e.g., TalentoMiller, 2008). In the U.S. and Chile, public outcry against disparate outcomes for certain groups of students have marshaled in changes in admissions testing programs and the policies associated with them (Koljatic et al., 2021). In the U.S., several colleges and universities have suspended the SAT and ACT requirements for their applicants, which generated a number of heated discussions both within and outside academia. The use of the Graduate Record Examinations (GREs) in graduate admissions is also being hotly debated for similar reasons, and a number of graduate programs in the U.S. have opted to remove the GRE requirement from https://doi.org/10.1080/15305058.2021.2019753","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48665709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using third-party evaluations to assess socioemotional skills in graduate and professional school admissions 使用第三方评估来评估研究生和专业学校招生中的社会情感技能
IF 1.7 Q1 Social Sciences Pub Date : 2022-01-02 DOI: 10.1080/15305058.2021.2019748
David Klieger, Jennifer L. Bochenek, Chelsea Ezzo, Steven Holtzman, Frederick Cline, Margarita Olivera-Aguilar
Abstract Consideration of socioemotional skills in admissions potentially can increase representation of racial and ethnic minorities and women in graduate and professional education as well as identify candidates more likely to succeed in graduate and professional school. Research on one such assessment, the ETS Personal Potential Index (PPI), showed that the PPI produced much smaller racial/ethnic-gender group mean score differences than undergraduate grade point average (UGPA) and the Graduate Record Examinations (GRE) did. Across levels of institutional selectivity, the PPI can promote racial/ethnic and gender diversity in graduate and professional school in ways that UGPA and GRE scores do not. Predictive validity analyses showed that for doctoral STEM programs the PPI dimensions of (1) Planning and Organization and (2) Communication Skills positively predict school grade point average as well as a lower risk of academic probation, a determinant of degree progress, both alone and incrementally over UGPA and GRE scores. Supplemental data for this article is available online at https://doi.org/10.1080/15305058.2021.2019748 .
摘要在招生中考虑社会情感技能可能会增加少数民族和妇女在研究生和专业教育中的代表性,并确定更有可能在研究生院和专业学校取得成功的候选人。对其中一项评估——ETS个人潜力指数(PPI)的研究表明,PPI产生的种族/民族性别组平均分差异比本科生平均分(UGPA)和研究生入学考试(GRE)小得多。在不同的机构选择性水平上,PPI可以促进研究生院和专业学校的种族/民族和性别多样性,而UGPA和GRE成绩则不然。预测有效性分析表明,对于博士STEM项目,PPI维度(1)规划与组织和(2)沟通技能正向预测学校平均绩点,以及较低的学业试用风险,这是学位进步的决定因素,无论是单独还是逐步超过UGPA和GRE成绩。本文的补充数据可在线获取,网址为https://doi.org/10.1080/15305058.2021.2019748。
{"title":"Using third-party evaluations to assess socioemotional skills in graduate and professional school admissions","authors":"David Klieger, Jennifer L. Bochenek, Chelsea Ezzo, Steven Holtzman, Frederick Cline, Margarita Olivera-Aguilar","doi":"10.1080/15305058.2021.2019748","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019748","url":null,"abstract":"Abstract Consideration of socioemotional skills in admissions potentially can increase representation of racial and ethnic minorities and women in graduate and professional education as well as identify candidates more likely to succeed in graduate and professional school. Research on one such assessment, the ETS Personal Potential Index (PPI), showed that the PPI produced much smaller racial/ethnic-gender group mean score differences than undergraduate grade point average (UGPA) and the Graduate Record Examinations (GRE) did. Across levels of institutional selectivity, the PPI can promote racial/ethnic and gender diversity in graduate and professional school in ways that UGPA and GRE scores do not. Predictive validity analyses showed that for doctoral STEM programs the PPI dimensions of (1) Planning and Organization and (2) Communication Skills positively predict school grade point average as well as a lower risk of academic probation, a determinant of degree progress, both alone and incrementally over UGPA and GRE scores. Supplemental data for this article is available online at https://doi.org/10.1080/15305058.2021.2019748 .","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43931141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Test efficacy: Refocusing validation from college exams to candidates 考试效能:将验证的焦点从大学考试转移到考生身上
IF 1.7 Q1 Social Sciences Pub Date : 2022-01-02 DOI: 10.1080/15305058.2021.2019752
Alvaro J. Arce, M. J. Young
Abstract The paper argues that contemporary test validity theory places the consequences of testing on the lives of all college applicants at the back of the test validation argument. It introduces the notion of test efficacy as a process to gather evidence on claims on consequences of testing on all college applicants that can be traced back to validity. The paper proposes a test efficacy framework to evaluate test efficacy claims on the impact of admission examinations on all college applicants (not just those attaining the admission standard).
摘要本文认为,现代考试有效性理论将考试对所有大学申请者生活的影响置于考试有效性论点的后面。它引入了测试有效性的概念,作为一个收集证据的过程,这些证据可以追溯到有效性。本文提出了一个测试效能框架,以评估测试效能主张对入学考试对所有大学申请人(而不仅仅是达到录取标准的申请人)的影响。
{"title":"Test efficacy: Refocusing validation from college exams to candidates","authors":"Alvaro J. Arce, M. J. Young","doi":"10.1080/15305058.2021.2019752","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019752","url":null,"abstract":"Abstract The paper argues that contemporary test validity theory places the consequences of testing on the lives of all college applicants at the back of the test validation argument. It introduces the notion of test efficacy as a process to gather evidence on claims on consequences of testing on all college applicants that can be traced back to validity. The paper proposes a test efficacy framework to evaluate test efficacy claims on the impact of admission examinations on all college applicants (not just those attaining the admission standard).","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43977564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using personal statements in college admissions: An investigation of gender bias and the effects of increased structure 在大学录取中使用个人陈述:性别偏见和增加结构影响的调查
IF 1.7 Q1 Social Sciences Pub Date : 2021-12-15 DOI: 10.1080/15305058.2021.2019749
Susan Niessen, Marvin Neumann
Abstract Personal statements are among the most commonly used instruments in college admissions procedures. Yet, little research on their reliability, validity, and fairness exists. The first aim of this paper was to investigate hypotheses about adverse impact and underprediction for female applicants, which could result from lower tendencies to use agentic language compared to male applicants. Second, we examined if rating personal statements in a more structured manner would increase reliability and validity. Using personal statements (250 words) from a large cohort of applicants to an undergraduate psychology program at a Dutch University, we found no evidence for adverse impact for female applicants or more agentic language use by male applicants, and no relationship between agentic language use and personal statement ratings. In contrast, we found that personal statements of female applicants were rated slightly more positively than those of males. Exploratory analyses suggest that female applicants’ better writing skills might explain this difference. A more structured approach to rating personal statements yielded higher, but still only ‘moderate’ inter-rater reliability, and virtually identical, negligible predictive validity for first year GPA and dropout.
摘要个人陈述是大学招生程序中最常用的工具之一。然而,关于它们的可靠性、有效性和公平性的研究却很少。本文的第一个目的是调查对女性申请人不利影响和预测不足的假设,这可能是由于与男性申请人相比,使用代理语言的倾向较低。其次,我们研究了以更结构化的方式对个人陈述进行评级是否会提高可靠性和有效性。使用荷兰大学一个本科生心理学项目的大量申请人的个人陈述(250个单词),我们没有发现任何证据表明女性申请人或男性申请人使用更多代理语言会产生不利影响,代理语言使用与个人陈述评级之间也没有关系。相比之下,我们发现女性申请人的个人陈述比男性略为正面。探索性分析表明,女性申请人更好的写作能力可能解释了这种差异。一种更结构化的个人陈述评级方法产生了更高但仍然只有“中等”的评分者间可靠性,以及几乎相同的、可忽略不计的第一年GPA和辍学的预测有效性。
{"title":"Using personal statements in college admissions: An investigation of gender bias and the effects of increased structure","authors":"Susan Niessen, Marvin Neumann","doi":"10.1080/15305058.2021.2019749","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019749","url":null,"abstract":"Abstract Personal statements are among the most commonly used instruments in college admissions procedures. Yet, little research on their reliability, validity, and fairness exists. The first aim of this paper was to investigate hypotheses about adverse impact and underprediction for female applicants, which could result from lower tendencies to use agentic language compared to male applicants. Second, we examined if rating personal statements in a more structured manner would increase reliability and validity. Using personal statements (250 words) from a large cohort of applicants to an undergraduate psychology program at a Dutch University, we found no evidence for adverse impact for female applicants or more agentic language use by male applicants, and no relationship between agentic language use and personal statement ratings. In contrast, we found that personal statements of female applicants were rated slightly more positively than those of males. Exploratory analyses suggest that female applicants’ better writing skills might explain this difference. A more structured approach to rating personal statements yielded higher, but still only ‘moderate’ inter-rater reliability, and virtually identical, negligible predictive validity for first year GPA and dropout.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44509062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Metacognitive skills inventory (MSI): development and validation 元认知技能量表(MSI):开发与验证
IF 1.7 Q1 Social Sciences Pub Date : 2021-10-02 DOI: 10.1080/15305058.2021.1986051
Haja Hameed, Reena Cheruvalath
Abstract Metacognitive skills help to control and regulate negative thoughts, emotions, beliefs and sad memories. The objective of the study was to develop and validate an inventory-Metacognitive Skills Inventory (MSI) to assess the variance in adopting metacognitive strategies between those who have depressive symptoms and those who have not. Two studies were carried out among Indian youth (study 1—N = 269, MeanAge= 21.1 and study 2—N = 745, MeanAge= 20.9). They completed the MSI as well as measures of depression and negative emotions. Item response theory (IRT) analysis, and exploratory (EFA) and confirmatory factor analysis (CFA) were carried out for the scale development. The analyses derived a meaningful four-factor structure [(i) Navigation of negative thoughts by adopting metacognitive strategies, (ii) Channelizing negative emotions constructively, (iii) Recognizing ruminative tendencies, (iv) Knowledge of strengths and weaknesses in regulating emotions] of a 12-item MSI. An MSI could be used to identify patient-specific metacognitive skills in people with depressive symptoms, which need to be improved while doing Metacognitive Therapy (MCT) after validating clinical samples.
抽象元认知技能有助于控制和调节消极的想法、情绪、信念和悲伤记忆。本研究的目的是开发和验证元认知技能量表(MSI),以评估有抑郁症状者和没有抑郁症状者在采用元认知策略方面的差异。在印度青年中进行了两项研究(研究1-N = 269,平均年龄=21.1,研究2-N = 745,平均年龄=20.9)。他们完成了MSI以及抑郁和负面情绪的测量。量表开发采用项目反应理论(IRT)分析、探索性因素分析和验证性因素分析。这些分析得出了一个有意义的四因素结构[(i)通过采用元认知策略引导负面思维,(ii)建设性地引导负面情绪,(iii)识别沉思倾向,(iv)了解调节情绪的优势和劣势]。MSI可用于识别抑郁症状患者的特定元认知技能,在验证临床样本后,在进行元认知治疗(MCT)时需要提高这些技能。
{"title":"Metacognitive skills inventory (MSI): development and validation","authors":"Haja Hameed, Reena Cheruvalath","doi":"10.1080/15305058.2021.1986051","DOIUrl":"https://doi.org/10.1080/15305058.2021.1986051","url":null,"abstract":"Abstract Metacognitive skills help to control and regulate negative thoughts, emotions, beliefs and sad memories. The objective of the study was to develop and validate an inventory-Metacognitive Skills Inventory (MSI) to assess the variance in adopting metacognitive strategies between those who have depressive symptoms and those who have not. Two studies were carried out among Indian youth (study 1—N = 269, MeanAge= 21.1 and study 2—N = 745, MeanAge= 20.9). They completed the MSI as well as measures of depression and negative emotions. Item response theory (IRT) analysis, and exploratory (EFA) and confirmatory factor analysis (CFA) were carried out for the scale development. The analyses derived a meaningful four-factor structure [(i) Navigation of negative thoughts by adopting metacognitive strategies, (ii) Channelizing negative emotions constructively, (iii) Recognizing ruminative tendencies, (iv) Knowledge of strengths and weaknesses in regulating emotions] of a 12-item MSI. An MSI could be used to identify patient-specific metacognitive skills in people with depressive symptoms, which need to be improved while doing Metacognitive Therapy (MCT) after validating clinical samples.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44737070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cross-country comparability of a social-emotional skills assessment designed for youth in low-resource environments 为低资源环境中的青年设计的社会情感技能评估的跨国可比性
IF 1.7 Q1 Social Sciences Pub Date : 2021-10-02 DOI: 10.1080/15305058.2021.1995867
Nina Menezes Cunha, Andres Martinez, P. Kyllonen, Sarah Gates
Abstract We evaluate the measurement invariance of a 48-item instrument designed to measure general social and emotional skills of youth in low resource environments. We refer to the skills measured as positive self-concept, negative self-concept, higher order thinking skills, and social and communication skills. These skills are often associated with economic development and can be used to evaluate programs designed to enhance economic development. Our evaluation is based on a sample of 1,794 in and out-of-school youth from Uganda and Guatemala’s Western Highlands. We conduct the analyses using a multiple group confirmatory factor analysis approach, breaking the sample by country, gender, and socio-economic status (high vs. low). Overall, our analysis points to strong invariance for all four measures across the different groups being compared. These findings contribute to the validity of the instrument as a tool for better understanding youth in diverse, developing economies.
摘要我们评估了一种48项仪器的测量不变性,该仪器旨在测量低资源环境中青年的一般社会和情感技能。我们将技能称为积极自我概念、消极自我概念、高级思维技能以及社交和沟通技能。这些技能通常与经济发展有关,可用于评估旨在促进经济发展的计划。我们的评估基于来自乌干达和危地马拉西部高地的1794名在校和校外青年的样本。我们使用多组验证性因素分析方法进行分析,按国家、性别和社会经济地位(高与低)划分样本。总的来说,我们的分析表明,在被比较的不同组中,所有四种测量都具有很强的不变性。这些发现有助于该文书作为更好地了解不同发展中经济体青年的工具的有效性。
{"title":"Cross-country comparability of a social-emotional skills assessment designed for youth in low-resource environments","authors":"Nina Menezes Cunha, Andres Martinez, P. Kyllonen, Sarah Gates","doi":"10.1080/15305058.2021.1995867","DOIUrl":"https://doi.org/10.1080/15305058.2021.1995867","url":null,"abstract":"Abstract We evaluate the measurement invariance of a 48-item instrument designed to measure general social and emotional skills of youth in low resource environments. We refer to the skills measured as positive self-concept, negative self-concept, higher order thinking skills, and social and communication skills. These skills are often associated with economic development and can be used to evaluate programs designed to enhance economic development. Our evaluation is based on a sample of 1,794 in and out-of-school youth from Uganda and Guatemala’s Western Highlands. We conduct the analyses using a multiple group confirmatory factor analysis approach, breaking the sample by country, gender, and socio-economic status (high vs. low). Overall, our analysis points to strong invariance for all four measures across the different groups being compared. These findings contribute to the validity of the instrument as a tool for better understanding youth in diverse, developing economies.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43334250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Examining severity and centrality effects in TestDaF writing and speaking assessments: An extended Bayesian many-facet Rasch analysis 检验TestDaF写作和口语评估中的严重性和中心性效应:一种扩展的贝叶斯多方面Rasch分析
IF 1.7 Q1 Social Sciences Pub Date : 2021-09-03 DOI: 10.1080/15305058.2021.1963260
T. Eckes, K. Jin
Abstract Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang’s (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing and speaking assessments using Bayesian MCMC methods. The findings revealed that (a) the extended facets model had a better data–model fit than models that ignored either or both kinds of rater effects, (b) rating scale and partial credit versions of the extended model differed in terms of data–model fit for writing and speaking, (c) rater severity and centrality estimates were not significantly correlated with each other, and (d) centrality effects had a demonstrable impact on examinee rank orderings. The discussion focuses on implications for the analysis and evaluation of rating quality in performance assessments.
摘要严重性和中心性是两种主要的评分者效应,对绩效评估的有效性和公平性构成威胁。采用金和王(2018)的扩展facets建模方法,我们使用贝叶斯MCMC方法分别估计了基于网络的TestDaF(德语作为外语的测试)写作和口语评估中评分者严重性和中心性效应的大小。研究结果表明,(a)扩展facets模型比忽略任何一种或两种评分者效应的模型具有更好的数据-模型拟合性,(b)扩展模型的评分量表和部分信用版本在写作和口语的数据-模式拟合方面存在差异,(c)评分者严重程度和中心性估计彼此之间没有显著相关性,(d)中心性效应对考生的排名顺序有明显的影响。讨论的重点是对业绩评估中评级质量的分析和评估的影响。
{"title":"Examining severity and centrality effects in TestDaF writing and speaking assessments: An extended Bayesian many-facet Rasch analysis","authors":"T. Eckes, K. Jin","doi":"10.1080/15305058.2021.1963260","DOIUrl":"https://doi.org/10.1080/15305058.2021.1963260","url":null,"abstract":"Abstract Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang’s (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing and speaking assessments using Bayesian MCMC methods. The findings revealed that (a) the extended facets model had a better data–model fit than models that ignored either or both kinds of rater effects, (b) rating scale and partial credit versions of the extended model differed in terms of data–model fit for writing and speaking, (c) rater severity and centrality estimates were not significantly correlated with each other, and (d) centrality effects had a demonstrable impact on examinee rank orderings. The discussion focuses on implications for the analysis and evaluation of rating quality in performance assessments.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49282149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Exploring task features that predict psychometric quality of test items: the case for the Dutch driving theory exam 探索预测测试项目心理测量质量的任务特征:以荷兰驾驶理论考试为例
IF 1.7 Q1 Social Sciences Pub Date : 2021-06-15 DOI: 10.1080/15305058.2021.1916506
E. Roelofs, Wilco H M Emons, Angela J. Verschoor
Abstract This study reports on an Evidence Centered Design (ECD) project in the Netherlands, involving the theory exam for prospective car drivers. In particular, we illustrate how cognitive load theory, task-analysis, response process models, and explanatory item-response theory can be used to systematically develop and refine task models. Based on a cognitive model for driving, 353 existing items involving rules of priority at intersections, were coded on intrinsic task features and task presentation features. Hierarchical regression analyses were carried out to determine the contribution of task features to item difficulty and item discrimination. A substantial proportion of variance in both item difficulty and item discrimination parameters could be explained by intrinsic task-features, including rules and signs (25%, 18.6%), task-intersection features (13.4%, 14.1%), and a smaller small proportion to item presentation features (3.5%, 7.1%) of the total variance. It is concluded that the systematic approach of discerning task features and determining the impact on item parameters has added value as an ECD-tool for evaluating existing assessments that are planned to be innovated. The paper concludes with a discussion of practical implications.
摘要本研究报告了荷兰的一个以证据为中心的设计(ECD)项目,该项目涉及未来汽车驾驶员的理论考试。特别是,我们说明了如何使用认知负荷理论、任务分析、反应过程模型和解释性项目反应理论来系统地开发和完善任务模型。基于驾驶认知模型,对353个涉及十字路口优先规则的现有项目进行了内在任务特征和任务呈现特征编码。进行了层次回归分析,以确定任务特征对项目难度和项目辨别的贡献。项目难度和项目辨别参数的差异很大一部分可以由内在任务特征来解释,包括规则和符号(25%,18.6%)、任务交叉特征(13.4%,14.1%),以及占总方差的较小比例的项目呈现特征(3.5%,7.1%)。结论是,识别任务特征和确定对项目参数的影响的系统方法作为评估计划创新的现有评估的ECD工具具有附加值。论文最后讨论了实际意义。
{"title":"Exploring task features that predict psychometric quality of test items: the case for the Dutch driving theory exam","authors":"E. Roelofs, Wilco H M Emons, Angela J. Verschoor","doi":"10.1080/15305058.2021.1916506","DOIUrl":"https://doi.org/10.1080/15305058.2021.1916506","url":null,"abstract":"Abstract This study reports on an Evidence Centered Design (ECD) project in the Netherlands, involving the theory exam for prospective car drivers. In particular, we illustrate how cognitive load theory, task-analysis, response process models, and explanatory item-response theory can be used to systematically develop and refine task models. Based on a cognitive model for driving, 353 existing items involving rules of priority at intersections, were coded on intrinsic task features and task presentation features. Hierarchical regression analyses were carried out to determine the contribution of task features to item difficulty and item discrimination. A substantial proportion of variance in both item difficulty and item discrimination parameters could be explained by intrinsic task-features, including rules and signs (25%, 18.6%), task-intersection features (13.4%, 14.1%), and a smaller small proportion to item presentation features (3.5%, 7.1%) of the total variance. It is concluded that the systematic approach of discerning task features and determining the impact on item parameters has added value as an ECD-tool for evaluating existing assessments that are planned to be innovated. The paper concludes with a discussion of practical implications.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2021.1916506","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47064095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Validating theoretical assumptions about reading with cognitive diagnosis models 用认知诊断模型验证阅读的理论假设
IF 1.7 Q1 Social Sciences Pub Date : 2021-06-15 DOI: 10.1080/15305058.2021.1931238
A. George, A. Robitzsch
Abstract Modern large-scale studies such as the Progress in International Reading Literacy Study (PIRLS) do not only report reading competence of students on a global reading scale but also report reading on the level of reading subskills. However, the number of and the dependencies between the subskills are frequently discussed. In this study, different theoretical assumptions regarding the subskills describing the reading competence “acquiring and using information” in PIRLS are deduced from accompanying official materials. The different assumptions are then translated into empirical cognitive diagnosis models (CDMs). By evaluating and comparing the CDMs in terms of empirical fit criteria in each country participating in PIRLS 2016, the underlying theoretical assumptions are validated. Results show that in all but one country, a model proposing four reading subskills with no order between the subskills shows the best fit. This selected model could be simplified in order to facilitate practical derivations as, for example, the evaluation of skill classes and the analysis of learning paths.
国际阅读素养研究进展(PIRLS)等现代大规模研究不仅在全球阅读尺度上报告学生的阅读能力,而且在阅读子技能水平上报告阅读。但是,经常讨论子技能的数量和子技能之间的依赖关系。本研究从相关官方资料中,对PIRLS中描述阅读能力“获取和使用信息”的子技能进行了不同的理论假设。然后将不同的假设转化为经验认知诊断模型(CDMs)。通过根据参与PIRLS 2016的每个国家的经验拟合标准评估和比较清洁发展机制,验证了基本的理论假设。结果表明,在除一个国家外的所有国家中,提出四种阅读子技能且子技能之间没有顺序的模型最适合。这个选定的模型可以简化,以便于实际的推导,例如,技能等级的评估和学习路径的分析。
{"title":"Validating theoretical assumptions about reading with cognitive diagnosis models","authors":"A. George, A. Robitzsch","doi":"10.1080/15305058.2021.1931238","DOIUrl":"https://doi.org/10.1080/15305058.2021.1931238","url":null,"abstract":"Abstract Modern large-scale studies such as the Progress in International Reading Literacy Study (PIRLS) do not only report reading competence of students on a global reading scale but also report reading on the level of reading subskills. However, the number of and the dependencies between the subskills are frequently discussed. In this study, different theoretical assumptions regarding the subskills describing the reading competence “acquiring and using information” in PIRLS are deduced from accompanying official materials. The different assumptions are then translated into empirical cognitive diagnosis models (CDMs). By evaluating and comparing the CDMs in terms of empirical fit criteria in each country participating in PIRLS 2016, the underlying theoretical assumptions are validated. Results show that in all but one country, a model proposing four reading subskills with no order between the subskills shows the best fit. This selected model could be simplified in order to facilitate practical derivations as, for example, the evaluation of skill classes and the analysis of learning paths.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2021.1931238","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46664133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
International Journal of Testing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1