首页 > 最新文献

International Journal of Testing最新文献

英文 中文
Examining provision and sufficiency of testing accommodations for English learners 检查为英语学习者提供的考试设施是否充足
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2021-02-06 DOI: 10.1080/15305058.2021.1884872
S. Roschmann, S. Witmer, Martin A. Volker
Abstract Accommodations are commonly provided to address language-related barriers students may experience during testing. Research on the validity of scores from accommodated test administrations remains somewhat inconclusive. The current study investigated item response patterns to understand whether accommodations, as used in practice among English learners (ELs) in the United States, allow for comparable measurement between ELs and non-ELs. Results indicated that although significant differences are evident in overall test scores for ELs and non-ELs, only minimal measurement concerns were evident. Very few items displayed moderate or large differential item functioning (DIF); no tests showed small, medium, or large differential test functioning. The current study adds to existing literature on measurement comparability and accommodation research on ELs; implications for practice are provided.
通常提供住宿,以解决学生在测试期间可能遇到的语言障碍。从适应考试管理分数的有效性的研究仍然有些不确定。目前的研究调查了项目反应模式,以了解在美国英语学习者(el)中使用的住宿是否允许在英语学习者和非英语学习者之间进行可比测量。结果表明,尽管ELs和非ELs的总体测试分数有明显的显著差异,但只有最小的测量问题是明显的。很少项目显示中等或较大的差异项目功能(DIF);没有测试显示小、中或大差异测试功能。本研究补充了现有文献对ELs的测量可比性和适应性研究;为实践提供了启示。
{"title":"Examining provision and sufficiency of testing accommodations for English learners","authors":"S. Roschmann, S. Witmer, Martin A. Volker","doi":"10.1080/15305058.2021.1884872","DOIUrl":"https://doi.org/10.1080/15305058.2021.1884872","url":null,"abstract":"Abstract Accommodations are commonly provided to address language-related barriers students may experience during testing. Research on the validity of scores from accommodated test administrations remains somewhat inconclusive. The current study investigated item response patterns to understand whether accommodations, as used in practice among English learners (ELs) in the United States, allow for comparable measurement between ELs and non-ELs. Results indicated that although significant differences are evident in overall test scores for ELs and non-ELs, only minimal measurement concerns were evident. Very few items displayed moderate or large differential item functioning (DIF); no tests showed small, medium, or large differential test functioning. The current study adds to existing literature on measurement comparability and accommodation research on ELs; implications for practice are provided.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"21 1","pages":"32 - 55"},"PeriodicalIF":1.7,"publicationDate":"2021-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2021.1884872","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46681048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Goal orientation in job search: Psychometric characteristics and construct validation across job search contexts 求职中的目标导向:求职背景下的心理测量特征和结构验证
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2021-02-03 DOI: 10.1080/15305058.2021.1884871
Emmanuel Affum-Osei, H. Mensah, S. K. Forkuoh, Eric Adom Asante
Abstract The purpose of this study was to examine the psychometric properties of the goal orientation (GO) scale across job search contexts to facilitate its use in large and varied search settings. A sample of 720 job losers and new entrants’ job seekers in Ghana completed the survey. Confirmatory factor analysis supported the three-factor theoretical structure (Learning goal, Performance-prove goal, and Performance-avoid goal orientations) for both new entrants’ and job losers’ samples. Results of the invariance test reached measurement equivalence across job search contexts and genders. Furthermore, GO dimensions correlated differently with some cognitive self-regulation criterion variables (employment commitment, self-control, learning from failure, and strategy awareness) thus, providing evidence of convergent and discriminant validity. Overall, the study provides additional support for the job search GO measure for use across different job search contexts.
摘要本研究的目的是考察目标取向(GO)量表在求职环境中的心理测量特性,以促进其在大型和多样化的求职环境中的应用。在加纳,720名失业人员和新进入者的求职者完成了这项调查。验证性因子分析支持新入职者和失业者样本的三因素理论结构(学习目标、证明绩效目标和避免绩效目标导向)。不变性检验的结果在求职环境和性别之间达到了测量等值。此外,GO维度与一些认知自我调节标准变量(就业承诺、自我控制、从失败中学习和策略意识)存在不同的相关,从而提供了收敛效度和区分效度的证据。总体而言,该研究为求职GO测量在不同求职背景下的使用提供了额外的支持。
{"title":"Goal orientation in job search: Psychometric characteristics and construct validation across job search contexts","authors":"Emmanuel Affum-Osei, H. Mensah, S. K. Forkuoh, Eric Adom Asante","doi":"10.1080/15305058.2021.1884871","DOIUrl":"https://doi.org/10.1080/15305058.2021.1884871","url":null,"abstract":"Abstract The purpose of this study was to examine the psychometric properties of the goal orientation (GO) scale across job search contexts to facilitate its use in large and varied search settings. A sample of 720 job losers and new entrants’ job seekers in Ghana completed the survey. Confirmatory factor analysis supported the three-factor theoretical structure (Learning goal, Performance-prove goal, and Performance-avoid goal orientations) for both new entrants’ and job losers’ samples. Results of the invariance test reached measurement equivalence across job search contexts and genders. Furthermore, GO dimensions correlated differently with some cognitive self-regulation criterion variables (employment commitment, self-control, learning from failure, and strategy awareness) thus, providing evidence of convergent and discriminant validity. Overall, the study provides additional support for the job search GO measure for use across different job search contexts.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"21 1","pages":"1 - 31"},"PeriodicalIF":1.7,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2021.1884871","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43471412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Survey mode and data quality: Careless responding across three modes in cross-cultural contexts 调查模式与数据质量:跨文化语境下三种调查模式的草率回应
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-12-01 DOI: 10.1080/15305058.2021.2019747
Zoe Magraw‐Mickelson, Harry Wang, M. Gollwitzer
Abstract Much psychological research depends on participants’ diligence in filling out materials such as surveys. However, not all participants are motivated to respond attentively, which leads to unintended issues with data quality, known as careless responding. Our question is: how do different modes of data collection—paper/pencil, computer/web-based, and smartphone—affect participants’ diligence vs. “careless responding” tendencies and, thus, data quality? Results from prior studies suggest that different data collection modes produce a comparable prevalence of careless responding tendencies. However, as technology develops and data are collected with increasingly diversified populations, this question needs to be readdressed and taken further. The present research examined the effect of survey mode on careless responding in a repeated-measures design with data from three different samples. First, in a sample of working adults from China, we found that participants were slightly more careless when completing computer/web-based survey materials than in paper/pencil mode. Next, in a German student sample, participants were slightly more careless when completing the paper/pencil mode compared to the smartphone mode. Finally, in a sample of Chinese-speaking students, we found no difference between modes. Overall, in a meta-analysis of the findings, we found minimal difference between modes across cultures. Theoretical and practical implications are discussed.
摘要许多心理学研究都取决于参与者在填写调查等材料时的勤奋程度。然而,并非所有参与者都有认真回应的动机,这会导致数据质量出现意外问题,即所谓的粗心回应。我们的问题是:不同的数据收集模式——纸/笔、计算机/网络和智能手机——如何影响参与者的勤奋与“粗心回应”倾向,从而影响数据质量?先前研究的结果表明,不同的数据收集模式产生了相当普遍的粗心反应倾向。然而,随着技术的发展和数据的收集,人口越来越多样化,这个问题需要重新思考和深入研究。本研究采用三个不同样本的数据,在重复测量设计中检验了调查模式对粗心反应的影响。首先,在一个来自中国的在职成年人样本中,我们发现参与者在完成计算机/网络调查材料时比在纸/笔模式下略为粗心。接下来,在一个德国学生样本中,与智能手机模式相比,参与者在完成纸/笔模式时稍微更粗心。最后,在一个讲汉语的学生样本中,我们发现模式之间没有差异。总的来说,在对研究结果的荟萃分析中,我们发现不同文化模式之间的差异很小。讨论了理论和实践意义。
{"title":"Survey mode and data quality: Careless responding across three modes in cross-cultural contexts","authors":"Zoe Magraw‐Mickelson, Harry Wang, M. Gollwitzer","doi":"10.1080/15305058.2021.2019747","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019747","url":null,"abstract":"Abstract Much psychological research depends on participants’ diligence in filling out materials such as surveys. However, not all participants are motivated to respond attentively, which leads to unintended issues with data quality, known as careless responding. Our question is: how do different modes of data collection—paper/pencil, computer/web-based, and smartphone—affect participants’ diligence vs. “careless responding” tendencies and, thus, data quality? Results from prior studies suggest that different data collection modes produce a comparable prevalence of careless responding tendencies. However, as technology develops and data are collected with increasingly diversified populations, this question needs to be readdressed and taken further. The present research examined the effect of survey mode on careless responding in a repeated-measures design with data from three different samples. First, in a sample of working adults from China, we found that participants were slightly more careless when completing computer/web-based survey materials than in paper/pencil mode. Next, in a German student sample, participants were slightly more careless when completing the paper/pencil mode compared to the smartphone mode. Finally, in a sample of Chinese-speaking students, we found no difference between modes. Overall, in a meta-analysis of the findings, we found minimal difference between modes across cultures. Theoretical and practical implications are discussed.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"121 - 153"},"PeriodicalIF":1.7,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45224199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cognitive diagnosis models and automated test assembly: an approach incorporating response times 认知诊断模型和自动化测试装配:一种结合响应时间的方法
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-10-23 DOI: 10.1080/15305058.2020.1828427
M. Finkelman, J. de la Torre, Jeremy Karp
Abstract Cognitive diagnosis models (CDMs) have been studied as a means of providing detailed diagnostic information about the skills that have been mastered, and the skills that have not, by examinees. Prior research has examined the use of automated test assembly (ATA) alongside CDMs; however, no previous study has investigated how to perform ATA when a CDM is employed and the total amount of time taken by the test must be controlled. The purpose of the current research was to develop an ATA procedure to select tests that are highly informative while simultaneously satisfying constraints on key parameters related to the total-time distribution. In a simulation study, the procedure successfully selected tests that met these dual goals.
摘要认知诊断模型(CDMs)作为一种提供详细诊断信息的手段,被研究的对象是考生已掌握的技能和尚未掌握的技能。先前的研究已经检查了自动测试组装(ATA)与cdm的使用;然而,在使用CDM并且必须控制测试所花费的总时间的情况下,如何执行ATA,以前没有研究。当前研究的目的是开发一种ATA程序,以选择具有高信息量的测试,同时满足与总时间分布相关的关键参数的约束。在模拟研究中,该程序成功地选择了满足这两个目标的测试。
{"title":"Cognitive diagnosis models and automated test assembly: an approach incorporating response times","authors":"M. Finkelman, J. de la Torre, Jeremy Karp","doi":"10.1080/15305058.2020.1828427","DOIUrl":"https://doi.org/10.1080/15305058.2020.1828427","url":null,"abstract":"Abstract Cognitive diagnosis models (CDMs) have been studied as a means of providing detailed diagnostic information about the skills that have been mastered, and the skills that have not, by examinees. Prior research has examined the use of automated test assembly (ATA) alongside CDMs; however, no previous study has investigated how to perform ATA when a CDM is employed and the total amount of time taken by the test must be controlled. The purpose of the current research was to develop an ATA procedure to select tests that are highly informative while simultaneously satisfying constraints on key parameters related to the total-time distribution. In a simulation study, the procedure successfully selected tests that met these dual goals.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"299 - 320"},"PeriodicalIF":1.7,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2020.1828427","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44001960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Coaching β in admission test performance: a study of group differences 辅导β在入学考试成绩:组间差异的研究
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-07-31 DOI: 10.1080/15305058.2020.1786833
Anely Ramírez, Mladen Koljatic, Mónica Silva
Abstract The study addresses the association between coaching practices and university admission test performance in Chile. Estimates of coaching effects are reported for test-takers from the private and public school systems. Our results indicate that coaching is associated with variations in test scores. The estimated magnitude of coaching appears to vary by subject area, type of coaching strategy and type of high school attended.
摘要这项研究探讨了教练实践与智利大学入学考试成绩之间的关系。对私立和公立学校系统的考生的辅导效果进行了估计。我们的研究结果表明,辅导与考试成绩的变化有关。辅导的估计规模似乎因学科领域、辅导策略类型和就读高中类型而异。
{"title":"Coaching β in admission test performance: a study of group differences","authors":"Anely Ramírez, Mladen Koljatic, Mónica Silva","doi":"10.1080/15305058.2020.1786833","DOIUrl":"https://doi.org/10.1080/15305058.2020.1786833","url":null,"abstract":"Abstract The study addresses the association between coaching practices and university admission test performance in Chile. Estimates of coaching effects are reported for test-takers from the private and public school systems. Our results indicate that coaching is associated with variations in test scores. The estimated magnitude of coaching appears to vary by subject area, type of coaching strategy and type of high school attended.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"253 - 273"},"PeriodicalIF":1.7,"publicationDate":"2020-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2020.1786833","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48759901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Examining the simultaneous change in emotions during a test: relations with expended effort and test performance 考察考试中情绪的同步变化:与付出的努力和考试成绩的关系
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-07-24 DOI: 10.1080/15305058.2020.1786834
S. Finney, B. Perkins, Paulius Satkus
Abstract Using a sample of 497 college students, we measured test-taking emotions (anger, worry, pride, enjoyment) after the first third, second third, and last third of a low-stakes cognitive test of sociocultural knowledge. We examined the simultaneous change in emotions and whether change in emotions predicted subsequent test-taking effort and test performance. Latent growth models indicated that, on average, enjoyment and anger increased, whereas pride and worry decreased during the test. There was significant variability in individual change about these averages. Positive correlations were observed between change in worry and anger and change in pride and enjoyment. Structural equation models indicated that all initial emotions and gains in pride during the test influenced subsequent effort, whereas initial worry, anger and enjoyment, change in pride and enjoyment, and effort influenced test scores. The findings emphasize the importance of assessing change in emotions and the mediation mechanism of effort when modeling test performance.
摘要以497名大学生为研究对象,在社会文化知识低风险认知测试的前三分之一、后三分之一和后三分之一测试后,测量了他们的应试情绪(愤怒、担忧、骄傲、享受)。我们检查了情绪的同步变化,以及情绪的变化是否预测了随后的考试努力和考试成绩。潜在增长模型显示,平均而言,在测试过程中,快乐和愤怒增加了,而骄傲和担忧减少了。这些平均值在个体变化中有显著的可变性。焦虑和愤怒的变化与骄傲和快乐的变化之间存在正相关。结构方程模型表明,在测试过程中,所有最初的情绪和自豪感的获得都会影响随后的努力,而最初的担忧、愤怒和享受、自豪感和享受的变化以及努力会影响测试成绩。研究结果强调了在模拟测试表现时评估情绪变化和努力的中介机制的重要性。
{"title":"Examining the simultaneous change in emotions during a test: relations with expended effort and test performance","authors":"S. Finney, B. Perkins, Paulius Satkus","doi":"10.1080/15305058.2020.1786834","DOIUrl":"https://doi.org/10.1080/15305058.2020.1786834","url":null,"abstract":"Abstract Using a sample of 497 college students, we measured test-taking emotions (anger, worry, pride, enjoyment) after the first third, second third, and last third of a low-stakes cognitive test of sociocultural knowledge. We examined the simultaneous change in emotions and whether change in emotions predicted subsequent test-taking effort and test performance. Latent growth models indicated that, on average, enjoyment and anger increased, whereas pride and worry decreased during the test. There was significant variability in individual change about these averages. Positive correlations were observed between change in worry and anger and change in pride and enjoyment. Structural equation models indicated that all initial emotions and gains in pride during the test influenced subsequent effort, whereas initial worry, anger and enjoyment, change in pride and enjoyment, and effort influenced test scores. The findings emphasize the importance of assessing change in emotions and the mediation mechanism of effort when modeling test performance.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"274 - 298"},"PeriodicalIF":1.7,"publicationDate":"2020-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2020.1786834","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42495913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Identifying Misfitting Achievement Estimates in Performance Assessments: An Illustration Using Rasch and Mokken Scale Analyses 在绩效评估中识别不合适的成就估计:使用Rasch和Mokken量表分析的说明
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-07-02 DOI: 10.1080/15305058.2019.1673758
A. Walker, Stefanie A. Wind
Researchers apply individual person fit analyses as a procedure for checking model-data fit for individual test-takers. When a test-taker misfits, it means that the inferences from their test score regarding what they know and can do may not be accurate. One problem in applying individual person fit procedures in practice is the question of how much misfit it takes to make the test score an untrustworthy estimate of achievement. In this paper, we argue that if a person’s responses generally follow a monotonic pattern, the resulting test score is “good enough” to be interpreted and used. We present an approach that applies statistical procedures from the Rasch and Mokken measurement perspectives to examine individual person fit based on this good enough criterion in real data from a performance assessment. We discuss how these perspectives may facilitate thinking about applying individual person fit procedures in practice.
研究人员将个人适合性分析作为检查模型数据适合个人考生的程序。当一个考生不及格时,这意味着从他们的考试成绩中推断出他们知道什么和能做什么可能不准确。在实践中应用个人适合程序的一个问题是,要想让考试成绩成为一个不可信的成绩评估,需要多少不适合。在本文中,我们认为,如果一个人的回答通常遵循单调模式,那么由此产生的测试分数“足够好”,可以被解释和使用。我们提出了一种方法,该方法从Rasch和Mokken测量的角度应用统计程序,根据绩效评估的真实数据中的这一足够好的标准来检查个人适合度。我们讨论了这些观点如何促进在实践中应用个人适合程序的思考。
{"title":"Identifying Misfitting Achievement Estimates in Performance Assessments: An Illustration Using Rasch and Mokken Scale Analyses","authors":"A. Walker, Stefanie A. Wind","doi":"10.1080/15305058.2019.1673758","DOIUrl":"https://doi.org/10.1080/15305058.2019.1673758","url":null,"abstract":"Researchers apply individual person fit analyses as a procedure for checking model-data fit for individual test-takers. When a test-taker misfits, it means that the inferences from their test score regarding what they know and can do may not be accurate. One problem in applying individual person fit procedures in practice is the question of how much misfit it takes to make the test score an untrustworthy estimate of achievement. In this paper, we argue that if a person’s responses generally follow a monotonic pattern, the resulting test score is “good enough” to be interpreted and used. We present an approach that applies statistical procedures from the Rasch and Mokken measurement perspectives to examine individual person fit based on this good enough criterion in real data from a performance assessment. We discuss how these perspectives may facilitate thinking about applying individual person fit procedures in practice.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"231 - 251"},"PeriodicalIF":1.7,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1673758","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49272081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Investigating Technology-Enhanced Item Formats Using Cognitive and Item Response Theory Approaches 使用认知和项目反应理论方法调查技术增强的项目格式
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-04-02 DOI: 10.1080/15305058.2019.1648270
J. Moon, S. Sinharay, M. Keehner, Irvin R. Katz
The current study examined the relationship between test-taker cognition and psychometric item properties in multiple-selection multiple-choice and grid items. In a study with content-equivalent mathematics items in alternative item formats, adult participants’ tendency to respond to an item was affected by the presence of a grid and variations of answer options. The results of an item response theory analysis were consistent with the hypothesized cognitive processes in alternative item formats. The findings suggest that seemingly subtle variations of item design could substantially affect test-taker cognition and psychometric outcomes, emphasizing the need for investigating item format effects at a fine-grained level.
本研究考察了多项选择题、多项选择题和栅格题中考生认知与心理测量项目属性的关系。在一项研究中,在替代项目格式的内容等效数学项目中,成人参与者对一个项目的反应倾向受到网格和答案选项变化的影响。项目反应理论分析的结果与假设的认知过程在替代项目格式是一致的。研究结果表明,看似微妙的项目设计变化可能会实质性地影响考生的认知和心理测量结果,强调需要在细粒度水平上研究项目格式的影响。
{"title":"Investigating Technology-Enhanced Item Formats Using Cognitive and Item Response Theory Approaches","authors":"J. Moon, S. Sinharay, M. Keehner, Irvin R. Katz","doi":"10.1080/15305058.2019.1648270","DOIUrl":"https://doi.org/10.1080/15305058.2019.1648270","url":null,"abstract":"The current study examined the relationship between test-taker cognition and psychometric item properties in multiple-selection multiple-choice and grid items. In a study with content-equivalent mathematics items in alternative item formats, adult participants’ tendency to respond to an item was affected by the presence of a grid and variations of answer options. The results of an item response theory analysis were consistent with the hypothesized cognitive processes in alternative item formats. The findings suggest that seemingly subtle variations of item design could substantially affect test-taker cognition and psychometric outcomes, emphasizing the need for investigating item format effects at a fine-grained level.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"122 - 145"},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1648270","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46485223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information 题库信息不统一时计算机自适应测试的停止规则
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-04-02 DOI: 10.1080/15305058.2019.1635604
S. Morris, Mike Bass, Elizabeth Howard, R. Neapolitan
The standard error (SE) stopping rule, which terminates a computer adaptive test (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient-reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the predicted standard error reduction (PSER) stopping rule will stop the CAT even if the SE threshold has not been reached and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency. Using simulated data for the Patient-Reported Outcomes Measurement Information System Anxiety and Physical Function banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the SE stopping rule overall, particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT.
标准误差(SE)停止规则,当SE小于阈值时终止计算机自适应测试(CAT),在所有性状水平都存在信息性问题时有效。然而,在诸如患者报告结果等领域,信息库中的项目可能都针对特征连续体的一端(例如,阴性症状),并且信息库可能对许多个体缺乏深度。在这种情况下,即使没有达到SE阈值,预测的标准错误减少(PSER)停止规则也会停止CAT,并且可以避免管理提供很少额外信息的过多问题。通过调优PSER算法的参数,从业者可以在准确性和效率之间指定理想的权衡。使用患者报告结果测量信息系统焦虑和身体功能库的模拟数据,我们证明这些参数可以显著影响CAT的表现。当参数被优化后,发现PSER停止规则总体上优于SE停止规则,特别是对于非银行目标的个体,并且在特征连续体中呈现大致相同数量的项目。因此,PSER停止规则为平衡CAT的精度和效率提供了一种有效的方法。
{"title":"Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information","authors":"S. Morris, Mike Bass, Elizabeth Howard, R. Neapolitan","doi":"10.1080/15305058.2019.1635604","DOIUrl":"https://doi.org/10.1080/15305058.2019.1635604","url":null,"abstract":"The standard error (SE) stopping rule, which terminates a computer adaptive test (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient-reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the predicted standard error reduction (PSER) stopping rule will stop the CAT even if the SE threshold has not been reached and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency. Using simulated data for the Patient-Reported Outcomes Measurement Information System Anxiety and Physical Function banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the SE stopping rule overall, particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"146 - 168"},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1635604","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43767801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
True or False? Keying Direction and Acquiescence Influence the Validity of Socio-Emotional Skills Items in Predicting High School Achievement 是真是假?键盘指向和停顿对社会情感技能项目预测高中成绩有效性的影响
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-04-02 DOI: 10.1080/15305058.2019.1673398
Ricardo Primi, Filip De Fruyt, Daniel Santos, Stephen Antonoplis, O. John
What type of items, keyed positively or negatively, makes social-emotional skill or personality scales more valid? The present study examines the different criterion validities of true- and false-keyed items, before and after correction for acquiescence. The sample included 12,987 children and adolescents from 425 schools of the State of São Paulo Brazil (ages 11–18 attending grades 6–12). They answered a computerized 162-item questionnaire measuring 18 facets grouped into five broad domains of social-emotional skills, i.e.: Open-mindedness (O), Conscientious Self-Management (C), Engaging with others (E), Amity (A), and Negative-Emotion Regulation (N). All facet scales were fully balanced (3 true-keyed and 3 false-keyed items per facet). Criterion validity coefficients of scales composed of only true-keyed items versus only false-keyed items were compared. The criterion measure was a standardized achievement test of language and math ability. We found that coefficients were almost as twice as big for false-keyed items’ scales than for true-keyed items’ scales. After correcting for acquiescence coefficients became more similar. Acquiescence suppresses the criterion validity of unbalanced scales composed of true-keyed items. We conclude that balanced scales with pairs of true and false keyed items make a better scale in terms of internal structural and predictive validity.
哪种类型的项目,无论是积极的还是消极的,都会使社交情感技能或性格量表更有效?本研究考察了在默认校正前后,真键和假键项目的不同标准有效性。样本包括来自巴西圣保罗州425所学校的12987名儿童和青少年(11-18岁,6-12年级)。他们回答了一份162项的计算机化问卷,测量了社会情感技能的18个方面,分为五个广泛的领域,即:开放心态(O)、认真的自我管理(C)、与他人交往(e)、友善(a)和消极情绪调节(N)。所有方面的量表都是完全平衡的(每个方面有3个真键和3个假键项目)。比较了仅由真键项目和仅由假键项目组成的量表的标准有效性系数。衡量标准是语言和数学能力的标准化成绩测试。我们发现,假键项目的量表的系数几乎是真键项目量表的两倍。在对默认系数进行校正后,系数变得更加相似。停顿抑制了由真键项目组成的不平衡量表的标准有效性。我们得出的结论是,在内部结构和预测有效性方面,具有成对真键和假键项目的平衡量表是一个更好的量表。
{"title":"True or False? Keying Direction and Acquiescence Influence the Validity of Socio-Emotional Skills Items in Predicting High School Achievement","authors":"Ricardo Primi, Filip De Fruyt, Daniel Santos, Stephen Antonoplis, O. John","doi":"10.1080/15305058.2019.1673398","DOIUrl":"https://doi.org/10.1080/15305058.2019.1673398","url":null,"abstract":"What type of items, keyed positively or negatively, makes social-emotional skill or personality scales more valid? The present study examines the different criterion validities of true- and false-keyed items, before and after correction for acquiescence. The sample included 12,987 children and adolescents from 425 schools of the State of São Paulo Brazil (ages 11–18 attending grades 6–12). They answered a computerized 162-item questionnaire measuring 18 facets grouped into five broad domains of social-emotional skills, i.e.: Open-mindedness (O), Conscientious Self-Management (C), Engaging with others (E), Amity (A), and Negative-Emotion Regulation (N). All facet scales were fully balanced (3 true-keyed and 3 false-keyed items per facet). Criterion validity coefficients of scales composed of only true-keyed items versus only false-keyed items were compared. The criterion measure was a standardized achievement test of language and math ability. We found that coefficients were almost as twice as big for false-keyed items’ scales than for true-keyed items’ scales. After correcting for acquiescence coefficients became more similar. Acquiescence suppresses the criterion validity of unbalanced scales composed of true-keyed items. We conclude that balanced scales with pairs of true and false keyed items make a better scale in terms of internal structural and predictive validity.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"121 - 97"},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1673398","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49361168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
International Journal of Testing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1