首页 > 最新文献

International Journal of Testing最新文献

英文 中文
The Recovery of Correlation Between Latent Abilities Using Compensatory and Noncompensatory Multidimensional IRT Models 利用代偿和非代偿多维IRT模型恢复潜在能力之间的相关性
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-04-02 DOI: 10.1080/15305058.2019.1692212
Yanyan Fu, Tyler Strachan, E. Ip, John T. Willse, Shyh-Huei Chen, Terry A. Ackerman
This research examined correlation estimates between latent abilities when using the two-dimensional and three-dimensional compensatory and noncompensatory item response theory models. Simulation study results showed that the recovery of the latent correlation was best when the test contained 100% of simple structure items for all models and conditions. When a test measured weakly discriminated dimensions, it became harder to recover the latent correlation. Results also showed that increasing the sample size, test length, or using simpler models (i.e., two-parameter logistic rather than three-parameter logistic, compensatory rather than noncompensatory) could improve the recovery of latent correlation.
当使用二维和三维补偿和非补偿项目反应理论模型时,本研究检验了潜在能力之间的相关性估计。模拟研究结果表明,在所有模型和条件下,当测试包含100%的简单结构项时,潜在相关性的恢复最好。当测试测量到弱辨别维度时,恢复潜在相关性变得更加困难。结果还表明,增加样本量、测试长度或使用更简单的模型(即两参数逻辑而非三参数逻辑,补偿而非非补偿)可以提高潜在相关性的恢复。
{"title":"The Recovery of Correlation Between Latent Abilities Using Compensatory and Noncompensatory Multidimensional IRT Models","authors":"Yanyan Fu, Tyler Strachan, E. Ip, John T. Willse, Shyh-Huei Chen, Terry A. Ackerman","doi":"10.1080/15305058.2019.1692212","DOIUrl":"https://doi.org/10.1080/15305058.2019.1692212","url":null,"abstract":"This research examined correlation estimates between latent abilities when using the two-dimensional and three-dimensional compensatory and noncompensatory item response theory models. Simulation study results showed that the recovery of the latent correlation was best when the test contained 100% of simple structure items for all models and conditions. When a test measured weakly discriminated dimensions, it became harder to recover the latent correlation. Results also showed that increasing the sample size, test length, or using simpler models (i.e., two-parameter logistic rather than three-parameter logistic, compensatory rather than noncompensatory) could improve the recovery of latent correlation.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"169 - 186"},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1692212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44045191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment 同伴评价者的素质特征对同伴评价者错误率的影响
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-02-12 DOI: 10.1080/15305058.2020.1720216
Xiuyan Guo, Pui‐wa Lei
Little research has been done on the effects of peer raters’ quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters’ qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment where training and motivation interventions were manipulated, 24 classes with 838 high school students were randomly assigned to study conditions. Inter-rater error, intra-rater error and criterion error indices for peer ratings on four selected essays were analyzed using hierarchical linear models. Results indicated that peer raters’ content knowledge, previous rating experience, and rating motivation were associated with rating errors. This study also found some significant interactions between peer raters’ quality characteristics. Implications for in-person and online peer assessments as well as future directions are discussed.
同行评议员的素质特征对同行评议质量的影响研究很少。本研究旨在解决这一差距,并调查与同行评分员素质相关的关键变量的影响,包括内容知识、以往评级经验、评级任务培训和评级动机。在一项训练和动机干预被操纵的实验中,24个班级的838名高中生被随机分配到研究条件中。采用层次线性模型分析了四篇文章的评价者间误差、评价者内误差和标准误差指数。结果表明,同伴评分者的内容知识、以往评分经验和评分动机与评分错误有关。本研究还发现同伴评价员的素质特征之间存在显著的交互作用。讨论了对面对面和在线同行评估的影响以及未来的方向。
{"title":"Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment","authors":"Xiuyan Guo, Pui‐wa Lei","doi":"10.1080/15305058.2020.1720216","DOIUrl":"https://doi.org/10.1080/15305058.2020.1720216","url":null,"abstract":"Little research has been done on the effects of peer raters’ quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters’ qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment where training and motivation interventions were manipulated, 24 classes with 838 high school students were randomly assigned to study conditions. Inter-rater error, intra-rater error and criterion error indices for peer ratings on four selected essays were analyzed using hierarchical linear models. Results indicated that peer raters’ content knowledge, previous rating experience, and rating motivation were associated with rating errors. This study also found some significant interactions between peer raters’ quality characteristics. Implications for in-person and online peer assessments as well as future directions are discussed.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"206 - 230"},"PeriodicalIF":1.7,"publicationDate":"2020-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2020.1720216","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43660947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Relationship between Response-Time Effort and Accuracy in PISA Science Multiple Choice Items PISA科学选择题反应时间努力与准确性的关系
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-01-10 DOI: 10.1080/15305058.2019.1706529
M. Michaelides, M. Ivanova, C. Nicolaou
The study examined the relationship between examinees’ test-taking effort and their accuracy rate on items from the PISA 2015 assessment. The 10% normative threshold method was applied on Science multiple-choice items in the Cyprus sample to detect rapid guessing behavior. Results showed that the extent of rapid guessing across simple and complex multiple-choice items was on average less than 6% per item. Rapid guessers were identified, and for most items their accuracy was lower than the accuracy for students engaging in solution-based behavior. Examinees with higher overall performance on the test items tended to engage in less rapid guessing than their lower performing peers. Overall, this empirical investigation presents original evidence on test-taking effort as measured by response time in PISA items and tests propositions of Wise’s (2017) Test-Taking Theory.
该研究考察了考生的考试努力与2015年PISA评估项目的准确率之间的关系。将10%标准阈值法应用于塞浦路斯样本中的科学多项选择题,以检测快速猜测行为。结果显示,在简单和复杂的多项选择题中,每个项目的快速猜测程度平均不到6%。快速猜测者被识别出来,对于大多数项目,他们的准确性低于参与基于解决方案的行为的学生的准确性。在测试项目上总体表现较高的考生往往比表现较差的同龄人猜得不那么快。总的来说,这项实证调查提供了通过PISA项目中的反应时间来衡量考试努力的原始证据,并测试了Wise(2017)考试理论的命题。
{"title":"The Relationship between Response-Time Effort and Accuracy in PISA Science Multiple Choice Items","authors":"M. Michaelides, M. Ivanova, C. Nicolaou","doi":"10.1080/15305058.2019.1706529","DOIUrl":"https://doi.org/10.1080/15305058.2019.1706529","url":null,"abstract":"The study examined the relationship between examinees’ test-taking effort and their accuracy rate on items from the PISA 2015 assessment. The 10% normative threshold method was applied on Science multiple-choice items in the Cyprus sample to detect rapid guessing behavior. Results showed that the extent of rapid guessing across simple and complex multiple-choice items was on average less than 6% per item. Rapid guessers were identified, and for most items their accuracy was lower than the accuracy for students engaging in solution-based behavior. Examinees with higher overall performance on the test items tended to engage in less rapid guessing than their lower performing peers. Overall, this empirical investigation presents original evidence on test-taking effort as measured by response time in PISA items and tests propositions of Wise’s (2017) Test-Taking Theory.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"187 - 205"},"PeriodicalIF":1.7,"publicationDate":"2020-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1706529","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43585415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Log Data Analysis with ANFIS: A Fuzzy Neural Network Approach 基于ANFIS的测井数据分析:一种模糊神经网络方法
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-01-02 DOI: 10.1080/15305058.2018.1551225
Ying Cui, Qi Guo, Jacqueline P. Leighton, Man-Wai Chu
This study explores the use of the Adaptive Neuro-Fuzzy Inference System (ANFIS), a neuro-fuzzy approach, to analyze the log data of technology-based assessments to extract relevant features of student problem-solving processes, and develop and refine a set of fuzzy logic rules that could be used to interpret student performance. The log data that record student response processes while solving a science simulation task were analyzed with ANFIS. Results indicate the ANFIS analysis could generate and refine a set of fuzzy rules that shed lights on the process of how students solve the simulation task. We conclude the article by discussing the advantages of combining human judgments with the learning capacity of ANFIS for log data analysis and outlining the limitations of the current study and areas of future research.
本研究探索使用神经模糊方法自适应神经模糊推理系统(ANFIS)分析基于技术的评估日志数据,以提取学生问题解决过程的相关特征,并开发和完善一套可用于解释学生表现的模糊逻辑规则。利用ANFIS分析了学生在解决科学模拟任务时的反应过程日志数据。结果表明,ANFIS分析可以生成并细化一组模糊规则,这些规则揭示了学生如何解决模拟任务的过程。最后,我们讨论了将人工判断与ANFIS的学习能力结合起来进行测井数据分析的优势,并概述了当前研究的局限性和未来研究的领域。
{"title":"Log Data Analysis with ANFIS: A Fuzzy Neural Network Approach","authors":"Ying Cui, Qi Guo, Jacqueline P. Leighton, Man-Wai Chu","doi":"10.1080/15305058.2018.1551225","DOIUrl":"https://doi.org/10.1080/15305058.2018.1551225","url":null,"abstract":"This study explores the use of the Adaptive Neuro-Fuzzy Inference System (ANFIS), a neuro-fuzzy approach, to analyze the log data of technology-based assessments to extract relevant features of student problem-solving processes, and develop and refine a set of fuzzy logic rules that could be used to interpret student performance. The log data that record student response processes while solving a science simulation task were analyzed with ANFIS. Results indicate the ANFIS analysis could generate and refine a set of fuzzy rules that shed lights on the process of how students solve the simulation task. We conclude the article by discussing the advantages of combining human judgments with the learning capacity of ANFIS for log data analysis and outlining the limitations of the current study and areas of future research.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"78 - 96"},"PeriodicalIF":1.7,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1551225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48938428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Engineering a Twenty-First Century Reading Comprehension Assessment System Utilizing Scenario-Based Assessment Techniques 利用情景评估技术构建21世纪阅读理解评估系统
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-01-02 DOI: 10.1080/15305058.2018.1551224
J. Sabatini, T. O’Reilly, Jonathan P. Weeks, Zuowei Wang
The construct of reading comprehension has changed significantly in the twenty-first century; however, some test designs have not evolved sufficiently to capture these changes. Specifically, the nature of literacy sources and skills required has changed (wrought primarily by widespread use of digital technologies). Modern theories of comprehension and discourse processes have been developed to accommodate these changes, and the learning sciences have followed suit. These influences have significant implications for how we think about the development of comprehension proficiency across grades. In this paper, we describe a theoretically driven, developmentally sensitive assessment system based on a scenario-based assessment paradigm, and present evidence for its feasibility and psychometric soundness.
阅读理解的结构在21世纪发生了重大变化;然而,一些测试设计还没有发展到足以捕捉这些变化。具体来说,扫盲资源和所需技能的性质发生了变化(主要是由于数字技术的广泛使用)。现代理解和话语过程理论的发展适应了这些变化,学习科学也随之发展。这些影响对我们如何看待跨年级理解能力的发展具有重要意义。在本文中,我们描述了一个理论驱动的、基于场景的评估范式的发展敏感的评估系统,并提供了其可行性和心理测量合理性的证据。
{"title":"Engineering a Twenty-First Century Reading Comprehension Assessment System Utilizing Scenario-Based Assessment Techniques","authors":"J. Sabatini, T. O’Reilly, Jonathan P. Weeks, Zuowei Wang","doi":"10.1080/15305058.2018.1551224","DOIUrl":"https://doi.org/10.1080/15305058.2018.1551224","url":null,"abstract":"The construct of reading comprehension has changed significantly in the twenty-first century; however, some test designs have not evolved sufficiently to capture these changes. Specifically, the nature of literacy sources and skills required has changed (wrought primarily by widespread use of digital technologies). Modern theories of comprehension and discourse processes have been developed to accommodate these changes, and the learning sciences have followed suit. These influences have significant implications for how we think about the development of comprehension proficiency across grades. In this paper, we describe a theoretically driven, developmentally sensitive assessment system based on a scenario-based assessment paradigm, and present evidence for its feasibility and psychometric soundness.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"1 - 23"},"PeriodicalIF":1.7,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1551224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47386975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
The (Non)Impact of Differential Test Taker Engagement on Aggregated Scores 不同考生参与对总分的(非)影响
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-01-02 DOI: 10.1080/15305058.2019.1605999
S. Wise, J. Soland, Y. Bo
Disengaged test taking tends to be most prevalent with low-stakes tests. This has led to questions about the validity of aggregated scores from large-scale international assessments such as PISA and TIMSS, as previous research has found a meaningful correlation between the mean engagement and mean performance of countries. The current study, using data from the computer-based version of the PISA-Based Test for Schools, examined the distortive effects of differential engagement on aggregated school-level scores. The results showed that, although there was considerable differential engagement among schools, the school means were highly stable due to two factors. First, any distortive effects of disengagement in a school were diluted by a high proportion of the students exhibiting no non-effortful behavior. Second, and most interestingly, disengagement produced both positive and negative distortion of individual student scores, which tended to cancel out much of the net distortive effect on the school’s mean.
在低风险的考试中,心不在焉的考试往往最为普遍。这导致了对大规模国际评估(如PISA和TIMSS)汇总分数有效性的质疑,因为之前的研究发现,国家的平均参与度和平均表现之间存在有意义的相关性。目前的研究使用了基于pisa的学校测试(Test for Schools)的计算机版数据,研究了不同参与程度对学校总体成绩的扭曲效应。结果表明,虽然学校之间的参与程度存在较大差异,但由于两个因素,学校的投入程度高度稳定。首先,在学校里,没有表现出不努力行为的学生所占比例很高,这就稀释了任何不投入的扭曲效应。其次,也是最有趣的一点是,不投入对个别学生的成绩产生了积极和消极的扭曲,这往往会抵消对学校平均分的大部分净扭曲效应。
{"title":"The (Non)Impact of Differential Test Taker Engagement on Aggregated Scores","authors":"S. Wise, J. Soland, Y. Bo","doi":"10.1080/15305058.2019.1605999","DOIUrl":"https://doi.org/10.1080/15305058.2019.1605999","url":null,"abstract":"Disengaged test taking tends to be most prevalent with low-stakes tests. This has led to questions about the validity of aggregated scores from large-scale international assessments such as PISA and TIMSS, as previous research has found a meaningful correlation between the mean engagement and mean performance of countries. The current study, using data from the computer-based version of the PISA-Based Test for Schools, examined the distortive effects of differential engagement on aggregated school-level scores. The results showed that, although there was considerable differential engagement among schools, the school means were highly stable due to two factors. First, any distortive effects of disengagement in a school were diluted by a high proportion of the students exhibiting no non-effortful behavior. Second, and most interestingly, disengagement produced both positive and negative distortion of individual student scores, which tended to cancel out much of the net distortive effect on the school’s mean.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"57 - 77"},"PeriodicalIF":1.7,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1605999","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47045097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information. 题库信息不统一时计算机自适应测试的停止规则。
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2020-01-01 Epub Date: 2019-07-16
Scott B Morris, Michael Bass, Elizabeth Howard, Richard E Neapolitan

The standard error (SE) stopping rule, which terminates a computer adaptive test (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the predicted standard error reduction (PSER) stopping rule will stop the CAT even if the SE threshold has not been reached, and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency. Using simulated data for the PROMIS Anxiety and Physical Function banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the SE stopping rule overall and particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT.

标准误差(SE)停止规则,当SE小于阈值时终止计算机自适应测试(CAT),在所有性状水平都存在信息性问题时有效。然而,在诸如患者报告结果等领域,信息库中的项目可能都针对特征连续体的一端(例如,阴性症状),并且信息库可能对许多个体缺乏深度。在这种情况下,即使没有达到SE阈值,预测的标准错误减少(PSER)停止规则也会停止CAT,并且可以避免管理提供很少额外信息的过多问题。通过调优PSER算法的参数,从业者可以在准确性和效率之间指定理想的权衡。利用PROMIS焦虑和身体功能库的模拟数据,我们证明了这些参数可以极大地影响CAT的性能。当参数被优化后,发现PSER停止规则总体上优于SE停止规则,特别是对于非银行目标的个体,并且在特征连续体中呈现大致相同数量的项目。因此,PSER停止规则为平衡CAT的精度和效率提供了一种有效的方法。
{"title":"Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information.","authors":"Scott B Morris,&nbsp;Michael Bass,&nbsp;Elizabeth Howard,&nbsp;Richard E Neapolitan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The <i>standard error</i> (<i>SE</i>) stopping rule, which terminates a <i>computer adaptive test</i> (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the <i>predicted standard error reduction</i> (PSER) stopping rule will stop the CAT even if the <i>SE</i> threshold has not been reached, and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency<i>.</i> Using simulated data for the PROMIS <i>Anxiety</i> and <i>Physical Function</i> banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the <i>SE</i> stopping rule overall and particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT.</p>","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 2","pages":"146-168"},"PeriodicalIF":1.7,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518406/pdf/nihms-1534260.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38521672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations ITC语言和文化多样性人群大规模评估指南
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2019-10-02 DOI: 10.1080/15305058.2019.1631024
M. Oliveri
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the fair and valid assessment of linguistically or culturally diverse populations. They are meant to apply to most, if not all, aspects of the development, administration, scoring, and use of assessments; and are intended to supplement other existing professional standards or guidelines for testing and assessment. That is, these guidelines focus on the types of adaptations and considerations to use when developing, reviewing, and interpreting items and test scores from tests administered to culturally and linguistically or culturally diverse populations. Other guidelines such as the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) or Guidelines for Best Practice in Cross-Cultural Surveys (Survey Research Center, 2016) may also be relevant to testing linguistically and culturally diverse populations.
这些指南描述了在语言或文化多样的国家或地区内或跨国家或地区对考生进行评估时的相关考虑因素。该指南由一个专家委员会制定,旨在帮助测试开发人员、心理测量师、测试用户和测试管理员了解公平问题,以支持对语言或文化多样性人群进行公平有效的评估。它们适用于评估的制定、管理、评分和使用的大部分(如果不是全部的话)方面;旨在补充其他现有的测试和评估专业标准或指南。也就是说,这些指南侧重于在开发、审查和解释针对文化、语言或文化多样性人群的测试项目和测试成绩时使用的适应类型和考虑因素。其他指南,如《教育和心理测试标准》(AERA、APA和NCME,2014)或《跨文化调查最佳实践指南》(调查研究中心,2016)也可能与测试语言和文化多样性人群有关。
{"title":"ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations","authors":"M. Oliveri","doi":"10.1080/15305058.2019.1631024","DOIUrl":"https://doi.org/10.1080/15305058.2019.1631024","url":null,"abstract":"These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the fair and valid assessment of linguistically or culturally diverse populations. They are meant to apply to most, if not all, aspects of the development, administration, scoring, and use of assessments; and are intended to supplement other existing professional standards or guidelines for testing and assessment. That is, these guidelines focus on the types of adaptations and considerations to use when developing, reviewing, and interpreting items and test scores from tests administered to culturally and linguistically or culturally diverse populations. Other guidelines such as the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) or Guidelines for Best Practice in Cross-Cultural Surveys (Survey Research Center, 2016) may also be relevant to testing linguistically and culturally diverse populations.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"301 - 336"},"PeriodicalIF":1.7,"publicationDate":"2019-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1631024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49265430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Migration Background in PISA’s Measure of Social Belonging: Using a Diffractive Lens to Interpret Multi-Method DIF Studies PISA社会归属测量中的移民背景:用衍射透镜解释多方法DIF研究
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2019-07-16 DOI: 10.1080/15305058.2019.1632316
Nathan D. Roberson, B. Zumbo
This paper investigates measurement invariance as it relates to migration background using the Program for International Student Assessment measure of social belonging. We explore how the use of two measurement invariance techniques provide insights into differential item functioning using the alignment method in conjunction with logistic regression in the case of multiple group comparisons. Social belonging is a central human need, and we argue that immigration background is important factor when considering how an individual interacts with a survey/items about belonging. Overall results from both the alignment method and ordinal logistic regression, interpreted through a diffractive lens, suggest that it is inappropriate to treat peoples of four different immigration backgrounds within the countries analyzed as exchangeable groups.
本文使用国际学生社会归属感评估程序研究了与移民背景相关的测量不变性。我们探讨了在多组比较的情况下,使用对齐方法和逻辑回归,使用两种测量不变性技术如何深入了解差异项目功能。社会归属感是人类的核心需求,我们认为,在考虑个人如何与关于归属感的调查/项目互动时,移民背景是一个重要因素。通过衍射透镜解释的比对方法和有序逻辑回归的总体结果表明,将所分析国家内四种不同移民背景的人视为可交换群体是不合适的。
{"title":"Migration Background in PISA’s Measure of Social Belonging: Using a Diffractive Lens to Interpret Multi-Method DIF Studies","authors":"Nathan D. Roberson, B. Zumbo","doi":"10.1080/15305058.2019.1632316","DOIUrl":"https://doi.org/10.1080/15305058.2019.1632316","url":null,"abstract":"This paper investigates measurement invariance as it relates to migration background using the Program for International Student Assessment measure of social belonging. We explore how the use of two measurement invariance techniques provide insights into differential item functioning using the alignment method in conjunction with logistic regression in the case of multiple group comparisons. Social belonging is a central human need, and we argue that immigration background is important factor when considering how an individual interacts with a survey/items about belonging. Overall results from both the alignment method and ordinal logistic regression, interpreted through a diffractive lens, suggest that it is inappropriate to treat peoples of four different immigration backgrounds within the countries analyzed as exchangeable groups.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"363 - 389"},"PeriodicalIF":1.7,"publicationDate":"2019-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1632316","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44180342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Dynamic Multistage Testing: A Highly Efficient and Regulated Adaptive Testing Method 动态多阶段测试:一种高效调节的自适应测试方法
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2019-07-03 DOI: 10.1080/15305058.2019.1621871
Xiao Luo, Xinrui Wang
This study introduced dynamic multistage testing (dy-MST) as an improvement to existing adaptive testing methods. dy-MST combines the advantages of computerized adaptive testing (CAT) and computerized adaptive multistage testing (ca-MST) to create a highly efficient and regulated adaptive testing method. In the test construction phase, multistage panels are assembled using similar design principles and assembly techniques with ca-MST. In the administration phase, items are adaptively administered from a dynamic interim pool. A large-scale simulation study was conducted to evaluate the merits of dy-MST, and it found that dy-MST significantly reduced test length while maintaining the identical classification accuracy with the full-length tests and meeting all content requirements effectively. Psychometrically, the testing efficiency in dy-MST was comparable to CAT. Operationally, dy-MST allows for holistic pre-administration management of test content directly at the test level. Thus, dy-MST is deemed appropriate for delivering adaptive tests with high efficiency and well-controlled content.
本研究引入动态多级测试(dynamic multi - stage testing, dy-MST)作为现有自适应测试方法的改进。dy-MST结合了计算机化自适应测试(CAT)和计算机化自适应多阶段测试(ca-MST)的优点,创造了一种高效、规范的自适应测试方法。在测试施工阶段,多级面板使用与ca-MST相似的设计原则和组装技术进行组装。在管理阶段,从动态临时池自适应地管理项目。通过大规模的仿真研究,对dy-MST的优点进行了评价,发现dy-MST在保持与全长测试相同的分类精度的同时,显著缩短了测试长度,有效地满足了所有内容要求。在心理测量学上,dy-MST的测试效率与CAT相当。从操作上讲,dy-MST允许在考试阶段直接对考试内容进行全面的预管理。因此,dy-MST被认为适合于提供具有高效率和良好控制内容的自适应测试。
{"title":"Dynamic Multistage Testing: A Highly Efficient and Regulated Adaptive Testing Method","authors":"Xiao Luo, Xinrui Wang","doi":"10.1080/15305058.2019.1621871","DOIUrl":"https://doi.org/10.1080/15305058.2019.1621871","url":null,"abstract":"This study introduced dynamic multistage testing (dy-MST) as an improvement to existing adaptive testing methods. dy-MST combines the advantages of computerized adaptive testing (CAT) and computerized adaptive multistage testing (ca-MST) to create a highly efficient and regulated adaptive testing method. In the test construction phase, multistage panels are assembled using similar design principles and assembly techniques with ca-MST. In the administration phase, items are adaptively administered from a dynamic interim pool. A large-scale simulation study was conducted to evaluate the merits of dy-MST, and it found that dy-MST significantly reduced test length while maintaining the identical classification accuracy with the full-length tests and meeting all content requirements effectively. Psychometrically, the testing efficiency in dy-MST was comparable to CAT. Operationally, dy-MST allows for holistic pre-administration management of test content directly at the test level. Thus, dy-MST is deemed appropriate for delivering adaptive tests with high efficiency and well-controlled content.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"227 - 247"},"PeriodicalIF":1.7,"publicationDate":"2019-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1621871","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48949313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
International Journal of Testing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1