首页 > 最新文献

International Journal of Testing最新文献

英文 中文
Migration Background in PISA’s Measure of Social Belonging: Using a Diffractive Lens to Interpret Multi-Method DIF Studies PISA社会归属测量中的移民背景:用衍射透镜解释多方法DIF研究
IF 1.7 Q1 Social Sciences Pub Date : 2019-07-16 DOI: 10.1080/15305058.2019.1632316
Nathan D. Roberson, B. Zumbo
This paper investigates measurement invariance as it relates to migration background using the Program for International Student Assessment measure of social belonging. We explore how the use of two measurement invariance techniques provide insights into differential item functioning using the alignment method in conjunction with logistic regression in the case of multiple group comparisons. Social belonging is a central human need, and we argue that immigration background is important factor when considering how an individual interacts with a survey/items about belonging. Overall results from both the alignment method and ordinal logistic regression, interpreted through a diffractive lens, suggest that it is inappropriate to treat peoples of four different immigration backgrounds within the countries analyzed as exchangeable groups.
本文使用国际学生社会归属感评估程序研究了与移民背景相关的测量不变性。我们探讨了在多组比较的情况下,使用对齐方法和逻辑回归,使用两种测量不变性技术如何深入了解差异项目功能。社会归属感是人类的核心需求,我们认为,在考虑个人如何与关于归属感的调查/项目互动时,移民背景是一个重要因素。通过衍射透镜解释的比对方法和有序逻辑回归的总体结果表明,将所分析国家内四种不同移民背景的人视为可交换群体是不合适的。
{"title":"Migration Background in PISA’s Measure of Social Belonging: Using a Diffractive Lens to Interpret Multi-Method DIF Studies","authors":"Nathan D. Roberson, B. Zumbo","doi":"10.1080/15305058.2019.1632316","DOIUrl":"https://doi.org/10.1080/15305058.2019.1632316","url":null,"abstract":"This paper investigates measurement invariance as it relates to migration background using the Program for International Student Assessment measure of social belonging. We explore how the use of two measurement invariance techniques provide insights into differential item functioning using the alignment method in conjunction with logistic regression in the case of multiple group comparisons. Social belonging is a central human need, and we argue that immigration background is important factor when considering how an individual interacts with a survey/items about belonging. Overall results from both the alignment method and ordinal logistic regression, interpreted through a diffractive lens, suggest that it is inappropriate to treat peoples of four different immigration backgrounds within the countries analyzed as exchangeable groups.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1632316","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44180342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Dynamic Multistage Testing: A Highly Efficient and Regulated Adaptive Testing Method 动态多阶段测试:一种高效调节的自适应测试方法
IF 1.7 Q1 Social Sciences Pub Date : 2019-07-03 DOI: 10.1080/15305058.2019.1621871
Xiao Luo, Xinrui Wang
This study introduced dynamic multistage testing (dy-MST) as an improvement to existing adaptive testing methods. dy-MST combines the advantages of computerized adaptive testing (CAT) and computerized adaptive multistage testing (ca-MST) to create a highly efficient and regulated adaptive testing method. In the test construction phase, multistage panels are assembled using similar design principles and assembly techniques with ca-MST. In the administration phase, items are adaptively administered from a dynamic interim pool. A large-scale simulation study was conducted to evaluate the merits of dy-MST, and it found that dy-MST significantly reduced test length while maintaining the identical classification accuracy with the full-length tests and meeting all content requirements effectively. Psychometrically, the testing efficiency in dy-MST was comparable to CAT. Operationally, dy-MST allows for holistic pre-administration management of test content directly at the test level. Thus, dy-MST is deemed appropriate for delivering adaptive tests with high efficiency and well-controlled content.
本研究引入动态多级测试(dynamic multi - stage testing, dy-MST)作为现有自适应测试方法的改进。dy-MST结合了计算机化自适应测试(CAT)和计算机化自适应多阶段测试(ca-MST)的优点,创造了一种高效、规范的自适应测试方法。在测试施工阶段,多级面板使用与ca-MST相似的设计原则和组装技术进行组装。在管理阶段,从动态临时池自适应地管理项目。通过大规模的仿真研究,对dy-MST的优点进行了评价,发现dy-MST在保持与全长测试相同的分类精度的同时,显著缩短了测试长度,有效地满足了所有内容要求。在心理测量学上,dy-MST的测试效率与CAT相当。从操作上讲,dy-MST允许在考试阶段直接对考试内容进行全面的预管理。因此,dy-MST被认为适合于提供具有高效率和良好控制内容的自适应测试。
{"title":"Dynamic Multistage Testing: A Highly Efficient and Regulated Adaptive Testing Method","authors":"Xiao Luo, Xinrui Wang","doi":"10.1080/15305058.2019.1621871","DOIUrl":"https://doi.org/10.1080/15305058.2019.1621871","url":null,"abstract":"This study introduced dynamic multistage testing (dy-MST) as an improvement to existing adaptive testing methods. dy-MST combines the advantages of computerized adaptive testing (CAT) and computerized adaptive multistage testing (ca-MST) to create a highly efficient and regulated adaptive testing method. In the test construction phase, multistage panels are assembled using similar design principles and assembly techniques with ca-MST. In the administration phase, items are adaptively administered from a dynamic interim pool. A large-scale simulation study was conducted to evaluate the merits of dy-MST, and it found that dy-MST significantly reduced test length while maintaining the identical classification accuracy with the full-length tests and meeting all content requirements effectively. Psychometrically, the testing efficiency in dy-MST was comparable to CAT. Operationally, dy-MST allows for holistic pre-administration management of test content directly at the test level. Thus, dy-MST is deemed appropriate for delivering adaptive tests with high efficiency and well-controlled content.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1621871","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48949313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Comparison of Methods for Detecting Examinee Preknowledge of Items 检测考生对项目预知识的方法比较
IF 1.7 Q1 Social Sciences Pub Date : 2019-07-03 DOI: 10.1080/15305058.2019.1610886
Xi Wang, Yang Liu, F. Robin, Hongwen Guo
In an on-demand testing program, some items are repeatedly used across test administrations. This poses a risk to test security. In this study, we considered a scenario wherein a test was divided into two subsets: one consisting of secure items and the other consisting of possibly compromised items. In a simulation study of multistage adaptive testing, we used three methods to detect item preknowledge: a predictive checking method (PCM), a likelihood ratio test (LRT), and an adapted Kullback–Leibler divergence (KLD-A) test. We manipulated four factors: the proportion of compromised items, the stage of adaptive testing at which preknowledge was present, item-parameter estimation error, and the information contained in secure items. The type I error results indicated that the LRT and PCM methods are favored over the KLD-A method because the KLD-A can experience large inflated type I error in many conditions. In regard to power, the LRT and PCM methods displayed a wide range of results, generally from 0.2 to 0.8, depending on the amount of preknowledge and the stage of adaptive testing at which the preknowledge was present.
在按需测试程序中,有些项目会在测试管理中重复使用。这对测试安全性构成了风险。在这项研究中,我们考虑了一种场景,其中测试被分为两个子集:一个子集由安全项目组成,另一个子集可能由受损项目组成。在一项多阶段自适应测试的模拟研究中,我们使用了三种方法来检测项目先验知识:预测检验方法(PCM)、似然比测试(LRT)和自适应Kullback–Leibler散度(KLD-a)测试。我们操纵了四个因素:受损项目的比例、存在先验知识的自适应测试阶段、项目参数估计误差以及安全项目中包含的信息。I型误差结果表明,LRT和PCM方法比KLD-A方法更受青睐,因为KLD-A在许多条件下都会经历较大的I型膨胀误差。关于功率,LRT和PCM方法显示了广泛的结果,通常从0.2到0.8,这取决于预知识的数量和存在预知识的自适应测试阶段。
{"title":"A Comparison of Methods for Detecting Examinee Preknowledge of Items","authors":"Xi Wang, Yang Liu, F. Robin, Hongwen Guo","doi":"10.1080/15305058.2019.1610886","DOIUrl":"https://doi.org/10.1080/15305058.2019.1610886","url":null,"abstract":"In an on-demand testing program, some items are repeatedly used across test administrations. This poses a risk to test security. In this study, we considered a scenario wherein a test was divided into two subsets: one consisting of secure items and the other consisting of possibly compromised items. In a simulation study of multistage adaptive testing, we used three methods to detect item preknowledge: a predictive checking method (PCM), a likelihood ratio test (LRT), and an adapted Kullback–Leibler divergence (KLD-A) test. We manipulated four factors: the proportion of compromised items, the stage of adaptive testing at which preknowledge was present, item-parameter estimation error, and the information contained in secure items. The type I error results indicated that the LRT and PCM methods are favored over the KLD-A method because the KLD-A can experience large inflated type I error in many conditions. In regard to power, the LRT and PCM methods displayed a wide range of results, generally from 0.2 to 0.8, depending on the amount of preknowledge and the stage of adaptive testing at which the preknowledge was present.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1610886","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45510975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Diagnostic Classification Models: Recent Developments, Practical Issues, and Prospects 诊断分类模型:最近的发展、实际问题和前景
IF 1.7 Q1 Social Sciences Pub Date : 2019-05-02 DOI: 10.1080/15305058.2019.1588278
Hamdollah Ravand, Purya Baghaei
More than three decades after their introduction, diagnostic classification models (DCM) do not seem to have been implemented in educational systems for the purposes they were devised. Most DCM research is either methodological for model development and refinement or retrofitting to existing nondiagnostic tests and, in the latter case, basically for model demonstration or constructs identification. DCMs have rarely been used to develop diagnostic assessment right from the start with the purpose of identifying individuals’ strengths and weaknesses (referred to as true applications in this study). In this article, we give an introduction to DCMs and their latest developments along with guidelines on how to proceed to employ DCMs to develop a diagnostic test or retrofit to a nondiagnostic assessment. Finally, we enumerate the reasons why we believe DCMs have not become fully operational in educational systems and suggest some advice to make their advent smooth and quick.
在诊断分类模型(DCM)被引入30多年后,它似乎并没有在教育系统中实现其设计目的。大多数DCM研究要么是模型开发和改进的方法,要么是对现有非诊断测试的改进,在后一种情况下,基本上是模型演示或构造识别。从一开始就很少使用dcm来开发诊断性评估,以确定个人的优势和劣势(在本研究中称为真实应用)。在本文中,我们将介绍dcm及其最新发展,以及如何继续使用dcm开发诊断测试或改造为非诊断评估的指南。最后,我们列举了我们认为dcm尚未在教育系统中全面运作的原因,并提出了一些建议,使其顺利、快速地出现。
{"title":"Diagnostic Classification Models: Recent Developments, Practical Issues, and Prospects","authors":"Hamdollah Ravand, Purya Baghaei","doi":"10.1080/15305058.2019.1588278","DOIUrl":"https://doi.org/10.1080/15305058.2019.1588278","url":null,"abstract":"More than three decades after their introduction, diagnostic classification models (DCM) do not seem to have been implemented in educational systems for the purposes they were devised. Most DCM research is either methodological for model development and refinement or retrofitting to existing nondiagnostic tests and, in the latter case, basically for model demonstration or constructs identification. DCMs have rarely been used to develop diagnostic assessment right from the start with the purpose of identifying individuals’ strengths and weaknesses (referred to as true applications in this study). In this article, we give an introduction to DCMs and their latest developments along with guidelines on how to proceed to employ DCMs to develop a diagnostic test or retrofit to a nondiagnostic assessment. Finally, we enumerate the reasons why we believe DCMs have not become fully operational in educational systems and suggest some advice to make their advent smooth and quick.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1588278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44424368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Leveraging Evidence-Centered Design to Develop Assessments of Computational Thinking Practices 利用以证据为中心的设计开发计算思维实践评估
IF 1.7 Q1 Social Sciences Pub Date : 2019-04-03 DOI: 10.1080/15305058.2018.1543311
E. Snow, Daisy W. Rutstein, Satabdi Basu, M. Bienkowski, H. Everson
Computational thinking is a core skill in computer science that has become a focus of instruction in primary and secondary education worldwide. Since 2010, researchers have leveraged Evidence-Centered Design (ECD) methods to develop measures of students’ Computational Thinking (CT) practices. This article describes how ECD was used to develop CT assessments for primary students in Hong Kong and secondary students in the United States. We demonstrate how leveraging ECD yields a principled design for developing assessments of hard-to-assess constructs and, as part of the process, creates reusable artifacts—design patterns and task templates—that inform the design of other, related assessments. Leveraging ECD, as described in this article, represents a principled approach to measuring students’ computational thinking practices, and situates the approach in emerging computational thinking curricula and programs to emphasize the links between curricula and assessment design.
计算思维是计算机科学的核心技能,已成为世界中小学教育的重点。自2010年以来,研究人员利用以证据为中心的设计(ECD)方法来制定学生计算思维(CT)实践的衡量标准。本文介绍了ECD如何用于香港小学生和美国中学生的CT评估。我们展示了利用ECD如何产生一种原则性设计,用于开发难以评估的结构的评估,并作为过程的一部分,创建可重复使用的工件——设计模式和任务模板——为其他相关评估的设计提供信息。如本文所述,利用ECD代表了一种衡量学生计算思维实践的原则性方法,并将该方法置于新兴的计算思维课程和项目中,以强调课程和评估设计之间的联系。
{"title":"Leveraging Evidence-Centered Design to Develop Assessments of Computational Thinking Practices","authors":"E. Snow, Daisy W. Rutstein, Satabdi Basu, M. Bienkowski, H. Everson","doi":"10.1080/15305058.2018.1543311","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543311","url":null,"abstract":"Computational thinking is a core skill in computer science that has become a focus of instruction in primary and secondary education worldwide. Since 2010, researchers have leveraged Evidence-Centered Design (ECD) methods to develop measures of students’ Computational Thinking (CT) practices. This article describes how ECD was used to develop CT assessments for primary students in Hong Kong and secondary students in the United States. We demonstrate how leveraging ECD yields a principled design for developing assessments of hard-to-assess constructs and, as part of the process, creates reusable artifacts—design patterns and task templates—that inform the design of other, related assessments. Leveraging ECD, as described in this article, represents a principled approach to measuring students’ computational thinking practices, and situates the approach in emerging computational thinking curricula and programs to emphasize the links between curricula and assessment design.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543311","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41919857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Using Performance Tasks within Simulated Environments to Assess Teachers’ Ability to Engage in Coordinated, Accumulated, and Dynamic (CAD) Competencies 在模拟环境中使用绩效任务来评估教师参与协调、积累和动态(CAD)能力的能力
IF 1.7 Q1 Social Sciences Pub Date : 2019-04-03 DOI: 10.1080/15305058.2018.1551223
Jamie N. Mikeska, Heather Howell, C. Straub
The demand for assessments of competencies that require complex human interaction is steadily growing as we move toward a focus on twenty-first century skills. As assessment designers aim to address this demand, we argue for the importance of a common language to understand and attend to the key challenges implicated in designing task situations to assess such competencies. We offer the descriptors coordinated, accumulated, and dynamic (CAD) as a way of understanding the nature of these competencies and the considerations involved in measuring them. We use an example performance task designed to measure teacher competency in leading an argumentation-focused discussion in elementary science to illustrate what we mean by the coordinated, accumulated, and dynamic nature of this construct and the challenges assessment designers face when developing performance tasks to measure this construct. Our work is unique in that we designed these performance tasks to be deployed within a digital simulated classroom environment that includes simulated students controlled by a human agent, known as the simulation specialist. We illustrate what we mean by these three descriptors and discuss how we addressed various considerations in our task design to assess elementary science teachers’ ability to facilitate argumentation-focused discussions.
随着我们逐渐关注21世纪的技能,对需要复杂人际互动的能力评估的需求正在稳步增长。由于评估设计者的目标是解决这一需求,我们认为,在设计评估这些能力的任务情境时,一种通用语言对于理解和关注关键挑战的重要性。我们提供了协调的、累积的和动态的(CAD)描述符,作为理解这些能力的本质和测量它们所涉及的考虑的一种方式。我们使用了一个绩效任务的例子来衡量教师在基础科学中领导以论证为中心的讨论的能力,以说明我们所说的这种结构的协调性、积累性和动态性的含义,以及评估设计者在开发绩效任务来衡量这种结构时面临的挑战。我们的工作是独一无二的,因为我们设计了这些性能任务,部署在数字模拟教室环境中,其中包括由人类代理(称为模拟专家)控制的模拟学生。我们将说明这三个描述符的含义,并讨论我们如何在任务设计中处理各种考虑因素,以评估小学科学教师促进以论证为中心的讨论的能力。
{"title":"Using Performance Tasks within Simulated Environments to Assess Teachers’ Ability to Engage in Coordinated, Accumulated, and Dynamic (CAD) Competencies","authors":"Jamie N. Mikeska, Heather Howell, C. Straub","doi":"10.1080/15305058.2018.1551223","DOIUrl":"https://doi.org/10.1080/15305058.2018.1551223","url":null,"abstract":"The demand for assessments of competencies that require complex human interaction is steadily growing as we move toward a focus on twenty-first century skills. As assessment designers aim to address this demand, we argue for the importance of a common language to understand and attend to the key challenges implicated in designing task situations to assess such competencies. We offer the descriptors coordinated, accumulated, and dynamic (CAD) as a way of understanding the nature of these competencies and the considerations involved in measuring them. We use an example performance task designed to measure teacher competency in leading an argumentation-focused discussion in elementary science to illustrate what we mean by the coordinated, accumulated, and dynamic nature of this construct and the challenges assessment designers face when developing performance tasks to measure this construct. Our work is unique in that we designed these performance tasks to be deployed within a digital simulated classroom environment that includes simulated students controlled by a human agent, known as the simulation specialist. We illustrate what we mean by these three descriptors and discuss how we addressed various considerations in our task design to assess elementary science teachers’ ability to facilitate argumentation-focused discussions.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1551223","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45227236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Introduction to “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21st Century Skills’” Special Issue “设计‘21世纪新一代技能评估’的挑战与机遇”特刊简介
IF 1.7 Q1 Social Sciences Pub Date : 2019-04-03 DOI: 10.1080/15305058.2019.1608551
M. Oliveri, R. Mislevy
We are pleased to introduce this special issue of the International Journal of Testing (IJT), on the theme “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21 Century Skills.’” Our call elicited manuscripts related to evidence-based models or tools that facilitate the scalability of the design, development, and implementation of new forms of assessment. The articles sought to address topics beyond familiar tools and processes, such as automated scoring, in order to consider issues focusing on assessment architecture and assessment engineering models, with simulated learning and performance contexts, new item types, and steps taken to ensure reliability and validity. The issue’s aims are to enrich our understanding of what has worked well, why, and lessons learned, in order to strengthen future conceptualization and design of next-generation assessments (NGAs). We received a number of submissions, which do just that. The five pieces that constitute this issue were selected not only for their individual contributions but also because collectively, they illustrate broader principles and complement each other in their emphases. The articles illustrate lessons learned in current applications and provide insights to guide implementation in future extensions. Next, we offer thoughts on the challenges and opportunities stated in the call and the role of principled frameworks for the design of NGAs. A good place to begin a discussion of assessment design is Messick’s (1994) three-sentence description of the backbone of the underlying assessment argument:
我们很高兴地介绍本期《国际测试杂志》(IJT)的特刊,主题是“设计下一代21世纪技能评估的挑战与机遇”。我们的呼吁引发了与基于证据的模型或工具相关的手稿,这些模型或工具促进了新形式评估的设计、开发和实施的可扩展性。文章试图讨论熟悉的工具和过程之外的主题,例如自动评分,以便考虑关注评估体系结构和评估工程模型的问题,具有模拟学习和性能上下文、新项目类型以及确保可靠性和有效性所采取的步骤。本期的目的是丰富我们对行之有效的方法、原因和经验教训的理解,以加强未来下一代评估(NGAs)的概念化和设计。我们收到了很多这样的建议。选择构成本期的五篇文章不仅是因为它们各自的贡献,而且还因为它们共同说明了更广泛的原则,并在重点方面相辅相成。这些文章阐述了从当前应用程序中吸取的经验教训,并提供了指导未来扩展实现的见解。接下来,我们提出了对呼吁中提出的挑战和机遇的看法,以及NGAs设计原则框架的作用。开始讨论评估设计的一个好地方是Messick(1994)对潜在评估论点的支柱的三句话描述:
{"title":"Introduction to “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21st Century Skills’” Special Issue","authors":"M. Oliveri, R. Mislevy","doi":"10.1080/15305058.2019.1608551","DOIUrl":"https://doi.org/10.1080/15305058.2019.1608551","url":null,"abstract":"We are pleased to introduce this special issue of the International Journal of Testing (IJT), on the theme “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21 Century Skills.’” Our call elicited manuscripts related to evidence-based models or tools that facilitate the scalability of the design, development, and implementation of new forms of assessment. The articles sought to address topics beyond familiar tools and processes, such as automated scoring, in order to consider issues focusing on assessment architecture and assessment engineering models, with simulated learning and performance contexts, new item types, and steps taken to ensure reliability and validity. The issue’s aims are to enrich our understanding of what has worked well, why, and lessons learned, in order to strengthen future conceptualization and design of next-generation assessments (NGAs). We received a number of submissions, which do just that. The five pieces that constitute this issue were selected not only for their individual contributions but also because collectively, they illustrate broader principles and complement each other in their emphases. The articles illustrate lessons learned in current applications and provide insights to guide implementation in future extensions. Next, we offer thoughts on the challenges and opportunities stated in the call and the role of principled frameworks for the design of NGAs. A good place to begin a discussion of assessment design is Messick’s (1994) three-sentence description of the backbone of the underlying assessment argument:","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1608551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46381175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Use of Evidence-Centered Design to Develop Learning Maps-Based Assessments 使用以证据为中心的设计开发基于学习地图的评估
IF 1.7 Q1 Social Sciences Pub Date : 2019-04-03 DOI: 10.1080/15305058.2018.1543310
A. Clark, Meagan Karvonen
Evidence-based approaches to assessment design, development, and administration provide a strong foundation for an assessment’s validity argument but can be time consuming, resource intensive, and complex to implement. This article describes an evidence-based approach used for one assessment that addresses these challenges. Evidence-centered design principles were applied to create a task template to support test development for a new, instructionally embedded, large-scale alternate assessment system used for accountability purposes in 18 US states. Example evidence from the validity argument is presented to evaluate the effectiveness of the template as an evidence-based method for test development. Lessons learned, including strengths and challenges, are shared to inform test-development efforts for other programs.
基于证据的评估设计、开发和管理方法为评估的有效性论证提供了坚实的基础,但可能会耗费时间、资源密集,并且实现起来很复杂。本文描述了一种基于证据的方法,用于解决这些挑战的评估。以证据为中心的设计原则被应用于创建一个任务模板,以支持在美国18个州用于问责目的的新的、教学嵌入式的大规模替代评估系统的测试开发。给出了有效性论证的实例证据,以评估模板作为测试开发的循证方法的有效性。所获得的经验教训,包括优势和挑战,将被共享,以通知其他程序的测试开发工作。
{"title":"Use of Evidence-Centered Design to Develop Learning Maps-Based Assessments","authors":"A. Clark, Meagan Karvonen","doi":"10.1080/15305058.2018.1543310","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543310","url":null,"abstract":"Evidence-based approaches to assessment design, development, and administration provide a strong foundation for an assessment’s validity argument but can be time consuming, resource intensive, and complex to implement. This article describes an evidence-based approach used for one assessment that addresses these challenges. Evidence-centered design principles were applied to create a task template to support test development for a new, instructionally embedded, large-scale alternate assessment system used for accountability purposes in 18 US states. Example evidence from the validity argument is presented to evaluate the effectiveness of the template as an evidence-based method for test development. Lessons learned, including strengths and challenges, are shared to inform test-development efforts for other programs.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41489319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Application of Ontologies for Assessing Collaborative Problem Solving Skills 本体在评估协作解决问题能力中的应用
IF 1.7 Q1 Social Sciences Pub Date : 2019-04-03 DOI: 10.1080/15305058.2019.1573823
Jessica Andrews-Todd, Deirdre Kerr
Abstract Collaborative problem solving (CPS) has been deemed a critical twenty-first century competency for a variety of contexts. However, less attention has been given to work aimed at the assessment and acquisition of such capabilities. Recently large scale efforts have been devoted toward assessing CPS skills, but there are no agreed upon guiding principles for assessment of this complex construct, particularly for assessment in digital performance situations. There are notable challenges in conceptualizing the complex construct and extracting evidence of CPS skills from large streams of data in digital contexts such as games and simulations. In the current paper, we discuss how the in-task assessment framework (I-TAF), a framework informed by evidence-centered design, can provide guiding principles for the assessment of CPS in these contexts. We give specific attention to one aspect of I-TAF, ontologies, and describe how they can be used to instantiate the student model in evidence-centered design which lays out what we wish to measure in a principled way. We further discuss how ontologies can serve as an anchor representation for other components of assessment such as scoring rubrics, evidence identification, and task design.
摘要协作解决问题(CPS)被认为是二十一世纪各种环境下的关键能力。然而,对旨在评估和获得这种能力的工作给予的注意较少。最近,大规模的工作致力于评估CPS技能,但是对于评估这种复杂结构,特别是在数字性能情况下的评估,没有达成一致的指导原则。在概念化复杂的结构和从游戏和模拟等数字环境中的大量数据流中提取CPS技能的证据方面存在显著的挑战。在本文中,我们讨论了任务内评估框架(I-TAF),一个以证据为中心的设计为基础的框架,如何为这些背景下的CPS评估提供指导原则。我们特别关注I-TAF的一个方面,本体,并描述了如何使用它们在以证据为中心的设计中实例化学生模型,该设计列出了我们希望以原则的方式测量的内容。我们进一步讨论了本体如何作为其他评估组件(如评分标准、证据识别和任务设计)的锚表示。
{"title":"Application of Ontologies for Assessing Collaborative Problem Solving Skills","authors":"Jessica Andrews-Todd, Deirdre Kerr","doi":"10.1080/15305058.2019.1573823","DOIUrl":"https://doi.org/10.1080/15305058.2019.1573823","url":null,"abstract":"Abstract Collaborative problem solving (CPS) has been deemed a critical twenty-first century competency for a variety of contexts. However, less attention has been given to work aimed at the assessment and acquisition of such capabilities. Recently large scale efforts have been devoted toward assessing CPS skills, but there are no agreed upon guiding principles for assessment of this complex construct, particularly for assessment in digital performance situations. There are notable challenges in conceptualizing the complex construct and extracting evidence of CPS skills from large streams of data in digital contexts such as games and simulations. In the current paper, we discuss how the in-task assessment framework (I-TAF), a framework informed by evidence-centered design, can provide guiding principles for the assessment of CPS in these contexts. We give specific attention to one aspect of I-TAF, ontologies, and describe how they can be used to instantiate the student model in evidence-centered design which lays out what we wish to measure in a principled way. We further discuss how ontologies can serve as an anchor representation for other components of assessment such as scoring rubrics, evidence identification, and task design.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1573823","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46252638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Evaluating a Technology-Based Assessment (TBA) to Measure Teachers’ Action-Related and Reflective Skills 评估基于技术的评估(TBA),以衡量教师的行动相关和反思技能
IF 1.7 Q1 Social Sciences Pub Date : 2019-04-03 DOI: 10.1080/15305058.2019.1586377
O. Zlatkin‐Troitschanskaia, Christiane Kuhn, S. Brückner, Jacqueline P. Leighton
Teaching performance can be assessed validly only if the assessment involves an appropriate, authentic representation of real-life teaching practices. Different skills interact in coordinating teachers’ actions in different classroom situations. Based on the evidence-centered design model, we developed a technology-based assessment framework that enables differentiation between two essential teaching actions: action-related skills and reflective skills. Action-related skills are necessary to handle specific subject-related situations during instruction. Reflective skills are necessary to prepare and evaluate specific situations in pre- and postinstructional phases. In this article, we present the newly developed technology-based assessment to validly measure teaching performance, and we discuss validity evidence from cognitive interviews with teachers (novices and experts) using the think-aloud method, which indicates that the test takers’ respective mental processes when solving action-related skills tasks are consistent with the theoretically assumed knowledge and skill components and depend on the different levels of teaching expertise.
只有当评估涉及真实、恰当的教学实践时,才能有效地评估教学表现。在不同的课堂情境中,不同的技能在协调教师的行动方面相互作用。基于以证据为中心的设计模型,我们开发了一个基于技术的评估框架,可以区分两种基本的教学行动:行动相关技能和反思技能。在教学过程中,动作相关技能是处理特定主题相关情况所必需的。反思技能对于在教学前和教学后阶段准备和评估具体情况是必要的。在这篇文章中,我们提出了新开发的基于技术的评估来有效地衡量教学表现,并使用大声思考的方法讨论了来自教师(新手和专家)认知访谈的有效性证据,这表明考生在解决与动作相关的技能任务时的心理过程与理论假设的知识和技能成分一致,并取决于不同水平的教学专业知识。
{"title":"Evaluating a Technology-Based Assessment (TBA) to Measure Teachers’ Action-Related and Reflective Skills","authors":"O. Zlatkin‐Troitschanskaia, Christiane Kuhn, S. Brückner, Jacqueline P. Leighton","doi":"10.1080/15305058.2019.1586377","DOIUrl":"https://doi.org/10.1080/15305058.2019.1586377","url":null,"abstract":"Teaching performance can be assessed validly only if the assessment involves an appropriate, authentic representation of real-life teaching practices. Different skills interact in coordinating teachers’ actions in different classroom situations. Based on the evidence-centered design model, we developed a technology-based assessment framework that enables differentiation between two essential teaching actions: action-related skills and reflective skills. Action-related skills are necessary to handle specific subject-related situations during instruction. Reflective skills are necessary to prepare and evaluate specific situations in pre- and postinstructional phases. In this article, we present the newly developed technology-based assessment to validly measure teaching performance, and we discuss validity evidence from cognitive interviews with teachers (novices and experts) using the think-aloud method, which indicates that the test takers’ respective mental processes when solving action-related skills tasks are consistent with the theoretically assumed knowledge and skill components and depend on the different levels of teaching expertise.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1586377","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49063537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
International Journal of Testing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1