International Journal of Testing最新文献

英文中文

A Comparison of Methods for Detecting Examinee Preknowledge of Items 检测考生对项目预知识的方法比较

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-07-03 DOI: 10.1080/15305058.2019.1610886

Xi Wang, Yang Liu, F. Robin, Hongwen Guo

In an on-demand testing program, some items are repeatedly used across test administrations. This poses a risk to test security. In this study, we considered a scenario wherein a test was divided into two subsets: one consisting of secure items and the other consisting of possibly compromised items. In a simulation study of multistage adaptive testing, we used three methods to detect item preknowledge: a predictive checking method (PCM), a likelihood ratio test (LRT), and an adapted Kullback–Leibler divergence (KLD-A) test. We manipulated four factors: the proportion of compromised items, the stage of adaptive testing at which preknowledge was present, item-parameter estimation error, and the information contained in secure items. The type I error results indicated that the LRT and PCM methods are favored over the KLD-A method because the KLD-A can experience large inflated type I error in many conditions. In regard to power, the LRT and PCM methods displayed a wide range of results, generally from 0.2 to 0.8, depending on the amount of preknowledge and the stage of adaptive testing at which the preknowledge was present.

在按需测试程序中，有些项目会在测试管理中重复使用。这对测试安全性构成了风险。在这项研究中，我们考虑了一种场景，其中测试被分为两个子集：一个子集由安全项目组成，另一个子集可能由受损项目组成。在一项多阶段自适应测试的模拟研究中，我们使用了三种方法来检测项目先验知识：预测检验方法（PCM）、似然比测试（LRT）和自适应Kullback–Leibler散度（KLD-a）测试。我们操纵了四个因素：受损项目的比例、存在先验知识的自适应测试阶段、项目参数估计误差以及安全项目中包含的信息。I型误差结果表明，LRT和PCM方法比KLD-A方法更受青睐，因为KLD-A在许多条件下都会经历较大的I型膨胀误差。关于功率，LRT和PCM方法显示了广泛的结果，通常从0.2到0.8，这取决于预知识的数量和存在预知识的自适应测试阶段。

{"title":"A Comparison of Methods for Detecting Examinee Preknowledge of Items","authors":"Xi Wang, Yang Liu, F. Robin, Hongwen Guo","doi":"10.1080/15305058.2019.1610886","DOIUrl":"https://doi.org/10.1080/15305058.2019.1610886","url":null,"abstract":"In an on-demand testing program, some items are repeatedly used across test administrations. This poses a risk to test security. In this study, we considered a scenario wherein a test was divided into two subsets: one consisting of secure items and the other consisting of possibly compromised items. In a simulation study of multistage adaptive testing, we used three methods to detect item preknowledge: a predictive checking method (PCM), a likelihood ratio test (LRT), and an adapted Kullback–Leibler divergence (KLD-A) test. We manipulated four factors: the proportion of compromised items, the stage of adaptive testing at which preknowledge was present, item-parameter estimation error, and the information contained in secure items. The type I error results indicated that the LRT and PCM methods are favored over the KLD-A method because the KLD-A can experience large inflated type I error in many conditions. In regard to power, the LRT and PCM methods displayed a wide range of results, generally from 0.2 to 0.8, depending on the amount of preknowledge and the stage of adaptive testing at which the preknowledge was present.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"207 - 226"},"PeriodicalIF":1.7,"publicationDate":"2019-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1610886","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45510975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Diagnostic Classification Models: Recent Developments, Practical Issues, and Prospects 诊断分类模型:最近的发展、实际问题和前景

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-05-02 DOI: 10.1080/15305058.2019.1588278

Hamdollah Ravand, Purya Baghaei

More than three decades after their introduction, diagnostic classification models (DCM) do not seem to have been implemented in educational systems for the purposes they were devised. Most DCM research is either methodological for model development and refinement or retrofitting to existing nondiagnostic tests and, in the latter case, basically for model demonstration or constructs identification. DCMs have rarely been used to develop diagnostic assessment right from the start with the purpose of identifying individuals’ strengths and weaknesses (referred to as true applications in this study). In this article, we give an introduction to DCMs and their latest developments along with guidelines on how to proceed to employ DCMs to develop a diagnostic test or retrofit to a nondiagnostic assessment. Finally, we enumerate the reasons why we believe DCMs have not become fully operational in educational systems and suggest some advice to make their advent smooth and quick.

在诊断分类模型(DCM)被引入30多年后，它似乎并没有在教育系统中实现其设计目的。大多数DCM研究要么是模型开发和改进的方法，要么是对现有非诊断测试的改进，在后一种情况下，基本上是模型演示或构造识别。从一开始就很少使用dcm来开发诊断性评估，以确定个人的优势和劣势(在本研究中称为真实应用)。在本文中，我们将介绍dcm及其最新发展，以及如何继续使用dcm开发诊断测试或改造为非诊断评估的指南。最后，我们列举了我们认为dcm尚未在教育系统中全面运作的原因，并提出了一些建议，使其顺利、快速地出现。

引用次数: 30

Leveraging Evidence-Centered Design to Develop Assessments of Computational Thinking Practices 利用以证据为中心的设计开发计算思维实践评估

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-04-03 DOI: 10.1080/15305058.2018.1543311

E. Snow, Daisy W. Rutstein, Satabdi Basu, M. Bienkowski, H. Everson

Computational thinking is a core skill in computer science that has become a focus of instruction in primary and secondary education worldwide. Since 2010, researchers have leveraged Evidence-Centered Design (ECD) methods to develop measures of students’ Computational Thinking (CT) practices. This article describes how ECD was used to develop CT assessments for primary students in Hong Kong and secondary students in the United States. We demonstrate how leveraging ECD yields a principled design for developing assessments of hard-to-assess constructs and, as part of the process, creates reusable artifacts—design patterns and task templates—that inform the design of other, related assessments. Leveraging ECD, as described in this article, represents a principled approach to measuring students’ computational thinking practices, and situates the approach in emerging computational thinking curricula and programs to emphasize the links between curricula and assessment design.

计算思维是计算机科学的核心技能，已成为世界中小学教育的重点。自2010年以来，研究人员利用以证据为中心的设计（ECD）方法来制定学生计算思维（CT）实践的衡量标准。本文介绍了ECD如何用于香港小学生和美国中学生的CT评估。我们展示了利用ECD如何产生一种原则性设计，用于开发难以评估的结构的评估，并作为过程的一部分，创建可重复使用的工件——设计模式和任务模板——为其他相关评估的设计提供信息。如本文所述，利用ECD代表了一种衡量学生计算思维实践的原则性方法，并将该方法置于新兴的计算思维课程和项目中，以强调课程和评估设计之间的联系。

引用次数: 13

Using Performance Tasks within Simulated Environments to Assess Teachers’ Ability to Engage in Coordinated, Accumulated, and Dynamic (CAD) Competencies 在模拟环境中使用绩效任务来评估教师参与协调、积累和动态(CAD)能力的能力

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-04-03 DOI: 10.1080/15305058.2018.1551223

Jamie N. Mikeska, Heather Howell, C. Straub

The demand for assessments of competencies that require complex human interaction is steadily growing as we move toward a focus on twenty-first century skills. As assessment designers aim to address this demand, we argue for the importance of a common language to understand and attend to the key challenges implicated in designing task situations to assess such competencies. We offer the descriptors coordinated, accumulated, and dynamic (CAD) as a way of understanding the nature of these competencies and the considerations involved in measuring them. We use an example performance task designed to measure teacher competency in leading an argumentation-focused discussion in elementary science to illustrate what we mean by the coordinated, accumulated, and dynamic nature of this construct and the challenges assessment designers face when developing performance tasks to measure this construct. Our work is unique in that we designed these performance tasks to be deployed within a digital simulated classroom environment that includes simulated students controlled by a human agent, known as the simulation specialist. We illustrate what we mean by these three descriptors and discuss how we addressed various considerations in our task design to assess elementary science teachers’ ability to facilitate argumentation-focused discussions.

随着我们逐渐关注21世纪的技能，对需要复杂人际互动的能力评估的需求正在稳步增长。由于评估设计者的目标是解决这一需求，我们认为，在设计评估这些能力的任务情境时，一种通用语言对于理解和关注关键挑战的重要性。我们提供了协调的、累积的和动态的(CAD)描述符，作为理解这些能力的本质和测量它们所涉及的考虑的一种方式。我们使用了一个绩效任务的例子来衡量教师在基础科学中领导以论证为中心的讨论的能力，以说明我们所说的这种结构的协调性、积累性和动态性的含义，以及评估设计者在开发绩效任务来衡量这种结构时面临的挑战。我们的工作是独一无二的，因为我们设计了这些性能任务，部署在数字模拟教室环境中，其中包括由人类代理(称为模拟专家)控制的模拟学生。我们将说明这三个描述符的含义，并讨论我们如何在任务设计中处理各种考虑因素，以评估小学科学教师促进以论证为中心的讨论的能力。

{"title":"Using Performance Tasks within Simulated Environments to Assess Teachers’ Ability to Engage in Coordinated, Accumulated, and Dynamic (CAD) Competencies","authors":"Jamie N. Mikeska, Heather Howell, C. Straub","doi":"10.1080/15305058.2018.1551223","DOIUrl":"https://doi.org/10.1080/15305058.2018.1551223","url":null,"abstract":"The demand for assessments of competencies that require complex human interaction is steadily growing as we move toward a focus on twenty-first century skills. As assessment designers aim to address this demand, we argue for the importance of a common language to understand and attend to the key challenges implicated in designing task situations to assess such competencies. We offer the descriptors coordinated, accumulated, and dynamic (CAD) as a way of understanding the nature of these competencies and the considerations involved in measuring them. We use an example performance task designed to measure teacher competency in leading an argumentation-focused discussion in elementary science to illustrate what we mean by the coordinated, accumulated, and dynamic nature of this construct and the challenges assessment designers face when developing performance tasks to measure this construct. Our work is unique in that we designed these performance tasks to be deployed within a digital simulated classroom environment that includes simulated students controlled by a human agent, known as the simulation specialist. We illustrate what we mean by these three descriptors and discuss how we addressed various considerations in our task design to assess elementary science teachers’ ability to facilitate argumentation-focused discussions.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"128 - 147"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1551223","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45227236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Introduction to “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21st Century Skills’” Special Issue “设计‘21世纪新一代技能评估’的挑战与机遇”特刊简介

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-04-03 DOI: 10.1080/15305058.2019.1608551

M. Oliveri, R. Mislevy

We are pleased to introduce this special issue of the International Journal of Testing (IJT), on the theme “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21 Century Skills.’” Our call elicited manuscripts related to evidence-based models or tools that facilitate the scalability of the design, development, and implementation of new forms of assessment. The articles sought to address topics beyond familiar tools and processes, such as automated scoring, in order to consider issues focusing on assessment architecture and assessment engineering models, with simulated learning and performance contexts, new item types, and steps taken to ensure reliability and validity. The issue’s aims are to enrich our understanding of what has worked well, why, and lessons learned, in order to strengthen future conceptualization and design of next-generation assessments (NGAs). We received a number of submissions, which do just that. The five pieces that constitute this issue were selected not only for their individual contributions but also because collectively, they illustrate broader principles and complement each other in their emphases. The articles illustrate lessons learned in current applications and provide insights to guide implementation in future extensions. Next, we offer thoughts on the challenges and opportunities stated in the call and the role of principled frameworks for the design of NGAs. A good place to begin a discussion of assessment design is Messick’s (1994) three-sentence description of the backbone of the underlying assessment argument:

我们很高兴地介绍本期《国际测试杂志》(IJT)的特刊，主题是“设计下一代21世纪技能评估的挑战与机遇”。我们的呼吁引发了与基于证据的模型或工具相关的手稿，这些模型或工具促进了新形式评估的设计、开发和实施的可扩展性。文章试图讨论熟悉的工具和过程之外的主题，例如自动评分，以便考虑关注评估体系结构和评估工程模型的问题，具有模拟学习和性能上下文、新项目类型以及确保可靠性和有效性所采取的步骤。本期的目的是丰富我们对行之有效的方法、原因和经验教训的理解，以加强未来下一代评估(NGAs)的概念化和设计。我们收到了很多这样的建议。选择构成本期的五篇文章不仅是因为它们各自的贡献，而且还因为它们共同说明了更广泛的原则，并在重点方面相辅相成。这些文章阐述了从当前应用程序中吸取的经验教训，并提供了指导未来扩展实现的见解。接下来，我们提出了对呼吁中提出的挑战和机遇的看法，以及NGAs设计原则框架的作用。开始讨论评估设计的一个好地方是Messick(1994)对潜在评估论点的支柱的三句话描述:

{"title":"Introduction to “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21st Century Skills’” Special Issue","authors":"M. Oliveri, R. Mislevy","doi":"10.1080/15305058.2019.1608551","DOIUrl":"https://doi.org/10.1080/15305058.2019.1608551","url":null,"abstract":"We are pleased to introduce this special issue of the International Journal of Testing (IJT), on the theme “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21 Century Skills.’” Our call elicited manuscripts related to evidence-based models or tools that facilitate the scalability of the design, development, and implementation of new forms of assessment. The articles sought to address topics beyond familiar tools and processes, such as automated scoring, in order to consider issues focusing on assessment architecture and assessment engineering models, with simulated learning and performance contexts, new item types, and steps taken to ensure reliability and validity. The issue’s aims are to enrich our understanding of what has worked well, why, and lessons learned, in order to strengthen future conceptualization and design of next-generation assessments (NGAs). We received a number of submissions, which do just that. The five pieces that constitute this issue were selected not only for their individual contributions but also because collectively, they illustrate broader principles and complement each other in their emphases. The articles illustrate lessons learned in current applications and provide insights to guide implementation in future extensions. Next, we offer thoughts on the challenges and opportunities stated in the call and the role of principled frameworks for the design of NGAs. A good place to begin a discussion of assessment design is Messick’s (1994) three-sentence description of the backbone of the underlying assessment argument:","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"102 - 97"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1608551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46381175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Use of Evidence-Centered Design to Develop Learning Maps-Based Assessments 使用以证据为中心的设计开发基于学习地图的评估

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-04-03 DOI: 10.1080/15305058.2018.1543310

A. Clark, Meagan Karvonen

Evidence-based approaches to assessment design, development, and administration provide a strong foundation for an assessment’s validity argument but can be time consuming, resource intensive, and complex to implement. This article describes an evidence-based approach used for one assessment that addresses these challenges. Evidence-centered design principles were applied to create a task template to support test development for a new, instructionally embedded, large-scale alternate assessment system used for accountability purposes in 18 US states. Example evidence from the validity argument is presented to evaluate the effectiveness of the template as an evidence-based method for test development. Lessons learned, including strengths and challenges, are shared to inform test-development efforts for other programs.

基于证据的评估设计、开发和管理方法为评估的有效性论证提供了坚实的基础，但可能会耗费时间、资源密集，并且实现起来很复杂。本文描述了一种基于证据的方法，用于解决这些挑战的评估。以证据为中心的设计原则被应用于创建一个任务模板，以支持在美国18个州用于问责目的的新的、教学嵌入式的大规模替代评估系统的测试开发。给出了有效性论证的实例证据，以评估模板作为测试开发的循证方法的有效性。所获得的经验教训，包括优势和挑战，将被共享，以通知其他程序的测试开发工作。

引用次数: 7

Application of Ontologies for Assessing Collaborative Problem Solving Skills 本体在评估协作解决问题能力中的应用

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-04-03 DOI: 10.1080/15305058.2019.1573823

Jessica Andrews-Todd, Deirdre Kerr

Abstract Collaborative problem solving (CPS) has been deemed a critical twenty-first century competency for a variety of contexts. However, less attention has been given to work aimed at the assessment and acquisition of such capabilities. Recently large scale efforts have been devoted toward assessing CPS skills, but there are no agreed upon guiding principles for assessment of this complex construct, particularly for assessment in digital performance situations. There are notable challenges in conceptualizing the complex construct and extracting evidence of CPS skills from large streams of data in digital contexts such as games and simulations. In the current paper, we discuss how the in-task assessment framework (I-TAF), a framework informed by evidence-centered design, can provide guiding principles for the assessment of CPS in these contexts. We give specific attention to one aspect of I-TAF, ontologies, and describe how they can be used to instantiate the student model in evidence-centered design which lays out what we wish to measure in a principled way. We further discuss how ontologies can serve as an anchor representation for other components of assessment such as scoring rubrics, evidence identification, and task design.

摘要协作解决问题(CPS)被认为是二十一世纪各种环境下的关键能力。然而，对旨在评估和获得这种能力的工作给予的注意较少。最近，大规模的工作致力于评估CPS技能，但是对于评估这种复杂结构，特别是在数字性能情况下的评估，没有达成一致的指导原则。在概念化复杂的结构和从游戏和模拟等数字环境中的大量数据流中提取CPS技能的证据方面存在显著的挑战。在本文中，我们讨论了任务内评估框架(I-TAF)，一个以证据为中心的设计为基础的框架，如何为这些背景下的CPS评估提供指导原则。我们特别关注I-TAF的一个方面，本体，并描述了如何使用它们在以证据为中心的设计中实例化学生模型，该设计列出了我们希望以原则的方式测量的内容。我们进一步讨论了本体如何作为其他评估组件(如评分标准、证据识别和任务设计)的锚表示。

{"title":"Application of Ontologies for Assessing Collaborative Problem Solving Skills","authors":"Jessica Andrews-Todd, Deirdre Kerr","doi":"10.1080/15305058.2019.1573823","DOIUrl":"https://doi.org/10.1080/15305058.2019.1573823","url":null,"abstract":"Abstract Collaborative problem solving (CPS) has been deemed a critical twenty-first century competency for a variety of contexts. However, less attention has been given to work aimed at the assessment and acquisition of such capabilities. Recently large scale efforts have been devoted toward assessing CPS skills, but there are no agreed upon guiding principles for assessment of this complex construct, particularly for assessment in digital performance situations. There are notable challenges in conceptualizing the complex construct and extracting evidence of CPS skills from large streams of data in digital contexts such as games and simulations. In the current paper, we discuss how the in-task assessment framework (I-TAF), a framework informed by evidence-centered design, can provide guiding principles for the assessment of CPS in these contexts. We give specific attention to one aspect of I-TAF, ontologies, and describe how they can be used to instantiate the student model in evidence-centered design which lays out what we wish to measure in a principled way. We further discuss how ontologies can serve as an anchor representation for other components of assessment such as scoring rubrics, evidence identification, and task design.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"172 - 187"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1573823","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46252638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Evaluating a Technology-Based Assessment (TBA) to Measure Teachers’ Action-Related and Reflective Skills 评估基于技术的评估（TBA），以衡量教师的行动相关和反思技能

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-04-03 DOI: 10.1080/15305058.2019.1586377

O. Zlatkin‐Troitschanskaia, Christiane Kuhn, S. Brückner, Jacqueline P. Leighton

Teaching performance can be assessed validly only if the assessment involves an appropriate, authentic representation of real-life teaching practices. Different skills interact in coordinating teachers’ actions in different classroom situations. Based on the evidence-centered design model, we developed a technology-based assessment framework that enables differentiation between two essential teaching actions: action-related skills and reflective skills. Action-related skills are necessary to handle specific subject-related situations during instruction. Reflective skills are necessary to prepare and evaluate specific situations in pre- and postinstructional phases. In this article, we present the newly developed technology-based assessment to validly measure teaching performance, and we discuss validity evidence from cognitive interviews with teachers (novices and experts) using the think-aloud method, which indicates that the test takers’ respective mental processes when solving action-related skills tasks are consistent with the theoretically assumed knowledge and skill components and depend on the different levels of teaching expertise.

只有当评估涉及真实、恰当的教学实践时，才能有效地评估教学表现。在不同的课堂情境中，不同的技能在协调教师的行动方面相互作用。基于以证据为中心的设计模型，我们开发了一个基于技术的评估框架，可以区分两种基本的教学行动：行动相关技能和反思技能。在教学过程中，动作相关技能是处理特定主题相关情况所必需的。反思技能对于在教学前和教学后阶段准备和评估具体情况是必要的。在这篇文章中，我们提出了新开发的基于技术的评估来有效地衡量教学表现，并使用大声思考的方法讨论了来自教师（新手和专家）认知访谈的有效性证据，这表明考生在解决与动作相关的技能任务时的心理过程与理论假设的知识和技能成分一致，并取决于不同水平的教学专业知识。

{"title":"Evaluating a Technology-Based Assessment (TBA) to Measure Teachers’ Action-Related and Reflective Skills","authors":"O. Zlatkin‐Troitschanskaia, Christiane Kuhn, S. Brückner, Jacqueline P. Leighton","doi":"10.1080/15305058.2019.1586377","DOIUrl":"https://doi.org/10.1080/15305058.2019.1586377","url":null,"abstract":"Teaching performance can be assessed validly only if the assessment involves an appropriate, authentic representation of real-life teaching practices. Different skills interact in coordinating teachers’ actions in different classroom situations. Based on the evidence-centered design model, we developed a technology-based assessment framework that enables differentiation between two essential teaching actions: action-related skills and reflective skills. Action-related skills are necessary to handle specific subject-related situations during instruction. Reflective skills are necessary to prepare and evaluate specific situations in pre- and postinstructional phases. In this article, we present the newly developed technology-based assessment to validly measure teaching performance, and we discuss validity evidence from cognitive interviews with teachers (novices and experts) using the think-aloud method, which indicates that the test takers’ respective mental processes when solving action-related skills tasks are consistent with the theoretically assumed knowledge and skill components and depend on the different levels of teaching expertise.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"148 - 171"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1586377","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49063537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Using Evidence-Centered Design to Support the Development of Culturally and Linguistically Sensitive Collaborative Problem-Solving Assessments 使用以证据为中心的设计来支持文化和语言敏感的协作问题解决评估的发展

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-01-29 DOI: 10.1080/15305058.2018.1543308

M. Oliveri, René Lawless, R. Mislevy

Collaborative problem solving (CPS) ranks among the top five most critical skills necessary for college graduates to meet workforce demands (Hart Research Associates, 2015). It is also deemed a critical skill for educational success (Beaver, 2013). It thus deserves more prominence in the suite of courses and subjects assessed in K-16. Such inclusion, however, presents the need for improvements in the conceptualization, design, and analysis of CPS, which challenges us to think differently about assessing the skills than the current focus given to assessing individuals’ substantive knowledge. In this article, we discuss an Evidence-Centered Design approach to assess CPS in a culturally and linguistically diverse educational environment. We demonstrate ways to consider a sociocognitive perspective to conceptualize and model possible linguistic and/or cultural differences between populations along key stages of assessment development including assessment conceptualization and design to help reduce possible construct-irrelevant differences when assessing complex constructs with diverse populations.

协作解决问题(CPS)是大学毕业生满足劳动力需求所需的五大关键技能之一(Hart Research Associates, 2015)。它也被认为是教育成功的关键技能(Beaver, 2013)。因此，在K-16评估的课程和科目中，它应该得到更突出的地位。然而，这样的纳入提出了改进CPS的概念化、设计和分析的需要，这挑战了我们对评估技能的不同思考，而不是目前对评估个人实质性知识的关注。在本文中，我们讨论了在文化和语言多样化的教育环境中评估CPS的循证设计方法。我们展示了在评估发展的关键阶段考虑社会认知视角概念化和建模人群之间可能的语言和/或文化差异的方法，包括评估概念化和设计，以帮助在评估不同人群的复杂构念时减少可能的构念无关的差异。

{"title":"Using Evidence-Centered Design to Support the Development of Culturally and Linguistically Sensitive Collaborative Problem-Solving Assessments","authors":"M. Oliveri, René Lawless, R. Mislevy","doi":"10.1080/15305058.2018.1543308","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543308","url":null,"abstract":"Collaborative problem solving (CPS) ranks among the top five most critical skills necessary for college graduates to meet workforce demands (Hart Research Associates, 2015). It is also deemed a critical skill for educational success (Beaver, 2013). It thus deserves more prominence in the suite of courses and subjects assessed in K-16. Such inclusion, however, presents the need for improvements in the conceptualization, design, and analysis of CPS, which challenges us to think differently about assessing the skills than the current focus given to assessing individuals’ substantive knowledge. In this article, we discuss an Evidence-Centered Design approach to assess CPS in a culturally and linguistically diverse educational environment. We demonstrate ways to consider a sociocognitive perspective to conceptualize and model possible linguistic and/or cultural differences between populations along key stages of assessment development including assessment conceptualization and design to help reduce possible construct-irrelevant differences when assessing complex constructs with diverse populations.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"270 - 300"},"PeriodicalIF":1.7,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543308","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44350922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Assessment of University Students’ Critical Thinking: Next Generation Performance Assessment 大学生批判性思维评价：下一代绩效评价

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2019-01-24 DOI: 10.1080/15305058.2018.1543309

R. Shavelson, O. Zlatkin‐Troitschanskaia, K. Beck, Susanne Schmidt, Julián P. Mariño

Following employers’ criticisms and recent societal developments, policymakers and educators have called for students to develop a range of generic skills such as critical thinking (“twenty-first century skills”). So far, such skills have typically been assessed by student self-reports or with multiple-choice tests. An alternative approach is criterion-sampling measurement. This approach leads to developing performance assessments using “criterion” tasks, which are drawn from real-world situations in which students are being educated, both within and across academic or professional domains. One current project, iPAL (The international Performance Assessment of Learning), consolidates previous research and focuses on the next generation performance assessments. In this paper, we present iPAL’s assessment framework and show how it guides the development of such performance assessments, exemplify these assessments with a concrete task, and provide preliminary evidence of its reliability and validity, which allows us to draw initial implications for further test design and development.

在雇主的批评和最近的社会发展之后，政策制定者和教育工作者呼吁学生培养一系列通用技能，如批判性思维（“二十一世纪技能”）。到目前为止，这些技能通常是通过学生自我报告或多项选择题测试来评估的。另一种方法是标准抽样测量。这种方法导致使用“标准”任务进行绩效评估，这些任务来自学生在学术或专业领域内和跨领域接受教育的真实情况。目前的一个项目，iPAL（国际学习绩效评估），整合了以前的研究，并专注于下一代绩效评估。在本文中，我们介绍了iPAL的评估框架，并展示了它如何指导此类性能评估的开发，用具体任务举例说明这些评估，并提供了其可靠性和有效性的初步证据，这使我们能够为进一步的测试设计和开发得出初步启示。

{"title":"Assessment of University Students’ Critical Thinking: Next Generation Performance Assessment","authors":"R. Shavelson, O. Zlatkin‐Troitschanskaia, K. Beck, Susanne Schmidt, Julián P. Mariño","doi":"10.1080/15305058.2018.1543309","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543309","url":null,"abstract":"Following employers’ criticisms and recent societal developments, policymakers and educators have called for students to develop a range of generic skills such as critical thinking (“twenty-first century skills”). So far, such skills have typically been assessed by student self-reports or with multiple-choice tests. An alternative approach is criterion-sampling measurement. This approach leads to developing performance assessments using “criterion” tasks, which are drawn from real-world situations in which students are being educated, both within and across academic or professional domains. One current project, iPAL (The international Performance Assessment of Learning), consolidates previous research and focuses on the next generation performance assessments. In this paper, we present iPAL’s assessment framework and show how it guides the development of such performance assessments, exemplify these assessments with a concrete task, and provide preliminary evidence of its reliability and validity, which allows us to draw initial implications for further test design and development.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"337 - 362"},"PeriodicalIF":1.7,"publicationDate":"2019-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543309","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48194695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Testing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀