Pub Date : 2019-07-03DOI: 10.1080/15305058.2019.1610886
Xi Wang, Yang Liu, F. Robin, Hongwen Guo
In an on-demand testing program, some items are repeatedly used across test administrations. This poses a risk to test security. In this study, we considered a scenario wherein a test was divided into two subsets: one consisting of secure items and the other consisting of possibly compromised items. In a simulation study of multistage adaptive testing, we used three methods to detect item preknowledge: a predictive checking method (PCM), a likelihood ratio test (LRT), and an adapted Kullback–Leibler divergence (KLD-A) test. We manipulated four factors: the proportion of compromised items, the stage of adaptive testing at which preknowledge was present, item-parameter estimation error, and the information contained in secure items. The type I error results indicated that the LRT and PCM methods are favored over the KLD-A method because the KLD-A can experience large inflated type I error in many conditions. In regard to power, the LRT and PCM methods displayed a wide range of results, generally from 0.2 to 0.8, depending on the amount of preknowledge and the stage of adaptive testing at which the preknowledge was present.
{"title":"A Comparison of Methods for Detecting Examinee Preknowledge of Items","authors":"Xi Wang, Yang Liu, F. Robin, Hongwen Guo","doi":"10.1080/15305058.2019.1610886","DOIUrl":"https://doi.org/10.1080/15305058.2019.1610886","url":null,"abstract":"In an on-demand testing program, some items are repeatedly used across test administrations. This poses a risk to test security. In this study, we considered a scenario wherein a test was divided into two subsets: one consisting of secure items and the other consisting of possibly compromised items. In a simulation study of multistage adaptive testing, we used three methods to detect item preknowledge: a predictive checking method (PCM), a likelihood ratio test (LRT), and an adapted Kullback–Leibler divergence (KLD-A) test. We manipulated four factors: the proportion of compromised items, the stage of adaptive testing at which preknowledge was present, item-parameter estimation error, and the information contained in secure items. The type I error results indicated that the LRT and PCM methods are favored over the KLD-A method because the KLD-A can experience large inflated type I error in many conditions. In regard to power, the LRT and PCM methods displayed a wide range of results, generally from 0.2 to 0.8, depending on the amount of preknowledge and the stage of adaptive testing at which the preknowledge was present.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"207 - 226"},"PeriodicalIF":1.7,"publicationDate":"2019-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1610886","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45510975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-02DOI: 10.1080/15305058.2019.1588278
Hamdollah Ravand, Purya Baghaei
More than three decades after their introduction, diagnostic classification models (DCM) do not seem to have been implemented in educational systems for the purposes they were devised. Most DCM research is either methodological for model development and refinement or retrofitting to existing nondiagnostic tests and, in the latter case, basically for model demonstration or constructs identification. DCMs have rarely been used to develop diagnostic assessment right from the start with the purpose of identifying individuals’ strengths and weaknesses (referred to as true applications in this study). In this article, we give an introduction to DCMs and their latest developments along with guidelines on how to proceed to employ DCMs to develop a diagnostic test or retrofit to a nondiagnostic assessment. Finally, we enumerate the reasons why we believe DCMs have not become fully operational in educational systems and suggest some advice to make their advent smooth and quick.
{"title":"Diagnostic Classification Models: Recent Developments, Practical Issues, and Prospects","authors":"Hamdollah Ravand, Purya Baghaei","doi":"10.1080/15305058.2019.1588278","DOIUrl":"https://doi.org/10.1080/15305058.2019.1588278","url":null,"abstract":"More than three decades after their introduction, diagnostic classification models (DCM) do not seem to have been implemented in educational systems for the purposes they were devised. Most DCM research is either methodological for model development and refinement or retrofitting to existing nondiagnostic tests and, in the latter case, basically for model demonstration or constructs identification. DCMs have rarely been used to develop diagnostic assessment right from the start with the purpose of identifying individuals’ strengths and weaknesses (referred to as true applications in this study). In this article, we give an introduction to DCMs and their latest developments along with guidelines on how to proceed to employ DCMs to develop a diagnostic test or retrofit to a nondiagnostic assessment. Finally, we enumerate the reasons why we believe DCMs have not become fully operational in educational systems and suggest some advice to make their advent smooth and quick.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"20 1","pages":"24 - 56"},"PeriodicalIF":1.7,"publicationDate":"2019-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1588278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44424368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-03DOI: 10.1080/15305058.2018.1543311
E. Snow, Daisy W. Rutstein, Satabdi Basu, M. Bienkowski, H. Everson
Computational thinking is a core skill in computer science that has become a focus of instruction in primary and secondary education worldwide. Since 2010, researchers have leveraged Evidence-Centered Design (ECD) methods to develop measures of students’ Computational Thinking (CT) practices. This article describes how ECD was used to develop CT assessments for primary students in Hong Kong and secondary students in the United States. We demonstrate how leveraging ECD yields a principled design for developing assessments of hard-to-assess constructs and, as part of the process, creates reusable artifacts—design patterns and task templates—that inform the design of other, related assessments. Leveraging ECD, as described in this article, represents a principled approach to measuring students’ computational thinking practices, and situates the approach in emerging computational thinking curricula and programs to emphasize the links between curricula and assessment design.
{"title":"Leveraging Evidence-Centered Design to Develop Assessments of Computational Thinking Practices","authors":"E. Snow, Daisy W. Rutstein, Satabdi Basu, M. Bienkowski, H. Everson","doi":"10.1080/15305058.2018.1543311","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543311","url":null,"abstract":"Computational thinking is a core skill in computer science that has become a focus of instruction in primary and secondary education worldwide. Since 2010, researchers have leveraged Evidence-Centered Design (ECD) methods to develop measures of students’ Computational Thinking (CT) practices. This article describes how ECD was used to develop CT assessments for primary students in Hong Kong and secondary students in the United States. We demonstrate how leveraging ECD yields a principled design for developing assessments of hard-to-assess constructs and, as part of the process, creates reusable artifacts—design patterns and task templates—that inform the design of other, related assessments. Leveraging ECD, as described in this article, represents a principled approach to measuring students’ computational thinking practices, and situates the approach in emerging computational thinking curricula and programs to emphasize the links between curricula and assessment design.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"103 - 127"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543311","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41919857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-03DOI: 10.1080/15305058.2018.1551223
Jamie N. Mikeska, Heather Howell, C. Straub
The demand for assessments of competencies that require complex human interaction is steadily growing as we move toward a focus on twenty-first century skills. As assessment designers aim to address this demand, we argue for the importance of a common language to understand and attend to the key challenges implicated in designing task situations to assess such competencies. We offer the descriptors coordinated, accumulated, and dynamic (CAD) as a way of understanding the nature of these competencies and the considerations involved in measuring them. We use an example performance task designed to measure teacher competency in leading an argumentation-focused discussion in elementary science to illustrate what we mean by the coordinated, accumulated, and dynamic nature of this construct and the challenges assessment designers face when developing performance tasks to measure this construct. Our work is unique in that we designed these performance tasks to be deployed within a digital simulated classroom environment that includes simulated students controlled by a human agent, known as the simulation specialist. We illustrate what we mean by these three descriptors and discuss how we addressed various considerations in our task design to assess elementary science teachers’ ability to facilitate argumentation-focused discussions.
{"title":"Using Performance Tasks within Simulated Environments to Assess Teachers’ Ability to Engage in Coordinated, Accumulated, and Dynamic (CAD) Competencies","authors":"Jamie N. Mikeska, Heather Howell, C. Straub","doi":"10.1080/15305058.2018.1551223","DOIUrl":"https://doi.org/10.1080/15305058.2018.1551223","url":null,"abstract":"The demand for assessments of competencies that require complex human interaction is steadily growing as we move toward a focus on twenty-first century skills. As assessment designers aim to address this demand, we argue for the importance of a common language to understand and attend to the key challenges implicated in designing task situations to assess such competencies. We offer the descriptors coordinated, accumulated, and dynamic (CAD) as a way of understanding the nature of these competencies and the considerations involved in measuring them. We use an example performance task designed to measure teacher competency in leading an argumentation-focused discussion in elementary science to illustrate what we mean by the coordinated, accumulated, and dynamic nature of this construct and the challenges assessment designers face when developing performance tasks to measure this construct. Our work is unique in that we designed these performance tasks to be deployed within a digital simulated classroom environment that includes simulated students controlled by a human agent, known as the simulation specialist. We illustrate what we mean by these three descriptors and discuss how we addressed various considerations in our task design to assess elementary science teachers’ ability to facilitate argumentation-focused discussions.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"128 - 147"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1551223","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45227236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-03DOI: 10.1080/15305058.2019.1608551
M. Oliveri, R. Mislevy
We are pleased to introduce this special issue of the International Journal of Testing (IJT), on the theme “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21 Century Skills.’” Our call elicited manuscripts related to evidence-based models or tools that facilitate the scalability of the design, development, and implementation of new forms of assessment. The articles sought to address topics beyond familiar tools and processes, such as automated scoring, in order to consider issues focusing on assessment architecture and assessment engineering models, with simulated learning and performance contexts, new item types, and steps taken to ensure reliability and validity. The issue’s aims are to enrich our understanding of what has worked well, why, and lessons learned, in order to strengthen future conceptualization and design of next-generation assessments (NGAs). We received a number of submissions, which do just that. The five pieces that constitute this issue were selected not only for their individual contributions but also because collectively, they illustrate broader principles and complement each other in their emphases. The articles illustrate lessons learned in current applications and provide insights to guide implementation in future extensions. Next, we offer thoughts on the challenges and opportunities stated in the call and the role of principled frameworks for the design of NGAs. A good place to begin a discussion of assessment design is Messick’s (1994) three-sentence description of the backbone of the underlying assessment argument:
{"title":"Introduction to “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21st Century Skills’” Special Issue","authors":"M. Oliveri, R. Mislevy","doi":"10.1080/15305058.2019.1608551","DOIUrl":"https://doi.org/10.1080/15305058.2019.1608551","url":null,"abstract":"We are pleased to introduce this special issue of the International Journal of Testing (IJT), on the theme “Challenges and Opportunities in the Design of ‘Next-Generation Assessments of 21 Century Skills.’” Our call elicited manuscripts related to evidence-based models or tools that facilitate the scalability of the design, development, and implementation of new forms of assessment. The articles sought to address topics beyond familiar tools and processes, such as automated scoring, in order to consider issues focusing on assessment architecture and assessment engineering models, with simulated learning and performance contexts, new item types, and steps taken to ensure reliability and validity. The issue’s aims are to enrich our understanding of what has worked well, why, and lessons learned, in order to strengthen future conceptualization and design of next-generation assessments (NGAs). We received a number of submissions, which do just that. The five pieces that constitute this issue were selected not only for their individual contributions but also because collectively, they illustrate broader principles and complement each other in their emphases. The articles illustrate lessons learned in current applications and provide insights to guide implementation in future extensions. Next, we offer thoughts on the challenges and opportunities stated in the call and the role of principled frameworks for the design of NGAs. A good place to begin a discussion of assessment design is Messick’s (1994) three-sentence description of the backbone of the underlying assessment argument:","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"102 - 97"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1608551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46381175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-03DOI: 10.1080/15305058.2018.1543310
A. Clark, Meagan Karvonen
Evidence-based approaches to assessment design, development, and administration provide a strong foundation for an assessment’s validity argument but can be time consuming, resource intensive, and complex to implement. This article describes an evidence-based approach used for one assessment that addresses these challenges. Evidence-centered design principles were applied to create a task template to support test development for a new, instructionally embedded, large-scale alternate assessment system used for accountability purposes in 18 US states. Example evidence from the validity argument is presented to evaluate the effectiveness of the template as an evidence-based method for test development. Lessons learned, including strengths and challenges, are shared to inform test-development efforts for other programs.
{"title":"Use of Evidence-Centered Design to Develop Learning Maps-Based Assessments","authors":"A. Clark, Meagan Karvonen","doi":"10.1080/15305058.2018.1543310","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543310","url":null,"abstract":"Evidence-based approaches to assessment design, development, and administration provide a strong foundation for an assessment’s validity argument but can be time consuming, resource intensive, and complex to implement. This article describes an evidence-based approach used for one assessment that addresses these challenges. Evidence-centered design principles were applied to create a task template to support test development for a new, instructionally embedded, large-scale alternate assessment system used for accountability purposes in 18 US states. Example evidence from the validity argument is presented to evaluate the effectiveness of the template as an evidence-based method for test development. Lessons learned, including strengths and challenges, are shared to inform test-development efforts for other programs.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"188 - 205"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41489319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-03DOI: 10.1080/15305058.2019.1573823
Jessica Andrews-Todd, Deirdre Kerr
Abstract Collaborative problem solving (CPS) has been deemed a critical twenty-first century competency for a variety of contexts. However, less attention has been given to work aimed at the assessment and acquisition of such capabilities. Recently large scale efforts have been devoted toward assessing CPS skills, but there are no agreed upon guiding principles for assessment of this complex construct, particularly for assessment in digital performance situations. There are notable challenges in conceptualizing the complex construct and extracting evidence of CPS skills from large streams of data in digital contexts such as games and simulations. In the current paper, we discuss how the in-task assessment framework (I-TAF), a framework informed by evidence-centered design, can provide guiding principles for the assessment of CPS in these contexts. We give specific attention to one aspect of I-TAF, ontologies, and describe how they can be used to instantiate the student model in evidence-centered design which lays out what we wish to measure in a principled way. We further discuss how ontologies can serve as an anchor representation for other components of assessment such as scoring rubrics, evidence identification, and task design.
{"title":"Application of Ontologies for Assessing Collaborative Problem Solving Skills","authors":"Jessica Andrews-Todd, Deirdre Kerr","doi":"10.1080/15305058.2019.1573823","DOIUrl":"https://doi.org/10.1080/15305058.2019.1573823","url":null,"abstract":"Abstract Collaborative problem solving (CPS) has been deemed a critical twenty-first century competency for a variety of contexts. However, less attention has been given to work aimed at the assessment and acquisition of such capabilities. Recently large scale efforts have been devoted toward assessing CPS skills, but there are no agreed upon guiding principles for assessment of this complex construct, particularly for assessment in digital performance situations. There are notable challenges in conceptualizing the complex construct and extracting evidence of CPS skills from large streams of data in digital contexts such as games and simulations. In the current paper, we discuss how the in-task assessment framework (I-TAF), a framework informed by evidence-centered design, can provide guiding principles for the assessment of CPS in these contexts. We give specific attention to one aspect of I-TAF, ontologies, and describe how they can be used to instantiate the student model in evidence-centered design which lays out what we wish to measure in a principled way. We further discuss how ontologies can serve as an anchor representation for other components of assessment such as scoring rubrics, evidence identification, and task design.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"172 - 187"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1573823","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46252638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-03DOI: 10.1080/15305058.2019.1586377
O. Zlatkin‐Troitschanskaia, Christiane Kuhn, S. Brückner, Jacqueline P. Leighton
Teaching performance can be assessed validly only if the assessment involves an appropriate, authentic representation of real-life teaching practices. Different skills interact in coordinating teachers’ actions in different classroom situations. Based on the evidence-centered design model, we developed a technology-based assessment framework that enables differentiation between two essential teaching actions: action-related skills and reflective skills. Action-related skills are necessary to handle specific subject-related situations during instruction. Reflective skills are necessary to prepare and evaluate specific situations in pre- and postinstructional phases. In this article, we present the newly developed technology-based assessment to validly measure teaching performance, and we discuss validity evidence from cognitive interviews with teachers (novices and experts) using the think-aloud method, which indicates that the test takers’ respective mental processes when solving action-related skills tasks are consistent with the theoretically assumed knowledge and skill components and depend on the different levels of teaching expertise.
{"title":"Evaluating a Technology-Based Assessment (TBA) to Measure Teachers’ Action-Related and Reflective Skills","authors":"O. Zlatkin‐Troitschanskaia, Christiane Kuhn, S. Brückner, Jacqueline P. Leighton","doi":"10.1080/15305058.2019.1586377","DOIUrl":"https://doi.org/10.1080/15305058.2019.1586377","url":null,"abstract":"Teaching performance can be assessed validly only if the assessment involves an appropriate, authentic representation of real-life teaching practices. Different skills interact in coordinating teachers’ actions in different classroom situations. Based on the evidence-centered design model, we developed a technology-based assessment framework that enables differentiation between two essential teaching actions: action-related skills and reflective skills. Action-related skills are necessary to handle specific subject-related situations during instruction. Reflective skills are necessary to prepare and evaluate specific situations in pre- and postinstructional phases. In this article, we present the newly developed technology-based assessment to validly measure teaching performance, and we discuss validity evidence from cognitive interviews with teachers (novices and experts) using the think-aloud method, which indicates that the test takers’ respective mental processes when solving action-related skills tasks are consistent with the theoretically assumed knowledge and skill components and depend on the different levels of teaching expertise.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"148 - 171"},"PeriodicalIF":1.7,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1586377","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49063537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-01-29DOI: 10.1080/15305058.2018.1543308
M. Oliveri, René Lawless, R. Mislevy
Collaborative problem solving (CPS) ranks among the top five most critical skills necessary for college graduates to meet workforce demands (Hart Research Associates, 2015). It is also deemed a critical skill for educational success (Beaver, 2013). It thus deserves more prominence in the suite of courses and subjects assessed in K-16. Such inclusion, however, presents the need for improvements in the conceptualization, design, and analysis of CPS, which challenges us to think differently about assessing the skills than the current focus given to assessing individuals’ substantive knowledge. In this article, we discuss an Evidence-Centered Design approach to assess CPS in a culturally and linguistically diverse educational environment. We demonstrate ways to consider a sociocognitive perspective to conceptualize and model possible linguistic and/or cultural differences between populations along key stages of assessment development including assessment conceptualization and design to help reduce possible construct-irrelevant differences when assessing complex constructs with diverse populations.
协作解决问题(CPS)是大学毕业生满足劳动力需求所需的五大关键技能之一(Hart Research Associates, 2015)。它也被认为是教育成功的关键技能(Beaver, 2013)。因此,在K-16评估的课程和科目中,它应该得到更突出的地位。然而,这样的纳入提出了改进CPS的概念化、设计和分析的需要,这挑战了我们对评估技能的不同思考,而不是目前对评估个人实质性知识的关注。在本文中,我们讨论了在文化和语言多样化的教育环境中评估CPS的循证设计方法。我们展示了在评估发展的关键阶段考虑社会认知视角概念化和建模人群之间可能的语言和/或文化差异的方法,包括评估概念化和设计,以帮助在评估不同人群的复杂构念时减少可能的构念无关的差异。
{"title":"Using Evidence-Centered Design to Support the Development of Culturally and Linguistically Sensitive Collaborative Problem-Solving Assessments","authors":"M. Oliveri, René Lawless, R. Mislevy","doi":"10.1080/15305058.2018.1543308","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543308","url":null,"abstract":"Collaborative problem solving (CPS) ranks among the top five most critical skills necessary for college graduates to meet workforce demands (Hart Research Associates, 2015). It is also deemed a critical skill for educational success (Beaver, 2013). It thus deserves more prominence in the suite of courses and subjects assessed in K-16. Such inclusion, however, presents the need for improvements in the conceptualization, design, and analysis of CPS, which challenges us to think differently about assessing the skills than the current focus given to assessing individuals’ substantive knowledge. In this article, we discuss an Evidence-Centered Design approach to assess CPS in a culturally and linguistically diverse educational environment. We demonstrate ways to consider a sociocognitive perspective to conceptualize and model possible linguistic and/or cultural differences between populations along key stages of assessment development including assessment conceptualization and design to help reduce possible construct-irrelevant differences when assessing complex constructs with diverse populations.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"270 - 300"},"PeriodicalIF":1.7,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543308","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44350922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-01-24DOI: 10.1080/15305058.2018.1543309
R. Shavelson, O. Zlatkin‐Troitschanskaia, K. Beck, Susanne Schmidt, Julián P. Mariño
Following employers’ criticisms and recent societal developments, policymakers and educators have called for students to develop a range of generic skills such as critical thinking (“twenty-first century skills”). So far, such skills have typically been assessed by student self-reports or with multiple-choice tests. An alternative approach is criterion-sampling measurement. This approach leads to developing performance assessments using “criterion” tasks, which are drawn from real-world situations in which students are being educated, both within and across academic or professional domains. One current project, iPAL (The international Performance Assessment of Learning), consolidates previous research and focuses on the next generation performance assessments. In this paper, we present iPAL’s assessment framework and show how it guides the development of such performance assessments, exemplify these assessments with a concrete task, and provide preliminary evidence of its reliability and validity, which allows us to draw initial implications for further test design and development.
{"title":"Assessment of University Students’ Critical Thinking: Next Generation Performance Assessment","authors":"R. Shavelson, O. Zlatkin‐Troitschanskaia, K. Beck, Susanne Schmidt, Julián P. Mariño","doi":"10.1080/15305058.2018.1543309","DOIUrl":"https://doi.org/10.1080/15305058.2018.1543309","url":null,"abstract":"Following employers’ criticisms and recent societal developments, policymakers and educators have called for students to develop a range of generic skills such as critical thinking (“twenty-first century skills”). So far, such skills have typically been assessed by student self-reports or with multiple-choice tests. An alternative approach is criterion-sampling measurement. This approach leads to developing performance assessments using “criterion” tasks, which are drawn from real-world situations in which students are being educated, both within and across academic or professional domains. One current project, iPAL (The international Performance Assessment of Learning), consolidates previous research and focuses on the next generation performance assessments. In this paper, we present iPAL’s assessment framework and show how it guides the development of such performance assessments, exemplify these assessments with a concrete task, and provide preliminary evidence of its reliability and validity, which allows us to draw initial implications for further test design and development.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"337 - 362"},"PeriodicalIF":1.7,"publicationDate":"2019-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1543309","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48194695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}