Pub Date : 2020-01-01Epub Date: 2020-05-07DOI: 10.1080/10627197.2020.1756253
Kavita L Seeratan, Kevin W McElhaney, Jessica Mislevy, Raymond McGhee, Dylan Conger, Mark C Long
We describe the conceptualization, design, development, validation, and testing of a summative instrument that measures high school students' ability to analyze and evaluate data, construct scientific explanations, and formulate scientific arguments in biology and chemistry disciplinary contexts. Data from 1,405 students were analyzed to evaluate the properties of the instrument. Student measurement separation reliability was 0.71 with items showing satisfactory fit to the Partial Credit Model. The use of the Evidence-Centered Design framework during the design and development process provided a strong foundation for the validity argument. Additional evidence for validation were also gathered. The strengths of the instrument lie in its relatively brief time for administration and a unique approach that integrates science practice and disciplinary knowledge, while simultaneously seeking to decouple their measurement. This research models how to design assessments that align to the National Research Council's framework and informs the design of Next Generation Science Standards-aligned assessments.
{"title":"Measuring Students' Ability to Engage in Scientific Inquiry: A New Instrument to Assess Data Analysis, Explanation, and Argumentation.","authors":"Kavita L Seeratan, Kevin W McElhaney, Jessica Mislevy, Raymond McGhee, Dylan Conger, Mark C Long","doi":"10.1080/10627197.2020.1756253","DOIUrl":"https://doi.org/10.1080/10627197.2020.1756253","url":null,"abstract":"<p><p>We describe the conceptualization, design, development, validation, and testing of a summative instrument that measures high school students' ability to analyze and evaluate data, construct scientific explanations, and formulate scientific arguments in biology and chemistry disciplinary contexts. Data from 1,405 students were analyzed to evaluate the properties of the instrument. Student measurement separation reliability was 0.71 with items showing satisfactory fit to the Partial Credit Model. The use of the Evidence-Centered Design framework during the design and development process provided a strong foundation for the validity argument. Additional evidence for validation were also gathered. The strengths of the instrument lie in its relatively brief time for administration and a unique approach that integrates science practice and disciplinary knowledge, while simultaneously seeking to decouple their measurement. This research models how to design assessments that align to the National Research Council's framework and informs the design of Next Generation Science Standards-aligned assessments.</p>","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"25 2","pages":"112-135"},"PeriodicalIF":1.5,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2020.1756253","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38895806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-27DOI: 10.1080/10627197.2019.1670055
M. Russell, Sebastian Moncaleano
ABSTRACT Over the past decade, large-scale testing programs have employed technology-enhanced items (TEI) to improve the fidelity with which an item measures a targeted construct. This paper presents findings from a review of released TEIs employed by large-scale testing programs worldwide. Analyses examine the prevalence with which different types of TEIs are employed and the content areas and grade levels in which they are employed. The analyses apply the Technology-Enhanced Item Utility Framework to examine the fidelity with which current TEIs represent targeted constructs. The analyses indicate that the most common type of TEI employed by testing programs is a drag-and-drop response interaction. Analyses indicate that approximately 40% of the TEIs examined provide a high-level of construct fidelity, while an approximately equal proportion provide low construct fidelity. Finally, the data indicate that a large portion of drag-and-drop items are of low fidelity while other TEI types provide moderate or high fidelity.
{"title":"Examining the Use and Construct Fidelity of Technology-Enhanced Items Employed by K-12 Testing Programs","authors":"M. Russell, Sebastian Moncaleano","doi":"10.1080/10627197.2019.1670055","DOIUrl":"https://doi.org/10.1080/10627197.2019.1670055","url":null,"abstract":"ABSTRACT Over the past decade, large-scale testing programs have employed technology-enhanced items (TEI) to improve the fidelity with which an item measures a targeted construct. This paper presents findings from a review of released TEIs employed by large-scale testing programs worldwide. Analyses examine the prevalence with which different types of TEIs are employed and the content areas and grade levels in which they are employed. The analyses apply the Technology-Enhanced Item Utility Framework to examine the fidelity with which current TEIs represent targeted constructs. The analyses indicate that the most common type of TEI employed by testing programs is a drag-and-drop response interaction. Analyses indicate that approximately 40% of the TEIs examined provide a high-level of construct fidelity, while an approximately equal proportion provide low construct fidelity. Finally, the data indicate that a large portion of drag-and-drop items are of low fidelity while other TEI types provide moderate or high fidelity.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"24 1","pages":"286 - 304"},"PeriodicalIF":1.5,"publicationDate":"2019-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2019.1670055","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49259570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-26DOI: 10.1080/10627197.2019.1670056
Christopher DeLuca, Allison E. A. Chapman-Chin, D. Klinger
ABSTRACT Over the past 15 years, assessment for learning (AfL) has emerged as a key area of teacher practice with policy mandates around the world supporting teachers’ implementation of the underlying components of this pedagogical approach. While procedural and selective implementation of AfL strategies has been observed within research (i.e., implementing the letter of AfL), promoting a spirit of AfL appears far more challenging. There is a critical need to better understand how teachers develop AfL capacity within their practice to effectively cultivate a spirit of AfL in their classrooms. The purpose of this study was to describe a learning continuum for teachers’ implementation of AfL as based on data from 88 teachers. Specifically, interview and observational data were analyzed to describe five developmental stages demarcating shifts in teachers’ conceptual understandings and enacted AfL practices. The resulting learning continuum provides an empirical foundation for responsive teacher education that facilitates teachers’ continued learning toward more meaningful AfL implementation.
{"title":"Toward a Teacher Professional Learning Continuum in Assessment for Learning","authors":"Christopher DeLuca, Allison E. A. Chapman-Chin, D. Klinger","doi":"10.1080/10627197.2019.1670056","DOIUrl":"https://doi.org/10.1080/10627197.2019.1670056","url":null,"abstract":"ABSTRACT Over the past 15 years, assessment for learning (AfL) has emerged as a key area of teacher practice with policy mandates around the world supporting teachers’ implementation of the underlying components of this pedagogical approach. While procedural and selective implementation of AfL strategies has been observed within research (i.e., implementing the letter of AfL), promoting a spirit of AfL appears far more challenging. There is a critical need to better understand how teachers develop AfL capacity within their practice to effectively cultivate a spirit of AfL in their classrooms. The purpose of this study was to describe a learning continuum for teachers’ implementation of AfL as based on data from 88 teachers. Specifically, interview and observational data were analyzed to describe five developmental stages demarcating shifts in teachers’ conceptual understandings and enacted AfL practices. The resulting learning continuum provides an empirical foundation for responsive teacher education that facilitates teachers’ continued learning toward more meaningful AfL implementation.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"24 1","pages":"267 - 285"},"PeriodicalIF":1.5,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2019.1670056","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49272871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ABSTRACT This study describes the development and validation of a multidimensional measure of preadolescent and adolescent readers’ abilities to apply reading comprehension strategies necessary for understanding challenging academic texts. The Strategy Use Measure (SUM) was designed with the intention of being pedagogically informative to the increasingly multilingual student population in the U.S. in grades 6 through 8. The SUM aims to measure four areas of knowledge and skill that are widely purported to support the use of reading strategies: (a) morphological awareness, (b) knowledge of cognates, (c) ability to relate micro- and macro- ideas within a text, and (d) the ability to use intra- and inter-sentential context clues for defining unfamiliar words. The test was developed following a principled, iterative process to instrument development, employing Rasch models and qualitative investigations to test hypotheses related to the instrument’s validity. Findings suggest promising evidence for the validity and fairness of this multidimensional measure.
{"title":"Measuring Reading Strategy Use","authors":"D. Arya, Anthony Clairmont, Daniel Katz, A. Maul","doi":"10.35542/osf.io/f6vu9","DOIUrl":"https://doi.org/10.35542/osf.io/f6vu9","url":null,"abstract":"ABSTRACT This study describes the development and validation of a multidimensional measure of preadolescent and adolescent readers’ abilities to apply reading comprehension strategies necessary for understanding challenging academic texts. The Strategy Use Measure (SUM) was designed with the intention of being pedagogically informative to the increasingly multilingual student population in the U.S. in grades 6 through 8. The SUM aims to measure four areas of knowledge and skill that are widely purported to support the use of reading strategies: (a) morphological awareness, (b) knowledge of cognates, (c) ability to relate micro- and macro- ideas within a text, and (d) the ability to use intra- and inter-sentential context clues for defining unfamiliar words. The test was developed following a principled, iterative process to instrument development, employing Rasch models and qualitative investigations to test hypotheses related to the instrument’s validity. Findings suggest promising evidence for the validity and fairness of this multidimensional measure.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"25 1","pages":"5 - 30"},"PeriodicalIF":1.5,"publicationDate":"2019-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44137935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-08-26DOI: 10.1080/10627197.2019.1645591
Aaron J. Myers, S. Finney
ABSTRACT The indirect effect of perceived test importance on test performance via examinee effort is often modeled using importance and effort scores measured after test completion, which does not align with their theoretical temporal ordering. These retrospectively measured scores may be influenced by examinees’ test performance. To investigate the impact of timing of measurement, college students were randomly assigned to one of the three conditions: (a) importance and effort were measured retrospectively, (b) importance and effort were measured retrospectively and importance was measured prospectively, and (c) importance and effort were measured retrospectively and prospectively. The unstandardized indirect effect was invariant across conditions when modeling prospective and retrospective scores. Priming examinees via prospectively measuring importance and effort did not affect the interrelations among performance and retrospective importance and effort (i.e., invariant indirect effect). Priming did lead to higher average test performance. Thus, priming may provide a cheap intervention for increasing test performance.
{"title":"Does It Matter if Examinee Motivation Is Measured before or after A Low-Stakes Test? A Moderated Mediation Analysis","authors":"Aaron J. Myers, S. Finney","doi":"10.1080/10627197.2019.1645591","DOIUrl":"https://doi.org/10.1080/10627197.2019.1645591","url":null,"abstract":"ABSTRACT The indirect effect of perceived test importance on test performance via examinee effort is often modeled using importance and effort scores measured after test completion, which does not align with their theoretical temporal ordering. These retrospectively measured scores may be influenced by examinees’ test performance. To investigate the impact of timing of measurement, college students were randomly assigned to one of the three conditions: (a) importance and effort were measured retrospectively, (b) importance and effort were measured retrospectively and importance was measured prospectively, and (c) importance and effort were measured retrospectively and prospectively. The unstandardized indirect effect was invariant across conditions when modeling prospective and retrospective scores. Priming examinees via prospectively measuring importance and effort did not affect the interrelations among performance and retrospective importance and effort (i.e., invariant indirect effect). Priming did lead to higher average test performance. Thus, priming may provide a cheap intervention for increasing test performance.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"1 - 19"},"PeriodicalIF":1.5,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2019.1645591","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42896999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-08-26DOI: 10.1080/10627197.2019.1645590
Jihyun Lee, Yang Zhang, L. Stankov
ABSTRACT This study aims to identify which socio-economic status (SES) variables have the best predictive validity for academic achievement, based on the international data sets of the Programme for International Student Assessment (PISA) in 2012, 2009, 2006, and 2003. From among 10 SES measures, two composite variables - Index of economic, social and cultural status (ESCS) and Home possessions (HOMEPOS) - showed superior predictive power for student achievement. Their pan-cultural correlations with the PISA 2012 mathematics achievement were r = .40 and r = .36, respectively. Parental occupation status (r = .33) outperformed all other single measures of SES, including parental education (r = .29). Only two SES variables (i.e., family wealth and home possessions) showed non-linear relationships with academic achievement. We conclude with practical implications and recommendations for using SES measures as predictors of student achievement in educational research and point to the importance of a theoretical alignment between SES measures and particular issues to be addressed.
{"title":"Predictive Validity of SES Measures for Student Achievement","authors":"Jihyun Lee, Yang Zhang, L. Stankov","doi":"10.1080/10627197.2019.1645590","DOIUrl":"https://doi.org/10.1080/10627197.2019.1645590","url":null,"abstract":"ABSTRACT This study aims to identify which socio-economic status (SES) variables have the best predictive validity for academic achievement, based on the international data sets of the Programme for International Student Assessment (PISA) in 2012, 2009, 2006, and 2003. From among 10 SES measures, two composite variables - Index of economic, social and cultural status (ESCS) and Home possessions (HOMEPOS) - showed superior predictive power for student achievement. Their pan-cultural correlations with the PISA 2012 mathematics achievement were r = .40 and r = .36, respectively. Parental occupation status (r = .33) outperformed all other single measures of SES, including parental education (r = .29). Only two SES variables (i.e., family wealth and home possessions) showed non-linear relationships with academic achievement. We conclude with practical implications and recommendations for using SES measures as predictors of student achievement in educational research and point to the importance of a theoretical alignment between SES measures and particular issues to be addressed.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"24 1","pages":"305 - 326"},"PeriodicalIF":1.5,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2019.1645590","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43632597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-08-26DOI: 10.1080/10627197.2019.1645592
J. Soland, Megan Kuhfeld
ABSTRACT Considerable research has examined the use of rapid guessing measures to identify disengaged item responses. However, little is known about students who rapidly guess over the course of several tests. In this study, we use achievement test data from six administrations over three years to investigate whether rapid guessing is a stable trait-like behavior or if rapid guessing is determined mostly by situational variables. Additionally, we examine whether rapid guessing over the course of several tests is associated with certain psychological and background measures. We find that rapid guessing tends to be more state-like compared to academic achievement scores, which are fairly stable. Further, we show that repeated rapid guessing is strongly associated with students’ academic self-efficacy and self-management scores. These findings have implications for detecting rapid guessing and intervening to reduce its effect on observed achievement test scores.
{"title":"Do Students Rapidly Guess Repeatedly over Time? A Longitudinal Analysis of Student Test Disengagement, Background, and Attitudes","authors":"J. Soland, Megan Kuhfeld","doi":"10.1080/10627197.2019.1645592","DOIUrl":"https://doi.org/10.1080/10627197.2019.1645592","url":null,"abstract":"ABSTRACT Considerable research has examined the use of rapid guessing measures to identify disengaged item responses. However, little is known about students who rapidly guess over the course of several tests. In this study, we use achievement test data from six administrations over three years to investigate whether rapid guessing is a stable trait-like behavior or if rapid guessing is determined mostly by situational variables. Additionally, we examine whether rapid guessing over the course of several tests is associated with certain psychological and background measures. We find that rapid guessing tends to be more state-like compared to academic achievement scores, which are fairly stable. Further, we show that repeated rapid guessing is strongly associated with students’ academic self-efficacy and self-management scores. These findings have implications for detecting rapid guessing and intervening to reduce its effect on observed achievement test scores.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"24 1","pages":"327 - 342"},"PeriodicalIF":1.5,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2019.1645592","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46688474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-08-01DOI: 10.1080/10627197.2020.1766961
V. Mills, C. Harrison
ABSTRACT The need and desire to understand and adopt formative assessment practices remain high on the agenda at all levels of educational systems around the world. To advance teachers’ use of formative assessment, research attention also needs to be paid to (a) understanding the challenges teachers face when asked to utilize formative assessment practices in subject-specific content areas and (b) to the development of appropriate and sufficiently powerful professional learning designs that can enable change for teachers. To begin addressing these needs, this paper offers a close examination of an intentionally designed professional learning (PL) series to help middle and high school Algebra I teachers understand the formative assessment process and then track and advance their classroom practice. The professional learning design, in this case, is based on a collaborative and formative approach to classroom practice and teacher change with high school mathematics teachers. Together, the PL model and tools provide a formative framework that bridges the theory-practice divide enabling teachers to conceptualize and then plan for, reflect on, and revise the ways in which new formative assessment practices are implemented in their classrooms. Through an analysis of the affordances and constraints of the PL design in practice, this paper provides insights into how discipline-specific professional learning can be better developed and supported throughout the teacher growth process.
{"title":"Intentional Professional Learning Design: Models, Tools, and the Synergies they Produce Supporting Teacher Growth","authors":"V. Mills, C. Harrison","doi":"10.1080/10627197.2020.1766961","DOIUrl":"https://doi.org/10.1080/10627197.2020.1766961","url":null,"abstract":"ABSTRACT The need and desire to understand and adopt formative assessment practices remain high on the agenda at all levels of educational systems around the world. To advance teachers’ use of formative assessment, research attention also needs to be paid to (a) understanding the challenges teachers face when asked to utilize formative assessment practices in subject-specific content areas and (b) to the development of appropriate and sufficiently powerful professional learning designs that can enable change for teachers. To begin addressing these needs, this paper offers a close examination of an intentionally designed professional learning (PL) series to help middle and high school Algebra I teachers understand the formative assessment process and then track and advance their classroom practice. The professional learning design, in this case, is based on a collaborative and formative approach to classroom practice and teacher change with high school mathematics teachers. Together, the PL model and tools provide a formative framework that bridges the theory-practice divide enabling teachers to conceptualize and then plan for, reflect on, and revise the ways in which new formative assessment practices are implemented in their classrooms. Through an analysis of the affordances and constraints of the PL design in practice, this paper provides insights into how discipline-specific professional learning can be better developed and supported throughout the teacher growth process.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"25 1","pages":"331 - 354"},"PeriodicalIF":1.5,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2020.1766961","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49341653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-27DOI: 10.1080/10627197.2019.1615374
J. Kaufman, J. Engberg, L. Hamilton, Kun Yuan, H. Hill
ABSTRACT High-quality measures of instructional practice are essential for research and evaluation of innovative instructional policies and programs. However, existing measures have generally proven inadequate because of cost and validity issues. This paper addresses two potential drawbacks of survey self-report measures: variation in teachers’ interpretation of response scales and their interpretation of survey questions. To address these drawbacks, researchers tested out use of “anchoring vignettes“ in teacher surveys to capture information about teaching practice, and they gathered validity evidence in regard to their use as a tool for adjusting teachers’ survey self-reports about their instructional practices for research purposes, or potentially to inform professional development. Data from 65 teachers in grades 4-9 responding to our survey suggested that vignette adjustments were reliable and valid for some instructional practices more than others. For some instructional practices, researchers found significant and high correlations between teachers’ adjusted survey self-rating, through use of anchoring vignettes, and previous observation ratings of teachers’ instruction, including ratings from several widely-used observation rubrics. These results suggest that anchoring vignettes may provide an efficient, cost-effective method for gathering data on teachers’ instruction.
{"title":"Validity Evidence Supporting Use of Anchoring Vignettes to Measure Teaching Practice","authors":"J. Kaufman, J. Engberg, L. Hamilton, Kun Yuan, H. Hill","doi":"10.1080/10627197.2019.1615374","DOIUrl":"https://doi.org/10.1080/10627197.2019.1615374","url":null,"abstract":"ABSTRACT High-quality measures of instructional practice are essential for research and evaluation of innovative instructional policies and programs. However, existing measures have generally proven inadequate because of cost and validity issues. This paper addresses two potential drawbacks of survey self-report measures: variation in teachers’ interpretation of response scales and their interpretation of survey questions. To address these drawbacks, researchers tested out use of “anchoring vignettes“ in teacher surveys to capture information about teaching practice, and they gathered validity evidence in regard to their use as a tool for adjusting teachers’ survey self-reports about their instructional practices for research purposes, or potentially to inform professional development. Data from 65 teachers in grades 4-9 responding to our survey suggested that vignette adjustments were reliable and valid for some instructional practices more than others. For some instructional practices, researchers found significant and high correlations between teachers’ adjusted survey self-rating, through use of anchoring vignettes, and previous observation ratings of teachers’ instruction, including ratings from several widely-used observation rubrics. These results suggest that anchoring vignettes may provide an efficient, cost-effective method for gathering data on teachers’ instruction.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"24 1","pages":"155 - 188"},"PeriodicalIF":1.5,"publicationDate":"2019-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2019.1615374","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49264339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-26DOI: 10.1080/10627197.2019.1615373
D. Pastor, Thai Q. Ong, S. Strickman
ABSTRACT The trustworthiness of low-stakes assessment results largely depends on examinee effort, which can be measured by the amount of time examinees devote to items using solution behavior (SB) indices. Because SB indices are calculated for each item, they can be used to understand how examinee motivation changes across items within a test. Latent class analysis (LCA) was used with the SB indices from three low-stakes assessments to explore patterns of solution behavior across items. Across tests, the favored models consisted of two classes, with Class 1 characterized by high and consistent solution behavior (>90% of examinees) and Class 2 by lower and less consistent solution behavior (<10% of examinees). Additional analyses provided supportive validity evidence for the two-class solution with notable differences between classes in self-reported effort, test scores, gender composition, and testing context. Although results were generally similar across the three assessments, striking differences were found in the nature of the solution behavior pattern for Class 2 and the ability of item characteristics to explain the pattern. The variability in the results suggests motivational changes across items may be unique to aspects of the testing situation (e.g., content of the assessment) for less motivated examinees.
{"title":"Patterns of Solution Behavior across Items in Low-Stakes Assessments","authors":"D. Pastor, Thai Q. Ong, S. Strickman","doi":"10.1080/10627197.2019.1615373","DOIUrl":"https://doi.org/10.1080/10627197.2019.1615373","url":null,"abstract":"ABSTRACT The trustworthiness of low-stakes assessment results largely depends on examinee effort, which can be measured by the amount of time examinees devote to items using solution behavior (SB) indices. Because SB indices are calculated for each item, they can be used to understand how examinee motivation changes across items within a test. Latent class analysis (LCA) was used with the SB indices from three low-stakes assessments to explore patterns of solution behavior across items. Across tests, the favored models consisted of two classes, with Class 1 characterized by high and consistent solution behavior (>90% of examinees) and Class 2 by lower and less consistent solution behavior (<10% of examinees). Additional analyses provided supportive validity evidence for the two-class solution with notable differences between classes in self-reported effort, test scores, gender composition, and testing context. Although results were generally similar across the three assessments, striking differences were found in the nature of the solution behavior pattern for Class 2 and the ability of item characteristics to explain the pattern. The variability in the results suggests motivational changes across items may be unique to aspects of the testing situation (e.g., content of the assessment) for less motivated examinees.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"24 1","pages":"189 - 212"},"PeriodicalIF":1.5,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2019.1615373","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44799562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}