Over our field's 100-year-plus history, standardization has been a central assumption in test theory and practice. The concept's justification turns on leveling the playing field by presenting all examinees with putatively equivalent experiences. Until relatively recently, our field has accepted that justification almost without question. In this article, I present a case for standardization's antithesis, personalization. Interestingly, personalized assessment has important precedents within the measurement community. As intriguing are some of the divergent ways in which personalization might be realized in practice. Those ways, however, suggest a host of serious issues. Despite those issues, both moral obligation and survival imperative counsel persistence in trying to personalize assessment.
The Computer-based Case Simulations (CCS) component of the United States Medical Licensing Examination (USMLE) Step 3 was developed to assess the decision-making and patient-management skills of physicians. Process data can provide deep insights into examinees’ behavioral processes related to completing the CCS assessment task. In this paper, we utilized process data to evaluate the impact of shortening allotted time limit by rescoring the CCS cases based on process data extracted at various timestamps that represented different percentages of the original allotted case time. It was found that examinees’ performance as well as the correlation between original and newly generated scores both tended to decrease as the timestamp condition became stricter. The impact of shortening allotted time limit was found marginally associated with case difficulties, but strongly dependent on the case time intensity under the original time setting.
This contribution to the Special Issue of EM:IP on the topic of The Past, Present and Future of Educational Measurement concentrates on the present and the future and hence focuses on the goal of improving education. The results of meta-analyses were examined, and it was noted that the largest effect sizes were associated with actual use of formative assessments in classroom settings—hence classroom assessment (in contrast with large-scale assessment). The paper describes micro assessment, which focuses on in-classroom forms of measurement, and then expands this assessment approach to focus on frames beyond that in terms of summative end-of-semester tests (macro). This is followed by a description of how these approaches can be combined using a construct map as the basis for developing and using assessments to span across these two levels in terms of the BEAR Assessment System (BAS). Throughout, this is exemplified using an elementary school program designed to teach students about geometry. Finally, a conclusion summarizes the discussion, and also looks to the future where a meso level of use involves end-of-unit tests.
Vertical scales are frequently developed using common item nonequivalent group linking. In this design, one can use upper-grade, lower-grade, or mixed-grade common items to estimate the linking constants that underlie the absolute measurement of growth. Using the Rasch model and a dataset from Curriculum Associates’ i-Ready Diagnostic in math in grades 3–7, we demonstrate how grade-to-grade mean differences in mathematics proficiency appear much larger when upper-grade linking items are used instead of lower-grade items, with linkings based on a mixture of items falling in between. We then consider salient properties of the three calibrated scales including invariance of the different sets of common items to student grade and item difficulty reversals. These exploratory analyses suggest that upper-grade common items in vertical scaling are more subject to threats to score comparability across grades, even though these items also tend to imply the most growth.
This article describes an amazing development of methods and models supporting educational measurement together with a much slower evolution of theory about how and what students learn and how educational measurement best supports that learning. Told from the perspective of someone who has lived through many of these changes, the article provides background on these developments and insights into challenges and opportunities for future development.
Educational measurement is a social science that requires both qualitative and quantitative competencies. Qualitative competencies in educational measurement include developing and applying theories of learning, designing instruments, and identifying the social, cultural, historical, and political contexts of measurement. Quantitative competencies include statistical inference, computational fluency, and psychometric modeling. I review 12 commentaries authored by past presidents of the National Council on Measurement in Education (NCME) published in a special issue prompting them to reflect on the past, present, and future of educational measurement. I explain how a perspective on both qualitative and quantitative competencies yields common themes across the commentaries. These include the appeal and challenge of personalization, the necessity of contextualization, and the value of communication and collaboration. I conclude that elevation of both qualitative and quantitative competencies underlying educational measurement provides a clearer sense of how NCME can advance its mission, “to advance theory and applications of educational measurement to benefit society.”
Access to admission tests was greatly restricted during the COVID-19 pandemic resulting in widespread adoption of test-optional policies by colleges and universities. Many institutions adopted such policies on an interim or trial basis, as many others signaled the change would be long term. Several Ivy League institutions and selective public flagship universities have returned to requiring test scores from all applicants citing their own research indicating diversity and ensuring academic success of applicants can be best served by inclusion of test scores in the admissions process. This paper reviews recent research on the impact of test-optional policies on score-sending behaviors of applicants and differential outcomes in college and score sending. Ultimately, test-optional policies are neither the panacea for diversity that proponents suggested nor do they result in a decay of academic outcomes that opponents forecast, but they do have consequences, which colleges will need to weigh going forward.

