In the spring of 2021, just 1 year after schools were forced to close for COVID-19, state assessments were administered at great expense to provide data about impacts of the pandemic on student learning and to help target resources where they were most needed. Using state assessment data from Colorado, this article describes the biggest threats to making valid inferences about student learning to study pandemic impacts using state assessment data: measurement artifacts affecting the comparability of scores, secular trends, and changes in the tested population. The article compares three statistical approaches (the Fair Trend, baseline student growth percentiles, and multiple regression with demographic covariates) that can support more valid inferences about student learning during the pandemic and in other scenarios in which the tested population changes over time. All three approaches lead to similar inferences about statewide student performance but can lead to very different inferences about student subgroups. Results show that controlling statistically for prepandemic demographic differences can reverse the conclusions about groups most affected by the pandemic and decisions about prioritizing resources.
{"title":"Causal Inference and COVID: Contrasting Methods for Evaluating Pandemic Impacts Using State Assessments","authors":"Benjamin R. Shear","doi":"10.1111/emip.12540","DOIUrl":"10.1111/emip.12540","url":null,"abstract":"<p>In the spring of 2021, just 1 year after schools were forced to close for COVID-19, state assessments were administered at great expense to provide data about impacts of the pandemic on student learning and to help target resources where they were most needed. Using state assessment data from Colorado, this article describes the biggest threats to making valid inferences about student learning to study pandemic impacts using state assessment data: measurement artifacts affecting the comparability of scores, secular trends, and changes in the tested population. The article compares three statistical approaches (the Fair Trend, baseline student growth percentiles, and multiple regression with demographic covariates) that can support more valid inferences about student learning during the pandemic and in other scenarios in which the tested population changes over time. All three approaches lead to similar inferences about statewide student performance but can lead to very different inferences about student subgroups. Results show that controlling statistically for prepandemic demographic differences can reverse the conclusions about groups most affected by the pandemic and decisions about prioritizing resources.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44143301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, machine learning (ML) techniques have received more attention in detecting aberrant test-taking behaviors due to advantages when compared to traditional data forensics methods. However, defining “True Test Cheaters” is challenging—different than other fraud detection tasks such as flagging forged bank checks or credit card frauds, testing organizations are often lack of physical evidences to identify “True Test Cheaters” to train ML models. This study proposed a statistically defensible method of labeling “True Test Cheaters” in the data, demonstrated the effectiveness of using ML approaches to identify irregular statistical patterns in exam data, and established an analytical framework for evaluating and conducting real-time ML-based test data forensics. Classification accuracy and false negative/positive results are evaluated across different supervised-ML techniques. The reliability and feasibility of operationally using this approach for an IT certification exam are evaluated using real data.
{"title":"Machine Learning–Based Profiling in Test Cheating Detection","authors":"Huijuan Meng, Ye Ma","doi":"10.1111/emip.12541","DOIUrl":"10.1111/emip.12541","url":null,"abstract":"<p>In recent years, machine learning (ML) techniques have received more attention in detecting aberrant test-taking behaviors due to advantages when compared to traditional data forensics methods. However, defining “True Test Cheaters” is challenging—different than other fraud detection tasks such as flagging forged bank checks or credit card frauds, testing organizations are often lack of physical evidences to identify “True Test Cheaters” to train ML models. This study proposed a statistically defensible method of labeling “True Test Cheaters” in the data, demonstrated the effectiveness of using ML approaches to identify irregular statistical patterns in exam data, and established an analytical framework for evaluating and conducting real-time ML-based test data forensics. Classification accuracy and false negative/positive results are evaluated across different supervised-ML techniques. The reliability and feasibility of operationally using this approach for an IT certification exam are evaluated using real data.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48429707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Preschool Early Numeracy Skills Test–Brief Version (PENS-B) is a measure of early numeracy skills, developed and mainly used in the United States. The purpose of this study was to examine the factorial validity and measurement invariance across gender of PENS-B in the Greek educational context. PENS-B was administered to 906 preschool children (473 boys, 433 girls), randomly selected from 84 kindergarten classrooms. A 2PL unidimensional and multidimensional item response theory analysis, using cross-validation procedures, were used to analyze the data. Results showed that responses to 20 items can be adequately explained by a two-dimensional model (Numbering Relations and Arithmetic Operations). Application of differential item functioning procedures did not detect any gender bias. Numeracy Relation comprises 16 items, which assess low levels of this latent trait. On the other hand, four items capture average levels of Arithmetic Operations. Total information curves revealed that both dimensions measure with precision only a small area of their underlying latent trait.
{"title":"Psychometric Evaluation of the Preschool Early Numeracy Skills Test–Brief Version Within the Item Response Theory Framework","authors":"Nikolaos Tsigilis, Katerina Krousorati, Athanasios Gregoriadis, Vasilis Grammatikopoulos","doi":"10.1111/emip.12536","DOIUrl":"10.1111/emip.12536","url":null,"abstract":"<p>The Preschool Early Numeracy Skills Test–Brief Version (PENS-B) is a measure of early numeracy skills, developed and mainly used in the United States. The purpose of this study was to examine the factorial validity and measurement invariance across gender of PENS-B in the Greek educational context. PENS-B was administered to 906 preschool children (473 boys, 433 girls), randomly selected from 84 kindergarten classrooms. A 2PL unidimensional and multidimensional item response theory analysis, using cross-validation procedures, were used to analyze the data. Results showed that responses to 20 items can be adequately explained by a two-dimensional model (Numbering Relations and Arithmetic Operations). Application of differential item functioning procedures did not detect any gender bias. Numeracy Relation comprises 16 items, which assess low levels of this latent trait. On the other hand, four items capture average levels of Arithmetic Operations. Total information curves revealed that both dimensions measure with precision only a small area of their underlying latent trait.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12536","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47069473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research on Automated Essay Scoring has become increasing important because it serves as a method for evaluating students’ written responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods that can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern Automated Essay Scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included in the Automated Student Assessment Prize competition that were then classified using a scoring model that was trained with the bidirectional encoder representations from a transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
{"title":"Using Active Learning Methods to Strategically Select Essays for Automated Scoring","authors":"Tahereh Firoozi, Hamid Mohammadi, Mark J. Gierl","doi":"10.1111/emip.12537","DOIUrl":"https://doi.org/10.1111/emip.12537","url":null,"abstract":"<p>Research on Automated Essay Scoring has become increasing important because it serves as a method for evaluating students’ written responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods that can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern Automated Essay Scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included in the Automated Student Assessment Prize competition that were then classified using a scoring model that was trained with the bidirectional encoder representations from a transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50147865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As announced in the previous issue of Educational Measurement: Issues and Practice, the ITEMS portal is now hosted on the NCME website. This shift has many benefits. The modules are now easier to access for the NCME membership. Members can navigate to the portal via the link under the resources tab found on the ribbon at the top of each page on the website. Rather than having to go to an external site with a unique log in, all ITEMS modules are now available under the NCME brand directly on the primary site. The modules can be found:
https://www.ncme.org/itemsportal
Being hosted on the NCME website also allows more editorial control of the modules. New modules have an updated form with interactive features built into the browsing experience on the NCME website. Each module begins with a video abstract introducing the objectives learners can expect to achieve by completing the module, as well as an introduction of the authors. The content of the module is broken down into sections, each built around two to four section-specific learning objectives. For each section, authors develop a video of content and interactive learning checks, which are multiple choice items designed to check for understanding. There is an interactive activity for the learner to apply what they have learned in the module. Finally, the slides, sample data sets, example syntax, and other useful resources are available for download.
Since its launch in September 2022, the ITEMS portal has experienced considerable traffic. In the 30 days between September 12 and October 11, the ITEMS portal amassed just under 1,000 unique page views, with Figure 1 showcasing the daily traffic. At the same time, the original ITEMS portal has continued to remain active, amassing many more views. We are planning on shutting down the original ITEMS portal in the near future. It is important that links to ITEMS modules on the original portal be updated to the URL for the NCME website. Linking to new modules is simple. All modules have the same domain name, top-level domain, and path. All digital modules may be linked using the following URL template, replacing ## with the two-digit digital ITEMS module number: https://www.ncme.org/itemsportal/digital-modules/dm##.
I am thrilled to announce the second module of the new format on the NCME website. Jennifer Lewis and Steve Sireci author Digital Module #30 Validity and Educational Testing: Purposes and Uses of Educational Tests. In this five-part module, Lewis and Sireci discuss the purposes and uses of educational tests, the basic concepts of validity theory, the five sources of validity evidence, and how to document a “validity argument.” The module contains content that outlines definitions conceptually and provides concrete examples in K–12 testing but will be of use to anyone involved in testing or measurement.
We have several exciting ITEMS modules in development. There are still opportunities to autho
{"title":"ITEMS Corner Update: High Traffic to the ITEMS Portal on the NCME Website","authors":"Brian C. Leventhal","doi":"10.1111/emip.12532","DOIUrl":"10.1111/emip.12532","url":null,"abstract":"<p>As announced in the previous issue of <i>Educational Measurement: Issues and Practice</i>, the ITEMS portal is now hosted on the NCME website. This shift has many benefits. The modules are now easier to access for the NCME membership. Members can navigate to the portal via the link under the resources tab found on the ribbon at the top of each page on the website. Rather than having to go to an external site with a unique log in, all ITEMS modules are now available under the NCME brand directly on the primary site. The modules can be found:</p><p>https://www.ncme.org/itemsportal</p><p>Being hosted on the NCME website also allows more editorial control of the modules. New modules have an updated form with interactive features built into the browsing experience on the NCME website. Each module begins with a video abstract introducing the objectives learners can expect to achieve by completing the module, as well as an introduction of the authors. The content of the module is broken down into sections, each built around two to four section-specific learning objectives. For each section, authors develop a video of content and interactive learning checks, which are multiple choice items designed to check for understanding. There is an interactive activity for the learner to apply what they have learned in the module. Finally, the slides, sample data sets, example syntax, and other useful resources are available for download.</p><p>Since its launch in September 2022, the ITEMS portal has experienced considerable traffic. In the 30 days between September 12 and October 11, the ITEMS portal amassed just under 1,000 unique page views, with Figure 1 showcasing the daily traffic. At the same time, the original ITEMS portal has continued to remain active, amassing many more views. We are planning on shutting down the original ITEMS portal in the near future. It is important that links to ITEMS modules on the original portal be updated to the URL for the NCME website. Linking to new modules is simple. All modules have the same domain name, top-level domain, and path. All digital modules may be linked using the following URL template, replacing ## with the two-digit digital ITEMS module number: https://www.ncme.org/itemsportal/digital-modules/dm##.</p><p>I am thrilled to announce the second module of the new format on the NCME website. Jennifer Lewis and Steve Sireci author <i>Digital Module #30 Validity and Educational Testing: Purposes and Uses of Educational Tests</i>. In this five-part module, Lewis and Sireci discuss the purposes and uses of educational tests, the basic concepts of validity theory, the five sources of validity evidence, and how to document a “validity argument.” The module contains content that outlines definitions conceptually and provides concrete examples in K–12 testing but will be of use to anyone involved in testing or measurement.</p><p>We have several exciting ITEMS modules in development. There are still opportunities to autho","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12532","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48219216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}