James Sharpnack, Phoebe Mulcaire, Klinton Bicknell, Geoff LaFlair, Kevin Yancey
{"title":"AutoIRT:利用自动机器学习校准项目反应理论模型","authors":"James Sharpnack, Phoebe Mulcaire, Klinton Bicknell, Geoff LaFlair, Kevin Yancey","doi":"arxiv-2409.08823","DOIUrl":null,"url":null,"abstract":"Item response theory (IRT) is a class of interpretable factor models that are\nwidely used in computerized adaptive tests (CATs), such as language proficiency\ntests. Traditionally, these are fit using parametric mixed effects models on\nthe probability of a test taker getting the correct answer to a test item\n(i.e., question). Neural net extensions of these models, such as BertIRT,\nrequire specialized architectures and parameter tuning. We propose a multistage\nfitting procedure that is compatible with out-of-the-box Automated Machine\nLearning (AutoML) tools. It is based on a Monte Carlo EM (MCEM) outer loop with\na two stage inner loop, which trains a non-parametric AutoML grade model using\nitem features followed by an item specific parametric model. This greatly\naccelerates the modeling workflow for scoring tests. We demonstrate its\neffectiveness by applying it to the Duolingo English Test, a high stakes,\nonline English proficiency test. We show that the resulting model is typically\nmore well calibrated, gets better predictive performance, and more accurate\nscores than existing methods (non-explanatory IRT models and explanatory IRT\nmodels like BERT-IRT). Along the way, we provide a brief survey of machine\nlearning methods for calibration of item parameters for CATs.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning\",\"authors\":\"James Sharpnack, Phoebe Mulcaire, Klinton Bicknell, Geoff LaFlair, Kevin Yancey\",\"doi\":\"arxiv-2409.08823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Item response theory (IRT) is a class of interpretable factor models that are\\nwidely used in computerized adaptive tests (CATs), such as language proficiency\\ntests. Traditionally, these are fit using parametric mixed effects models on\\nthe probability of a test taker getting the correct answer to a test item\\n(i.e., question). Neural net extensions of these models, such as BertIRT,\\nrequire specialized architectures and parameter tuning. We propose a multistage\\nfitting procedure that is compatible with out-of-the-box Automated Machine\\nLearning (AutoML) tools. It is based on a Monte Carlo EM (MCEM) outer loop with\\na two stage inner loop, which trains a non-parametric AutoML grade model using\\nitem features followed by an item specific parametric model. This greatly\\naccelerates the modeling workflow for scoring tests. We demonstrate its\\neffectiveness by applying it to the Duolingo English Test, a high stakes,\\nonline English proficiency test. We show that the resulting model is typically\\nmore well calibrated, gets better predictive performance, and more accurate\\nscores than existing methods (non-explanatory IRT models and explanatory IRT\\nmodels like BERT-IRT). Along the way, we provide a brief survey of machine\\nlearning methods for calibration of item parameters for CATs.\",\"PeriodicalId\":501172,\"journal\":{\"name\":\"arXiv - STAT - Applications\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning
Item response theory (IRT) is a class of interpretable factor models that are
widely used in computerized adaptive tests (CATs), such as language proficiency
tests. Traditionally, these are fit using parametric mixed effects models on
the probability of a test taker getting the correct answer to a test item
(i.e., question). Neural net extensions of these models, such as BertIRT,
require specialized architectures and parameter tuning. We propose a multistage
fitting procedure that is compatible with out-of-the-box Automated Machine
Learning (AutoML) tools. It is based on a Monte Carlo EM (MCEM) outer loop with
a two stage inner loop, which trains a non-parametric AutoML grade model using
item features followed by an item specific parametric model. This greatly
accelerates the modeling workflow for scoring tests. We demonstrate its
effectiveness by applying it to the Duolingo English Test, a high stakes,
online English proficiency test. We show that the resulting model is typically
more well calibrated, gets better predictive performance, and more accurate
scores than existing methods (non-explanatory IRT models and explanatory IRT
models like BERT-IRT). Along the way, we provide a brief survey of machine
learning methods for calibration of item parameters for CATs.