In computerized adaptive testing, item exposure control methods are often used to provide a more balanced usage of the item pool. Many of the most popular methods, including the restricted method (Revuelta and Ponsoda), use a single maximum exposure rate to limit the proportion of times that each item is administered. However, Barrada et al. showed that by using multiple maximum exposure rates, it is possible to obtain an even more balanced usage of the item pool. Therefore, in this paper, we develop four extensions of the restricted method that involve the use of multiple maximum exposure rates. A detailed simulation study reveals that (a) all four of the new methods improve item pool utilization and (b) three of the new methods also improve measurement accuracy. Taken together, these results are highly encouraging, as they reveal that it is possible to improve both types of outcomes simultaneously.
{"title":"Using Multiple Maximum Exposure Rates in Computerized Adaptive Testing","authors":"Kylie Gorney, Mark D. Reckase","doi":"10.1111/jedm.12436","DOIUrl":"https://doi.org/10.1111/jedm.12436","url":null,"abstract":"<p>In computerized adaptive testing, item exposure control methods are often used to provide a more balanced usage of the item pool. Many of the most popular methods, including the restricted method (Revuelta and Ponsoda), use a single maximum exposure rate to limit the proportion of times that each item is administered. However, Barrada et al. showed that by using multiple maximum exposure rates, it is possible to obtain an even more balanced usage of the item pool. Therefore, in this paper, we develop four extensions of the restricted method that involve the use of multiple maximum exposure rates. A detailed simulation study reveals that (a) all four of the new methods improve item pool utilization and (b) three of the new methods also improve measurement accuracy. Taken together, these results are highly encouraging, as they reveal that it is possible to improve both types of outcomes simultaneously.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 2","pages":"360-379"},"PeriodicalIF":1.4,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12436","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144524607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Item response theory (IRT) encompasses a broader class of measurement models than is commonly appreciated by practitioners in educational measurement. For measures of vocabulary and its development, we show how psychological theory might in certain instances support unipolar IRT modeling as a superior alternative to the more traditional bipolar IRT models fit in practice. Although corresponding model choices make unipolar IRT statistically equivalent with bipolar IRT, adopting the unipolar approach substantially alters the resulting metric for proficiency. This shift can have substantial implications for educational research and practices that depend heavily on interval-level score interpretations. As an example, we illustrate through simulation how the perspective of unipolar IRT may account for inconsistencies seen across empirical studies in the observation (or lack thereof) of Matthew effects in reading/vocabulary development (i.e., growth being positively correlated with baseline proficiency), despite theoretical expectations for their presence. Additionally, a unipolar measurement perspective can reflect the anticipated diversification of vocabulary as proficiency level increases. Implications of unipolar IRT representations for constructing tests of vocabulary proficiency and evaluating measurement error are discussed.
{"title":"Theory-Driven IRT Modeling of Vocabulary Development: Matthew Effects and the Case for Unipolar IRT","authors":"Qi (Helen) Huang, Daniel M. Bolt, Xiangyi Liao","doi":"10.1111/jedm.12433","DOIUrl":"https://doi.org/10.1111/jedm.12433","url":null,"abstract":"<p>Item response theory (IRT) encompasses a broader class of measurement models than is commonly appreciated by practitioners in educational measurement. For measures of vocabulary and its development, we show how psychological theory might in certain instances support unipolar IRT modeling as a superior alternative to the more traditional bipolar IRT models fit in practice. Although corresponding model choices make unipolar IRT statistically equivalent with bipolar IRT, adopting the unipolar approach substantially alters the resulting metric for proficiency. This shift can have substantial implications for educational research and practices that depend heavily on interval-level score interpretations. As an example, we illustrate through simulation how the perspective of unipolar IRT may account for inconsistencies seen across empirical studies in the observation (or lack thereof) of Matthew effects in reading/vocabulary development (i.e., growth being positively correlated with baseline proficiency), despite theoretical expectations for their presence. Additionally, a unipolar measurement perspective can reflect the anticipated diversification of vocabulary as proficiency level increases. Implications of unipolar IRT representations for constructing tests of vocabulary proficiency and evaluating measurement error are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 2","pages":"199-224"},"PeriodicalIF":1.4,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12433","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144524868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study examined the widely used threshold of .2 for Yen's Q3, an index for violations of local independence. Specifically, a simulation was conducted to investigate whether Q3 values were related to the magnitude of bias in estimates of reliability, item parameters, and examinee ability. Results showed that Q3 values below the typical cut-off yielded meaningful bias in estimates. Practical implications and limitations are discussed.
{"title":"Another Look at Yen's Q3: Is .2 an Appropriate Cut-Off?","authors":"Kelsey Nason, Christine DeMars","doi":"10.1111/jedm.12432","DOIUrl":"https://doi.org/10.1111/jedm.12432","url":null,"abstract":"<p>This study examined the widely used threshold of .2 for Yen's Q3, an index for violations of local independence. Specifically, a simulation was conducted to investigate whether Q3 values were related to the magnitude of bias in estimates of reliability, item parameters, and examinee ability. Results showed that Q3 values below the typical cut-off yielded meaningful bias in estimates. Practical implications and limitations are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 2","pages":"345-359"},"PeriodicalIF":1.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12432","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144524843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}