首页 > 最新文献

Educational Measurement-Issues and Practice最新文献

英文 中文
Issue Cover
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-02-17 DOI: 10.1111/emip.12611
{"title":"Issue Cover","authors":"","doi":"10.1111/emip.12611","DOIUrl":"https://doi.org/10.1111/emip.12611","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12611","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital Module 37: Introduction to Item Response Tree (IRTree) Models
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-02-17 DOI: 10.1111/emip.12665
Nana Kim, Jiayi Deng, Yun Leng Wong

Module Abstract

Item response tree (IRTree) models, an item response modeling approach that incorporates a tree structure, have become a popular method for many applications in measurement. IRTree models characterize the underlying response processes using a decision tree structure, where the internal decision outcome at each node is parameterized with an item response theory (IRT) model. Such models provide a flexible way of investigating and modeling underlying response processes, which can be useful for examining sources of individual differences in measurement and addressing measurement issues that traditional IRT models cannot deal with. In this module, we discuss the conceptual framework of IRTree models and demonstrate examples of their applications in the context of both cognitive and noncognitive assessments. We also introduce some possible extensions of the model and provide a demonstration of an example data analysis in R.

模块 摘要 项目反应树(IRTree)模型是一种包含树形结构的项目反应建模方法,在测量领域的许多应用中已成为一种流行的方法。IRTree 模型采用决策树结构来描述基本的反应过程,其中每个节点的内部决策结果都用项目反应理论(IRT)模型来参数化。这类模型提供了一种灵活的方法来研究和模拟基本的反应过程,这对于研究测量中个体差异的来源和解决传统 IRT 模型无法解决的测量问题非常有用。在本模块中,我们将讨论 IRTree 模型的概念框架,并举例说明其在认知和非认知评估中的应用。我们还将介绍该模型的一些可能扩展,并提供一个用 R 进行数据分析的示例。
{"title":"Digital Module 37: Introduction to Item Response Tree (IRTree) Models","authors":"Nana Kim,&nbsp;Jiayi Deng,&nbsp;Yun Leng Wong","doi":"10.1111/emip.12665","DOIUrl":"https://doi.org/10.1111/emip.12665","url":null,"abstract":"<div>\u0000 \u0000 <section>\u0000 \u0000 <h3> Module Abstract</h3>\u0000 \u0000 <p>Item response tree (IRTree) models, an item response modeling approach that incorporates a tree structure, have become a popular method for many applications in measurement. IRTree models characterize the underlying response processes using a decision tree structure, where the internal decision outcome at each node is parameterized with an item response theory (IRT) model. Such models provide a flexible way of investigating and modeling underlying response processes, which can be useful for examining sources of individual differences in measurement and addressing measurement issues that traditional IRT models cannot deal with. In this module, we discuss the conceptual framework of IRTree models and demonstrate examples of their applications in the context of both cognitive and noncognitive assessments. We also introduce some possible extensions of the model and provide a demonstration of an example data analysis in R.</p>\u0000 </section>\u0000 </div>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"109-110"},"PeriodicalIF":2.7,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Cover: Unraveling Reading Recognition Trajectories: Classifying Student Development through Growth Mixture Modeling
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-02-17 DOI: 10.1111/emip.12667
Yuan-Ling Liaw
<p>The cover of this issue features “<i>Unraveling Reading Recognition Trajectories: Classifying Student Development through Growth Mixture Modeling</i>” by Xingyao Xiao and Sophia Rabe-Hesketh from the University of California, Berkeley. Using advanced Bayesian growth mixture modeling, their research examines how reading recognition develops between ages 6 and 14, identifying three distinct patterns of growth. This study provides a detailed and nuanced understanding of how students’ reading abilities progress over time.</p><p>Xiao and Rabe-Hesketh illustrated their findings using a multiplot visualization. It combines model-implied class-specific mean trajectories, a shaded 50% mid-range, and box-plots of observed reading scores, effectively highlighting the variability in reading progress among different learner groups. By juxtaposing observed data with model predictions, the visualization clearly depicts diverse growth patterns. Additionally, it emphasizes the variance and covariance of random effects, offering valuable insights often overlooked in similar analyses.</p><p>The three-class model described by Xiao and Rabe-Hesketh effectively explains different patterns of student growth. The first group, termed the “Early Bloomers,” comprises about 14% of the population who start with strong reading abilities and steadily improve. By age six, they show high reading scores and greater variability in growth trajectories compared to other groups. Xiao and Rabe-Hesketh note, “These students exhibit greater variability in growth curves at age six, with an 88% likelihood for those deviating 2 standard deviations below or above the mean to stray from the average growth rate.” This highlights their potential for early reading success.</p><p>The “Rapid Catch-Up Learners” represent 35% of students, starting with lower scores but progressing rapidly to often surpass Early Bloomers by adolescence. Xiao and Rabe-Hesketh explain, “Though showing minimal heterogeneity in growth trajectories at age 6, these paths diverge due to a positive correlation between intercepts and slope. Those with trajectories 2 standard deviations above or below the mean at age 6 possess an 81% likelihood of deviating from the average growth rate.” This group highlights the potential of slower starters to excel with targeted support.</p><p>Lastly, the “Steady Progressors” start with the lowest average scores at age six but show steady, consistent growth over time. By age 14, their scores begin to overlap with those of other groups, despite maintaining an initial gap. “These students are projected to deviate 605% more from the mean at age 14 than at age 6, approximately seven times as much.” Representing a majority of students, this group highlights the importance of persistence and gradual progress.</p><p>Through their research, Xiao and Rabe-Hesketh define the diverse trajectories of reading development. Whether a student's growth is rapid, steady, or gradual, every trajectory deser
本期封面文章是加州大学伯克利分校的肖星耀和索菲亚-拉贝-赫斯基(Sophia Rabe-Hesketh)的 "解读阅读识别轨迹:加州大学伯克利分校的Xingyao Xiao和Sophia Rabe-Hesketh撰写的 "通过成长混合模型对学生的发展进行分类"。他们的研究采用先进的贝叶斯成长混合模型,考察了 6 至 14 岁学生的阅读识别能力发展情况,并确定了三种不同的成长模式。这项研究详细而细致地揭示了学生的阅读能力是如何随着时间的推移而进步的。它结合了模型推测的班级平均轨迹、阴影 50%的中间范围以及观察到的阅读分数的箱形图,有效地突出了不同学习者群体之间阅读进步的差异性。通过将观测数据与模型预测并列,该可视化图表清晰地描述了不同的增长模式。此外,它还强调了随机效应的方差和协方差,提供了在类似分析中经常被忽视的宝贵见解。肖和拉贝-赫斯基思所描述的三类模型有效地解释了学生的不同成长模式。第一类被称为 "早期绽放者",约占总人口的 14%,他们开始时阅读能力很强,并稳步提高。到六岁时,他们的阅读得分很高,与其他群体相比,他们的成长轨迹变化更大。Xiao和Rabe-Hesketh指出:"这些学生在六岁时表现出更大的成长曲线变异性,低于或高于平均值2个标准差的学生偏离平均成长率的可能性为88%"。"快速追赶学习者 "占学生总数的 35%,他们开始时分数较低,但进步很快,到青春期时往往超过早期绽放者。Xiao 和 Rabe-Hesketh 解释说:"虽然在 6 岁时成长轨迹的异质性很小,但由于截距和斜率之间的正相关性,这些轨迹出现了分化。那些在 6 岁时生长轨迹高于或低于平均值 2 个标准差的人,有 81% 的可能性偏离平均生长速度"。最后,"稳步前进者 "在 6 岁时的平均成绩最低,但随着时间的推移,他们的成绩会稳步、持续地增长。到 14 岁时,他们的分数开始与其他组别重叠,尽管最初仍有差距。"预计这些学生 14 岁时的平均分偏差将比 6 岁时高出 605%,大约是 6 岁时的 7 倍"。肖和拉贝-赫斯基思通过他们的研究,确定了阅读发展的不同轨迹。无论学生的成长是快速、稳定还是循序渐进,每一种轨迹都值得肯定和鼓励。通过满足每个学习者的独特需求,教育者可以更好地支持这些不同的学习路径,为所有学生的成功和茁壮成长创造公平的机会。有关此可视化的更多详情或咨询,请联系肖星瑶([email protected])。我们邀请您参加 EM:IP 封面图形/数据可视化竞赛,为未来的期刊做出贡献。请发送电子邮件至[email protected]与廖远玲分享您的想法或问题。我们期待您的来信!
{"title":"On the Cover: Unraveling Reading Recognition Trajectories: Classifying Student Development through Growth Mixture Modeling","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12667","DOIUrl":"https://doi.org/10.1111/emip.12667","url":null,"abstract":"&lt;p&gt;The cover of this issue features “&lt;i&gt;Unraveling Reading Recognition Trajectories: Classifying Student Development through Growth Mixture Modeling&lt;/i&gt;” by Xingyao Xiao and Sophia Rabe-Hesketh from the University of California, Berkeley. Using advanced Bayesian growth mixture modeling, their research examines how reading recognition develops between ages 6 and 14, identifying three distinct patterns of growth. This study provides a detailed and nuanced understanding of how students’ reading abilities progress over time.&lt;/p&gt;&lt;p&gt;Xiao and Rabe-Hesketh illustrated their findings using a multiplot visualization. It combines model-implied class-specific mean trajectories, a shaded 50% mid-range, and box-plots of observed reading scores, effectively highlighting the variability in reading progress among different learner groups. By juxtaposing observed data with model predictions, the visualization clearly depicts diverse growth patterns. Additionally, it emphasizes the variance and covariance of random effects, offering valuable insights often overlooked in similar analyses.&lt;/p&gt;&lt;p&gt;The three-class model described by Xiao and Rabe-Hesketh effectively explains different patterns of student growth. The first group, termed the “Early Bloomers,” comprises about 14% of the population who start with strong reading abilities and steadily improve. By age six, they show high reading scores and greater variability in growth trajectories compared to other groups. Xiao and Rabe-Hesketh note, “These students exhibit greater variability in growth curves at age six, with an 88% likelihood for those deviating 2 standard deviations below or above the mean to stray from the average growth rate.” This highlights their potential for early reading success.&lt;/p&gt;&lt;p&gt;The “Rapid Catch-Up Learners” represent 35% of students, starting with lower scores but progressing rapidly to often surpass Early Bloomers by adolescence. Xiao and Rabe-Hesketh explain, “Though showing minimal heterogeneity in growth trajectories at age 6, these paths diverge due to a positive correlation between intercepts and slope. Those with trajectories 2 standard deviations above or below the mean at age 6 possess an 81% likelihood of deviating from the average growth rate.” This group highlights the potential of slower starters to excel with targeted support.&lt;/p&gt;&lt;p&gt;Lastly, the “Steady Progressors” start with the lowest average scores at age six but show steady, consistent growth over time. By age 14, their scores begin to overlap with those of other groups, despite maintaining an initial gap. “These students are projected to deviate 605% more from the mean at age 14 than at age 6, approximately seven times as much.” Representing a majority of students, this group highlights the importance of persistence and gradual progress.&lt;/p&gt;&lt;p&gt;Through their research, Xiao and Rabe-Hesketh define the diverse trajectories of reading development. Whether a student's growth is rapid, steady, or gradual, every trajectory deser","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"6"},"PeriodicalIF":2.7,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12667","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ITEMS Corner: Next Chapter of ITEMS
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-02-17 DOI: 10.1111/emip.12666
Stella Y. Kim
{"title":"ITEMS Corner: Next Chapter of ITEMS","authors":"Stella Y. Kim","doi":"10.1111/emip.12666","DOIUrl":"https://doi.org/10.1111/emip.12666","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"108"},"PeriodicalIF":2.7,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Issue Cover
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-01-05 DOI: 10.1111/emip.12566
{"title":"Issue Cover","authors":"","doi":"10.1111/emip.12566","DOIUrl":"https://doi.org/10.1111/emip.12566","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12566","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Cover: The Increasing Impact of EM:IP
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-01-05 DOI: 10.1111/emip.12657
Yuan-Ling Liaw

The cover of this issue featured “The Increasing Impact of EM:IP” by Zhongmin Cui, the journal's editor. Cui elaborated on the significance of the impact factor for Educational Measurement: Issues and Practice (EM:IP), one of the most widely recognized metrics for evaluating a journal's influence and prestige. The impact factor, which measures how frequently a journal's articles are cited over a specific period, serves as a critical tool for researchers, institutions, and funding bodies in assessing the relevance and significance of published work.

Cui noted the challenges in measuring a journal's influence, stating, “As measurement professionals, we are well aware of the difficulties in quantifying almost anything, including the impact of a journal. However, even imperfect metrics, if carefully designed, can provide valuable insights for users making informed decisions.”

He cited EM:IP’s latest journal impact factor of 2.7 (Wiley, 2024), which was calculated based on citations from the previous two years. Acknowledging that this figure might not seem substantial, Cui emphasized that it represents a significant milestone in the journal's history. “The visualization we created illustrates a steady, consistent upward trend in EM:IP’s impact factor over the past decade. This growth reflects our ongoing commitment to publishing high-quality, impactful research that resonates with both scholars and practitioners,” he added.

Cui also stressed the growing influence of EM:IP in the field of educational and psychological measurement. He credited this achievement to the dedication of the authors, the insights of the reviewers, and the ongoing support of the readers. “Everyone's contributions have been crucial to our success, and we are excited to continue our mission to advance knowledge and foster scholarly discourse in the years ahead,” he expressed with gratitude.

The visualization was created using Python, following guidelines established by Setzer and Cui (2022). “One special feature of the graph is the use of the journal's color scheme, which enhances visual harmony, particularly for the cover design,” Cui explained. The data used to calculate the impact factor was sourced from Clarivate (https://clarivate.com/). For those interested in learning more about this data visualization, Zhongmin Cui can be contacted at [email protected].

We also invite you to participate in the annual EM:IP Cover Graphic/Data Visualization Competition. Details for the 2025 competition can be found in this issue. Your entry could be featured on the cover of a future issue! We're eager to receive your feedback and submissions. Please share your thoughts or questions by emailing Yuan-Ling Liaw at [email protected].

{"title":"On the Cover: The Increasing Impact of EM:IP","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12657","DOIUrl":"https://doi.org/10.1111/emip.12657","url":null,"abstract":"<p>The cover of this issue featured “The Increasing Impact of <i>EM:IP</i>” by Zhongmin Cui, the journal's editor. Cui elaborated on the significance of the impact factor for Educational Measurement: Issues and Practice (<i>EM:IP</i>), one of the most widely recognized metrics for evaluating a journal's influence and prestige. The impact factor, which measures how frequently a journal's articles are cited over a specific period, serves as a critical tool for researchers, institutions, and funding bodies in assessing the relevance and significance of published work.</p><p>Cui noted the challenges in measuring a journal's influence, stating, “As measurement professionals, we are well aware of the difficulties in quantifying almost anything, including the impact of a journal. However, even imperfect metrics, if carefully designed, can provide valuable insights for users making informed decisions.”</p><p>He cited <i>EM:IP</i>’s latest journal impact factor of 2.7 (Wiley, <span>2024</span>), which was calculated based on citations from the previous two years. Acknowledging that this figure might not seem substantial, Cui emphasized that it represents a significant milestone in the journal's history. “The visualization we created illustrates a steady, consistent upward trend in <i>EM:IP</i>’s impact factor over the past decade. This growth reflects our ongoing commitment to publishing high-quality, impactful research that resonates with both scholars and practitioners,” he added.</p><p>Cui also stressed the growing influence of <i>EM:IP</i> in the field of educational and psychological measurement. He credited this achievement to the dedication of the authors, the insights of the reviewers, and the ongoing support of the readers. “Everyone's contributions have been crucial to our success, and we are excited to continue our mission to advance knowledge and foster scholarly discourse in the years ahead,” he expressed with gratitude.</p><p>The visualization was created using Python, following guidelines established by Setzer and Cui (<span>2022</span>). “One special feature of the graph is the use of the journal's color scheme, which enhances visual harmony, particularly for the cover design,” Cui explained. The data used to calculate the impact factor was sourced from Clarivate (https://clarivate.com/). For those interested in learning more about this data visualization, Zhongmin Cui can be contacted at [email protected].</p><p>We also invite you to participate in the annual <i>EM:IP</i> Cover Graphic/Data Visualization Competition. Details for the 2025 competition can be found in this issue. Your entry could be featured on the cover of a future issue! We're eager to receive your feedback and submissions. Please share your thoughts or questions by emailing Yuan-Ling Liaw at [email protected].</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"7"},"PeriodicalIF":2.7,"publicationDate":"2025-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12657","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current Psychometric Models and Some Uses of Technology in Educational Testing
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-12-27 DOI: 10.1111/emip.12644
Robert L. Brennan

This paper addresses some issues concerning the use of current psychometric models for current (and possibly future) technology-based educational testing (as well as most licensure and certification testing). The intent here is to provide a relatively simple overview that addresses important issues, with little explicit intent to argue strenuously for or against the particular uses of technology discussed here.

{"title":"Current Psychometric Models and Some Uses of Technology in Educational Testing","authors":"Robert L. Brennan","doi":"10.1111/emip.12644","DOIUrl":"https://doi.org/10.1111/emip.12644","url":null,"abstract":"<p>This paper addresses some issues concerning the use of current psychometric models for current (and possibly future) technology-based educational testing (as well as most licensure and certification testing). The intent here is to provide a relatively simple overview that addresses important issues, with little explicit intent to argue strenuously for or against the particular uses of technology discussed here.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"88-92"},"PeriodicalIF":2.7,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12644","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-12-19 DOI: 10.1111/emip.12663
Guher Gorgun, Okan Bulut

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for evaluating automatically generated cloze items. The trained large-language model was able to filter out majority of good and bad items accurately. Evaluating items automatically with instruction-tuned LLMs may aid educators and test developers in understanding the quality of items created in an efficient and scalable manner. The item evaluation process with LLMs may also act as an intermediate step between item creation and field testing to reduce the cost and time associated with multiple rounds of revision.

{"title":"Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study","authors":"Guher Gorgun,&nbsp;Okan Bulut","doi":"10.1111/emip.12663","DOIUrl":"https://doi.org/10.1111/emip.12663","url":null,"abstract":"<p>Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for evaluating automatically generated cloze items. The trained large-language model was able to filter out majority of good and bad items accurately. Evaluating items automatically with instruction-tuned LLMs may aid educators and test developers in understanding the quality of items created in an efficient and scalable manner. The item evaluation process with LLMs may also act as an intermediate step between item creation and field testing to reduce the cost and time associated with multiple rounds of revision.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"96-107"},"PeriodicalIF":2.7,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12663","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Still Interested in Multidimensional Item Response Theory Modeling? Here Are Some Thoughts on How to Make It Work in Practice
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-12-18 DOI: 10.1111/emip.12645
Terry A. Ackerman, Richard M. Luecht
<p>Given tremendous improvements over the past three to four decades in the computational methods and computer technologies needed to estimate the parameters for higher dimensionality models (Cai, <span>2010a, 2010b</span>, <span>2017</span>), we might expect that MIRT would by now be a widely used array of models and psychometric software tools being used operationally in many educational assessment settings. Perhaps one of the few areas where MIRT has helped practitioners is in the area of understanding Differential Item Functioning (DIF) (Ackerman & Ma, <span>2024</span>; Camilli, <span>1992</span>; Shealy & Stout, <span>1993</span>). Nevertheless, the expectation has not been met nor do there seem to be many operational initiatives to change the <i>status quo</i>.</p><p>Some research psychometricians might lament the lack of large-scale applications of MIRT in the field of educational assessment. However, the simple fact is that MIRT has not lived up to its early expectations nor its potential due to several barriers. Following a discussion of test purpose and metric design issues in the next section, we will examine some of the barriers associated with these topics and provide suggestions for overcoming or completely avoiding them.</p><p>Tests developed for one purpose are rarely of much utility for another purpose. For example, professional certification and licensure tests designed to optimize pass-fail classifications are often not very useful for reporting scores across a large proficiency range—at least not unless the tests are extremely long. Summative, and most interim assessments used in K–12 education, are usually designed to produce reliable total-test scores. The resulting scale scores are summarized as descriptive statistical aggregations of scale scores or other functions of the scores such as classifying students in ordered achievement levels (e.g., Below Basic, Basic, Proficient, Advanced), or in modeling student growth in a subject area as part of an educational accountability system. Some commercially available online “interim” assessments provide limited progress-oriented scores and subscores from on-demand tests. However, the defensible formative utility of most interim assessments remains limited because test development and psychometric analytics follow the summative assessment test design and development paradigm: focusing on maintaining vertically aligned or equated, unidimensional scores scales (e.g., a K–12 math scale).</p><p>The requisite test design and development frameworks for summative tests focus on the relationships between the item responses and the total test score scale (e.g., maximizing item-total score correlations and the conditional reliability within prioritized regions of that score scale).</p><p>Applying MIRT models to most summative or interim assessments makes little sense. The problem is that we continue to allow policymakers to make claims about score interpretations that are not support
{"title":"Still Interested in Multidimensional Item Response Theory Modeling? Here Are Some Thoughts on How to Make It Work in Practice","authors":"Terry A. Ackerman,&nbsp;Richard M. Luecht","doi":"10.1111/emip.12645","DOIUrl":"https://doi.org/10.1111/emip.12645","url":null,"abstract":"&lt;p&gt;Given tremendous improvements over the past three to four decades in the computational methods and computer technologies needed to estimate the parameters for higher dimensionality models (Cai, &lt;span&gt;2010a, 2010b&lt;/span&gt;, &lt;span&gt;2017&lt;/span&gt;), we might expect that MIRT would by now be a widely used array of models and psychometric software tools being used operationally in many educational assessment settings. Perhaps one of the few areas where MIRT has helped practitioners is in the area of understanding Differential Item Functioning (DIF) (Ackerman &amp; Ma, &lt;span&gt;2024&lt;/span&gt;; Camilli, &lt;span&gt;1992&lt;/span&gt;; Shealy &amp; Stout, &lt;span&gt;1993&lt;/span&gt;). Nevertheless, the expectation has not been met nor do there seem to be many operational initiatives to change the &lt;i&gt;status quo&lt;/i&gt;.&lt;/p&gt;&lt;p&gt;Some research psychometricians might lament the lack of large-scale applications of MIRT in the field of educational assessment. However, the simple fact is that MIRT has not lived up to its early expectations nor its potential due to several barriers. Following a discussion of test purpose and metric design issues in the next section, we will examine some of the barriers associated with these topics and provide suggestions for overcoming or completely avoiding them.&lt;/p&gt;&lt;p&gt;Tests developed for one purpose are rarely of much utility for another purpose. For example, professional certification and licensure tests designed to optimize pass-fail classifications are often not very useful for reporting scores across a large proficiency range—at least not unless the tests are extremely long. Summative, and most interim assessments used in K–12 education, are usually designed to produce reliable total-test scores. The resulting scale scores are summarized as descriptive statistical aggregations of scale scores or other functions of the scores such as classifying students in ordered achievement levels (e.g., Below Basic, Basic, Proficient, Advanced), or in modeling student growth in a subject area as part of an educational accountability system. Some commercially available online “interim” assessments provide limited progress-oriented scores and subscores from on-demand tests. However, the defensible formative utility of most interim assessments remains limited because test development and psychometric analytics follow the summative assessment test design and development paradigm: focusing on maintaining vertically aligned or equated, unidimensional scores scales (e.g., a K–12 math scale).&lt;/p&gt;&lt;p&gt;The requisite test design and development frameworks for summative tests focus on the relationships between the item responses and the total test score scale (e.g., maximizing item-total score correlations and the conditional reliability within prioritized regions of that score scale).&lt;/p&gt;&lt;p&gt;Applying MIRT models to most summative or interim assessments makes little sense. The problem is that we continue to allow policymakers to make claims about score interpretations that are not support","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"93-100"},"PeriodicalIF":2.7,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12645","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalizing Assessment: Dream or Nightmare?
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-12-04 DOI: 10.1111/emip.12652
Randy E. Bennett

Over our field's 100-year-plus history, standardization has been a central assumption in test theory and practice. The concept's justification turns on leveling the playing field by presenting all examinees with putatively equivalent experiences. Until relatively recently, our field has accepted that justification almost without question. In this article, I present a case for standardization's antithesis, personalization. Interestingly, personalized assessment has important precedents within the measurement community. As intriguing are some of the divergent ways in which personalization might be realized in practice. Those ways, however, suggest a host of serious issues. Despite those issues, both moral obligation and survival imperative counsel persistence in trying to personalize assessment.

{"title":"Personalizing Assessment: Dream or Nightmare?","authors":"Randy E. Bennett","doi":"10.1111/emip.12652","DOIUrl":"https://doi.org/10.1111/emip.12652","url":null,"abstract":"<p>Over our field's 100-year-plus history, standardization has been a central assumption in test theory and practice. The concept's justification turns on leveling the playing field by presenting all examinees with putatively equivalent experiences. Until relatively recently, our field has accepted that justification almost without question. In this article, I present a case for standardization's antithesis, personalization. Interestingly, personalized assessment has important precedents within the measurement community. As intriguing are some of the divergent ways in which personalization might be realized in practice. Those ways, however, suggest a host of serious issues. Despite those issues, both moral obligation and survival imperative counsel persistence in trying to personalize assessment.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"119-125"},"PeriodicalIF":2.7,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational Measurement-Issues and Practice
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1