首页 > 最新文献

Measurement-Interdisciplinary Research and Perspectives最新文献

英文 中文
Exploring Rater Accuracy Using Unfolding Models Combined with Topic Models: Incorporating Supervised Latent Dirichlet Allocation 利用展开模型结合主题模型探索更高的准确性:纳入监督潜在狄利克雷分配
IF 1 Q2 Mathematics Pub Date : 2022-01-02 DOI: 10.1080/15366367.2021.1915094
Jordan M. Wheeler, G. Engelhard, Jue Wang
ABSTRACT Objectively scoring constructed-response items on educational assessments has long been a challenge due to the use of human raters. Even well-trained raters using a rubric can inaccurately assess essays. Unfolding models measure rater’s scoring accuracy by capturing the discrepancy between criterion and operational ratings by placing essays on an unfolding continuum with an ideal-point location. Essay unfolding locations indicate how difficult it is for raters to score an essay accurately. This study aims to explore a substantive interpretation of the unfolding scale based on a supervised Latent Dirichlet Allocation (sLDA) model. We investigate the relationship between latent topics extracted using sLDA and unfolding locations with a sample of essays (n = 100) obtained from an integrated writing assessment. Results show that (a) three latent topics moderately explain (r 2 = 0.561) essay locations defined by the unfolding scale and (b) failing to use and/or cite the source articles led to essays that are difficult-to-score accurately.
长期以来,由于使用人工评分,对教育评估中的构建反应项目进行客观评分一直是一个挑战。即使是训练有素的评分员,使用评分标准也可能不准确地评估文章。展开模型通过将文章放置在具有理想点位置的展开连续体上来捕获标准和操作评级之间的差异,从而测量评分者的评分准确性。文章展开的位置表明评分者准确地评分一篇文章有多困难。本研究旨在探讨基于监督潜狄利克雷分配(sLDA)模型的展开量表的实质解释。我们通过综合写作评估获得的文章样本(n = 100)研究了使用sLDA提取的潜在主题与展开位置之间的关系。结果表明(a)三个潜在主题适度解释(r 2 = 0.561)展开量表定义的论文位置;(b)未使用和/或引用源文章导致论文难以准确评分。
{"title":"Exploring Rater Accuracy Using Unfolding Models Combined with Topic Models: Incorporating Supervised Latent Dirichlet Allocation","authors":"Jordan M. Wheeler, G. Engelhard, Jue Wang","doi":"10.1080/15366367.2021.1915094","DOIUrl":"https://doi.org/10.1080/15366367.2021.1915094","url":null,"abstract":"ABSTRACT Objectively scoring constructed-response items on educational assessments has long been a challenge due to the use of human raters. Even well-trained raters using a rubric can inaccurately assess essays. Unfolding models measure rater’s scoring accuracy by capturing the discrepancy between criterion and operational ratings by placing essays on an unfolding continuum with an ideal-point location. Essay unfolding locations indicate how difficult it is for raters to score an essay accurately. This study aims to explore a substantive interpretation of the unfolding scale based on a supervised Latent Dirichlet Allocation (sLDA) model. We investigate the relationship between latent topics extracted using sLDA and unfolding locations with a sample of essays (n = 100) obtained from an integrated writing assessment. Results show that (a) three latent topics moderately explain (r 2 = 0.561) essay locations defined by the unfolding scale and (b) failing to use and/or cite the source articles led to essays that are difficult-to-score accurately.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87560836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Now in JMP® Pro: Structual Equation Modeling 现在在JMP®Pro:结构方程建模
IF 1 Q2 Mathematics Pub Date : 2022-01-02 DOI: 10.1080/15366367.2022.2014446
{"title":"Now in JMP® Pro: Structual Equation Modeling","authors":"","doi":"10.1080/15366367.2022.2014446","DOIUrl":"https://doi.org/10.1080/15366367.2022.2014446","url":null,"abstract":"","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84870253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using SAS PROC IRT for Multidimensional Item Response Theory Analysis 运用SAS PROC IRT进行多维项目反应理论分析
IF 1 Q2 Mathematics Pub Date : 2022-01-02 DOI: 10.1080/15366367.2021.1976090
Ki Cole, Insu Paek
ABSTRACT Statistical Analysis Software (SAS) is a widely used tool for data management analysis across a variety of fields. The procedure for item response theory (PROC IRT) is one to perform unidimensional and multidimensional item response theory (IRT) analysis for dichotomous and polytomous data. This review provides a summary of the features of PROC IRT specifically for multidimensional data with examples provided for simple structure data, complex structure data, and bifactor data. Instructive examples for dichotomous data (using the Rasch and 2-parameter logistic models) and polytomous data (using the graded response model) are given. Explanations of the syntax are also presented.
统计分析软件(SAS)是一种广泛应用于各个领域的数据管理分析工具。项目反应理论是对二分和多分数据进行一维和多维项目反应理论分析的过程。这篇综述总结了PROC IRT在多维数据中的特点,并提供了简单结构数据、复杂结构数据和双因素数据的例子。给出了二分类数据(使用Rasch和2参数逻辑模型)和多分类数据(使用分级响应模型)的实例。还提供了语法解释。
{"title":"Using SAS PROC IRT for Multidimensional Item Response Theory Analysis","authors":"Ki Cole, Insu Paek","doi":"10.1080/15366367.2021.1976090","DOIUrl":"https://doi.org/10.1080/15366367.2021.1976090","url":null,"abstract":"ABSTRACT Statistical Analysis Software (SAS) is a widely used tool for data management analysis across a variety of fields. The procedure for item response theory (PROC IRT) is one to perform unidimensional and multidimensional item response theory (IRT) analysis for dichotomous and polytomous data. This review provides a summary of the features of PROC IRT specifically for multidimensional data with examples provided for simple structure data, complex structure data, and bifactor data. Instructive examples for dichotomous data (using the Rasch and 2-parameter logistic models) and polytomous data (using the graded response model) are given. Explanations of the syntax are also presented.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85032110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Comparison of Common IRT Model-selection Methods with Mixed-Format Tests 混合格式测试常用IRT模型选择方法的比较
IF 1 Q2 Mathematics Pub Date : 2021-10-02 DOI: 10.1080/15366367.2021.1878779
Yong Luo
ABSTRACT To date, only frequentist model-selection methods have been studied with mixed-format data in the context of IRT model-selection, and it is unknown how popular Bayesian model-selection methods such as DIC, WAIC, and LOO perform. In this study, we present the results of a comprehensive simulation study that compared the performances of eight model-selection methods with mixed-format data to select the correct combination of IRT models. Findings of the simulation study indicate that DIC, WAIC, and LOO had excellent statistical power to choose the correct IRT model combination. They performed comparably with LRT and slightly preferably than AIC, and considerably better than BIC, AICc, and SABIC. In addition, the performances of the three Bayesian methods were more stable than those of AIC and LRT regardless of the sample size and ability distribution. The eight model-selection methods were applied to a real dataset for demonstration purpose.
迄今为止,在IRT模型选择的背景下,只有频率主义模型选择方法被研究过,并且尚不清楚流行的贝叶斯模型选择方法(如DIC、WAIC和LOO)的表现如何。在本研究中,我们展示了一项综合仿真研究的结果,该研究比较了混合格式数据中八种模型选择方法的性能,以选择正确的IRT模型组合。模拟研究结果表明,DIC、WAIC和LOO在选择正确的IRT模型组合方面具有优异的统计能力。它们的表现与LRT相当,略好于AIC,明显好于BIC、AICc和SABIC。此外,无论样本量和能力分布如何,三种贝叶斯方法的性能都比AIC和LRT更稳定。将八种模型选择方法应用于实际数据集进行演示。
{"title":"A Comparison of Common IRT Model-selection Methods with Mixed-Format Tests","authors":"Yong Luo","doi":"10.1080/15366367.2021.1878779","DOIUrl":"https://doi.org/10.1080/15366367.2021.1878779","url":null,"abstract":"ABSTRACT To date, only frequentist model-selection methods have been studied with mixed-format data in the context of IRT model-selection, and it is unknown how popular Bayesian model-selection methods such as DIC, WAIC, and LOO perform. In this study, we present the results of a comprehensive simulation study that compared the performances of eight model-selection methods with mixed-format data to select the correct combination of IRT models. Findings of the simulation study indicate that DIC, WAIC, and LOO had excellent statistical power to choose the correct IRT model combination. They performed comparably with LRT and slightly preferably than AIC, and considerably better than BIC, AICc, and SABIC. In addition, the performances of the three Bayesian methods were more stable than those of AIC and LRT regardless of the sample size and ability distribution. The eight model-selection methods were applied to a real dataset for demonstration purpose.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84754702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Now in JMP® Pro: Structual Equation Modeling 现在在JMP®Pro:结构方程建模
IF 1 Q2 Mathematics Pub Date : 2021-10-02 DOI: 10.1080/15366367.2021.1982169
{"title":"Now in JMP® Pro: Structual Equation Modeling","authors":"","doi":"10.1080/15366367.2021.1982169","DOIUrl":"https://doi.org/10.1080/15366367.2021.1982169","url":null,"abstract":"","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87777285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying the Rasch Model in Social Sciences Using R and BlueSky Statistics Rasch模型在R和蓝天统计中的应用
IF 1 Q2 Mathematics Pub Date : 2021-10-02 DOI: 10.1080/15366367.2021.1940667
David Torres Irribarra
{"title":"Applying the Rasch Model in Social Sciences Using R and BlueSky Statistics","authors":"David Torres Irribarra","doi":"10.1080/15366367.2021.1940667","DOIUrl":"https://doi.org/10.1080/15366367.2021.1940667","url":null,"abstract":"","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89774696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resources for Identifying Measurement Instruments for Social Science Research 鉴定社会科学研究测量工具的资源
IF 1 Q2 Mathematics Pub Date : 2021-10-02 DOI: 10.1080/15366367.2021.1950486
R. Schumacker, Stefanie A. Wind, Lauren F. Holmes
ABSTRACT A variety of resources are available from which researchers can identify measurement instruments, including peer-reviewed journal articles, collections of technical information about published instruments, and electronic databases that are sponsored by universities, testing organizations, and other groups. Although these resources are widespread, many researchers are not aware of them. We provide a brief overview of several selected resources that researchers can use to identify measurement instruments for social science research.
研究人员可以从各种各样的资源中识别测量仪器,包括同行评审的期刊文章,关于已发表仪器的技术信息集合,以及由大学、测试组织和其他团体赞助的电子数据库。尽管这些资源广泛存在,但许多研究人员并没有意识到它们。我们提供了几个选定的资源的简要概述,研究人员可以使用这些资源来确定社会科学研究的测量工具。
{"title":"Resources for Identifying Measurement Instruments for Social Science Research","authors":"R. Schumacker, Stefanie A. Wind, Lauren F. Holmes","doi":"10.1080/15366367.2021.1950486","DOIUrl":"https://doi.org/10.1080/15366367.2021.1950486","url":null,"abstract":"ABSTRACT A variety of resources are available from which researchers can identify measurement instruments, including peer-reviewed journal articles, collections of technical information about published instruments, and electronic databases that are sponsored by universities, testing organizations, and other groups. Although these resources are widespread, many researchers are not aware of them. We provide a brief overview of several selected resources that researchers can use to identify measurement instruments for social science research.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77255688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Six Approaches to Handling Zero-Frequency Scores under Equipercentile Equating 评价在等百分位等价下处理零频率分数的六种方法
IF 1 Q2 Mathematics Pub Date : 2021-10-02 DOI: 10.1080/15366367.2020.1855034
Ting Sun, S. Y. Kim
ABSTRACT In many large testing programs, equipercentile equating has been widely used under a random groups design to adjust test difficulty between forms. However, one thorny issue occurs with equipercentile equating when a particular score has no observed frequency. The purpose of this study is to suggest and evaluate six potential methods in equipercentile equating when an observed-score distribution involves zero-frequency scores. A simulation study involving two levels of test lengths (30 and 50 items), five levels of sample sizes (100, 500, 1000, 3000, and 5000), and two levels of similarity in score distributions between two forms, was conducted to assess these methods in terms of equating accuracy. Results revealed that presmoothing was the most accurate method in estimating the equipercentile equating relationship when the population distributions for two forms differ with respect to the form of score distributions. When the populations have a similar score distribution, the presmoothing method was also found to be the most accurate method with longer tests (50 items). Furthermore, the performance of these methods does not vary as a function of the number of zero-frequency scores. This study informs practitioners of approaches to handling a zero-frequency issue with equipercentile equating that leads to more accurate equating results.
在许多大型考试项目中,随机分组设计下广泛采用等百分位方程来调整试题难度。然而,当一个特定的分数没有观察到的频率时,等百分位等式会出现一个棘手的问题。本研究的目的是建议和评估六种潜在的等百分位等效方法,当观察得分分布涉及零频率得分时。一项模拟研究涉及两个级别的测试长度(30和50个项目),五个级别的样本量(100、500、1000、3000和5000),以及两种形式之间得分分布的两个级别的相似性,以评估这些方法在等同准确性方面的作用。结果表明,当两种形式的总体分布相对于分数分布的形式不同时,预平滑是估计等百分位相等关系最准确的方法。当总体得分分布相似时,对于较长的测试(50个项目),预平滑方法也被发现是最准确的方法。此外,这些方法的性能不随零频率分数的数量而变化。这项研究告知实践者处理零频率问题的方法与等位数相等,导致更准确的相等结果。
{"title":"Evaluating Six Approaches to Handling Zero-Frequency Scores under Equipercentile Equating","authors":"Ting Sun, S. Y. Kim","doi":"10.1080/15366367.2020.1855034","DOIUrl":"https://doi.org/10.1080/15366367.2020.1855034","url":null,"abstract":"ABSTRACT In many large testing programs, equipercentile equating has been widely used under a random groups design to adjust test difficulty between forms. However, one thorny issue occurs with equipercentile equating when a particular score has no observed frequency. The purpose of this study is to suggest and evaluate six potential methods in equipercentile equating when an observed-score distribution involves zero-frequency scores. A simulation study involving two levels of test lengths (30 and 50 items), five levels of sample sizes (100, 500, 1000, 3000, and 5000), and two levels of similarity in score distributions between two forms, was conducted to assess these methods in terms of equating accuracy. Results revealed that presmoothing was the most accurate method in estimating the equipercentile equating relationship when the population distributions for two forms differ with respect to the form of score distributions. When the populations have a similar score distribution, the presmoothing method was also found to be the most accurate method with longer tests (50 items). Furthermore, the performance of these methods does not vary as a function of the number of zero-frequency scores. This study informs practitioners of approaches to handling a zero-frequency issue with equipercentile equating that leads to more accurate equating results.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82193167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The 2013-15 Decline in NAEP Mathematics: What it Teaches Us about NAEP and the Common Core 2013-15年NAEP数学成绩的下降:它告诉我们NAEP和共同核心的什么
IF 1 Q2 Mathematics Pub Date : 2021-10-02 DOI: 10.1080/15366367.2021.1873062
Gregory Camilli
ABSTRACT After 25 years with small to moderate gains in performance in mathematics, scores on the National Assessment of Educational Progress (NAEP) main assessment declined between 2013 and 2015 in Grades 4 and 8. Previous research has suggested the decline may be linked to the implementation of the Common Core state standards. In this article, the decline in the NAEP composite score is shown to be driven primarily by losses in the content strands of Geometry and of Data Analysis, Statistics, and Probability. A gain in fractions achievement is also evident in an item-level examination of the NAEP results, but not in reported NAEP scores. These effects are discussed with respect to the CCSS, the rationale for evaluating national progress, and a potential redesign of the NAEP assessment.
25年来,美国4年级和8年级学生的数学成绩略有提高,但在2013年至2015年期间,全国教育进步评估(NAEP)主要评估的分数有所下降。此前的研究表明,这种下降可能与实施“共同核心”州标准有关。在这篇文章中,NAEP综合分数的下降主要是由几何和数据分析、统计和概率的内容损失所驱动的。分数成绩的提高在NAEP成绩的项目水平考试中也很明显,但在报告的NAEP分数中没有。这些影响在CCSS、评估国家进步的基本原理以及重新设计NAEP评估的可能性方面进行了讨论。
{"title":"The 2013-15 Decline in NAEP Mathematics: What it Teaches Us about NAEP and the Common Core","authors":"Gregory Camilli","doi":"10.1080/15366367.2021.1873062","DOIUrl":"https://doi.org/10.1080/15366367.2021.1873062","url":null,"abstract":"ABSTRACT After 25 years with small to moderate gains in performance in mathematics, scores on the National Assessment of Educational Progress (NAEP) main assessment declined between 2013 and 2015 in Grades 4 and 8. Previous research has suggested the decline may be linked to the implementation of the Common Core state standards. In this article, the decline in the NAEP composite score is shown to be driven primarily by losses in the content strands of Geometry and of Data Analysis, Statistics, and Probability. A gain in fractions achievement is also evident in an item-level examination of the NAEP results, but not in reported NAEP scores. These effects are discussed with respect to the CCSS, the rationale for evaluating national progress, and a potential redesign of the NAEP assessment.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87697889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Investigation of Item Calibration Methods in Multistage Testing 多级测试中项目标定方法的研究
IF 1 Q2 Mathematics Pub Date : 2021-07-03 DOI: 10.1080/15366367.2021.1878778
L. Cai, Anthony D. Albano, L. Roussos
ABSTRACT Multistage testing (MST), an adaptive test delivery mode that involves algorithmic selection of predefined item modules rather than individual items, offers a practical alternative to linear and fully computerized adaptive testing. However, interactions across stages between item modules and examinee groups can lead to challenges in item calibration with MST. This study used simulated data based on an operational program to investigate the performance of four item calibration methods under a 1–3 MST design. Conditions included routing module length, routing rule, and sample size. Calibration methods were evaluated based on item and person parameter recovery and classification accuracy. Results indicated that calibration with fixed common item parameters and concurrent calibration assuming a single ability distribution similarly outperformed both separate calibration with linking and concurrent calibration with the multiple-group procedure.
多阶段测试(MST)是一种自适应测试交付模式,它涉及预先定义的项目模块的算法选择,而不是单个项目,提供了线性和完全计算机化的自适应测试的实际替代方案。然而,项目模块和考生群体之间的跨阶段互动可能会导致项目与MST校准的挑战。本研究使用基于操作程序的模拟数据,研究了1-3 MST设计下四种项目校准方法的性能。条件包括路由模块长度、路由规则和样本大小。基于项目和人的参数恢复和分类精度对标定方法进行了评价。结果表明,采用固定的共同项目参数进行标定和采用单一能力分布进行并行标定的效果与采用连接方法进行单独标定和采用多组方法进行并行标定的效果相似。
{"title":"An Investigation of Item Calibration Methods in Multistage Testing","authors":"L. Cai, Anthony D. Albano, L. Roussos","doi":"10.1080/15366367.2021.1878778","DOIUrl":"https://doi.org/10.1080/15366367.2021.1878778","url":null,"abstract":"ABSTRACT Multistage testing (MST), an adaptive test delivery mode that involves algorithmic selection of predefined item modules rather than individual items, offers a practical alternative to linear and fully computerized adaptive testing. However, interactions across stages between item modules and examinee groups can lead to challenges in item calibration with MST. This study used simulated data based on an operational program to investigate the performance of four item calibration methods under a 1–3 MST design. Conditions included routing module length, routing rule, and sample size. Calibration methods were evaluated based on item and person parameter recovery and classification accuracy. Results indicated that calibration with fixed common item parameters and concurrent calibration assuming a single ability distribution similarly outperformed both separate calibration with linking and concurrent calibration with the multiple-group procedure.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73776718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Measurement-Interdisciplinary Research and Perspectives
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1