Applied Measurement in Education最新文献

英文中文

Reviewing the Test Reviews: Quality Judgments and Reviewer Agreements in the Mental Measurements Yearbook 复习测验复习：心理测量年鉴中的质量判断和复习者协议

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2021-02-25 DOI: 10.1080/08957347.2021.1890742

T. Hogan, Marissa DeStefano, Caitlin Gilby, Dana C. Kosman, Joshua Peri

ABSTRACT Buros’ Mental Measurements Yearbook (MMY) has provided professional reviews of commercially published psychological and educational tests for over 80 years. It serves as a kind of conscience for the testing industry. For a random sample of 50 entries in the 19th MMY (a total of 100 separate reviews) this study determined the level of qualitative judgment rendered by reviewers and the consistency of those independent reviewers in rendering judgments. Judgments of quality distributed themselves almost uniformly from very good to very bad across the 100 reviews. Agreement among reviewers for a given test was positive but relatively weak. We explore implications of the results and suggest follow-up investigations.

Buros的《心理测量年鉴》（MMY）为商业出版的心理和教育测试提供了80多年的专业评论。这是测试行业的一种良知。本研究随机抽取了第19个MMY的50个条目（共100篇独立评论），确定了评审员的定性判断水平以及这些独立评审员在做出判断时的一致性。在100条评论中，对质量的判断几乎是一致的，从非常好到非常差。评审人员对某项测试的一致性是积极的，但相对较弱。我们探讨了研究结果的影响，并建议进行后续调查。

引用次数: 0

Think Alouds: Informing Scholarship and Broadening Partnerships through Assessment 大声思考:通过评估通知奖学金和扩大伙伴关系

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2021-01-02 DOI: 10.1080/08957347.2020.1835914

J. Bostic

ABSTRACT Think alouds are valuable tools for academicians, test developers, and practitioners as they provide a unique window into a respondent’s thinking during an assessment. The purpose of this special issue is to highlight novel ways to use think alouds as a means to gather evidence about respondents’ thinking. An intended outcome from this special issue is that readers may better understand think alouds and feel better equipped to use them in practical and research settings.

大声思考对于学者、测试开发人员和从业者来说是很有价值的工具，因为它们在评估过程中为被调查者的思维提供了一个独特的窗口。本期特刊的目的是强调使用“大声思考”作为收集受访者思维证据的新方法。这期特刊的一个预期结果是，读者可能会更好地理解大声思考，并感觉更好地在实践和研究环境中使用它们。

引用次数: 3

Formative Assessment of Computational Thinking: Cognitive and Metacognitive Processes 计算思维的形成性评估:认知和元认知过程

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2021-01-02 DOI: 10.1080/08957347.2020.1835912

Sarah M. Bonner, Peggy P. Chen, Kristi Jones, Brandon Milonovich

ABSTRACT We describe the use of think alouds to examine substantive processes involved in performance on a formative assessment of computational thinking (CT) designed to support self-regulated learning (SRL). Our task design model included three phases of work on a computational thinking problem: forethought, performance, and reflection. The cognitive processes of seven students who reported their thinking during all three phases were analyzed. Ratings of artifacts of code indicated the computational thinking problem was moderately difficult to solve (M = 15, SD = 5) on a scale of 0 to 21 points. Profiles were created to illustrate length and sequence of different types of cognitive processes during the think-aloud. Results provide construct validity evidence for the tasks as formative assessments of CT, elucidate the way learners at different levels of skill use SRL, shed light on the nature of computational thinking, and point out areas for improvement in assessment design.

摘要:我们描述了使用大声思考来检查在旨在支持自我调节学习(SRL)的计算思维(CT)形成性评估中涉及的实质性过程。我们的任务设计模型包括计算思维问题的三个阶段:预见、执行和反思。分析了7名学生在这三个阶段的认知过程。代码工件的评分表明计算思维问题在0到21分的范围内是中等难度的(M = 15, SD = 5)。在有声思考过程中，不同类型的认知过程的长度和顺序被创建。研究结果为CT形成性评估任务提供了构建效度证据，阐明了不同技能水平的学习者使用SRL的方式，揭示了计算思维的本质，并指出了评估设计中需要改进的领域。

引用次数: 6

Using Think-Alouds for Response Process Evidence of Teacher Attentiveness 用思考声作为教师注意力的反应过程证据

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2021-01-02 DOI: 10.1080/08957347.2020.1835910

Ya Mo, Michele B. Carney, L. Cavey, Tatia Totorica

ABSTRACT There is a need for assessment items that assess complex constructs but can also be efficiently scored for evaluation of teacher education programs. In an effort to measure the construct of teacher attentiveness in an efficient and scalable manner, we are using exemplar responses elicited by constructed-response item prompts to develop selected-response assessment items. Through analyses of think-aloud interview data, this study examines the alignment between participant responses to, and scores arising from, the two item types. The interview protocol was administered to 12 mathematics teachers and teacher candidates who were first presented a constructed-response version of an item followed by the selected-response version of the same item stem. Our analyses focus on the alignment between responses and scores for eight item stems across the two item types and the identification of items in need of modification. The results have the potential to influence the way test developers generate and use response process evidence to support or refute the assumptions inherent in a particular score interpretation and use.

在教师教育项目的评估中，需要一些既能评估复杂结构又能有效评分的评估项目。为了以有效和可扩展的方式衡量教师注意力的构建，我们使用由构建反应项目提示引出的范例反应来开发选择反应评估项目。通过对有声思考访谈数据的分析，本研究检验了参与者对这两种项目类型的反应和得分之间的一致性。对12名数学教师和教师候选人实施了访谈协议，他们首先提出了一个项目的构造反应版本，然后提出了同一项目系统的选择反应版本。我们的分析集中在两种类型的八个项目的回答和得分之间的一致性以及需要修改的项目的识别。结果有可能影响测试开发人员生成和使用响应过程证据的方式，以支持或反驳特定分数解释和使用中固有的假设。

{"title":"Using Think-Alouds for Response Process Evidence of Teacher Attentiveness","authors":"Ya Mo, Michele B. Carney, L. Cavey, Tatia Totorica","doi":"10.1080/08957347.2020.1835910","DOIUrl":"https://doi.org/10.1080/08957347.2020.1835910","url":null,"abstract":"ABSTRACT There is a need for assessment items that assess complex constructs but can also be efficiently scored for evaluation of teacher education programs. In an effort to measure the construct of teacher attentiveness in an efficient and scalable manner, we are using exemplar responses elicited by constructed-response item prompts to develop selected-response assessment items. Through analyses of think-aloud interview data, this study examines the alignment between participant responses to, and scores arising from, the two item types. The interview protocol was administered to 12 mathematics teachers and teacher candidates who were first presented a constructed-response version of an item followed by the selected-response version of the same item stem. Our analyses focus on the alignment between responses and scores for eight item stems across the two item types and the identification of items in need of modification. The results have the potential to influence the way test developers generate and use response process evidence to support or refute the assumptions inherent in a particular score interpretation and use.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"10 - 26"},"PeriodicalIF":1.5,"publicationDate":"2021-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1835910","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42248907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Gathering Response Process Data for a Problem-Solving Measure through Whole-Class Think Alouds 通过全班大声思考来收集解决问题的反应过程数据

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2021-01-02 DOI: 10.1080/08957347.2020.1835913

J. Bostic, T. Sondergeld, G. Matney, G. Stone, Tiara Hicks

ABSTRACT Response process validity evidence provides a window into a respondent’s cognitive processing. The purpose of this study is to describe a new data collection tool called a whole-class think aloud (WCTA). This work is performed as part of test development for a series of problem-solving measures to be used in elementary and middle grades. Data from third-grade students were collected in a 1–1 think-aloud setting and compared to data from similar students as part of WCTAs. Findings indicated that students performed similarly on the items when the two think-aloud settings were compared. Respondents also needed less encouragement to share ideas aloud during the WCTA compared to the 1–1 think aloud. They also communicated feeling more comfortable in the WCTA setting compared to the 1–1 think aloud. Drawing the findings together, WCTAs functioned as well if not better, than 1–1 think alouds for the purpose of contextualizing third-grade students’ cognitive processes. Future studies using WCTAs are recommended to explore their limitations and other factors that might impact their success as data gathering tools.

反应过程效度证据为研究被调查者的认知加工提供了一个窗口。本研究的目的是描述一种新的数据收集工具，称为全班大声思考(WCTA)。这项工作是测试开发的一部分，用于小学和初中的一系列解决问题的措施。三年级学生的数据以1-1的方式收集，并与类似学生的数据进行比较，作为wcta的一部分。研究结果表明，当比较两种声音思考的设置时，学生在项目上的表现相似。在WCTA比赛中，与1-1自言自语相比，受访者在大声分享想法时需要的鼓励也更少。与1-1自言自语相比，他们在WCTA环境下的交流感觉也更舒适。综上所述，wcta在三年级学生认知过程情境化方面的作用不亚于1-1大声思考。建议未来使用wcta进行研究，以探索其局限性和其他可能影响其作为数据收集工具成功的因素。

{"title":"Gathering Response Process Data for a Problem-Solving Measure through Whole-Class Think Alouds","authors":"J. Bostic, T. Sondergeld, G. Matney, G. Stone, Tiara Hicks","doi":"10.1080/08957347.2020.1835913","DOIUrl":"https://doi.org/10.1080/08957347.2020.1835913","url":null,"abstract":"ABSTRACT Response process validity evidence provides a window into a respondent’s cognitive processing. The purpose of this study is to describe a new data collection tool called a whole-class think aloud (WCTA). This work is performed as part of test development for a series of problem-solving measures to be used in elementary and middle grades. Data from third-grade students were collected in a 1–1 think-aloud setting and compared to data from similar students as part of WCTAs. Findings indicated that students performed similarly on the items when the two think-aloud settings were compared. Respondents also needed less encouragement to share ideas aloud during the WCTA compared to the 1–1 think aloud. They also communicated feeling more comfortable in the WCTA setting compared to the 1–1 think aloud. Drawing the findings together, WCTAs functioned as well if not better, than 1–1 think alouds for the purpose of contextualizing third-grade students’ cognitive processes. Future studies using WCTAs are recommended to explore their limitations and other factors that might impact their success as data gathering tools.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"46 - 60"},"PeriodicalIF":1.5,"publicationDate":"2021-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1835913","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47052381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Rethinking Think-Alouds: The Often-Problematic Collection of Response Process Data 重新思考大声思考:经常有问题的反应过程数据收集

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2021-01-02 DOI: 10.1080/08957347.2020.1835911

Jacqueline P. Leighton

ABSTRACT The objective of this paper is to comment on the think-aloud methods presented in the three papers included in this special issue. The commentary offered stems from the author’s own psychological investigations of unobservable information processes and the conditions under which the most defensible claims can be advanced. The structure of this commentary is as follows: First, the objective of think-alouds in light of test development and validation goals are considered for each of the three papers in the volume. Second, the response processes (psychological constructs) described in the three studies are assessed vis à vis think-aloud methods. Third, the methodological details that are essential to properly evaluate response processing data for educational assessment goals are elaborated. Fourth, the possible impasse of using a psychological technique to collect psychological data about non-psychological content forms the basis of the commentary’s conclusion.

本文的目的是对本期特刊中包含的三篇论文中提出的有声思考方法进行评论。所提供的评论源于作者自己对不可观察的信息过程的心理调查，以及可以提出最站得住脚的主张的条件。这篇评论的结构如下:首先，根据测试开发和验证目标，对卷中的三篇论文中的每一篇都考虑了大声思考的目标。其次，对三个研究中描述的反应过程(心理构念)进行了评估，并采用有声思维方法。第三，阐述了正确评估教育评估目标响应处理数据所必需的方法细节。第四，使用心理学技术收集关于非心理学内容的心理学数据可能会陷入僵局，这构成了评论结论的基础。

引用次数: 5

Asymptotic Standard Errors of Equating Coefficients Using the Characteristic Curve Methods for the Graded Response Model 梯度响应模型特征曲线法方程系数的渐近标准误差

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2020-08-25 DOI: 10.1080/08957347.2020.1789142

Zhonghua Zhang

ABSTRACT The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the mathematical formulas for computing the asymptotic standard errors for the parameter scale transformation coefficients and the true score equating coefficients that are estimated using the characteristic curve methods in test equating under the GRM in the context of the common-item nonequivalent groups equating design. Simulated and real data were further used to examine the accuracy of the derivations and compare the performance of the newly developed delta method with that of the multiple imputation method. The results indicated that the standard errors produced by the delta method were extremely close to the criterion empirical standard errors as well as those yielded by the multiple imputation method. The development of the standard error expressions by the delta method in the study has important practical implications.

摘要将特征曲线法应用于梯度响应模型下试验方程组中方程组系数的估计。然而，获得这些系数估计的标准误差的方法尚未得到开发和检验。在本研究中，应用delta方法推导了计算参数尺度变换系数和真实分数等值系数的渐近标准误差的数学公式，这些系数是在通用项非等价群等值设计的背景下，在GRM下的测试等值中使用特征曲线法估计的。模拟数据和真实数据进一步用于检验推导的准确性，并将新开发的德尔塔法与多重插补法的性能进行比较。结果表明，德尔塔法产生的标准误差与标准经验标准误差以及多重插补法产生的误差非常接近。研究中使用delta方法开发标准误差表达式具有重要的实际意义。

{"title":"Asymptotic Standard Errors of Equating Coefficients Using the Characteristic Curve Methods for the Graded Response Model","authors":"Zhonghua Zhang","doi":"10.1080/08957347.2020.1789142","DOIUrl":"https://doi.org/10.1080/08957347.2020.1789142","url":null,"abstract":"ABSTRACT The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the mathematical formulas for computing the asymptotic standard errors for the parameter scale transformation coefficients and the true score equating coefficients that are estimated using the characteristic curve methods in test equating under the GRM in the context of the common-item nonequivalent groups equating design. Simulated and real data were further used to examine the accuracy of the derivations and compare the performance of the newly developed delta method with that of the multiple imputation method. The results indicated that the standard errors produced by the delta method were extremely close to the criterion empirical standard errors as well as those yielded by the multiple imputation method. The development of the standard error expressions by the delta method in the study has important practical implications.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"309 - 330"},"PeriodicalIF":1.5,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1789142","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49604565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Can Culture Be a Salient Predictor of Test-Taking Engagement? An Analysis of Differential Noneffortful Responding on an International College-Level Assessment of Critical Thinking 文化是考试参与度的显著预测指标吗?一项国际大学水平批判性思维评估中不费力回答的差异分析

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2020-07-29 DOI: 10.1080/08957347.2020.1789141

Joseph A. Rios, Hongwen Guo

ABSTRACT The objective of this study was to evaluate whether differential noneffortful responding (identified via response latencies) was present in four countries administered a low-stakes college-level critical thinking assessment. Results indicated significant differences (as large as .90 SD) between nearly all country pairings in the average number of noneffortful responses per test taker. Furthermore, noneffortful responding was found to be associated with a number of individual-level predictors, such as demographics (both gender and academic year), prior ability, and perceived difficulty of the test, though, these predictors were found to differ across countries. Ignoring the presence of noneffortful responses was associated with: (a) model fit deterioration as well as inflation of reliability, and (b) the inclusion of non-invariant items in the score linking anchor set. However, no meaningful differences in relative performance were noted once accounting for noneffortful responses. Implications for test development and improving the validity of score-based inferences from international assessments are discussed.

本研究的目的是评估在四个国家进行低风险大学水平批判性思维评估时，是否存在差异的不费力反应(通过反应延迟来识别)。结果表明，在几乎所有国家的配对中，每个测试者不费力的平均回答次数存在显著差异(高达0.90标准差)。此外，不费力的反应被发现与许多个人层面的预测因素有关，比如人口统计(性别和学年)、先前的能力和测试的感知难度，尽管这些预测因素在不同国家有所不同。忽略不费力反应的存在与:(a)模型拟合恶化以及可靠性膨胀有关，(b)在分数链接锚集中包含非不变项目。然而，考虑到不费力的反应，相对表现没有显著差异。讨论了测试开发和提高国际评估中基于分数的推断的有效性的含义。

{"title":"Can Culture Be a Salient Predictor of Test-Taking Engagement? An Analysis of Differential Noneffortful Responding on an International College-Level Assessment of Critical Thinking","authors":"Joseph A. Rios, Hongwen Guo","doi":"10.1080/08957347.2020.1789141","DOIUrl":"https://doi.org/10.1080/08957347.2020.1789141","url":null,"abstract":"ABSTRACT The objective of this study was to evaluate whether differential noneffortful responding (identified via response latencies) was present in four countries administered a low-stakes college-level critical thinking assessment. Results indicated significant differences (as large as .90 SD) between nearly all country pairings in the average number of noneffortful responses per test taker. Furthermore, noneffortful responding was found to be associated with a number of individual-level predictors, such as demographics (both gender and academic year), prior ability, and perceived difficulty of the test, though, these predictors were found to differ across countries. Ignoring the presence of noneffortful responses was associated with: (a) model fit deterioration as well as inflation of reliability, and (b) the inclusion of non-invariant items in the score linking anchor set. However, no meaningful differences in relative performance were noted once accounting for noneffortful responses. Implications for test development and improving the validity of score-based inferences from international assessments are discussed.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"263 - 279"},"PeriodicalIF":1.5,"publicationDate":"2020-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1789141","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59806029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

On the Reliable Identification and Effectiveness of Computer-Based, Pop-Up Glossaries in Large-Scale Assessments 论大规模评估中基于计算机的弹出式词汇表的可靠识别和有效性

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2020-07-27 DOI: 10.1080/08957347.2020.1789137

D. Cohen, Alesha D. Ballman, F. Rijmen, Jon Cohen

ABSTRACT Computer-based, pop-up glossaries are perhaps the most promising accommodation aimed at mitigating the influence of linguistic structure and cultural bias on the performance of English Learner (EL) students on statewide assessments. To date, there is no established procedure for identifying the words that require a glossary for EL students that is sufficiently reliable. In the coding procedure, we developed a method to reliably identify words and phrases that require a glossary. The method developed in the coding procedure was used to provide glossaries for the field-test items of statewide English language arts (ELA) and mathematics assessments across grades 3–11 (Current Study). In the Current Study, we assess the effectiveness and influence on construct validity of a pop-up glossary of the words identified in the coding procedure in a large scale, randomized controlled trial. The results demonstrated that generally the pop-up glossary accommodation was effective for both the ELA and mathematics assessments and did not harm the construct being measured.

基于计算机的弹出式词汇表可能是最有前途的适应方式，旨在减轻语言结构和文化偏见对英语学习者(EL)学生在全州评估中的表现的影响。到目前为止，还没有确定的程序来识别需要足够可靠的英语学生词汇表的单词。在编码过程中，我们开发了一种可靠地识别需要词汇表的单词和短语的方法。编码过程中开发的方法用于为全州3-11年级英语语言艺术(ELA)和数学评估的现场测试项目提供词汇表(当前研究)。在本研究中，我们通过一项大规模的随机对照试验，评估了在编码过程中识别的单词的弹出式词汇表的有效性及其对结构效度的影响。结果表明，一般而言，弹出式词汇表适应对ELA和数学评估都是有效的，并且不会损害被测量的结构。

{"title":"On the Reliable Identification and Effectiveness of Computer-Based, Pop-Up Glossaries in Large-Scale Assessments","authors":"D. Cohen, Alesha D. Ballman, F. Rijmen, Jon Cohen","doi":"10.1080/08957347.2020.1789137","DOIUrl":"https://doi.org/10.1080/08957347.2020.1789137","url":null,"abstract":"ABSTRACT Computer-based, pop-up glossaries are perhaps the most promising accommodation aimed at mitigating the influence of linguistic structure and cultural bias on the performance of English Learner (EL) students on statewide assessments. To date, there is no established procedure for identifying the words that require a glossary for EL students that is sufficiently reliable. In the coding procedure, we developed a method to reliably identify words and phrases that require a glossary. The method developed in the coding procedure was used to provide glossaries for the field-test items of statewide English language arts (ELA) and mathematics assessments across grades 3–11 (Current Study). In the Current Study, we assess the effectiveness and influence on construct validity of a pop-up glossary of the words identified in the coding procedure in a large scale, randomized controlled trial. The results demonstrated that generally the pop-up glossary accommodation was effective for both the ELA and mathematics assessments and did not harm the construct being measured.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"378 - 389"},"PeriodicalIF":1.5,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1789137","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59806226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Applying a Multiple Comparison Control to IRT Item-fit Testing 多重比较控制在IRT项目拟合测试中的应用

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2020-07-23 DOI: 10.1080/08957347.2020.1789138

D. Sauder, Christine E. DeMars

ABSTRACT We used simulation techniques to assess the item-level and familywise Type I error control and power of an IRT item-fit statistic, the S-X2 . Previous research indicated that the S-X2 has good Type I error control and decent power, but no previous research examined familywise Type I error control. We varied percentage of misfitting items, sample size, and test length, and computed familywise Type I error with no correction, a Bonferroni correction, and a Benjamini-Hochberg correction. The S-X2 controlled item-level and familywise Type I errors when corrections were applied to conditions with no misfitting items. In the presence of misfitting items, the S-X2 exhibited inflated item-level and familywise false hit rates in many conditions, even with familywise Type I error corrections. Lastly, power was low and negatively impacted when either of the familywise Type I error corrections was applied. We suggest using the S-X2 with no familywise Type I error control in conjunction with other methods of assessing item fit (e.g., visual analysis).

摘要：我们使用模拟技术来评估IRT项目拟合统计量S-X2的项目级别和家庭I型误差控制和能力。先前的研究表明S-X2具有良好的I型误差控制和良好的功率，但先前没有研究对家庭I型误差进行检验。我们改变了不匹配项目的百分比、样本量和测试长度，并计算了无校正、Bonferroni校正和Benjamini Hochberg校正的家族I型误差。S-X2控制的项目级别和在没有不匹配项目的情况下进行校正时的系列I型错误。在存在不匹配项目的情况下，S-X2在许多情况下表现出夸大的项目水平和家庭错误命中率，即使有家庭I型错误校正。最后，当应用任何一种家庭I型纠错时，功率较低并受到负面影响。我们建议将S-X2与其他评估项目匹配度的方法（如视觉分析）结合使用，而不进行家族I型误差控制。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Applied Measurement in Education

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀