Yanyan Fu, Edison M. Choe, Hwanggyu Lim, Jaehwa Choi
This case study applied the weak theory of Automatic Item Generation (AIG) to generate isomorphic item instances (i.e., unique but psychometrically equivalent items) for a large-scale assessment. Three representative instances were selected from each item template (i.e., model) and pilot-tested. In addition, a new analytical framework, differential child item functioning (DCIF) analysis, based on the existing differential item functioning statistics, was applied to evaluate the psychometric equivalency of item instances within each template. The results showed that, out of 23 templates, nine successfully generated isomorphic instances, five required minor revisions to make them isomorphic, and the remaining templates required major modifications. The results and insights obtained from the AIG template development procedure may help item writers and psychometricians effectively develop and manage the templates that generate isomorphic instances.
{"title":"An Evaluation of Automatic Item Generation: A Case Study of Weak Theory Approach","authors":"Yanyan Fu, Edison M. Choe, Hwanggyu Lim, Jaehwa Choi","doi":"10.1111/emip.12529","DOIUrl":"10.1111/emip.12529","url":null,"abstract":"<p>This case study applied the <i>weak theory</i> of Automatic Item Generation (AIG) to generate isomorphic item instances (i.e., unique but psychometrically equivalent items) for a large-scale assessment. Three representative instances were selected from each item template (i.e., model) and pilot-tested. In addition, a new analytical framework, differential child item functioning (DCIF) analysis, based on the existing differential item functioning statistics, was applied to evaluate the psychometric equivalency of item instances within each template. The results showed that, out of 23 templates, nine successfully generated isomorphic instances, five required minor revisions to make them isomorphic, and the remaining templates required major modifications. The results and insights obtained from the AIG template development procedure may help item writers and psychometricians effectively develop and manage the templates that generate isomorphic instances.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46565878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In addition to an exciting new module on Multidimensional Item Response Theory (MIRT) equating, there are two important announcements regarding the Instructional Topics in Educational Measurement Series (ITEMS). After much discussion with authors, learners, the educational measurement community, and other stakeholders of ITEMS, I am pleased to announce (1) the transfer of the ITEMS portal to the National Council on Measurement in Education (NCME) website and (2) a new digital module format.
Transfer of the ITEMS portal to the NCME website: In 2018, I, along with Matthew Gaertner, led efforts to launch the new NCME website on the Higher Logic platform. Besides bringing a modern look and feel to the organization's web presence, the platform was selected due to its flexibility and customizability. In the years since, traffic to the NCME website has continued to increase, and there has been a significant increase in site content (e.g., software database, special interest group community pages). In April of this year, just prior to the NCME Annual Conference, the website committee, led by Erin Banjanovic, released a much-needed re-organization of the content on the site. This wonderful overhaul has made navigating the NCME website easier with content now in more logical locations. However, noticeably absent from the NCME website has been the ITEMS portal. As a reminder, ITEMS is a publication from NCME that has a brief summary published in the Educational Measurement: Issues and Practice journal with the primary digital content on the ITEMS portal, freely available after registration. The ITEMS portal has been a Learning Management System based website, with many features and ripe for extension. Though from the user's perspective, it can be complex to navigate and necessitates additional navigation from the primary NCME website, also requiring unique log-in criteria.
It is at this time that I am pleased to announce that the ITEMS portal is now available on the NCME website at the following link: https://www.ncme.org/itemsportal
Transferring the ITEMS portal to the NCME website has several immediate benefits. First, all modules will remain free of charge, but no longer require additional registration. Second, they will have a different organization structure, improving navigating across modules and enabling more efficient access to key information. Finally, they will fall under the NCME brand, having the same look and feel with all the content on the NCME website.
Although this issue marks the launch of the ITEMS portal on the NCME website, the transfer of content remains a work in progress. For now, both the old and new ITEMS portal will be available and all links to the old ITEMS portal will remain functional. However, I would strongly advise all who embed or link to content to begin updating to the portal on the NCME website. Nearly all of the content has been shifted, but if you notice anything missing or ha
除了一个令人兴奋的多维项目反应理论(MIRT)等式的新模块外,关于教育测量系列(ITEMS)的教学主题还有两个重要的公告。在与作者、学习者、教育测量界和其他利益相关者进行了大量讨论之后,我很高兴地宣布:(1)将ITEMS门户网站转移到国家教育测量委员会(NCME)网站上;(2)采用新的数字模块格式。项目门户网站转移到NCME网站:2018年,我和Matthew Gaertner一起,领导了在高等逻辑平台上推出新的NCME网站的努力。除了为组织的网络呈现带来现代的外观和感觉之外,该平台还因其灵活性和可定制性而被选中。从那以后的几年里,NCME网站的流量持续增加,网站内容也有了显著的增加(例如,软件数据库,特殊兴趣小组社区页面)。今年4月,就在NCME年会之前,由Erin Banjanovic领导的网站委员会发布了一个急需的网站内容重组。这个奇妙的修改使得浏览NCME网站变得更加容易,内容现在在更合理的位置。然而,值得注意的是,NCME网站上没有条目门户。提醒一下,ITEMS是NCME的一份出版物,在《教育测量:问题与实践》期刊上发表了一个简短的摘要,主要数字内容在ITEMS门户网站上,注册后免费提供。ITEMS门户网站是一个基于学习管理系统的网站,具有许多功能,扩展的时机已经成熟。虽然从用户的角度来看,它可能很复杂,需要从主NCME网站进行额外的导航,也需要独特的登录标准。此时,我很高兴地宣布,项目门户网站现在可以在NCME网站上使用,链接如下:https://www.ncme.org/itemsportalTransferring NCME网站的项目门户网站有几个直接的好处。首先,所有模块将保持免费,但不再需要额外的注册。其次,它们将具有不同的组织结构,改进了跨模块的导航,并能够更有效地访问关键信息。最后,它们将归入NCME品牌,与NCME网站上的所有内容具有相同的外观和感觉。虽然这个问题标志着NCME网站项目门户的启动,但内容的转移仍在进行中。现在,新旧ITEMS门户都可用,所有到旧ITEMS门户的链接都将保持功能。然而,我强烈建议所有嵌入或链接到内容的人开始更新到NCME网站上的门户网站。几乎所有的内容都被转移了,但如果你发现有任何缺失或对增强导航有建议,请不要犹豫给我发电子邮件[email protected]。在接下来的几个月里,我将修改和增强NCME网站上新项目门户的外观、感觉和导航。这意味着如果ITEMS门户在每次访问时都更新,您不应该感到惊讶。新的数字模块格式:新门户的另一个优点是能够完全定制每个模块。在被任命为编辑的过去一年里,我与许多模块的作者和学习者进行了交谈。一个共同的主题出现了:对数字模块的热爱,但是在学习管理系统中开发一个模块需要巨大的学习曲线(和时间),以及使用模块的复杂性和局限性。在NCME网站上灵活的项目门户网站的帮助下,经过几次头脑风暴会议,我很高兴地宣布一种新的项目模块格式。这些模块将保持数字化,但简化了交互功能,使用户能够更快地获取内容。每个模块将包括一个视频摘要介绍,概述整个模块的学习目标。下面是几个部分,每个部分都有自己的学习目标,一个大约10分钟的视频,以及互动学习检查。这些部分可以以任何顺序完成,尽管这些模块将被设计为线性表示。视频将可在网站上观看或下载。这将允许课程教师将部分模块嵌入到他们的课程中,专业人士只与利益相关者分享特定的内容,并允许学习者下载部分内容以供离线查看。我要强调的是,与引用其他出版材料类似,在嵌入替代使用时,请适当引用该模块。为方便起见,模块引用将在门户网站上提供。我很高兴地宣布,由Stella Y. Kim撰写的关于MIRT等式的新格式的第一个数字模块。在这个模块中,Dr。 Kim对MIRT模型进行了回顾,总结了最近关于MIRT等值的文献,与一维IRT等值相比,MIRT等值的挑战,以及如何执行MIRT等值的逐步指南。此外,她还举例说明了使用FlexMirt和RAGE-RGEQUATE的活动方法。我特别感谢Kim博士对新开发过程的耐心,以及她对我对ITEMS模块新形式的设想的信任。我鼓励对MIRT等值感兴趣的学习者完成本模块,并鼓励所有ITEMS模块学习者探索ITEMS模块的新形式。对于作者来说,这种新格式的另一个好处是在后端。不需要学习新的软件,开发时间大大缩短,几乎100%的精力都花在了开发内容上。我已经并将继续与作者合作,使这一过程尽可能无缝衔接。因此,我鼓励任何对制作ITEMS模块感兴趣的人直接与我联系。有一个令人兴奋的开发中的模块阵容,但我很高兴与任何有模块想法的人交谈!
{"title":"ITEMS Corner Update: Announcing Two Significant Changes to ITEMS","authors":"Brian C. Leventhal","doi":"10.1111/emip.12524","DOIUrl":"10.1111/emip.12524","url":null,"abstract":"<p>In addition to an exciting new module on Multidimensional Item Response Theory (MIRT) equating, there are two important announcements regarding the Instructional Topics in Educational Measurement Series (ITEMS). After much discussion with authors, learners, the educational measurement community, and other stakeholders of ITEMS, I am pleased to announce (1) the transfer of the ITEMS portal to the National Council on Measurement in Education (NCME) website and (2) a new digital module format.</p><p><i>Transfer of the ITEMS portal to the NCME website</i>: In 2018, I, along with Matthew Gaertner, led efforts to launch the new NCME website on the Higher Logic platform. Besides bringing a modern look and feel to the organization's web presence, the platform was selected due to its flexibility and customizability. In the years since, traffic to the NCME website has continued to increase, and there has been a significant increase in site content (e.g., software database, special interest group community pages). In April of this year, just prior to the NCME Annual Conference, the website committee, led by Erin Banjanovic, released a much-needed re-organization of the content on the site. This wonderful overhaul has made navigating the NCME website easier with content now in more logical locations. However, noticeably absent from the NCME website has been the ITEMS portal. As a reminder, ITEMS is a publication from NCME that has a brief summary published in the <i>Educational Measurement: Issues and Practice</i> journal with the primary digital content on the ITEMS portal, freely available after registration. The ITEMS portal has been a Learning Management System based website, with many features and ripe for extension. Though from the user's perspective, it can be complex to navigate and necessitates additional navigation from the primary NCME website, also requiring unique log-in criteria.</p><p>It is at this time that I am pleased to announce that the ITEMS portal is now available on the NCME website at the following link: https://www.ncme.org/itemsportal</p><p>Transferring the ITEMS portal to the NCME website has several immediate benefits. First, all modules will remain free of charge, but no longer require additional registration. Second, they will have a different organization structure, improving navigating across modules and enabling more efficient access to key information. Finally, they will fall under the NCME brand, having the same look and feel with all the content on the NCME website.</p><p>Although this issue marks the launch of the ITEMS portal on the NCME website, the transfer of content remains a work in progress. For now, both the old and new ITEMS portal will be available and all links to the old ITEMS portal will remain functional. However, I would strongly advise all who embed or link to content to begin updating to the portal on the NCME website. Nearly all of the content has been shifted, but if you notice anything missing or ha","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12524","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42982757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sandra M. Sweeney, Sandip Sinharay, Matthew S. Johnson, Eric W. Steinhauer
The focus of this paper is on the empirical relationship between item difficulty and item discrimination. Two studies—an empirical investigation and a simulation study—were conducted to examine the association between item difficulty and item discrimination under classical test theory and item response theory (IRT), and the effects of the association on various quantities of interest. Results from the empirical investigation show that item difficulty and item discrimination are negatively correlated under classical test theory, mostly negatively correlated under the two-parameter logistic model, and mostly positively correlated under the three-parameter logistic model; the magnitude of the correlation varied over the different data sets. Results from the simulation study reveal that a failure to incorporate the correlation between item difficulty and item discrimination in IRT simulations may provide the investigator with inaccurate values of important quantities of interest, and may lead to incorrect operational decisions. Implications to practice and future directions are discussed.
{"title":"An Investigation of the Nature and Consequence of the Relationship between IRT Difficulty and Discrimination","authors":"Sandra M. Sweeney, Sandip Sinharay, Matthew S. Johnson, Eric W. Steinhauer","doi":"10.1111/emip.12522","DOIUrl":"10.1111/emip.12522","url":null,"abstract":"<p>The focus of this paper is on the empirical relationship between item difficulty and item discrimination. Two studies—an empirical investigation and a simulation study—were conducted to examine the association between item difficulty and item discrimination under classical test theory and item response theory (IRT), and the effects of the association on various quantities of interest. Results from the empirical investigation show that item difficulty and item discrimination are negatively correlated under classical test theory, mostly negatively correlated under the two-parameter logistic model, and mostly positively correlated under the three-parameter logistic model; the magnitude of the correlation varied over the different data sets. Results from the simulation study reveal that a failure to incorporate the correlation between item difficulty and item discrimination in IRT simulations may provide the investigator with inaccurate values of important quantities of interest, and may lead to incorrect operational decisions. Implications to practice and future directions are discussed.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45419783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this digital ITEMS module, Dr. Stella Kim provides an overview of multidimensional item response theory (MIRT) equating. Traditional unidimensional item response theory (IRT) equating methods impose the sometimes untenable restriction on data that only a single ability is assessed. This module discusses potential sources of multidimensionality and presents potential consequences of multidimensionality on equating. To remedy these effects, MIRT equating can be used as a viable alternative to traditional methods of IRT equating. In conducting MIRT equating, the choice of an appropriate MIRT model is necessary, and thus the module describes several existing MIRT models and illustrates each using hypothetical examples. After a brief description of MIRT models, an extensive review of the current literature is presented to identify gaps in the literature on MIRT equating. Then, the steps for conducting MIRT observed-score equating are described. Finally, the module discusses practical considerations in applying MIRT equating to testing practices and suggests potential areas of research for future studies.
{"title":"Digital Module 29: Multidimensional Item Response Theory Equating","authors":"Stella Y. Kim","doi":"10.1111/emip.12525","DOIUrl":"10.1111/emip.12525","url":null,"abstract":"<p>In this digital ITEMS module, Dr. Stella Kim provides an overview of multidimensional item response theory (MIRT) equating. Traditional unidimensional item response theory (IRT) equating methods impose the sometimes untenable restriction on data that only a single ability is assessed. This module discusses potential sources of multidimensionality and presents potential consequences of multidimensionality on equating. To remedy these effects, MIRT equating can be used as a viable alternative to traditional methods of IRT equating. In conducting MIRT equating, the choice of an appropriate MIRT model is necessary, and thus the module describes several existing MIRT models and illustrates each using hypothetical examples. After a brief description of MIRT models, an extensive review of the current literature is presented to identify gaps in the literature on MIRT equating. Then, the steps for conducting MIRT observed-score equating are described. Finally, the module discusses practical considerations in applying MIRT equating to testing practices and suggests potential areas of research for future studies.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12525","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44828478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Cover: Person Infit Density Contour","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12526","DOIUrl":"10.1111/emip.12526","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43771777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many scholars compared various item discrimination indices in real or simulated data. Item discrimination indices, such as item-total correlation, item-rest correlation, and IRT item discrimination parameter, provide information about individual differences among all participants. However, there are tests that aim to select a very limited number of students, examinees, or candidates for allocated schools and job positions. Thus, there is a need to evaluate the performances of CTT and IRT item discrimination indices when the test purpose is to select a limited number of students. The purpose of the current Monte Carlo study is to evaluate item discrimination indices in the case of selecting a limited number of high-achieving students. The results showed that a special case of Brennan's index, B10–90, provided more accurate information for this specific test purpose. Additionally, the effects of various factors, such as test length, ability distributions of examinees, and item difficulty variance on item discrimination indices were investigated. The performance of each item discrimination index is discussed in detail.
{"title":"A Special Case of Brennan's Index for Tests That Aim to Select a Limited Number of Students: A Monte Carlo Simulation Study","authors":"Serkan Arikan, Eren Can Aybek","doi":"10.1111/emip.12528","DOIUrl":"10.1111/emip.12528","url":null,"abstract":"<p>Many scholars compared various item discrimination indices in real or simulated data. Item discrimination indices, such as item-total correlation, item-rest correlation, and IRT item discrimination parameter, provide information about individual differences among all participants. However, there are tests that aim to select a very limited number of students, examinees, or candidates for allocated schools and job positions. Thus, there is a need to evaluate the performances of CTT and IRT item discrimination indices when the test purpose is to select a limited number of students. The purpose of the current Monte Carlo study is to evaluate item discrimination indices in the case of selecting a limited number of high-achieving students. The results showed that a special case of Brennan's index, <i>B</i><sub>10–90</sub>, provided more accurate information for this specific test purpose. Additionally, the effects of various factors, such as test length, ability distributions of examinees, and item difficulty variance on item discrimination indices were investigated. The performance of each item discrimination index is discussed in detail.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43556492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from summative assessments can support. As a solution, we propose tiered claims, which explicitly distinguish between claims about what students have done or can do on test items—which are typically easier to support under current test designs—and claims about what students could do in the broader domain of performances described by the standards, for which novel evidence is likely required. We discuss the positive implications of tiered claims for test construction, validation, and reporting of results.
{"title":"Supporting the Interpretive Validity of Student-Level Claims in Science Assessment with Tiered Claim Structures","authors":"Sanford R. Student, Brian Gong","doi":"10.1111/emip.12523","DOIUrl":"10.1111/emip.12523","url":null,"abstract":"<p>We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from summative assessments can support. As a solution, we propose tiered claims, which explicitly distinguish between claims about what students have done or can do on test items—which are typically easier to support under current test designs—and claims about what students could do in the broader domain of performances described by the standards, for which novel evidence is likely required. We discuss the positive implications of tiered claims for test construction, validation, and reporting of results.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49048947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I show that there are better measures of student college performance than grade point average (GPA) by undertaking a fine-grained empirical investigation of grading within a large public university. The value of using GPA as a measure of comparative performance is undermined by academically weaker students taking courses where the grading is more generous. In fact, college courses composed of weaker performing students (whether measured by their relative performance in other classes, SAT scores, or high school GPA) have higher average grades. To partially correct for idiosyncratic grading across classes, alternative measures, student class rank and the student's average class rank, are introduced. In comparison to a student's lower-division grade, the student's lower-division rank is a better predictor of the student's grade in the upper-division course. Course rank and course grade are adjusted to account for different levels of academic competitiveness across courses (more precisely, student fixed-effects are derived). SAT scores and high school GPA are then used to predict college performance. Higher explained variation (R2) is obtained when the dependent variable is average class rank rather than GPA. Still higher explained variation occurs when the dependent variable is adjusted rank.
{"title":"Average Rank and Adjusted Rank Are Better Measures of College Student Success than GPA","authors":"Donald Wittman","doi":"10.1111/emip.12521","DOIUrl":"10.1111/emip.12521","url":null,"abstract":"<p>I show that there are better measures of student college performance than grade point average (GPA) by undertaking a fine-grained empirical investigation of grading within a large public university. The value of using GPA as a measure of comparative performance is undermined by academically weaker students taking courses where the grading is more generous. In fact, college courses composed of <i>weaker</i> performing students (whether measured by their relative performance in other classes, SAT scores, or high school GPA) have <i>higher</i> average grades. To partially correct for idiosyncratic grading across classes, alternative measures, student class rank and the student's average class rank, are introduced. In comparison to a student's lower-division grade, the student's lower-division <i>rank</i> is a better predictor of the student's grade in the upper-division course. Course rank and course grade are adjusted to account for different levels of academic competitiveness across courses (more precisely, student fixed-effects are derived). SAT scores and high school GPA are then used to predict college performance. Higher explained variation (<i>R</i><sup>2</sup>) is obtained when the dependent variable is average class rank rather than GPA. Still higher explained variation occurs when the dependent variable is adjusted rank.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12521","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45018605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores, they are not appropriate to extend coefficient alpha to correctly estimate the reliability for nonlinearly transformed scaled scores such as percentile ranks and stanines. The current paper reconceptualizes coefficient alpha as a complement of the ratio of two unbiased estimates of the summed score variance. These include conditional summed score variance assuming uncorrelated item scores (gives the error score variance) and unconditional summed score variance incorporating intercorrelated item scores (gives the observed score variance). Using this reconceptualization, a new equation of coefficient generalized alpha is introduced for scaled scores. Coefficient alpha is a special case of this new equation since the latter reduces to coefficinet alpha if the scaled scores are the summed scores themselves. Two applications (cognitive and psychological assessments) are used to compare the performance (estimation and bootstrap confidence interval) of the reliability coefficients for different scaled scores. Results support the new equation of coefficient generalized alpha and compare it to coefficient generalized beta for parallel test forms. Coefficient generalized alpha produced different reliability values, which were larger than coefficient generalized beta for different scaled scores.
{"title":"Reconceptualization of Coefficient Alpha Reliability for Test Summed and Scaled Scores","authors":"Rashid S. Almehrizi","doi":"10.1111/emip.12520","DOIUrl":"10.1111/emip.12520","url":null,"abstract":"<p>Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores, they are not appropriate to extend coefficient alpha to correctly estimate the reliability for nonlinearly transformed scaled scores such as percentile ranks and stanines. The current paper reconceptualizes coefficient alpha as a complement of the ratio of two unbiased estimates of the summed score variance. These include conditional summed score variance assuming uncorrelated item scores (gives the error score variance) and unconditional summed score variance incorporating intercorrelated item scores (gives the observed score variance). Using this reconceptualization, a new equation of coefficient generalized alpha is introduced for scaled scores. Coefficient alpha is a special case of this new equation since the latter reduces to coefficinet alpha if the scaled scores are the summed scores themselves. Two applications (cognitive and psychological assessments) are used to compare the performance (estimation and bootstrap confidence interval) of the reliability coefficients for different scaled scores. Results support the new equation of coefficient generalized alpha and compare it to coefficient generalized beta for parallel test forms. Coefficient generalized alpha produced different reliability values, which were larger than coefficient generalized beta for different scaled scores.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45740678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}