W Holmes Finch, Brian F French, Maria E Hernandez Finch
{"title":"Quantifying Item Invariance for the Selection of the Least Biased Assessment.","authors":"W Holmes Finch, Brian F French, Maria E Hernandez Finch","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>An important aspect of educational and psychological measurement and evaluation of individuals is the selection of scales with appropriate evidence of reliability and validity for inferences and uses of the scores for the population of interest. One aspect of validity is the degree to which a scale fairly assesses the construct(s) of interest for members of different subgroups within the population. Typically, this issue is addressed statistically through assessment of differential item functioning (DIF) of individual items, or differential bundle functioning (DBF) of sets of items. When selecting an assessment to use for a given application (e.g., measuring intelligence), or which form of an assessment to use in a given instance, researchers need to consider the extent to which the scales work with all members of the population. Little research has examined methods for comparing the amount or magnitude of DIF/DBF present in two assessments when deciding which assessment to use. The current simulation study examines 6 different statistics for this purpose. Results show that a method based on the random effects item response theory model may be optimal for instrument comparisons, particularly when the assessments being compared are not of the same length.</p>","PeriodicalId":73608,"journal":{"name":"Journal of applied measurement","volume":"20 1","pages":"13-26"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of applied measurement","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
An important aspect of educational and psychological measurement and evaluation of individuals is the selection of scales with appropriate evidence of reliability and validity for inferences and uses of the scores for the population of interest. One aspect of validity is the degree to which a scale fairly assesses the construct(s) of interest for members of different subgroups within the population. Typically, this issue is addressed statistically through assessment of differential item functioning (DIF) of individual items, or differential bundle functioning (DBF) of sets of items. When selecting an assessment to use for a given application (e.g., measuring intelligence), or which form of an assessment to use in a given instance, researchers need to consider the extent to which the scales work with all members of the population. Little research has examined methods for comparing the amount or magnitude of DIF/DBF present in two assessments when deciding which assessment to use. The current simulation study examines 6 different statistics for this purpose. Results show that a method based on the random effects item response theory model may be optimal for instrument comparisons, particularly when the assessments being compared are not of the same length.