Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response theory approach.

Psychology science quarterly Pub Date : 2009-01-01

Jeanne A Teresi, Katja Ocepek-Welikson, Marjorie Kleinman, Joseph P Eimicke, Paul K Crane, Richard N Jones, Jin-Shei Lai, Seung W Choi, Ron D Hays, Bryce B Reeve, Steven P Reise, Paul A Pilkonis, David Cella

{"title":"Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response theory approach.","authors":"Jeanne A Teresi, Katja Ocepek-Welikson, Marjorie Kleinman, Joseph P Eimicke, Paul K Crane, Richard N Jones, Jin-Shei Lai, Seung W Choi, Ron D Hays, Bryce B Reeve, Steven P Reise, Paul A Pilkonis, David Cella","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The aims of this paper are to present findings related to differential item functioning (DIF) in the Patient Reported Outcome Measurement Information System (PROMIS) depression item bank, and to discuss potential threats to the validity of results from studies of DIF. The 32 depression items studied were modified from several widely used instruments. DIF analyses of gender, age and education were performed using a sample of 735 individuals recruited by a survey polling firm. DIF hypotheses were generated by asking content experts to indicate whether or not they expected DIF to be present, and the direction of the DIF with respect to the studied comparison groups. Primary analyses were conducted using the graded item response model (for polytomous, ordered response category data) with likelihood ratio tests of DIF, accompanied by magnitude measures. Sensitivity analyses were performed using other item response models and approaches to DIF detection. Despite some caveats, the items that are recommended for exclusion or for separate calibration were \"I felt like crying\" and \"I had trouble enjoying things that I used to enjoy.\" The item, \"I felt I had no energy,\" was also flagged as evidencing DIF, and recommended for additional review. On the one hand, false DIF detection (Type 1 error) was controlled to the extent possible by ensuring model fit and purification. On the other hand, power for DIF detection might have been compromised by several factors, including sparse data and small sample sizes. Nonetheless, practical and not just statistical significance should be considered. In this case the overall magnitude and impact of DIF was small for the groups studied, although impact was relatively large for some individuals.</p>","PeriodicalId":88654,"journal":{"name":"Psychology science quarterly","volume":"51 2","pages":"148-180"},"PeriodicalIF":0.0000,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2844669/pdf/nihms136951.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychology science quarterly","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The aims of this paper are to present findings related to differential item functioning (DIF) in the Patient Reported Outcome Measurement Information System (PROMIS) depression item bank, and to discuss potential threats to the validity of results from studies of DIF. The 32 depression items studied were modified from several widely used instruments. DIF analyses of gender, age and education were performed using a sample of 735 individuals recruited by a survey polling firm. DIF hypotheses were generated by asking content experts to indicate whether or not they expected DIF to be present, and the direction of the DIF with respect to the studied comparison groups. Primary analyses were conducted using the graded item response model (for polytomous, ordered response category data) with likelihood ratio tests of DIF, accompanied by magnitude measures. Sensitivity analyses were performed using other item response models and approaches to DIF detection. Despite some caveats, the items that are recommended for exclusion or for separate calibration were "I felt like crying" and "I had trouble enjoying things that I used to enjoy." The item, "I felt I had no energy," was also flagged as evidencing DIF, and recommended for additional review. On the one hand, false DIF detection (Type 1 error) was controlled to the extent possible by ensuring model fit and purification. On the other hand, power for DIF detection might have been compromised by several factors, including sparse data and small sample sizes. Nonetheless, practical and not just statistical significance should be considered. In this case the overall magnitude and impact of DIF was small for the groups studied, although impact was relatively large for some individuals.

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从病人报告结果测量信息系统（PROMIS）中分析抑郁症题库中不同项目的功能：一个项目反应理论方法。

本文的目的是介绍患者报告结果测量信息系统（PROMIS）抑郁项目库中差异项目功能（DIF）的相关发现，并讨论对DIF研究结果有效性的潜在威胁。所研究的32个抑郁项目是由几种广泛使用的仪器修改而成的。DIF对性别、年龄和教育程度的分析是由一家民意调查公司招募的735个人进行的。DIF假设是通过要求内容专家指出他们是否期望DIF存在，以及DIF相对于所研究的对照组的方向来产生的。初步分析采用分级项目反应模型（对于多分式、有序反应类别数据），采用DIF的似然比检验，并辅以量值测量。敏感度分析采用其他项目反应模型和方法进行DIF检测。尽管有一些警告，但建议排除或单独校准的项目是“我觉得想哭”和“我无法享受我以前喜欢的东西”。“我觉得我没有精力”这一项也被标记为DIF的证据，并建议进行额外的检查。一方面，通过保证模型拟合和净化，最大程度地控制DIF检测错误（1类错误）。另一方面，DIF检测的能力可能会受到几个因素的影响，包括稀疏的数据和较小的样本量。尽管如此，应该考虑实际意义，而不仅仅是统计意义。在这种情况下，尽管对某些个体的影响相对较大，但DIF的总体幅度和影响对所研究的群体来说很小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Psychology science quarterly

自引率

0.00%

发文量