Andrew Trigg, Claudia Haberland, Huda Shalhoub, Christoph Gerlinger, Christian Seitz
{"title":"Comparative performance of PROMIS Sleep Disturbance computerized adaptive testing algorithms and static short form in postmenopausal women.","authors":"Andrew Trigg, Claudia Haberland, Huda Shalhoub, Christoph Gerlinger, Christian Seitz","doi":"10.1186/s41687-025-00849-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The Patient-Reported Outcomes Measurement Information System (PROMIS) Sleep Disturbance v1.0 item bank (27 items) measures sleep disturbances. Rather than the full item bank, an 8-item short form (PROMIS SD SF 8b) or computerized adaptive testing (CAT) can be used. This study compares the performance of the PROMIS SD SF 8b with two CAT algorithms in postmenopausal women.</p><p><strong>Methods: </strong>This is a secondary analysis of data collected for the original psychometric testing of the PROMIS Sleep Disturbance item bank, in a sub-sample of women aged ≥55. A graded response model (GRM) was fitted for the item bank, then simulations evaluated the performance of CAT algorithms and the short form, in terms of root mean square error (RMSE) versus the latent trait estimate derived from the full bank. Two CAT algorithms were tested: CAT1 (stop once standard error <0.3 or 12 items administered) and CAT2 (stop once 8 items administered). Convergent and divergent hypotheses for validity were tested through correlations with the Pittsburgh Sleep Quality Index (PSQI) and Epworth Sleepiness Scale (ESS). Known-groups comparisons were made between those with and without self-reported sleep disorder.</p><p><strong>Results: </strong>A sample of 337 women was analyzed. Unidimensionality and item-level fit to the GRM was supported; however, the local independence assumption was violated. The CAT1 algorithm showed 4.18 items on average, with a minor decrease in performance (higher RMSE value) compared to CAT2 or the PROMIS SD SF 8b. Administering 8 items adaptively (CAT2) compared to fixed (PROMIS SD SF 8b) performed similarly (RMSE difference = 0.001). Reliability exceeded 0.90 across most of the latent trait for all approaches. Correlations with the PSQI and ESS were largely as hypothesized, with minor differences in coefficient values between the approaches (all within 0.05). Women reporting a sleep disorder had greater sleep disturbance than those who did not (p < 0.001 for all).</p><p><strong>Conclusions: </strong>The results of this study support using the PROMIS Sleep Disturbance item bank in postmenopausal women. The choice of PROMIS SD SF 8b versus CAT can largely be driven by practical reasons (respondent burden and operational complexity) rather than concerns of differential reliability and validity.</p>","PeriodicalId":36660,"journal":{"name":"Journal of Patient-Reported Outcomes","volume":"9 1","pages":"18"},"PeriodicalIF":2.4000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Patient-Reported Outcomes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41687-025-00849-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The Patient-Reported Outcomes Measurement Information System (PROMIS) Sleep Disturbance v1.0 item bank (27 items) measures sleep disturbances. Rather than the full item bank, an 8-item short form (PROMIS SD SF 8b) or computerized adaptive testing (CAT) can be used. This study compares the performance of the PROMIS SD SF 8b with two CAT algorithms in postmenopausal women.
Methods: This is a secondary analysis of data collected for the original psychometric testing of the PROMIS Sleep Disturbance item bank, in a sub-sample of women aged ≥55. A graded response model (GRM) was fitted for the item bank, then simulations evaluated the performance of CAT algorithms and the short form, in terms of root mean square error (RMSE) versus the latent trait estimate derived from the full bank. Two CAT algorithms were tested: CAT1 (stop once standard error <0.3 or 12 items administered) and CAT2 (stop once 8 items administered). Convergent and divergent hypotheses for validity were tested through correlations with the Pittsburgh Sleep Quality Index (PSQI) and Epworth Sleepiness Scale (ESS). Known-groups comparisons were made between those with and without self-reported sleep disorder.
Results: A sample of 337 women was analyzed. Unidimensionality and item-level fit to the GRM was supported; however, the local independence assumption was violated. The CAT1 algorithm showed 4.18 items on average, with a minor decrease in performance (higher RMSE value) compared to CAT2 or the PROMIS SD SF 8b. Administering 8 items adaptively (CAT2) compared to fixed (PROMIS SD SF 8b) performed similarly (RMSE difference = 0.001). Reliability exceeded 0.90 across most of the latent trait for all approaches. Correlations with the PSQI and ESS were largely as hypothesized, with minor differences in coefficient values between the approaches (all within 0.05). Women reporting a sleep disorder had greater sleep disturbance than those who did not (p < 0.001 for all).
Conclusions: The results of this study support using the PROMIS Sleep Disturbance item bank in postmenopausal women. The choice of PROMIS SD SF 8b versus CAT can largely be driven by practical reasons (respondent burden and operational complexity) rather than concerns of differential reliability and validity.