Estimating the Prevalence of Injection Drug Use Among Acute Hepatitis C Cases From a National Surveillance System: Application of Random Forest-Based Multiple Imputation.
Shaoman Yin, Kathleen N Ly, Laurie K Barker, Danae Bixler, Nicola D Thompson, Neil Gupta
{"title":"Estimating the Prevalence of Injection Drug Use Among Acute Hepatitis C Cases From a National Surveillance System: Application of Random Forest-Based Multiple Imputation.","authors":"Shaoman Yin, Kathleen N Ly, Laurie K Barker, Danae Bixler, Nicola D Thompson, Neil Gupta","doi":"10.1097/PHH.0000000000002014","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Injection drug use (IDU) is a major contributor to the syndemic of viral hepatitis, human immunodeficiency virus, and drug overdose. However, information on IDU is frequently missing in national viral hepatitis surveillance data, which limits our understanding of the full extent of IDU-associated infections. Multiple imputation by chained equations (MICE) has become a popular approach to address missing data, but its application for IDU imputation is less studied.</p><p><strong>Methods: </strong>Using the 2019-2021 National Notifiable Diseases Surveillance System acute hepatitis C case data and publicly available county-level measures, we evaluated listwise deletion (LD) and 3 models imputing missing IDU data through MICE: parametric logistic regression, semi-parametric predictive mean matching (PMM), and nonparametric random forest (RF) (both standard RF [sRF] and fast implementation of RF [fRF]).</p><p><strong>Results: </strong>The estimated IDU prevalence among acute hepatitis C cases increased from 63.5% by LD to 65.1% by logistic regression, 66.9% by PMM, 76.0% by sRF, and 85.1% by fRF. Evaluation studies showed that RF-based MICE imputation, especially fRF, has the highest accuracy (as measured by smallest raw bias, percent bias, and root mean square error) and highest efficiency (as measured by smallest 95% confidence interval width) compared to LD and other models. Sensitivity analyses indicated that fRF remained robust when data were missing not at random.</p><p><strong>Conclusion: </strong>Our analysis suggested that RF-based MICE imputation, especially fRF, could be a valuable approach for addressing missing IDU data in the context of population-based surveillance systems like National Notifiable Diseases Surveillance System. The inclusion of imputed IDU data may enhance the effectiveness of future surveillance and prevention efforts for the IDU-driven syndemic.</p>","PeriodicalId":47855,"journal":{"name":"Journal of Public Health Management and Practice","volume":"30 5","pages":"733-743"},"PeriodicalIF":1.9000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11883639/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Public Health Management and Practice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/PHH.0000000000002014","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/22 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Injection drug use (IDU) is a major contributor to the syndemic of viral hepatitis, human immunodeficiency virus, and drug overdose. However, information on IDU is frequently missing in national viral hepatitis surveillance data, which limits our understanding of the full extent of IDU-associated infections. Multiple imputation by chained equations (MICE) has become a popular approach to address missing data, but its application for IDU imputation is less studied.
Methods: Using the 2019-2021 National Notifiable Diseases Surveillance System acute hepatitis C case data and publicly available county-level measures, we evaluated listwise deletion (LD) and 3 models imputing missing IDU data through MICE: parametric logistic regression, semi-parametric predictive mean matching (PMM), and nonparametric random forest (RF) (both standard RF [sRF] and fast implementation of RF [fRF]).
Results: The estimated IDU prevalence among acute hepatitis C cases increased from 63.5% by LD to 65.1% by logistic regression, 66.9% by PMM, 76.0% by sRF, and 85.1% by fRF. Evaluation studies showed that RF-based MICE imputation, especially fRF, has the highest accuracy (as measured by smallest raw bias, percent bias, and root mean square error) and highest efficiency (as measured by smallest 95% confidence interval width) compared to LD and other models. Sensitivity analyses indicated that fRF remained robust when data were missing not at random.
Conclusion: Our analysis suggested that RF-based MICE imputation, especially fRF, could be a valuable approach for addressing missing IDU data in the context of population-based surveillance systems like National Notifiable Diseases Surveillance System. The inclusion of imputed IDU data may enhance the effectiveness of future surveillance and prevention efforts for the IDU-driven syndemic.
期刊介绍:
Journal of Public Health Management and Practice publishes articles which focus on evidence based public health practice and research. The journal is a bi-monthly peer-reviewed publication guided by a multidisciplinary editorial board of administrators, practitioners and scientists. Journal of Public Health Management and Practice publishes in a wide range of population health topics including research to practice; emergency preparedness; bioterrorism; infectious disease surveillance; environmental health; community health assessment, chronic disease prevention and health promotion, and academic-practice linkages.