解决电子健康记录数据中的选择偏差，估算纽约市年轻成年人的糖尿病患病率：一项横断面研究

BMJ public health Pub Date : 2024-01-01 DOI:10.1136/bmjph-2024-001666

Sarah Conderino, Lorna E Thorpe, Jasmin Divers, Sandra S Albrecht, Shannon M Farley, David C Lee, Rebecca Anthopolos

{"title":"解决电子健康记录数据中的选择偏差，估算纽约市年轻成年人的糖尿病患病率：一项横断面研究","authors":"Sarah Conderino, Lorna E Thorpe, Jasmin Divers, Sandra S Albrecht, Shannon M Farley, David C Lee, Rebecca Anthopolos","doi":"10.1136/bmjph-2024-001666","DOIUrl":null,"url":null,"abstract":"Introduction: There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state, or other jurisdictions. We focus on using EHR data for estimation of diabetes prevalence among young adults in New York City, as rising diabetes burden in younger ages call for better surveillance capacity.Methods: This article applies common nonprobability sampling methods, including raking, post-stratification, and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18-44 years. Within real data analyses, we externally validate city- and neighborhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.Results: Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared to gold standard. Residual biases remained at the neighborhood-level, where prevalence tended to be overestimated, especially in neighborhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.Conclusions: While EHRs offer potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.","PeriodicalId":101362,"journal":{"name":"BMJ public health","volume":"2 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11578099/pdf/","citationCount":"0","resultStr":"{\"title\":\"Addressing Selection Biases within Electronic Health Record Data for Estimation of Diabetes Prevalence among New York City Young Adults: A Cross-Sectional Study.\",\"authors\":\"Sarah Conderino, Lorna E Thorpe, Jasmin Divers, Sandra S Albrecht, Shannon M Farley, David C Lee, Rebecca Anthopolos\",\"doi\":\"10.1136/bmjph-2024-001666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state, or other jurisdictions. We focus on using EHR data for estimation of diabetes prevalence among young adults in New York City, as rising diabetes burden in younger ages call for better surveillance capacity.Methods: This article applies common nonprobability sampling methods, including raking, post-stratification, and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18-44 years. Within real data analyses, we externally validate city- and neighborhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.Results: Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared to gold standard. Residual biases remained at the neighborhood-level, where prevalence tended to be overestimated, especially in neighborhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.Conclusions: While EHRs offer potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.\",\"PeriodicalId\":101362,\"journal\":{\"name\":\"BMJ public health\",\"volume\":\"2 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11578099/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ public health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjph-2024-001666\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ public health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjph-2024-001666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

导言：人们对使用电子健康记录（EHR）进行慢性病监测越来越感兴趣。然而，这些数据都是方便抽取的在诊个人样本，并不能代表公共卫生监测的目标人群，在相关时期，目标人群一般被定义为城市、州或其他辖区内的常住人口。我们将重点放在使用电子病历数据估算纽约市年轻成年人的糖尿病患病率上，因为年轻人的糖尿病负担日益加重，需要更好的监测能力：本文在真实数据和模拟数据中应用了常见的非概率抽样方法，包括耙取、后分层和带后分层的多层次回归，对 18-44 岁人群的糖尿病患病率进行横截面估算。在真实数据分析中，我们从外部验证了基于城市和社区电子病历的估计值与当地健康调查的黄金标准估计值。在数据模拟中，我们探究了当电子健康记录样本的选择不可忽略时，残余偏差的程度：结果：在真实数据分析中，与黄金标准相比，这些方法减少了选择偏差对全市流行率估计值的影响。残余偏差仍然存在于邻里层面，流行率往往被高估，尤其是在样本中居民比例较高的邻里。模拟结果表明，这些方法可能是足够的，除非电子病历的选择是不可忽略的，这取决于未测量的因素或糖尿病状态：虽然电子健康记录为慢性病监测提供了创新潜力，但在估算小范围地区的患病率或选择不可忽略时仍需谨慎。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Addressing Selection Biases within Electronic Health Record Data for Estimation of Diabetes Prevalence among New York City Young Adults: A Cross-Sectional Study.

Introduction: There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state, or other jurisdictions. We focus on using EHR data for estimation of diabetes prevalence among young adults in New York City, as rising diabetes burden in younger ages call for better surveillance capacity.

Methods: This article applies common nonprobability sampling methods, including raking, post-stratification, and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18-44 years. Within real data analyses, we externally validate city- and neighborhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.

Results: Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared to gold standard. Residual biases remained at the neighborhood-level, where prevalence tended to be overestimated, especially in neighborhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.

Conclusions: While EHRs offer potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMJ public health

自引率

0.00%

发文量