解决电子健康记录数据中的选择偏差,估算纽约市年轻成年人的糖尿病患病率:一项横断面研究

Sarah Conderino, Lorna E Thorpe, Jasmin Divers, Sandra S Albrecht, Shannon M Farley, David C Lee, Rebecca Anthopolos
{"title":"解决电子健康记录数据中的选择偏差,估算纽约市年轻成年人的糖尿病患病率:一项横断面研究","authors":"Sarah Conderino, Lorna E Thorpe, Jasmin Divers, Sandra S Albrecht, Shannon M Farley, David C Lee, Rebecca Anthopolos","doi":"10.1136/bmjph-2024-001666","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state, or other jurisdictions. We focus on using EHR data for estimation of diabetes prevalence among young adults in New York City, as rising diabetes burden in younger ages call for better surveillance capacity.</p><p><strong>Methods: </strong>This article applies common nonprobability sampling methods, including raking, post-stratification, and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18-44 years. Within real data analyses, we externally validate city- and neighborhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.</p><p><strong>Results: </strong>Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared to gold standard. Residual biases remained at the neighborhood-level, where prevalence tended to be overestimated, especially in neighborhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.</p><p><strong>Conclusions: </strong>While EHRs offer potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.</p>","PeriodicalId":101362,"journal":{"name":"BMJ public health","volume":"2 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11578099/pdf/","citationCount":"0","resultStr":"{\"title\":\"Addressing Selection Biases within Electronic Health Record Data for Estimation of Diabetes Prevalence among New York City Young Adults: A Cross-Sectional Study.\",\"authors\":\"Sarah Conderino, Lorna E Thorpe, Jasmin Divers, Sandra S Albrecht, Shannon M Farley, David C Lee, Rebecca Anthopolos\",\"doi\":\"10.1136/bmjph-2024-001666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state, or other jurisdictions. We focus on using EHR data for estimation of diabetes prevalence among young adults in New York City, as rising diabetes burden in younger ages call for better surveillance capacity.</p><p><strong>Methods: </strong>This article applies common nonprobability sampling methods, including raking, post-stratification, and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18-44 years. Within real data analyses, we externally validate city- and neighborhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.</p><p><strong>Results: </strong>Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared to gold standard. Residual biases remained at the neighborhood-level, where prevalence tended to be overestimated, especially in neighborhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.</p><p><strong>Conclusions: </strong>While EHRs offer potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.</p>\",\"PeriodicalId\":101362,\"journal\":{\"name\":\"BMJ public health\",\"volume\":\"2 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11578099/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ public health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjph-2024-001666\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ public health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjph-2024-001666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

导言:人们对使用电子健康记录(EHR)进行慢性病监测越来越感兴趣。然而,这些数据都是方便抽取的在诊个人样本,并不能代表公共卫生监测的目标人群,在相关时期,目标人群一般被定义为城市、州或其他辖区内的常住人口。我们将重点放在使用电子病历数据估算纽约市年轻成年人的糖尿病患病率上,因为年轻人的糖尿病负担日益加重,需要更好的监测能力:本文在真实数据和模拟数据中应用了常见的非概率抽样方法,包括耙取、后分层和带后分层的多层次回归,对 18-44 岁人群的糖尿病患病率进行横截面估算。在真实数据分析中,我们从外部验证了基于城市和社区电子病历的估计值与当地健康调查的黄金标准估计值。在数据模拟中,我们探究了当电子健康记录样本的选择不可忽略时,残余偏差的程度:结果:在真实数据分析中,与黄金标准相比,这些方法减少了选择偏差对全市流行率估计值的影响。残余偏差仍然存在于邻里层面,流行率往往被高估,尤其是在样本中居民比例较高的邻里。模拟结果表明,这些方法可能是足够的,除非电子病历的选择是不可忽略的,这取决于未测量的因素或糖尿病状态:虽然电子健康记录为慢性病监测提供了创新潜力,但在估算小范围地区的患病率或选择不可忽略时仍需谨慎。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Addressing Selection Biases within Electronic Health Record Data for Estimation of Diabetes Prevalence among New York City Young Adults: A Cross-Sectional Study.

Introduction: There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state, or other jurisdictions. We focus on using EHR data for estimation of diabetes prevalence among young adults in New York City, as rising diabetes burden in younger ages call for better surveillance capacity.

Methods: This article applies common nonprobability sampling methods, including raking, post-stratification, and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18-44 years. Within real data analyses, we externally validate city- and neighborhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.

Results: Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared to gold standard. Residual biases remained at the neighborhood-level, where prevalence tended to be overestimated, especially in neighborhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.

Conclusions: While EHRs offer potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Problem of pain in the USA: evaluating the generalisability of high-impact chronic pain models over time using National Health Interview Survey (NHIS) data. Characterising the killing of girls and women in urban settings in Latin America, 2000–2019: an analysis of variability and time trends using mortality data from vital registration systems Correction: Improving influenza vaccine uptake in clinical risk groups: patient, provider and commissioner perspectives on the acceptability and feasibility of expanding delivery pathways in England ‘Two sides of the same coin’? A longitudinal analysis evaluating whether financial austerity accelerated NHS privatisation in England 2013-2020 Client perspectives on creating supportive sexual health environments for people with persistent anxiety: a qualitative study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1