评估电子健康记录数据质量对识别 2 型糖尿病患者的影响。

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS JMIR Medical Informatics Pub Date : 2024-06-08 DOI:10.2196/56734

Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi

{"title":"评估电子健康记录数据质量对识别 2 型糖尿病患者的影响。","authors":"Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi","doi":"10.2196/56734","DOIUrl":null,"url":null,"abstract":"Background: Increasing and substantial reliance on Electronic health records (EHR) and data types (i.e., diagnosis (Dx), medication (Rx), laboratory (Lx)) demands assessment of its data quality (DQ) as a fundamental approach; especially since there is need to identify appropriate denominator population with chronic conditions, such as Type-2 Diabetes (T2D), using commonly available computable phenotype definitions (phenotype).Objective: To bridge this gap, our study aims to assess how issues of EHR DQ, and variations and robustness (or lack thereof) in phenotypes may have potential impact in identifying denominator population.Methods: Approximately 208k patients with T2D were included in our study using retrospective EHR data of Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes, and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (i.e., age, sex, race, ethnicity), healthcare utilization (inpatient and emergency room visits), and average Charlson Comorbidity score of each phenotype. We then used different methods to induce/simulate DQ issues of completeness, accuracy and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped Dx, Rx, and Lx codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a Dx or Rx code with another code of the same data type and induced 2% incremental change from -100% to +10% in Lx result values; and lastly, for timeliness, data was modeled for induced incremental shift of date records by 30 days up to a year.Results: Less than a quarter (23%) of population overlapped across all phenotypes using EHR. The population identified by each phenotype varied across all combination of data types. Induced incompleteness identified fewer patients with each increment, for e.g., at 100% diagnostic incompleteness, Chronic Conditions Data Warehouse (CCW) phenotype identified zero patients as its phenotypic characteristics included only Dx codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype and therefore, resulting in fewer patients being identified with each incremental change.Conclusions: We utilized EHR data with Dx, Rx, and Lx data types from a large tertiary hospital system to understand the T2D phenotypic differences and performance. We learned how issues of DQ, using induced DQ methods, may impact identification of the denominator populations upon which clinical (e.g., clinical research and trials, population health evaluations) and financial/operational decisions are made. The novel results from our study may inform in shaping a common T2D computable phenotype definition that can be applicable to clinical informatics, managing chronic conditions, and additional healthcare industry-wide efforts.Clinicaltrial: ","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients with Type 2 Diabetes.\",\"authors\":\"Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi\",\"doi\":\"10.2196/56734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Increasing and substantial reliance on Electronic health records (EHR) and data types (i.e., diagnosis (Dx), medication (Rx), laboratory (Lx)) demands assessment of its data quality (DQ) as a fundamental approach; especially since there is need to identify appropriate denominator population with chronic conditions, such as Type-2 Diabetes (T2D), using commonly available computable phenotype definitions (phenotype).Objective: To bridge this gap, our study aims to assess how issues of EHR DQ, and variations and robustness (or lack thereof) in phenotypes may have potential impact in identifying denominator population.Methods: Approximately 208k patients with T2D were included in our study using retrospective EHR data of Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes, and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (i.e., age, sex, race, ethnicity), healthcare utilization (inpatient and emergency room visits), and average Charlson Comorbidity score of each phenotype. We then used different methods to induce/simulate DQ issues of completeness, accuracy and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped Dx, Rx, and Lx codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a Dx or Rx code with another code of the same data type and induced 2% incremental change from -100% to +10% in Lx result values; and lastly, for timeliness, data was modeled for induced incremental shift of date records by 30 days up to a year.Results: Less than a quarter (23%) of population overlapped across all phenotypes using EHR. The population identified by each phenotype varied across all combination of data types. Induced incompleteness identified fewer patients with each increment, for e.g., at 100% diagnostic incompleteness, Chronic Conditions Data Warehouse (CCW) phenotype identified zero patients as its phenotypic characteristics included only Dx codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype and therefore, resulting in fewer patients being identified with each incremental change.Conclusions: We utilized EHR data with Dx, Rx, and Lx data types from a large tertiary hospital system to understand the T2D phenotypic differences and performance. We learned how issues of DQ, using induced DQ methods, may impact identification of the denominator populations upon which clinical (e.g., clinical research and trials, population health evaluations) and financial/operational decisions are made. The novel results from our study may inform in shaping a common T2D computable phenotype definition that can be applicable to clinical informatics, managing chronic conditions, and additional healthcare industry-wide efforts.Clinicaltrial: \",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/56734\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/56734","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：对电子健康记录（EHR）和数据类型（即诊断（Dx）、用药（Rx）、化验（Lx））的依赖越来越大，这就要求对其数据质量（DQ）进行评估，并将其作为一项基本方法；尤其是因为需要利用常用的可计算表型定义（表型）来确定患有慢性疾病（如 2 型糖尿病（T2D））的适当分母人群：为了弥补这一差距，我们的研究旨在评估电子病历 DQ 问题以及表型的变化和稳健性（或缺乏稳健性）如何对确定分母人群产生潜在影响：我们的研究利用约翰霍普金斯医疗机构（JHMI）2017-2019年间的回顾性电子病历数据，纳入了约20.8万名T2D患者。我们的评估包括 4 种已发表的表型和 1 种来自霍普金斯大学专家小组的定义。我们对每种表型的人口统计学（即年龄、性别、种族、民族）、医疗保健利用率（住院和急诊就诊）和平均 Charlson 合并症评分进行了描述性分析。然后，我们使用不同的方法分别诱导/模拟每种表型的完整性、准确性和及时性等 DQ 问题。在诱导数据不完整方面，我们的模型以 10% 的增量随机丢弃 Dx、Rx 和 Lx 编码；在诱导数据不准确方面，我们的模型以相同数据类型的另一个编码随机替换 Dx 或 Rx 编码，并诱导 Lx 结果值从 -100% 到 +10% 之间以 2% 的增量变化；最后，在及时性方面，我们对数据进行建模，诱导日期记录以 30 天到一年的增量变化：只有不到四分之一（23%）的人群在使用电子病历的所有表型中出现重叠。在所有数据类型组合中，每种表型所识别的人群各不相同。每增加一个表型，诱导的不完整性识别出的患者人数就会减少，例如，当诊断不完整性达到 100%时，慢性病数据仓库（CCW）表型识别出的患者人数为零，因为其表型特征只包括疾病代码。诱导的不准确性和及时性同样显示了每种表型的性能差异，因此，每次增量变化都会导致识别出的患者数量减少：我们利用一家大型三级医院系统中包含 Dx、Rx 和 Lx 数据类型的电子病历数据来了解 T2D 表型的差异和性能。我们了解了使用诱导 DQ 方法的 DQ 问题如何影响分母人群的识别，而临床（如临床研究和试验、人群健康评估）和财务/运营决策正是基于这些分母人群做出的。我们研究的新结果可能有助于形成共同的 T2D 可计算表型定义，该定义可适用于临床信息学、慢性病管理以及其他医疗保健行业范围内的工作：

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients with Type 2 Diabetes.

Background: Increasing and substantial reliance on Electronic health records (EHR) and data types (i.e., diagnosis (Dx), medication (Rx), laboratory (Lx)) demands assessment of its data quality (DQ) as a fundamental approach; especially since there is need to identify appropriate denominator population with chronic conditions, such as Type-2 Diabetes (T2D), using commonly available computable phenotype definitions (phenotype).

Objective: To bridge this gap, our study aims to assess how issues of EHR DQ, and variations and robustness (or lack thereof) in phenotypes may have potential impact in identifying denominator population.

Methods: Approximately 208k patients with T2D were included in our study using retrospective EHR data of Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes, and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (i.e., age, sex, race, ethnicity), healthcare utilization (inpatient and emergency room visits), and average Charlson Comorbidity score of each phenotype. We then used different methods to induce/simulate DQ issues of completeness, accuracy and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped Dx, Rx, and Lx codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a Dx or Rx code with another code of the same data type and induced 2% incremental change from -100% to +10% in Lx result values; and lastly, for timeliness, data was modeled for induced incremental shift of date records by 30 days up to a year.

Results: Less than a quarter (23%) of population overlapped across all phenotypes using EHR. The population identified by each phenotype varied across all combination of data types. Induced incompleteness identified fewer patients with each increment, for e.g., at 100% diagnostic incompleteness, Chronic Conditions Data Warehouse (CCW) phenotype identified zero patients as its phenotypic characteristics included only Dx codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype and therefore, resulting in fewer patients being identified with each incremental change.

Conclusions: We utilized EHR data with Dx, Rx, and Lx data types from a large tertiary hospital system to understand the T2D phenotypic differences and performance. We learned how issues of DQ, using induced DQ methods, may impact identification of the denominator populations upon which clinical (e.g., clinical research and trials, population health evaluations) and financial/operational decisions are made. The novel results from our study may inform in shaping a common T2D computable phenotype definition that can be applicable to clinical informatics, managing chronic conditions, and additional healthcare industry-wide efforts.

Clinicaltrial:

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.