Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi
{"title":"评估电子健康记录数据质量对识别 2 型糖尿病患者的影响。","authors":"Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi","doi":"10.2196/56734","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Increasing and substantial reliance on Electronic health records (EHR) and data types (i.e., diagnosis (Dx), medication (Rx), laboratory (Lx)) demands assessment of its data quality (DQ) as a fundamental approach; especially since there is need to identify appropriate denominator population with chronic conditions, such as Type-2 Diabetes (T2D), using commonly available computable phenotype definitions (phenotype).</p><p><strong>Objective: </strong>To bridge this gap, our study aims to assess how issues of EHR DQ, and variations and robustness (or lack thereof) in phenotypes may have potential impact in identifying denominator population.</p><p><strong>Methods: </strong>Approximately 208k patients with T2D were included in our study using retrospective EHR data of Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes, and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (i.e., age, sex, race, ethnicity), healthcare utilization (inpatient and emergency room visits), and average Charlson Comorbidity score of each phenotype. We then used different methods to induce/simulate DQ issues of completeness, accuracy and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped Dx, Rx, and Lx codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a Dx or Rx code with another code of the same data type and induced 2% incremental change from -100% to +10% in Lx result values; and lastly, for timeliness, data was modeled for induced incremental shift of date records by 30 days up to a year.</p><p><strong>Results: </strong>Less than a quarter (23%) of population overlapped across all phenotypes using EHR. The population identified by each phenotype varied across all combination of data types. Induced incompleteness identified fewer patients with each increment, for e.g., at 100% diagnostic incompleteness, Chronic Conditions Data Warehouse (CCW) phenotype identified zero patients as its phenotypic characteristics included only Dx codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype and therefore, resulting in fewer patients being identified with each incremental change.</p><p><strong>Conclusions: </strong>We utilized EHR data with Dx, Rx, and Lx data types from a large tertiary hospital system to understand the T2D phenotypic differences and performance. We learned how issues of DQ, using induced DQ methods, may impact identification of the denominator populations upon which clinical (e.g., clinical research and trials, population health evaluations) and financial/operational decisions are made. The novel results from our study may inform in shaping a common T2D computable phenotype definition that can be applicable to clinical informatics, managing chronic conditions, and additional healthcare industry-wide efforts.</p><p><strong>Clinicaltrial: </strong></p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients with Type 2 Diabetes.\",\"authors\":\"Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi\",\"doi\":\"10.2196/56734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Increasing and substantial reliance on Electronic health records (EHR) and data types (i.e., diagnosis (Dx), medication (Rx), laboratory (Lx)) demands assessment of its data quality (DQ) as a fundamental approach; especially since there is need to identify appropriate denominator population with chronic conditions, such as Type-2 Diabetes (T2D), using commonly available computable phenotype definitions (phenotype).</p><p><strong>Objective: </strong>To bridge this gap, our study aims to assess how issues of EHR DQ, and variations and robustness (or lack thereof) in phenotypes may have potential impact in identifying denominator population.</p><p><strong>Methods: </strong>Approximately 208k patients with T2D were included in our study using retrospective EHR data of Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes, and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (i.e., age, sex, race, ethnicity), healthcare utilization (inpatient and emergency room visits), and average Charlson Comorbidity score of each phenotype. We then used different methods to induce/simulate DQ issues of completeness, accuracy and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped Dx, Rx, and Lx codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a Dx or Rx code with another code of the same data type and induced 2% incremental change from -100% to +10% in Lx result values; and lastly, for timeliness, data was modeled for induced incremental shift of date records by 30 days up to a year.</p><p><strong>Results: </strong>Less than a quarter (23%) of population overlapped across all phenotypes using EHR. The population identified by each phenotype varied across all combination of data types. Induced incompleteness identified fewer patients with each increment, for e.g., at 100% diagnostic incompleteness, Chronic Conditions Data Warehouse (CCW) phenotype identified zero patients as its phenotypic characteristics included only Dx codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype and therefore, resulting in fewer patients being identified with each incremental change.</p><p><strong>Conclusions: </strong>We utilized EHR data with Dx, Rx, and Lx data types from a large tertiary hospital system to understand the T2D phenotypic differences and performance. We learned how issues of DQ, using induced DQ methods, may impact identification of the denominator populations upon which clinical (e.g., clinical research and trials, population health evaluations) and financial/operational decisions are made. The novel results from our study may inform in shaping a common T2D computable phenotype definition that can be applicable to clinical informatics, managing chronic conditions, and additional healthcare industry-wide efforts.</p><p><strong>Clinicaltrial: </strong></p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/56734\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/56734","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients with Type 2 Diabetes.
Background: Increasing and substantial reliance on Electronic health records (EHR) and data types (i.e., diagnosis (Dx), medication (Rx), laboratory (Lx)) demands assessment of its data quality (DQ) as a fundamental approach; especially since there is need to identify appropriate denominator population with chronic conditions, such as Type-2 Diabetes (T2D), using commonly available computable phenotype definitions (phenotype).
Objective: To bridge this gap, our study aims to assess how issues of EHR DQ, and variations and robustness (or lack thereof) in phenotypes may have potential impact in identifying denominator population.
Methods: Approximately 208k patients with T2D were included in our study using retrospective EHR data of Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes, and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (i.e., age, sex, race, ethnicity), healthcare utilization (inpatient and emergency room visits), and average Charlson Comorbidity score of each phenotype. We then used different methods to induce/simulate DQ issues of completeness, accuracy and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped Dx, Rx, and Lx codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a Dx or Rx code with another code of the same data type and induced 2% incremental change from -100% to +10% in Lx result values; and lastly, for timeliness, data was modeled for induced incremental shift of date records by 30 days up to a year.
Results: Less than a quarter (23%) of population overlapped across all phenotypes using EHR. The population identified by each phenotype varied across all combination of data types. Induced incompleteness identified fewer patients with each increment, for e.g., at 100% diagnostic incompleteness, Chronic Conditions Data Warehouse (CCW) phenotype identified zero patients as its phenotypic characteristics included only Dx codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype and therefore, resulting in fewer patients being identified with each incremental change.
Conclusions: We utilized EHR data with Dx, Rx, and Lx data types from a large tertiary hospital system to understand the T2D phenotypic differences and performance. We learned how issues of DQ, using induced DQ methods, may impact identification of the denominator populations upon which clinical (e.g., clinical research and trials, population health evaluations) and financial/operational decisions are made. The novel results from our study may inform in shaping a common T2D computable phenotype definition that can be applicable to clinical informatics, managing chronic conditions, and additional healthcare industry-wide efforts.
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.