{"title":"An EHR Data Quality Evaluation Approach Based on Medical Knowledge and Text Matching","authors":"Nanya Chen, Jiangtao Ren","doi":"10.1016/j.irbm.2023.100782","DOIUrl":null,"url":null,"abstract":"<div><p><em>Introduction</em><span><span>: Recently, medical artificial intelligence based on Electronic Health Records (EHR) is a significant research field, and EHR data has been widely used in </span>clinical decision support systems and medical diagnosis systems. However, because EHR are used to record the patient's disease information and are not primarily designed for research and discovery, the utility of EHR for research will be hindered by data quality problems. Therefore, it is a meaningful and challenging task to evaluate the data quality of EHR before they are used in medical artificial intelligence. Most of the current EHR data quality evaluation methods are based on some conventional evaluation indicators, and rarely consider the introduction of clinical evidence.</span></p><p><em>Materials and methods</em>: we propose an EHR data quality evaluation approach based on clinical evidence and a deep text matching model. First, based on the medical knowledge of the particular disease, we establish the list of standard clinical evidence descriptions including typical symptoms and special signs, etc. Then we find the relevant clinical evidence from the EHR based on the text matching model, and finally evaluate the quality of the EHR based on the quantity and quality of the relevant clinical evidence found.</p><p><em>Results</em><span>: The experimental results of more than 1,000 EHR for two types of diseases show that our approach can effectively distinguish high-quality EHR from low-quality EHR, and the high-quality EHR found generally contains sufficient and consistent information related to disease diagnosis.</span></p><p><em>Conclusions</em>: Experiments results on a real-world dataset demonstrate the effectiveness of our EHR data quality evaluation approach based on medical knowledge and text matching.</p></div>","PeriodicalId":14605,"journal":{"name":"Irbm","volume":"44 5","pages":"Article 100782"},"PeriodicalIF":5.6000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Irbm","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1959031823000313","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Recently, medical artificial intelligence based on Electronic Health Records (EHR) is a significant research field, and EHR data has been widely used in clinical decision support systems and medical diagnosis systems. However, because EHR are used to record the patient's disease information and are not primarily designed for research and discovery, the utility of EHR for research will be hindered by data quality problems. Therefore, it is a meaningful and challenging task to evaluate the data quality of EHR before they are used in medical artificial intelligence. Most of the current EHR data quality evaluation methods are based on some conventional evaluation indicators, and rarely consider the introduction of clinical evidence.
Materials and methods: we propose an EHR data quality evaluation approach based on clinical evidence and a deep text matching model. First, based on the medical knowledge of the particular disease, we establish the list of standard clinical evidence descriptions including typical symptoms and special signs, etc. Then we find the relevant clinical evidence from the EHR based on the text matching model, and finally evaluate the quality of the EHR based on the quantity and quality of the relevant clinical evidence found.
Results: The experimental results of more than 1,000 EHR for two types of diseases show that our approach can effectively distinguish high-quality EHR from low-quality EHR, and the high-quality EHR found generally contains sufficient and consistent information related to disease diagnosis.
Conclusions: Experiments results on a real-world dataset demonstrate the effectiveness of our EHR data quality evaluation approach based on medical knowledge and text matching.
期刊介绍:
IRBM is the journal of the AGBM (Alliance for engineering in Biology an Medicine / Alliance pour le génie biologique et médical) and the SFGBM (BioMedical Engineering French Society / Société française de génie biologique médical) and the AFIB (French Association of Biomedical Engineers / Association française des ingénieurs biomédicaux).
As a vehicle of information and knowledge in the field of biomedical technologies, IRBM is devoted to fundamental as well as clinical research. Biomedical engineering and use of new technologies are the cornerstones of IRBM, providing authors and users with the latest information. Its six issues per year propose reviews (state-of-the-art and current knowledge), original articles directed at fundamental research and articles focusing on biomedical engineering. All articles are submitted to peer reviewers acting as guarantors for IRBM''s scientific and medical content. The field covered by IRBM includes all the discipline of Biomedical engineering. Thereby, the type of papers published include those that cover the technological and methodological development in:
-Physiological and Biological Signal processing (EEG, MEG, ECG…)-
Medical Image processing-
Biomechanics-
Biomaterials-
Medical Physics-
Biophysics-
Physiological and Biological Sensors-
Information technologies in healthcare-
Disability research-
Computational physiology-
…