The Secondary Use of Electronic Health Records for Data Mining: Data Characteristics and Challenges

ACM Computing Surveys (CSUR) Pub Date : 2022-01-18 DOI:10.1145/3490234

Tabinda Sarwar, S. Seifollahi, Jeffrey A Chan, Xiuzhen Zhang, V. Aksakalli, I. Hudson, Karin M. Verspoor, L. Cavedon

{"title":"The Secondary Use of Electronic Health Records for Data Mining: Data Characteristics and Challenges","authors":"Tabinda Sarwar, S. Seifollahi, Jeffrey A Chan, Xiuzhen Zhang, V. Aksakalli, I. Hudson, Karin M. Verspoor, L. Cavedon","doi":"10.1145/3490234","DOIUrl":null,"url":null,"abstract":"The primary objective of implementing Electronic Health Records (EHRs) is to improve the management of patients’ health-related information. However, these records have also been extensively used for the secondary purpose of clinical research and to improve healthcare practice. EHRs provide a rich set of information that includes demographics, medical history, medications, laboratory test results, and diagnosis. Data mining and analytics techniques have extensively exploited EHR information to study patient cohorts for various clinical and research applications, such as phenotype extraction, precision medicine, intervention evaluation, disease prediction, detection, and progression. But the presence of diverse data types and associated characteristics poses many challenges to the use of EHR data. In this article, we provide an overview of information found in EHR systems and their characteristics that could be utilized for secondary applications. We first discuss the different types of data stored in EHRs, followed by the data transformations necessary for data analysis and mining. Later, we discuss the data quality issues and characteristics of the EHRs along with the relevant methods used to address them. Moreover, this survey also highlights the usage of various data types for different applications. Hence, this article can serve as a primer for researchers to understand the use of EHRs for data mining and analytics purposes.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"8 1","pages":"1 - 40"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys (CSUR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3490234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

The primary objective of implementing Electronic Health Records (EHRs) is to improve the management of patients’ health-related information. However, these records have also been extensively used for the secondary purpose of clinical research and to improve healthcare practice. EHRs provide a rich set of information that includes demographics, medical history, medications, laboratory test results, and diagnosis. Data mining and analytics techniques have extensively exploited EHR information to study patient cohorts for various clinical and research applications, such as phenotype extraction, precision medicine, intervention evaluation, disease prediction, detection, and progression. But the presence of diverse data types and associated characteristics poses many challenges to the use of EHR data. In this article, we provide an overview of information found in EHR systems and their characteristics that could be utilized for secondary applications. We first discuss the different types of data stored in EHRs, followed by the data transformations necessary for data analysis and mining. Later, we discuss the data quality issues and characteristics of the EHRs along with the relevant methods used to address them. Moreover, this survey also highlights the usage of various data types for different applications. Hence, this article can serve as a primer for researchers to understand the use of EHRs for data mining and analytics purposes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

电子健康记录在数据挖掘中的二次使用:数据特征和挑战

实施电子健康记录(EHRs)的主要目标是改善对患者健康相关信息的管理。然而，这些记录也被广泛用于临床研究和改善医疗保健实践的次要目的。电子病历提供了一组丰富的信息，包括人口统计、病史、药物、实验室测试结果和诊断。数据挖掘和分析技术已经广泛利用电子病历信息来研究各种临床和研究应用的患者群体，如表型提取、精准医学、干预评估、疾病预测、检测和进展。但是，各种数据类型和相关特征的存在给电子病历数据的使用带来了许多挑战。在本文中，我们概述了在EHR系统中发现的信息及其可用于辅助应用程序的特征。我们首先讨论存储在ehr中的不同类型的数据，然后讨论数据分析和挖掘所需的数据转换。稍后，我们将讨论电子病历的数据质量问题和特征，以及用于解决这些问题的相关方法。此外，该调查还强调了不同应用程序对不同数据类型的使用。因此，本文可以作为研究人员了解电子病历用于数据挖掘和分析目的的入门读物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Computing Surveys (CSUR)

自引率

0.00%

发文量