Wansu Chen, Fagen Xie, Don P Mccarthy, Kristi L Reynolds, Mingsum Lee, Karen J Coleman, Darios Getahun, Corinna Koebnick, Steve J Jacobsen
{"title":"Research data warehouse: using electronic health records to conduct population-based observational studies.","authors":"Wansu Chen, Fagen Xie, Don P Mccarthy, Kristi L Reynolds, Mingsum Lee, Karen J Coleman, Darios Getahun, Corinna Koebnick, Steve J Jacobsen","doi":"10.1093/jamiaopen/ooad039","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Electronic health records and many legacy systems contain rich longitudinal data that can be used for research; however, they typically are not readily available.</p><p><strong>Materials and methods: </strong>At Kaiser Permanente Southern California (KPSC), a research data warehouse (RDW) has been developed and maintained since the late 1990s and widely extended in 2006, aggregating and standardizing data collected from internal and a few external sources. This article provides a high-level overview of the RDW and discusses challenges common to data warehouses or repositories for research use. To demonstrate the application of the data, we report the volume, patient characteristics, and age-adjusted prevalence of selected medical conditions and utilization rates of selected medical procedures.</p><p><strong>Results: </strong>A total of 105 million person-years of health plan enrollment was recorded in the RDW between 1981 and 2018, with most healthcare utilization data available since early or middle 1990s. Among active enrollees on December 31, 2018, 15% were ≥65 years of age, 33.9% were non-Hispanic white, 43.3% Hispanic, 11.0% Asian, and 8.4% African American, and 34.4% of children (2-17 years old) and 72.1% of adults (≥18 years old) were overweight or obese. The age-adjusted prevalence of asthma, atrial fibrillation, diabetes mellitus, hypercholesteremia, and hypertension increased between 2001 and 2018. Hospitalization and Emergency Department (ED) visit rates appeared lower, and office visit rates seemed higher at KPSC compared to the reported US averages.</p><p><strong>Discussion and conclusion: </strong>Although the RDW is unique to KPSC, its methodologies and experience may provide useful insights for researchers of other healthcare systems worldwide in the era of big data analysis.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 2","pages":"ooad039"},"PeriodicalIF":2.5000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10284679/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooad039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Electronic health records and many legacy systems contain rich longitudinal data that can be used for research; however, they typically are not readily available.
Materials and methods: At Kaiser Permanente Southern California (KPSC), a research data warehouse (RDW) has been developed and maintained since the late 1990s and widely extended in 2006, aggregating and standardizing data collected from internal and a few external sources. This article provides a high-level overview of the RDW and discusses challenges common to data warehouses or repositories for research use. To demonstrate the application of the data, we report the volume, patient characteristics, and age-adjusted prevalence of selected medical conditions and utilization rates of selected medical procedures.
Results: A total of 105 million person-years of health plan enrollment was recorded in the RDW between 1981 and 2018, with most healthcare utilization data available since early or middle 1990s. Among active enrollees on December 31, 2018, 15% were ≥65 years of age, 33.9% were non-Hispanic white, 43.3% Hispanic, 11.0% Asian, and 8.4% African American, and 34.4% of children (2-17 years old) and 72.1% of adults (≥18 years old) were overweight or obese. The age-adjusted prevalence of asthma, atrial fibrillation, diabetes mellitus, hypercholesteremia, and hypertension increased between 2001 and 2018. Hospitalization and Emergency Department (ED) visit rates appeared lower, and office visit rates seemed higher at KPSC compared to the reported US averages.
Discussion and conclusion: Although the RDW is unique to KPSC, its methodologies and experience may provide useful insights for researchers of other healthcare systems worldwide in the era of big data analysis.
背景:电子健康记录和许多遗留系统包含丰富的纵向数据,可用于研究;然而,它们通常不是现成的。材料和方法:在Kaiser Permanente Southern California (KPSC),一个研究数据仓库(RDW)自20世纪90年代末以来一直在开发和维护,并于2006年得到广泛扩展,用于汇总和标准化从内部和一些外部来源收集的数据。本文提供了RDW的高级概述,并讨论了用于研究的数据仓库或存储库的常见挑战。为了证明数据的应用,我们报告了选定医疗条件的数量、患者特征和年龄调整患病率以及选定医疗程序的使用率。结果:1981年至2018年期间,RDW共记录了1.05亿人年的健康计划登记,其中大多数医疗保健利用数据是在20世纪90年代早期或中期获得的。在2018年12月31日的积极参与者中,15%的人年龄≥65岁,33.9%为非西班牙裔白人,43.3%为西班牙裔,11.0%为亚洲人,8.4%为非洲裔美国人,34.4%的儿童(2-17岁)和72.1%的成年人(≥18岁)超重或肥胖。2001年至2018年间,哮喘、心房颤动、糖尿病、高胆固醇血症和高血压的年龄调整患病率有所增加。与报道的美国平均水平相比,KPSC的住院和急诊科(ED)就诊率似乎较低,办公室就诊率似乎较高。讨论与结论:虽然RDW是KPSC独有的,但其方法和经验可能为全球其他医疗保健系统的研究人员在大数据分析时代提供有用的见解。