Harmonizing population health data into OMOP common data model: a demonstration using COVID-19 sero-surveillance data from Nairobi Urban Health and Demographic Surveillance System.
Michael Ochola, Sylvia Kiwuwa-Muyingo, Tathagata Bhattacharjee, David Amadi, Maureen Ng'etich, Damazo Kadengye, Henry Owoko, Boniface Igumba, Jay Greenfield, Jim Todd, Agnes Kiragga
{"title":"Harmonizing population health data into OMOP common data model: a demonstration using COVID-19 sero-surveillance data from Nairobi Urban Health and Demographic Surveillance System.","authors":"Michael Ochola, Sylvia Kiwuwa-Muyingo, Tathagata Bhattacharjee, David Amadi, Maureen Ng'etich, Damazo Kadengye, Henry Owoko, Boniface Igumba, Jay Greenfield, Jim Todd, Agnes Kiragga","doi":"10.3389/fdgth.2025.1423621","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Observational health data are collected in different formats and structures, making it challenging to analyze with common tools. The Observational Medical Outcome Partnership (OMOP) Common Data Model (CDM) is a standardized data model that can harmonize observational health data.</p><p><strong>Objective: </strong>This paper demonstrates the use of the OMOP CDM to harmonize COVID-19 sero-surveillance data from the Nairobi Urban Health and Demographic Surveillance System (HDSS).</p><p><strong>Methods: </strong>In this study, we extracted data from the Nairobi Urban HDSS COVID-19 sero-surveillance database and mapped it to the OMOP CDM. We used open-source Observational Health Data Sciences and Informatics (OHDSI) tools like WhiteRabbit, RabbitInAHat, and USAGI. The steps included data profiling (scanning), mapping the vocabularies using the offline USAGI and online ATHENA, and designing the extract, transform, and load (ETL) process using RabbitInAHat. The ETL process was implemented using Pentaho Data Integration community edition software and structured query language (SQL). The target OMOP CDM can now be used to analyze the prevalence of COVID-19 antibodies in the Nairobi Urban HDSS population.</p><p><strong>Results: </strong>We successfully mapped the Nairobi Urban HDSS COVID-19 sero-surveillance data to the OMOP CDM. The standardized dataset included information on demographics, COVID-19 symptoms, vaccination, and COVID-19 antibody test results.</p><p><strong>Conclusions: </strong>The OMOP CDM is a valuable tool for harmonizing observational health data. Using the OMOP CDM facilitates the sharing and analysis of observational health data, leading to a better understanding of disease conditions and trends and improving evidence-based population health strategies.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1423621"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11822943/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1423621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Observational health data are collected in different formats and structures, making it challenging to analyze with common tools. The Observational Medical Outcome Partnership (OMOP) Common Data Model (CDM) is a standardized data model that can harmonize observational health data.
Objective: This paper demonstrates the use of the OMOP CDM to harmonize COVID-19 sero-surveillance data from the Nairobi Urban Health and Demographic Surveillance System (HDSS).
Methods: In this study, we extracted data from the Nairobi Urban HDSS COVID-19 sero-surveillance database and mapped it to the OMOP CDM. We used open-source Observational Health Data Sciences and Informatics (OHDSI) tools like WhiteRabbit, RabbitInAHat, and USAGI. The steps included data profiling (scanning), mapping the vocabularies using the offline USAGI and online ATHENA, and designing the extract, transform, and load (ETL) process using RabbitInAHat. The ETL process was implemented using Pentaho Data Integration community edition software and structured query language (SQL). The target OMOP CDM can now be used to analyze the prevalence of COVID-19 antibodies in the Nairobi Urban HDSS population.
Results: We successfully mapped the Nairobi Urban HDSS COVID-19 sero-surveillance data to the OMOP CDM. The standardized dataset included information on demographics, COVID-19 symptoms, vaccination, and COVID-19 antibody test results.
Conclusions: The OMOP CDM is a valuable tool for harmonizing observational health data. Using the OMOP CDM facilitates the sharing and analysis of observational health data, leading to a better understanding of disease conditions and trends and improving evidence-based population health strategies.