Mengjia Kang, Jose A Alvarado-Guzman, Luke V Rasmussen, Justin B Starren
{"title":"Evolution of a Graph Model for the OMOP Common Data Model.","authors":"Mengjia Kang, Jose A Alvarado-Guzman, Luke V Rasmussen, Justin B Starren","doi":"10.1055/s-0044-1791487","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong> Graph databases for electronic health record (EHR) data have become a useful tool for clinical research in recent years, but there is a lack of published methods to transform relational databases to a graph database schema. We developed a graph model for the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that can be reused across research institutions.</p><p><strong>Methods: </strong> We created and evaluated four models, representing two different strategies, for converting the standardized clinical and vocabulary tables of OMOP into a property graph model within the Neo4j graph database. Taking the Successful Clinical Response in Pneumonia Therapy (SCRIPT) and Collaborative Resource for Intensive care Translational science, Informatics, Comprehensive Analytics, and Learning (CRITICAL) cohorts as test datasets with different sizes, we compared two of the resulting graph models with respect to database performance including database building time, query complexity, and runtime for both cohorts.</p><p><strong>Results: </strong> Utilizing a graph schema that was optimized for storing critical information as topology rather than attributes resulted in a significant improvement in both data creation and querying. The graph database for our larger cohort, CRITICAL, can be built within 1 hour for 134,145 patients, with a total of 749,011,396 nodes and 1,703,560,910 edges.</p><p><strong>Discussion: </strong> To our knowledge, this is the first generalized solution to convert the OMOP CDM to a graph-optimized schema. Despite being developed for studies at a single institution, the modeling method can be applied to other OMOP CDM v5.x databases. Our evaluation with the SCRIPT and CRITICAL cohorts and comparison between the current and previous versions show advantages in code simplicity, database building, and query speed.</p><p><strong>Conclusion: </strong> We developed a method for converting OMOP CDM databases into graph databases. Our experiments revealed that the final model outperformed the initial relational-to-graph transformation in both code simplicity and query efficiency, particularly for complex queries.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"15 5","pages":"1056-1065"},"PeriodicalIF":2.1000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11617070/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Clinical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/s-0044-1791487","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/4 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Graph databases for electronic health record (EHR) data have become a useful tool for clinical research in recent years, but there is a lack of published methods to transform relational databases to a graph database schema. We developed a graph model for the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that can be reused across research institutions.
Methods: We created and evaluated four models, representing two different strategies, for converting the standardized clinical and vocabulary tables of OMOP into a property graph model within the Neo4j graph database. Taking the Successful Clinical Response in Pneumonia Therapy (SCRIPT) and Collaborative Resource for Intensive care Translational science, Informatics, Comprehensive Analytics, and Learning (CRITICAL) cohorts as test datasets with different sizes, we compared two of the resulting graph models with respect to database performance including database building time, query complexity, and runtime for both cohorts.
Results: Utilizing a graph schema that was optimized for storing critical information as topology rather than attributes resulted in a significant improvement in both data creation and querying. The graph database for our larger cohort, CRITICAL, can be built within 1 hour for 134,145 patients, with a total of 749,011,396 nodes and 1,703,560,910 edges.
Discussion: To our knowledge, this is the first generalized solution to convert the OMOP CDM to a graph-optimized schema. Despite being developed for studies at a single institution, the modeling method can be applied to other OMOP CDM v5.x databases. Our evaluation with the SCRIPT and CRITICAL cohorts and comparison between the current and previous versions show advantages in code simplicity, database building, and query speed.
Conclusion: We developed a method for converting OMOP CDM databases into graph databases. Our experiments revealed that the final model outperformed the initial relational-to-graph transformation in both code simplicity and query efficiency, particularly for complex queries.
期刊介绍:
ACI is the third Schattauer journal dealing with biomedical and health informatics. It perfectly complements our other journals Öffnet internen Link im aktuellen FensterMethods of Information in Medicine and the Öffnet internen Link im aktuellen FensterYearbook of Medical Informatics. The Yearbook of Medical Informatics being the “Milestone” or state-of-the-art journal and Methods of Information in Medicine being the “Science and Research” journal of IMIA, ACI intends to be the “Practical” journal of IMIA.