Converting Health Level 7 Clinical Document Architecture (CDA) documents to Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) by leveraging CDA Template definitions.
Florian Katsch, Rada Hussein, Tanja Stamm, Georg Duftschmid
{"title":"Converting Health Level 7 Clinical Document Architecture (CDA) documents to Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) by leveraging CDA Template definitions.","authors":"Florian Katsch, Rada Hussein, Tanja Stamm, Georg Duftschmid","doi":"10.1093/jamiaopen/ooaf022","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This work aims to develop a methodology for transforming Health Level 7 (HL7) Clinical Document Architecture (CDA) documents into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The described method seeks to improve the Extract, Transform, Load (ETL) design process by using HL7 CDA Template definitions and the CDA Refined Message Information Model (CDA R-MIM).</p><p><strong>Material and methods: </strong>Our approach utilizes HL7 CDA Templates to define structural and semantic mappings. Supported by the CDA R-MIM for semantic alignment with the OMOP CDM, we developed a tool named CDA Rabbit that enables the generation of Rabbit-In-a-Hat project files from HL7 CDA Template definitions and could be successfully integrated into the existing toolchain around OMOP.</p><p><strong>Results: </strong>We tested our approach using 13 CDA Templates from the Austrian national EHR System (ELGA) and 430 anonymized CDA test documents that were mapped to 10 OMOP CDM tables. The data quality assessment, using OMOP's DataQualityDashboard, showed a 99% pass rate, indicating a robust and accurate data transformation.</p><p><strong>Conclusion: </strong>This study presents a novel framework for transforming HL7 CDA documents into OMOP CDM using template definitions and CDA R-MIM. The methodology improves semantic interoperability, mapping reusability, and ETL design efficiency. Future work should focus on automating code generation, improving data profiling, and addressing cyclic dependencies within CDA templates. The presented approach supports improved secondary use of health data and research while adhering to standardized data models and semantics.</p><p><strong>Discussion: </strong>Using CDA Templates for ETL design addresses common ETL challenges, such as data accessibility during ETL design, by decoupling the process from the actual CDA instances. Future work could focus on extending this approach to automatically generate boilerplate code, address cyclic dependencies within CDA Templates, and adapt the method for the use with FHIR profiles.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 2","pages":"ooaf022"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11945294/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: This work aims to develop a methodology for transforming Health Level 7 (HL7) Clinical Document Architecture (CDA) documents into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The described method seeks to improve the Extract, Transform, Load (ETL) design process by using HL7 CDA Template definitions and the CDA Refined Message Information Model (CDA R-MIM).
Material and methods: Our approach utilizes HL7 CDA Templates to define structural and semantic mappings. Supported by the CDA R-MIM for semantic alignment with the OMOP CDM, we developed a tool named CDA Rabbit that enables the generation of Rabbit-In-a-Hat project files from HL7 CDA Template definitions and could be successfully integrated into the existing toolchain around OMOP.
Results: We tested our approach using 13 CDA Templates from the Austrian national EHR System (ELGA) and 430 anonymized CDA test documents that were mapped to 10 OMOP CDM tables. The data quality assessment, using OMOP's DataQualityDashboard, showed a 99% pass rate, indicating a robust and accurate data transformation.
Conclusion: This study presents a novel framework for transforming HL7 CDA documents into OMOP CDM using template definitions and CDA R-MIM. The methodology improves semantic interoperability, mapping reusability, and ETL design efficiency. Future work should focus on automating code generation, improving data profiling, and addressing cyclic dependencies within CDA templates. The presented approach supports improved secondary use of health data and research while adhering to standardized data models and semantics.
Discussion: Using CDA Templates for ETL design addresses common ETL challenges, such as data accessibility during ETL design, by decoupling the process from the actual CDA instances. Future work could focus on extending this approach to automatically generate boilerplate code, address cyclic dependencies within CDA Templates, and adapt the method for the use with FHIR profiles.