Enhancing clinical data warehousing with provenance data to support longitudinal analyses and large file management: The gitOmmix approach for genomic and image data

IF 4.5 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Biomedical Informatics Pub Date : 2025-03-01 Epub Date: 2025-02-12 DOI:10.1016/j.jbi.2025.104788

Maxime Wack , Adrien Coulet , Anita Burgun , Bastien Rance

{"title":"Enhancing clinical data warehousing with provenance data to support longitudinal analyses and large file management: The gitOmmix approach for genomic and image data","authors":"Maxime Wack , Adrien Coulet , Anita Burgun , Bastien Rance","doi":"10.1016/j.jbi.2025.104788","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>If hospital Clinical Data Warehouses are to address today’s focus in personalized medicine, they need to be able to track patients longitudinally and manage the large data sets generated by whole genome sequencing, RNA analyses, and complex imaging studies. Current Clinical Data Warehouses address neither issue. This paper reports on methods to enrich current systems by providing provenance data allowing patient histories to be followed longitudinally and managing the linking and versioning of large data sets from whatever source. The methods are open source and applicable to any clinical data warehouse system, whether data schema it uses.</div></div><div><h3>Method:</h3><div>We introduce <span>gitOmmix</span>, an approach that overcomes these limitations, and illustrate its usefulness in the management of medical omics data. <span>gitOmmix</span> relies on <em>(i)</em> a file versioning system: git, <em>(ii)</em> an extension that handles large files: git-annex, <em>(iii)</em> a provenance knowledge graph: PROV-O, and <em>(iv)</em> an alignment between the git versioning information and the provenance knowledge graph.</div></div><div><h3>Results:</h3><div>Capabilities inherited from git and git-annex enable retracing the history of a clinical interpretation back to the patient sample, through supporting data and analyses. In addition, the provenance knowledge graph, aligned with the git versioning information, enables querying and browsing provenance relationships between these elements.</div></div><div><h3>Conclusion:</h3><div><span>gitOmmix</span> adds a provenance layer to CDWs, while scaling to large files and being agnostic of the CDW system. For these reasons, we think that it is a viable and generalizable solution for omics clinical studies.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104788"},"PeriodicalIF":4.5000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000176","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/12 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background:

If hospital Clinical Data Warehouses are to address today’s focus in personalized medicine, they need to be able to track patients longitudinally and manage the large data sets generated by whole genome sequencing, RNA analyses, and complex imaging studies. Current Clinical Data Warehouses address neither issue. This paper reports on methods to enrich current systems by providing provenance data allowing patient histories to be followed longitudinally and managing the linking and versioning of large data sets from whatever source. The methods are open source and applicable to any clinical data warehouse system, whether data schema it uses.

Method:

We introduce gitOmmix, an approach that overcomes these limitations, and illustrate its usefulness in the management of medical omics data. gitOmmix relies on (i) a file versioning system: git, (ii) an extension that handles large files: git-annex, (iii) a provenance knowledge graph: PROV-O, and (iv) an alignment between the git versioning information and the provenance knowledge graph.

Results:

Capabilities inherited from git and git-annex enable retracing the history of a clinical interpretation back to the patient sample, through supporting data and analyses. In addition, the provenance knowledge graph, aligned with the git versioning information, enables querying and browsing provenance relationships between these elements.

Conclusion:

gitOmmix adds a provenance layer to CDWs, while scaling to large files and being agnostic of the CDW system. For these reasons, we think that it is a viable and generalizable solution for omics clinical studies.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用来源数据增强临床数据仓库，以支持纵向分析和大型文件管理：基因组和图像数据的gitOmmix方法。

背景：如果医院临床数据仓库要解决当今对个性化医疗的关注，他们需要能够纵向跟踪患者并管理由全基因组测序、RNA分析和复杂成像研究产生的大型数据集。目前的临床数据仓库没有解决这两个问题。本文报告了通过提供来源数据来丰富当前系统的方法，允许纵向跟踪患者病史，并管理来自任何来源的大型数据集的链接和版本控制。这些方法是开源的，适用于任何临床数据仓库系统，无论它使用的是哪种数据模式。方法：我们介绍gitOmmix，一种克服这些限制的方法，并说明其在医学组学数据管理中的有用性。gitOmmix依赖于(i)一个文件版本控制系统：git，（ii）一个处理大文件的扩展：git-annex，（iii）一个出处知识图：provo，以及（iv） git版本控制信息和出处知识图之间的一致性。结果：从git和git-annex继承的功能可以通过支持数据和分析将临床解释的历史回溯到患者样本。此外，与git版本控制信息相结合的出处知识图支持查询和浏览这些元素之间的出处关系。结论：gitOmmix为CDW添加了一个来源层，同时扩展到大文件，并且对CDW系统不可知。基于这些原因，我们认为这是一种可行的、可推广的组学临床研究解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.