基于ehr的预测建模满足多模态深度学习：结构化和文本数据融合方法的系统回顾

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2025-06-01 Epub Date: 2025-02-01 DOI:10.1016/j.inffus.2025.102981

Ariel Soares Teles , Ivan Rodrigues de Moura , Francisco Silva , Angus Roberts , Daniel Stahl

{"title":"基于ehr的预测建模满足多模态深度学习：结构化和文本数据融合方法的系统回顾","authors":"Ariel Soares Teles , Ivan Rodrigues de Moura , Francisco Silva , Angus Roberts , Daniel Stahl","doi":"10.1016/j.inffus.2025.102981","DOIUrl":null,"url":null,"abstract":"<div><div>Electronic Health Records (EHRs) have transformed healthcare by digitally consolidating patient medical history, encompassing structured data (e.g., demographic data, lab results), and unstructured textual data (e.g., clinical notes). These data hold significant potential for predictive modelling, and recent studies have dedicated efforts to leverage the different modalities in a cohesive and effective manner to improve predictive accuracy. This Systematic Literature Review (SLR) addresses the application of Multimodal Deep Learning (MDL) methods in EHR-based prediction modelling, specifically through the fusion of structured and textual data. Following PRISMA guidelines, we conducted a comprehensive literature search across six article databases, using a carefully designed search string. After applying inclusion and exclusion criteria, we selected 77 primary studies. Data extraction was standardized using a structured form based on the CHARMS checklist. We categorized and analysed the fusion strategies employed across the studies. By combining structured and textual data at the input level, early fusion enabled models to learn joint feature representations from the beginning, whether in vectorized representations or data textualization. Intermediate fusion, which delays integration, was particularly useful for tasks where each modality provides unique insights that need to be processed independently before being combined. Late fusion enabled modularity by integrating outputs from unimodal models, which is suitable when EHR structured and textual data have varying quality or reliability. We also identified trends and open issues that need attention. This review contributes a comprehensive understanding of EHR data fusion practices using MDL, highlighting potential pathways for future research and development in health informatics.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102981"},"PeriodicalIF":15.5000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EHR-based prediction modelling meets multimodal deep learning: A systematic review of structured and textual data fusion methods\",\"authors\":\"Ariel Soares Teles , Ivan Rodrigues de Moura , Francisco Silva , Angus Roberts , Daniel Stahl\",\"doi\":\"10.1016/j.inffus.2025.102981\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Electronic Health Records (EHRs) have transformed healthcare by digitally consolidating patient medical history, encompassing structured data (e.g., demographic data, lab results), and unstructured textual data (e.g., clinical notes). These data hold significant potential for predictive modelling, and recent studies have dedicated efforts to leverage the different modalities in a cohesive and effective manner to improve predictive accuracy. This Systematic Literature Review (SLR) addresses the application of Multimodal Deep Learning (MDL) methods in EHR-based prediction modelling, specifically through the fusion of structured and textual data. Following PRISMA guidelines, we conducted a comprehensive literature search across six article databases, using a carefully designed search string. After applying inclusion and exclusion criteria, we selected 77 primary studies. Data extraction was standardized using a structured form based on the CHARMS checklist. We categorized and analysed the fusion strategies employed across the studies. By combining structured and textual data at the input level, early fusion enabled models to learn joint feature representations from the beginning, whether in vectorized representations or data textualization. Intermediate fusion, which delays integration, was particularly useful for tasks where each modality provides unique insights that need to be processed independently before being combined. Late fusion enabled modularity by integrating outputs from unimodal models, which is suitable when EHR structured and textual data have varying quality or reliability. We also identified trends and open issues that need attention. This review contributes a comprehensive understanding of EHR data fusion practices using MDL, highlighting potential pathways for future research and development in health informatics.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"118 \",\"pages\":\"Article 102981\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525000545\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/1 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525000545","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

电子健康记录（EHRs）通过数字化整合患者病史(包括结构化数据（例如，人口统计数据、实验室结果）和非结构化文本数据（例如，临床记录），改变了医疗保健。这些数据具有预测建模的巨大潜力，最近的研究致力于以一种连贯有效的方式利用不同的模式来提高预测的准确性。本系统文献综述（SLR）探讨了多模态深度学习（MDL）方法在基于ehr的预测建模中的应用，特别是通过结构化和文本数据的融合。遵循PRISMA指南，我们使用精心设计的搜索字符串，在六个文章数据库中进行了全面的文献检索。在应用纳入和排除标准后，我们选择了77项主要研究。使用基于CHARMS检查表的结构化表单对数据提取进行了标准化。我们对研究中采用的融合策略进行了分类和分析。通过在输入级别结合结构化和文本数据，早期融合使模型能够从一开始就学习联合特征表示，无论是矢量表示还是数据文本化。中间融合会延迟集成，对于每种模式都提供独特的见解，需要在组合之前独立处理的任务特别有用。后期融合通过集成单模模型的输出来实现模块化，这适用于EHR结构化和文本数据具有不同质量或可靠性的情况。我们还确定了需要注意的趋势和悬而未决的问题。这篇综述有助于全面了解使用MDL的电子病历数据融合实践，突出了未来卫生信息学研究和发展的潜在途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EHR-based prediction modelling meets multimodal deep learning: A systematic review of structured and textual data fusion methods

Electronic Health Records (EHRs) have transformed healthcare by digitally consolidating patient medical history, encompassing structured data (e.g., demographic data, lab results), and unstructured textual data (e.g., clinical notes). These data hold significant potential for predictive modelling, and recent studies have dedicated efforts to leverage the different modalities in a cohesive and effective manner to improve predictive accuracy. This Systematic Literature Review (SLR) addresses the application of Multimodal Deep Learning (MDL) methods in EHR-based prediction modelling, specifically through the fusion of structured and textual data. Following PRISMA guidelines, we conducted a comprehensive literature search across six article databases, using a carefully designed search string. After applying inclusion and exclusion criteria, we selected 77 primary studies. Data extraction was standardized using a structured form based on the CHARMS checklist. We categorized and analysed the fusion strategies employed across the studies. By combining structured and textual data at the input level, early fusion enabled models to learn joint feature representations from the beginning, whether in vectorized representations or data textualization. Intermediate fusion, which delays integration, was particularly useful for tasks where each modality provides unique insights that need to be processed independently before being combined. Late fusion enabled modularity by integrating outputs from unimodal models, which is suitable when EHR structured and textual data have varying quality or reliability. We also identified trends and open issues that need attention. This review contributes a comprehensive understanding of EHR data fusion practices using MDL, highlighting potential pathways for future research and development in health informatics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.