Multimodal representations of biomedical knowledge from limited training whole slide images and reports using deep learning

IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Medical image analysis Pub Date : 2024-08-14 DOI:10.1016/j.media.2024.103303
Niccolò Marini , Stefano Marchesin , Marek Wodzinski , Alessandro Caputo , Damian Podareanu , Bryan Cardenas Guevara , Svetla Boytcheva , Simona Vatrano , Filippo Fraggetta , Francesco Ciompi , Gianmaria Silvello , Henning Müller , Manfredo Atzori
{"title":"Multimodal representations of biomedical knowledge from limited training whole slide images and reports using deep learning","authors":"Niccolò Marini ,&nbsp;Stefano Marchesin ,&nbsp;Marek Wodzinski ,&nbsp;Alessandro Caputo ,&nbsp;Damian Podareanu ,&nbsp;Bryan Cardenas Guevara ,&nbsp;Svetla Boytcheva ,&nbsp;Simona Vatrano ,&nbsp;Filippo Fraggetta ,&nbsp;Francesco Ciompi ,&nbsp;Gianmaria Silvello ,&nbsp;Henning Müller ,&nbsp;Manfredo Atzori","doi":"10.1016/j.media.2024.103303","DOIUrl":null,"url":null,"abstract":"<div><p>The increasing availability of biomedical data creates valuable resources for developing new deep learning algorithms to support experts, especially in domains where collecting large volumes of annotated data is not trivial. Biomedical data include several modalities containing complementary information, such as medical images and reports: images are often large and encode low-level information, while reports include a summarized high-level description of the findings identified within data and often only concerning a small part of the image. However, only a few methods allow to effectively link the visual content of images with the textual content of reports, preventing medical specialists from properly benefitting from the recent opportunities offered by deep learning models. This paper introduces a multimodal architecture creating a robust biomedical data representation encoding fine-grained text representations within image embeddings. The architecture aims to tackle data scarcity (combining supervised and self-supervised learning) and to create multimodal biomedical ontologies. The architecture is trained on over 6,000 colon whole slide Images (WSI), paired with the corresponding report, collected from two digital pathology workflows. The evaluation of the multimodal architecture involves three tasks: WSI classification (on data from pathology workflow and from public repositories), multimodal data retrieval, and linking between textual and visual concepts. Noticeably, the latter two tasks are available by architectural design without further training, showing that the multimodal architecture that can be adopted as a backbone to solve peculiar tasks. The multimodal data representation outperforms the unimodal one on the classification of colon WSIs and allows to halve the data needed to reach accurate performance, reducing the computational power required and thus the carbon footprint. The combination of images and reports exploiting self-supervised algorithms allows to mine databases without needing new annotations provided by experts, extracting new information. In particular, the multimodal visual ontology, linking semantic concepts to images, may pave the way to advancements in medicine and biomedical analysis domains, not limited to histopathology.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"97 ","pages":"Article 103303"},"PeriodicalIF":10.7000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1361841524002287/pdfft?md5=73a7966410c3f9ed908cd48c6bfefa5b&pid=1-s2.0-S1361841524002287-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841524002287","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The increasing availability of biomedical data creates valuable resources for developing new deep learning algorithms to support experts, especially in domains where collecting large volumes of annotated data is not trivial. Biomedical data include several modalities containing complementary information, such as medical images and reports: images are often large and encode low-level information, while reports include a summarized high-level description of the findings identified within data and often only concerning a small part of the image. However, only a few methods allow to effectively link the visual content of images with the textual content of reports, preventing medical specialists from properly benefitting from the recent opportunities offered by deep learning models. This paper introduces a multimodal architecture creating a robust biomedical data representation encoding fine-grained text representations within image embeddings. The architecture aims to tackle data scarcity (combining supervised and self-supervised learning) and to create multimodal biomedical ontologies. The architecture is trained on over 6,000 colon whole slide Images (WSI), paired with the corresponding report, collected from two digital pathology workflows. The evaluation of the multimodal architecture involves three tasks: WSI classification (on data from pathology workflow and from public repositories), multimodal data retrieval, and linking between textual and visual concepts. Noticeably, the latter two tasks are available by architectural design without further training, showing that the multimodal architecture that can be adopted as a backbone to solve peculiar tasks. The multimodal data representation outperforms the unimodal one on the classification of colon WSIs and allows to halve the data needed to reach accurate performance, reducing the computational power required and thus the carbon footprint. The combination of images and reports exploiting self-supervised algorithms allows to mine databases without needing new annotations provided by experts, extracting new information. In particular, the multimodal visual ontology, linking semantic concepts to images, may pave the way to advancements in medicine and biomedical analysis domains, not limited to histopathology.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用深度学习从有限的训练全切片图像和报告中获得生物医学知识的多模态表征。
生物医学数据的可用性越来越高,为开发支持专家的新型深度学习算法创造了宝贵的资源,尤其是在收集大量注释数据并非易事的领域。生物医学数据包括几种包含互补信息的模式,如医学图像和报告:图像通常很大,编码低层次信息,而报告包括对数据中确定的研究结果的高层次摘要描述,通常只涉及图像的一小部分。然而,只有少数方法可以有效地将图像的视觉内容与报告的文本内容联系起来,这使得医学专家无法从深度学习模型提供的最新机会中适当获益。本文介绍了一种多模态架构,该架构创建了一种在图像嵌入中编码细粒度文本表示的稳健生物医学数据表示。该架构旨在解决数据稀缺问题(结合监督学习和自我监督学习),并创建多模态生物医学本体。该架构在从两个数字病理工作流中收集的 6000 多张结肠全切片图像(WSI)和相应的报告上进行了训练。多模态架构的评估包括三项任务:WSI分类(来自病理工作流和公共资料库的数据)、多模态数据检索以及文本和视觉概念之间的链接。值得注意的是,后两项任务无需进一步训练即可通过架构设计完成,这表明多模态架构可以作为解决特殊任务的骨干。多模态数据表示在结肠 WSI 分类方面优于单模态数据表示,并能将达到准确性能所需的数据减半,从而降低所需计算能力,减少碳足迹。利用自监督算法将图像和报告结合起来,无需专家提供新注释即可挖掘数据库,提取新信息。特别是将语义概念与图像联系起来的多模态视觉本体,可能会为医学和生物医学分析领域(不限于组织病理学)的进步铺平道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Medical image analysis
Medical image analysis 工程技术-工程:生物医学
CiteScore
22.10
自引率
6.40%
发文量
309
审稿时长
6.6 months
期刊介绍: Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.
期刊最新文献
Corrigendum to "Detection and analysis of cerebral aneurysms based on X-ray rotational angiography - the CADA 2020 challenge" [Medical Image Analysis, April 2022, Volume 77, 102333]. Editorial for Special Issue on Foundation Models for Medical Image Analysis. Few-shot medical image segmentation with high-fidelity prototypes. The Developing Human Connectome Project: A fast deep learning-based pipeline for neonatal cortical surface reconstruction. AutoFOX: An automated cross-modal 3D fusion framework of coronary X-ray angiography and OCT.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1