Radiology Reports Improve Visual Representations Learned from Radiographs.

Proceedings of machine learning research Pub Date : 2023-07-01

Haoxu Huang, Samyak Rawlekar, Sumit Chopra, Cem M Deniz

{"title":"Radiology Reports Improve Visual Representations Learned from Radiographs.","authors":"Haoxu Huang, Samyak Rawlekar, Sumit Chopra, Cem M Deniz","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Although human's ability to visually understand the structure of the World plays a crucial role in perceiving the World and making appropriate decisions, human perception does not solely rely on vision but amalgamates the information from acoustic, verbal, and visual stimuli. An active area of research has been revolving around designing an efficient framework that adapts to multiple modalities and ideally improves the performance of existing tasks. While numerous frameworks have proved effective on natural datasets like ImageNet, a limited number of studies have been carried out in the biomedical domain. In this work, we extend the available frameworks for natural data to biomedical data by leveraging the abundant, unstructured multi-modal data available as radiology images and reports. We attempt to answer the question, \"For multi-modal learning, self-supervised learning and joint learning using both learning strategies, which one improves the visual representation for downstream chest radiographs classification tasks the most?\". Our experiments indicated that in limited labeled data settings with 1% and 10% labeled data, the joint learning with multi-modal and self-supervised models outperforms self-supervised learning and is at par with multi-modal learning. Additionally, we found that multi-modal learning is generally more robust on out-of-distribution datasets. The code is publicly available online.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"227 ","pages":"1385-1405"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11234265/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of machine learning research","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Although human's ability to visually understand the structure of the World plays a crucial role in perceiving the World and making appropriate decisions, human perception does not solely rely on vision but amalgamates the information from acoustic, verbal, and visual stimuli. An active area of research has been revolving around designing an efficient framework that adapts to multiple modalities and ideally improves the performance of existing tasks. While numerous frameworks have proved effective on natural datasets like ImageNet, a limited number of studies have been carried out in the biomedical domain. In this work, we extend the available frameworks for natural data to biomedical data by leveraging the abundant, unstructured multi-modal data available as radiology images and reports. We attempt to answer the question, "For multi-modal learning, self-supervised learning and joint learning using both learning strategies, which one improves the visual representation for downstream chest radiographs classification tasks the most?". Our experiments indicated that in limited labeled data settings with 1% and 10% labeled data, the joint learning with multi-modal and self-supervised models outperforms self-supervised learning and is at par with multi-modal learning. Additionally, we found that multi-modal learning is generally more robust on out-of-distribution datasets. The code is publicly available online.

微信好友朋友圈 QQ好友复制链接

本刊更多论文

放射学报告改进了从射线照片中学到的可视化表达。

虽然人类通过视觉理解世界结构的能力在感知世界和做出适当决策方面起着至关重要的作用，但人类的感知并不完全依赖视觉，而是综合了来自声音、语言和视觉刺激的信息。一个活跃的研究领域一直围绕着设计一个能适应多种模式并能理想地提高现有任务性能的高效框架展开。虽然许多框架已在 ImageNet 等自然数据集上证明有效，但在生物医学领域开展的研究数量有限。在这项工作中，我们利用放射学图像和报告等丰富的非结构化多模态数据，将现有的自然数据框架扩展到生物医学数据。我们试图回答这样一个问题："对于多模态学习、自我监督学习和同时使用两种学习策略的联合学习，哪种学习策略能最大程度地改善下游胸片分类任务的可视化表示？我们的实验表明，在 1%和 10%的有限标注数据设置中，多模态模型和自我监督模型的联合学习优于自我监督学习，与多模态学习相当。此外，我们还发现，多模态学习在非分布数据集上通常更稳健。代码可在线公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of machine learning research

自引率

0.00%

发文量