Rethinking masked image modelling for medical image representation

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Medical image analysis Pub Date : 2024-08-17 DOI:10.1016/j.media.2024.103304

Yutong Xie , Lin Gu , Tatsuya Harada , Jianpeng Zhang , Yong Xia , Qi Wu

{"title":"Rethinking masked image modelling for medical image representation","authors":"Yutong Xie , Lin Gu , Tatsuya Harada , Jianpeng Zhang , Yong Xia , Qi Wu","doi":"10.1016/j.media.2024.103304","DOIUrl":null,"url":null,"abstract":"<div><p>Masked Image Modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes’ location. Inspired by this, we propose <strong>M</strong>asked m<strong>ed</strong>ical <strong>I</strong>mage <strong>M</strong>odelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words (<em>e.g.</em>, cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image–report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image–report pre-training counterparts. Codes are available at <span><span>https://github.com/YtongXie/MedIM</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"98 ","pages":"Article 103304"},"PeriodicalIF":10.7000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1361841524002299/pdfft?md5=3f1249842080ca268c74cdfa823a2939&pid=1-s2.0-S1361841524002299-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841524002299","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Masked Image Modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes’ location. Inspired by this, we propose Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words (e.g., cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image–report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image–report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

重新思考用于医学图像表示的遮蔽图像建模。

遮罩图像建模（MIM）是一种自我监督学习方式，通过使用无标注数据改进图像表征，在计算机视觉领域取得了巨大成功。传统的 MIM 通常采用对图像进行随机抽样的策略。然而，这种随机屏蔽技术可能并不非常适合医学成像，因为医学成像具有不同于自然图像的独特特征。在医学成像中，尤其是在病理成像中，与疾病相关的特征往往非常稀疏和局部化，而其余区域看起来正常且无差别。此外，医学影像常常伴随着报告，直接指出病理变化的位置。受此启发，我们提出了掩蔽医学图像建模（MedIM）这一新颖方法，据我们所知，这是首个利用放射报告指导掩蔽和还原图像信息区域的研究，鼓励网络从医学图像中探索更强的语义表征。我们介绍了两种相互综合的掩蔽策略：知识驱动掩蔽（KDM）和句子驱动掩蔽（SDM）。KDM 使用放射学报告特有的医学主题词表（MeSH）来识别映射到 MeSH 词（如心脏、水肿、血管、肺）的症状线索，并指导掩码生成。由于放射学报告通常由数个句子组成，详细描述了不同的检查结果，因此 SDM 整合了句子级信息，以确定掩蔽的关键区域。MedIM 根据 KDM 和 SDM 模块的遮罩信息重建图像，从而促进全面、丰富的医学图像表征。我们在七个下游任务上进行了广泛的实验，包括多标签/类别图像分类、气胸分割和医学影像报告分析，结果表明，采用报告引导遮蔽的 MedIM 取得了极具竞争力的性能。我们的方法大大优于 ImageNet 预训练、基于 MIM 的预训练和医学影像报告预训练。代码见 https://github.com/YtongXie/MedIM。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.