Uni4Eye++: A General Masked Image Modeling Multi-modal Pre-training Framework for Ophthalmic Image Classification and Segmentation.

IEEE transactions on medical imaging Pub Date : 2024-07-02 DOI:10.1109/TMI.2024.3422102

Zhiyuan Cai, Li Lin, Huaqing He, Pujin Cheng, Xiaoying Tang

{"title":"Uni4Eye++: A General Masked Image Modeling Multi-modal Pre-training Framework for Ophthalmic Image Classification and Segmentation.","authors":"Zhiyuan Cai, Li Lin, Huaqing He, Pujin Cheng, Xiaoying Tang","doi":"10.1109/TMI.2024.3422102","DOIUrl":null,"url":null,"abstract":"A large-scale labeled dataset is a key factor for the success of supervised deep learning in most ophthalmic image analysis scenarios. However, limited annotated data is very common in ophthalmic image analysis, since manual annotation is time-consuming and labor-intensive. Self-supervised learning (SSL) methods bring huge opportunities for better utilizing unlabeled data, as they do not require massive annotations. To utilize as many unlabeled ophthalmic images as possible, it is necessary to break the dimension barrier, simultaneously making use of both 2D and 3D images as well as alleviating the issue of catastrophic forgetting. In this paper, we propose a universal self-supervised Transformer framework named Uni4Eye++ to discover the intrinsic image characteristic and capture domain-specific feature embedding in ophthalmic images. Uni4Eye++ can serve as a global feature extractor, which builds its basis on a Masked Image Modeling task with a Vision Transformer architecture. On the basis of our previous work Uni4Eye, we further employ an image entropy guided masking strategy to reconstruct more-informative patches and a dynamic head generator module to alleviate modality confusion. We evaluate the performance of our pre-trained Uni4Eye++ encoder by fine-tuning it on multiple downstream ophthalmic image classification and segmentation tasks. The superiority of Uni4Eye++ is successfully established through comparisons to other state-of-the-art SSL pre-training methods. Our code is available at Github1.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2024.3422102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A large-scale labeled dataset is a key factor for the success of supervised deep learning in most ophthalmic image analysis scenarios. However, limited annotated data is very common in ophthalmic image analysis, since manual annotation is time-consuming and labor-intensive. Self-supervised learning (SSL) methods bring huge opportunities for better utilizing unlabeled data, as they do not require massive annotations. To utilize as many unlabeled ophthalmic images as possible, it is necessary to break the dimension barrier, simultaneously making use of both 2D and 3D images as well as alleviating the issue of catastrophic forgetting. In this paper, we propose a universal self-supervised Transformer framework named Uni4Eye++ to discover the intrinsic image characteristic and capture domain-specific feature embedding in ophthalmic images. Uni4Eye++ can serve as a global feature extractor, which builds its basis on a Masked Image Modeling task with a Vision Transformer architecture. On the basis of our previous work Uni4Eye, we further employ an image entropy guided masking strategy to reconstruct more-informative patches and a dynamic head generator module to alleviate modality confusion. We evaluate the performance of our pre-trained Uni4Eye++ encoder by fine-tuning it on multiple downstream ophthalmic image classification and segmentation tasks. The superiority of Uni4Eye++ is successfully established through comparisons to other state-of-the-art SSL pre-training methods. Our code is available at Github¹.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Uni4Eye++：用于眼科图像分类和分割的通用屏蔽图像建模多模态预训练框架

在大多数眼科图像分析场景中，大规模标注数据集是有监督深度学习取得成功的关键因素。然而，在眼科图像分析中，标注数据有限的情况非常普遍，因为人工标注既耗时又耗力。自监督学习（SSL）方法不需要大量注释，因此为更好地利用未标注数据带来了巨大的机遇。要利用尽可能多的未标记眼科图像，就必须打破维度障碍，同时利用二维和三维图像，并缓解灾难性遗忘问题。在本文中，我们提出了一个名为 Uni4Eye++ 的通用自监督变换器框架，用于发现眼科图像的内在特征并捕捉特定领域的特征嵌入。Uni4Eye++ 可作为全局特征提取器，其基础是具有视觉变换器架构的遮罩图像建模任务。在之前的 Uni4Eye 工作基础上，我们进一步采用了图像熵引导的遮罩策略来重建信息量更大的补丁，并使用动态头部生成器模块来缓解模态混淆。我们通过在多个下游眼科图像分类和分割任务中对预先训练好的 Uni4Eye++ 编码器进行微调来评估其性能。通过与其他最先进的 SSL 预训练方法进行比较，我们成功地确定了 Uni4Eye++ 的优越性。我们的代码可在 Github 上获取1。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on medical imaging

自引率

0.00%

发文量