用于组织学图像分析的变换器屏蔽预训练

Shuai Jiang , Liesbeth Hondelink , Arief A. Suriawinata , Saeed Hassanpour
{"title":"用于组织学图像分析的变换器屏蔽预训练","authors":"Shuai Jiang ,&nbsp;Liesbeth Hondelink ,&nbsp;Arief A. Suriawinata ,&nbsp;Saeed Hassanpour","doi":"10.1016/j.jpi.2024.100386","DOIUrl":null,"url":null,"abstract":"<div><p>In digital pathology, whole-slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Vision transformer (ViT) models have recently emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. However, due to the large number of model parameters and limited labeled data, applying transformer models to WSIs remains challenging. In this study, we propose a pretext task to train the transformer model in a self-supervised manner. Our model, MaskHIT, uses the transformer output to reconstruct masked patches, measured by contrastive loss. We pre-trained MaskHIT model using over 7000 WSIs from TCGA and extensively evaluated its performance in multiple experiments, covering survival prediction, cancer subtype classification, and grade prediction tasks. Our experiments demonstrate that the pre-training procedure enables context-aware understanding of WSIs, facilitates the learning of representative histological features based on patch positions and visual patterns, and is essential for the ViT model to achieve optimal results on WSI-level tasks. The pre-trained MaskHIT surpasses various multiple instance learning approaches by 3% and 2% on survival prediction and cancer subtype classification tasks, and also outperforms recent state-of-the-art transformer-based methods. Finally, a comparison between the attention maps generated by the MaskHIT model with pathologist's annotations indicates that the model can accurately identify clinically relevant histological structures on the whole slide for each task.</p></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2153353924000257/pdfft?md5=3dfddd9f11d8384fd0c39d65dbfab6b4&pid=1-s2.0-S2153353924000257-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Masked pre-training of transformers for histology image analysis\",\"authors\":\"Shuai Jiang ,&nbsp;Liesbeth Hondelink ,&nbsp;Arief A. Suriawinata ,&nbsp;Saeed Hassanpour\",\"doi\":\"10.1016/j.jpi.2024.100386\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In digital pathology, whole-slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Vision transformer (ViT) models have recently emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. However, due to the large number of model parameters and limited labeled data, applying transformer models to WSIs remains challenging. In this study, we propose a pretext task to train the transformer model in a self-supervised manner. Our model, MaskHIT, uses the transformer output to reconstruct masked patches, measured by contrastive loss. We pre-trained MaskHIT model using over 7000 WSIs from TCGA and extensively evaluated its performance in multiple experiments, covering survival prediction, cancer subtype classification, and grade prediction tasks. Our experiments demonstrate that the pre-training procedure enables context-aware understanding of WSIs, facilitates the learning of representative histological features based on patch positions and visual patterns, and is essential for the ViT model to achieve optimal results on WSI-level tasks. The pre-trained MaskHIT surpasses various multiple instance learning approaches by 3% and 2% on survival prediction and cancer subtype classification tasks, and also outperforms recent state-of-the-art transformer-based methods. Finally, a comparison between the attention maps generated by the MaskHIT model with pathologist's annotations indicates that the model can accurately identify clinically relevant histological structures on the whole slide for each task.</p></div>\",\"PeriodicalId\":37769,\"journal\":{\"name\":\"Journal of Pathology Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2153353924000257/pdfft?md5=3dfddd9f11d8384fd0c39d65dbfab6b4&pid=1-s2.0-S2153353924000257-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Pathology Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2153353924000257\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2153353924000257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

在数字病理学中,整幅图像(WSI)被广泛应用于癌症诊断和预后预测等领域。视觉变换器(ViT)模型是最近出现的一种很有前途的方法,它可以对大区域的 WSIs 进行编码,同时保留斑块之间的空间关系。然而,由于模型参数较多且标注数据有限,将变换器模型应用于 WSIs 仍然具有挑战性。在本研究中,我们提出了一个借口任务,以自我监督的方式训练变换器模型。我们的模型 MaskHIT 使用变换器输出来重构被遮蔽的斑块,以对比度损失来衡量。我们使用 TCGA 的 7000 多个 WSI 对 MaskHIT 模型进行了预训练,并在多个实验中对其性能进行了广泛评估,包括生存预测、癌症亚型分类和等级预测任务。我们的实验证明,预训练程序能够实现对 WSI 的上下文感知理解,促进了基于斑块位置和视觉模式的代表性组织学特征的学习,对于 ViT 模型在 WSI 级别任务中取得最佳结果至关重要。在生存预测和癌症亚型分类任务上,预训练的 MaskHIT 比各种多实例学习方法分别高出 3% 和 2%,也优于最近最先进的基于变换器的方法。最后,将 MaskHIT 模型生成的注意图与病理学家的注释进行比较,结果表明该模型能在每项任务中准确识别整张幻灯片上与临床相关的组织结构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Masked pre-training of transformers for histology image analysis

In digital pathology, whole-slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Vision transformer (ViT) models have recently emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. However, due to the large number of model parameters and limited labeled data, applying transformer models to WSIs remains challenging. In this study, we propose a pretext task to train the transformer model in a self-supervised manner. Our model, MaskHIT, uses the transformer output to reconstruct masked patches, measured by contrastive loss. We pre-trained MaskHIT model using over 7000 WSIs from TCGA and extensively evaluated its performance in multiple experiments, covering survival prediction, cancer subtype classification, and grade prediction tasks. Our experiments demonstrate that the pre-training procedure enables context-aware understanding of WSIs, facilitates the learning of representative histological features based on patch positions and visual patterns, and is essential for the ViT model to achieve optimal results on WSI-level tasks. The pre-trained MaskHIT surpasses various multiple instance learning approaches by 3% and 2% on survival prediction and cancer subtype classification tasks, and also outperforms recent state-of-the-art transformer-based methods. Finally, a comparison between the attention maps generated by the MaskHIT model with pathologist's annotations indicates that the model can accurately identify clinically relevant histological structures on the whole slide for each task.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Pathology Informatics
Journal of Pathology Informatics Medicine-Pathology and Forensic Medicine
CiteScore
3.70
自引率
0.00%
发文量
2
审稿时长
18 weeks
期刊介绍: The Journal of Pathology Informatics (JPI) is an open access peer-reviewed journal dedicated to the advancement of pathology informatics. This is the official journal of the Association for Pathology Informatics (API). The journal aims to publish broadly about pathology informatics and freely disseminate all articles worldwide. This journal is of interest to pathologists, informaticians, academics, researchers, health IT specialists, information officers, IT staff, vendors, and anyone with an interest in informatics. We encourage submissions from anyone with an interest in the field of pathology informatics. We publish all types of papers related to pathology informatics including original research articles, technical notes, reviews, viewpoints, commentaries, editorials, symposia, meeting abstracts, book reviews, and correspondence to the editors. All submissions are subject to rigorous peer review by the well-regarded editorial board and by expert referees in appropriate specialties.
期刊最新文献
Digital mapping of resected cancer specimens: The visual pathology report A precise machine learning model: Detecting cervical cancer using feature selection and explainable AI ViCE: An automated and quantitative program to assess intestinal tissue morphology Deep feature batch correction using ComBat for machine learning applications in computational pathology LVI-PathNet: Segmentation-classification pipeline for detection of lymphovascular invasion in whole slide images of lung adenocarcinoma
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1