Pyramid Tokens-to-Token Vision Transformer for Thyroid Pathology Image Classification

Peng Yin, Bo Yu, Cheng-wei Jiang, Hechang Chen
{"title":"Pyramid Tokens-to-Token Vision Transformer for Thyroid Pathology Image Classification","authors":"Peng Yin, Bo Yu, Cheng-wei Jiang, Hechang Chen","doi":"10.1109/IPTA54936.2022.9784139","DOIUrl":null,"url":null,"abstract":"Histopathological image contains rich phenotypic information, which is beneficial to classifying tumor subtypes and predicting the development of diseases. The vast size of pathological slides makes it impossible to directly train whole slide images (WSI) on convolutional neural networks (CNNs). Most of the previous weakly supervision works divide high-resolution WSIs into small image patches and separately input them into the CNN to classify them as tumors or normal areas. The first difficulty is that although the method based on the CNN framework achieves a high accuracy rate, it increases the model parameters and computational complexity. The second difficulty is balancing the relationship between accuracy and model compu-tation. It makes the model maintain and improve the classification accuracy as much as possible based on the lightweight. In this paper, we propose a new lightweight architecture called Pyramid Tokens-to-Token VIsion Transformer (PyT2T-ViT) with multiple instance learning based on Vision Transformer. We introduce the feature extractor of the model with Token-to-Token ViT (T2T-ViT) to reduce the model parameters. The performance of the model is improved by combining the image pyramid of multiple receptive fields so that it can take into account the local and global features of the cell structure at a single scale. We applied the method to our collection of 560 thyroid pathology images from the same institution, model parameters and computation were greatly reduced. The classification effect is significantly better than the CNN-based method.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPTA54936.2022.9784139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Histopathological image contains rich phenotypic information, which is beneficial to classifying tumor subtypes and predicting the development of diseases. The vast size of pathological slides makes it impossible to directly train whole slide images (WSI) on convolutional neural networks (CNNs). Most of the previous weakly supervision works divide high-resolution WSIs into small image patches and separately input them into the CNN to classify them as tumors or normal areas. The first difficulty is that although the method based on the CNN framework achieves a high accuracy rate, it increases the model parameters and computational complexity. The second difficulty is balancing the relationship between accuracy and model compu-tation. It makes the model maintain and improve the classification accuracy as much as possible based on the lightweight. In this paper, we propose a new lightweight architecture called Pyramid Tokens-to-Token VIsion Transformer (PyT2T-ViT) with multiple instance learning based on Vision Transformer. We introduce the feature extractor of the model with Token-to-Token ViT (T2T-ViT) to reduce the model parameters. The performance of the model is improved by combining the image pyramid of multiple receptive fields so that it can take into account the local and global features of the cell structure at a single scale. We applied the method to our collection of 560 thyroid pathology images from the same institution, model parameters and computation were greatly reduced. The classification effect is significantly better than the CNN-based method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于甲状腺病理图像分类的金字塔标记到标记视觉转换器
组织病理图像包含丰富的表型信息,有利于肿瘤亚型的分类和疾病发展的预测。病理切片的巨大尺寸使得在卷积神经网络(cnn)上直接训练整个切片图像(WSI)成为不可能。以往的弱监督工作大多是将高分辨率wsi分割成小块图像,分别输入到CNN中进行肿瘤或正常区域的分类。第一个困难是基于CNN框架的方法虽然达到了较高的准确率,但增加了模型参数和计算复杂度。第二个困难是平衡精度和模型计算之间的关系。它使模型在轻量化的基础上尽可能地保持和提高分类精度。在本文中,我们提出了一种新的轻量级架构,称为金字塔令牌到令牌视觉转换器(PyT2T-ViT),它具有基于视觉转换器的多实例学习。我们引入了Token-to-Token ViT (T2T-ViT)模型的特征提取器来减少模型参数。通过结合多个感受野的图像金字塔来提高模型的性能,使其能够在单个尺度上兼顾细胞结构的局部和全局特征。我们将该方法应用于同一机构收集的560张甲状腺病理图像,大大减少了模型参数和计算量。分类效果明显优于基于cnn的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Special Session 3: Visual Computing in Digital Humanities Complex Texture Features Learned by Applying Randomized Neural Network on Graphs AAEGAN Optimization by Purposeful Noise Injection for the Generation of Bright-Field Brain Organoid Images Towards Fast and Accurate Intimate Contact Recognition through Video Analysis Draco-Based Selective Crypto-Compression Method of 3D objects
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1