A reconstruction and convolution operations enabled variant vision transformer with gastroscopic images for automatic locating of polyps in Internet of Medical Things
Zhe Qin , Yaqiong Zhang , Jian Li , Deming Li , Yanqing Mo , Liyang Wang , Peiyu Qian , Li Feng
{"title":"A reconstruction and convolution operations enabled variant vision transformer with gastroscopic images for automatic locating of polyps in Internet of Medical Things","authors":"Zhe Qin , Yaqiong Zhang , Jian Li , Deming Li , Yanqing Mo , Liyang Wang , Peiyu Qian , Li Feng","doi":"10.1016/j.inffus.2023.102007","DOIUrl":null,"url":null,"abstract":"<div><p>Gastric polyps are an important cause of gastric disease. At present, the computer-aided diagnosis technology based on convolutional neural network (CNN) can automatically locate the position of polyps from the gastroscopic image, which improves the efficiency of doctors. However, due to the small polyp area in the gastroscopic image, the CNN-based method has a high rate of missed detection. To solve the above problems, in this work, we propose a reconstruction and convolution operations enabled variant vision transformer (RCVViT) to automatically locate the position of polyps in gastroscopic images. The RCVViT model uses the vision transformer model as a benchmark model. By using the self-attention mechanism, contextual information can be considered, and irregularly shaped polyps or polyps with small areas can be effectively detected. The feedforward neural network (FNN) and CNN are used to flatten each image patch data into a one-dimensional vector. The advantage of combining the FNN and CNN is that the local feature information and structural information of the polyp area are considered. In addition, we use an Internet of Medical Things (IoMT) platform to collect and analyze patients’ medical data to make timely diagnosis of patients’ diseases. Finally, our multiple experimental results on real gastroscopic datasets demonstrate the superiority of the RCVViT model.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7000,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253523003238","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Gastric polyps are an important cause of gastric disease. At present, the computer-aided diagnosis technology based on convolutional neural network (CNN) can automatically locate the position of polyps from the gastroscopic image, which improves the efficiency of doctors. However, due to the small polyp area in the gastroscopic image, the CNN-based method has a high rate of missed detection. To solve the above problems, in this work, we propose a reconstruction and convolution operations enabled variant vision transformer (RCVViT) to automatically locate the position of polyps in gastroscopic images. The RCVViT model uses the vision transformer model as a benchmark model. By using the self-attention mechanism, contextual information can be considered, and irregularly shaped polyps or polyps with small areas can be effectively detected. The feedforward neural network (FNN) and CNN are used to flatten each image patch data into a one-dimensional vector. The advantage of combining the FNN and CNN is that the local feature information and structural information of the polyp area are considered. In addition, we use an Internet of Medical Things (IoMT) platform to collect and analyze patients’ medical data to make timely diagnosis of patients’ diseases. Finally, our multiple experimental results on real gastroscopic datasets demonstrate the superiority of the RCVViT model.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.