A. Ding, Ying Li, Qilei Chen, Yu Cao, Benyuan Liu, Shu Han Chen, Xiaowei Liu
{"title":"Gastric Location Classification During Esophagogastroduodenoscopy Using Deep Neural Networks","authors":"A. Ding, Ying Li, Qilei Chen, Yu Cao, Benyuan Liu, Shu Han Chen, Xiaowei Liu","doi":"10.1109/BIBE52308.2021.9635273","DOIUrl":null,"url":null,"abstract":"Esophagogastroduodenoscopy (EGD) is a common procedure that visualizes the esophagus, stomach, and the duodenum by inserting a camera, attached to a long flexible tube, into the patient's mouth and down the stomach. A comprehensive EGD needs to examine all gastric locations, but since the camera is controlled manually, it is easy to miss some surface area and create diagnostic blind spots, which often result in life-costing oversights of early gastric cancer and other serious illnesses. In order to address this problem, we train a convolutional neural network to classify gastric locations based on the camera feed during an EGD, and based on the classifier and a triggering algorithm we propose, we build a video processing system that checks off each location as visited, allowing human operators to keep track of which locations they have visited and which they have not. Based on collected clinical patient reports, we consider six gastric locations, and we add a background class to our classifier to accomodate for the frames in EGD videos that do not resemble the six defined classes (including when the camera is outside of the patient body). Our best classifier achieves 98 % accuracy within the six gastric locations and 88 % accuracy including the background class, and our video processing system clearly checks off gastric locations in an expected order when being tested on recorded EGD videos. Lastly, we use class activation mapping to provide human-readable insight into how our trained classifier works.","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"57 50","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Esophagogastroduodenoscopy (EGD) is a common procedure that visualizes the esophagus, stomach, and the duodenum by inserting a camera, attached to a long flexible tube, into the patient's mouth and down the stomach. A comprehensive EGD needs to examine all gastric locations, but since the camera is controlled manually, it is easy to miss some surface area and create diagnostic blind spots, which often result in life-costing oversights of early gastric cancer and other serious illnesses. In order to address this problem, we train a convolutional neural network to classify gastric locations based on the camera feed during an EGD, and based on the classifier and a triggering algorithm we propose, we build a video processing system that checks off each location as visited, allowing human operators to keep track of which locations they have visited and which they have not. Based on collected clinical patient reports, we consider six gastric locations, and we add a background class to our classifier to accomodate for the frames in EGD videos that do not resemble the six defined classes (including when the camera is outside of the patient body). Our best classifier achieves 98 % accuracy within the six gastric locations and 88 % accuracy including the background class, and our video processing system clearly checks off gastric locations in an expected order when being tested on recorded EGD videos. Lastly, we use class activation mapping to provide human-readable insight into how our trained classifier works.