Background and Aims
Inflammatory bowel disease (IBD) is a global disease that is evolving with increasing incidence. However, there are few works on computationally assisted diagnosis of IBD based on pathological images. Therefore, based on the UK and Chinese IBD diagnostic guidelines, our study established an artificial intelligence-assisted diagnostic system for histologic grading of inflammatory activity in ulcerative colitis (UC).
Methods
We proposed an efficient deep-learning (DL) method for grading inflammatory activity in whole-slide images (WSIs) of UC pathology. Our model was constructed using 603 UC WSIs from Nanjing Drum Tower Hospital for model train set and internal test set. We collected 212 UC WSIs from Zhujiang Hospital as an external test set. Initially, the pre-trained ResNet50 model on the ImageNet dataset was employed to extract image patch features from UC patients. Subsequently, a multi-instance learning (MIL) approach with embedded self-attention was utilized to aggregate tissue image patch features, representing the entire WSI. Finally, the model was trained based on the aggregated features and WSI annotations provided by senior gastrointestinal pathologists to predict the level of inflammatory activity in UC WSIs.
Results
In the task of distinguishing the presence or absence of inflammatory activity, the Area Under Curve (AUC) value in the internal test set is 0.863 (95% confidence interval [CI] 0.829, 0.898), with a sensitivity of 0.913 (95% [CI] 0.866, 0.961), and specificity of 0.816 (95% [CI] 0.771, 0.861). The AUC in the external test set is 0.947 (95% confidence interval [CI] 0.939, 0.955), with a sensitivity of 0.889 (905% [CI] 0.837, 0.940), and specificity of 0.858 (95% [CI] 0.777, 0.939). For distinguishing different levels of inflammatory activity in UC, the average Macro-AUC in the internal test set and the external test set are 0.827 (95% [CI] 0.803, 0.850) and 0.908 (95% [CI] 0.882, 0.935). the average Micro-AUC in the internal test set and the external test set are 0.816 (95% [CI] 0.792, 0.840) and 0.898 (95% [CI] 0.869, 0.926).
Conclusions
Comparative analysis with diagnoses made by pathologists at different expertise levels revealed that the algorithm reached a proficiency comparable to the pathologist with 5 years of experience. Furthermore, our algorithm performed superior to other MIL algorithms.