{"title":"Recognition of Diabetic Retinopathy Grades Based on Data Augmentation and Attention Mechanisms","authors":"Xueri Li, Li Wen, Fanyu Du, Lei Yang, Jianfang Wu","doi":"10.1002/ima.23201","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Diabetic retinopathy is a complication of diabetes and one of the leading causes of vision loss. Early detection and treatment are essential to prevent vision loss. Deep learning has been making great strides in the field of medical image processing and can be used as an aid for medical practitioners. However, unbalanced datasets, sparse focal areas, small differences between adjacent disease grades, and varied manifestations of the same grade disease challenge deep learning model training. Generalization performance and robustness are inadequate. To address the problem of unbalanced sample numbers between classes in the dataset, this work proposes using VQ-VAE for reconstructing affine transformed images to enrich and balance the dataset. Test results show the model's average reconstruction error is 0.0001, and the mean structural similarity between reconstructed and original images is 0.967. This proves reconstructed images differ from originals yet belong to the same category, expanding and diversifying the dataset. Addressing the issues of focal area sparsity and disease grade disparity, this work utilizes ResNeXt50 as the backbone network and constructs diverse attention networks by modifying the network structure and embedding different attention modules. Experiments demonstrate that the convolutional attention network outperforms ResNeXt50 in terms of Precision, Sensitivity, Specificity, F1 Score, Quadratic Weighted Kappa Coefficient, Accuracy, and robustness against Salt and Pepper noise, Gaussian noise, and gradient perturbation. Finally, the heat maps of each model recognizing the fundus image were plotted using the Grad-CAM method. The heat maps show that the attentional network is more effective than the non-attentional network ResNeXt50 at attending to the fundus image.</p>\n </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 6","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Imaging Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ima.23201","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Diabetic retinopathy is a complication of diabetes and one of the leading causes of vision loss. Early detection and treatment are essential to prevent vision loss. Deep learning has been making great strides in the field of medical image processing and can be used as an aid for medical practitioners. However, unbalanced datasets, sparse focal areas, small differences between adjacent disease grades, and varied manifestations of the same grade disease challenge deep learning model training. Generalization performance and robustness are inadequate. To address the problem of unbalanced sample numbers between classes in the dataset, this work proposes using VQ-VAE for reconstructing affine transformed images to enrich and balance the dataset. Test results show the model's average reconstruction error is 0.0001, and the mean structural similarity between reconstructed and original images is 0.967. This proves reconstructed images differ from originals yet belong to the same category, expanding and diversifying the dataset. Addressing the issues of focal area sparsity and disease grade disparity, this work utilizes ResNeXt50 as the backbone network and constructs diverse attention networks by modifying the network structure and embedding different attention modules. Experiments demonstrate that the convolutional attention network outperforms ResNeXt50 in terms of Precision, Sensitivity, Specificity, F1 Score, Quadratic Weighted Kappa Coefficient, Accuracy, and robustness against Salt and Pepper noise, Gaussian noise, and gradient perturbation. Finally, the heat maps of each model recognizing the fundus image were plotted using the Grad-CAM method. The heat maps show that the attentional network is more effective than the non-attentional network ResNeXt50 at attending to the fundus image.
期刊介绍:
The International Journal of Imaging Systems and Technology (IMA) is a forum for the exchange of ideas and results relevant to imaging systems, including imaging physics and informatics. The journal covers all imaging modalities in humans and animals.
IMA accepts technically sound and scientifically rigorous research in the interdisciplinary field of imaging, including relevant algorithmic research and hardware and software development, and their applications relevant to medical research. The journal provides a platform to publish original research in structural and functional imaging.
The journal is also open to imaging studies of the human body and on animals that describe novel diagnostic imaging and analyses methods. Technical, theoretical, and clinical research in both normal and clinical populations is encouraged. Submissions describing methods, software, databases, replication studies as well as negative results are also considered.
The scope of the journal includes, but is not limited to, the following in the context of biomedical research:
Imaging and neuro-imaging modalities: structural MRI, functional MRI, PET, SPECT, CT, ultrasound, EEG, MEG, NIRS etc.;
Neuromodulation and brain stimulation techniques such as TMS and tDCS;
Software and hardware for imaging, especially related to human and animal health;
Image segmentation in normal and clinical populations;
Pattern analysis and classification using machine learning techniques;
Computational modeling and analysis;
Brain connectivity and connectomics;
Systems-level characterization of brain function;
Neural networks and neurorobotics;
Computer vision, based on human/animal physiology;
Brain-computer interface (BCI) technology;
Big data, databasing and data mining.