Objective
The aim of our research is to enhance the calibration of machine learning models for glaucoma classification through a specialized loss function named Confidence-Calibrated Label Smoothing (CC-LS) loss. This approach is specifically designed to refine model calibration without compromising accuracy by integrating label smoothing and confidence penalty techniques, tailored to the specifics of glaucoma detection.
Design
This study focuses on the development and evaluation of a calibrated deep learning model.
Participants
The study employs fundus images from both external datasets—the Online Retinal Fundus Image Database for Glaucoma Analysis and Research (482 normal, 168 glaucoma) and the Retinal Fundus Glaucoma Challenge (720 normal, 80 glaucoma)—and an extensive internal dataset (4639 images per category), aiming to bolster the model's generalizability. The model's clinical performance is validated using a comprehensive test set (47 913 normal, 1629 glaucoma) from the internal dataset.
Methods
The CC-LS loss function seamlessly integrates label smoothing, which tempers extreme predictions to avoid overfitting, with confidence-based penalties. These penalties deter the model from expressing undue confidence in incorrect classifications. Our study aims at training models using the CC-LS and comparing their performance with those trained using conventional loss functions.
Main Outcome Measures
The model's precision is evaluated using metrics like the Brier score, sensitivity, specificity, and the false positive rate, alongside qualitative heatmap analyses for a holistic accuracy assessment.
Results
Preliminary findings reveal that models employing the CC-LS mechanism exhibit superior calibration metrics, as evidenced by a Brier score of 0.098, along with notable accuracy measures: sensitivity of 81%, specificity of 80%, and weighted accuracy of 80%. Importantly, these enhancements in calibration are achieved without sacrificing classification accuracy.
Conclusions
The CC-LS loss function presents a significant advancement in the pursuit of deploying machine learning models for glaucoma diagnosis. By improving calibration, the CC-LS ensures that clinicians can interpret and trust the predictive probabilities, making artificial intelligence-driven diagnostic tools more clinically viable. From a clinical standpoint, this heightened trust and interpretability can potentially lead to more timely and appropriate interventions, thereby optimizing patient outcomes and safety.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.