Background
Computer-aided detection (CAD) is emerging as an adjunct to the use of the chest X-ray (CXR) in screening for pulmonary tuberculosis (TB). CAD for silicosis, a fibrotic lung disease due to silica dust and a strong risk factor for TB, is at an earlier stage of development and, unlike TB, depends on expert human reading for validation. For all CAD systems, an important step is the choice of threshold for classifying images as positive or negative for the disease in question. The objective of this article is to present an analytic approach to the choice of threshold in using CAD systems for silicosis.
Methods
Drawing on receiver operating curve data from a published study on agreement between CAD and two expert readings of silicosis, two criteria for choosing the sensitivity/specificity combination were compared—the Youden Index and a minimum sensitivity of 90%. We explore the impact of criterion selection, silicosis definition, and reader on the choice and interpretation of threshold, as well as the influence of positive predictive value (PPV) derived from screen prevalence. We present a novel technique for using two CAD thresholds to distinguish images with a high likelihood of being of positive or negative from those characterized by uncertainty.
Results
The sample was 501 CXR images from ex-gold miners. Derived thresholds varied across the two criteria, as well as across silicosis definition and expert reader. Varying the notional disease prevalence produced large differences in PPV and, therefore, proportions of false positives. The implications of these variations affecting threshold choice are described for three use cases—annual screening of active miners, outreach screening of former miners, and adjudication of claims for silicosis compensation.
Conclusion
In applying CAD to silicosis, users need to establish the use case, their preference for the sensitivity/specificity trade-off, and the silicosis definition, as well as considering the effect of disease prevalence. System developers need to take inter-reader variation in validation exercises into account and present this information transparently. A two-threshold model has potential utility in situations of high screening volume where there is a significant cost associated with referral for confirmation of diagnosis.