Objectives
To evaluate the efficacy of combining predictive artificial intelligence (AI) and image similarity model to risk stratify thyroid nodules, using retrospective external validation study.
Methods
Two datasets were used to determine efficacy of the AI application. One was Stanford dataset ultrasound images of 192 nodules between April 2017 and May 2018 and the second was private practice consisting of 118 thyroid nodule images between January 2018 and December 2023. The nodules had definitive diagnosis by cytology or surgical pathology. The AI application was used to predict the diagnosis and American College of Radiology Thyroid Imaging and Data System (ACR TI-RADS) score.
Results
In the Stanford dataset, the AI application predicted malignancies with sensitivity of 1.0 and specificity of 0.55. Positive predictive value (PPV) was 0.18 and negative predictive value (NPV) was 1.0. The Area Under the Curve - Receiver Operating Characteristic was 0.78. ACR TI-RADS based clinical recommendation had a polychoric correlation of 0.67. In the private dataset, the AI application predicted malignancies with sensitivity of 0.91 and specificity of 0.95. PPV was 0.8 and NPV was 0.98. The area under the curve - receiver operating characteristic was 0.93 and accuracy was 0.94. ACR TI-RADS based score had a polychoric correlation of 0.94.
Conclusion
The AI application showed good performance for sensitivity and NPV between the two datasets and demonstrated potential for 61.5% reduction in the need for fine needle aspiration and strong correlation to ACR TI-RADS. However, PPV was variable between the datasets possibly from variability in image selection and prevalence of malignancy. If implemented widely and consistently among various clinical settings, this could lead to decreased patient burden associated with an invasive procedure and possibly to decreased health care spending.