Govind Nair, Aishwarya Vedula, Ethan Thomas Johnson, Johnson Thomas, Rajshree Patel, Jennifer Cheng, Ramya Vedula
{"title":"Combining Image similarity and Predictive AI Models to Decrease Subjectivity in Thyroid Nodule Diagnosis and Improve Malignancy Prediction.","authors":"Govind Nair, Aishwarya Vedula, Ethan Thomas Johnson, Johnson Thomas, Rajshree Patel, Jennifer Cheng, Ramya Vedula","doi":"10.1016/j.eprac.2024.08.001","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the efficacy of combining predictive artificial intelligence (AI) and image similarity model to risk stratify thyroid nodules, using retrospective external validation study.</p><p><strong>Methods: </strong>Two datasets were used to determine efficacy of the AI application. One was Stanford dataset ultrasound images of 192 nodules between April 2017 to May 2018 and the second was private practice consisting of 118 thyroid nodule images between January 2018 to December 2023. The nodules had definitive diagnosis by cytology or surgical pathology. The AI application was used to predict the diagnosis and American College of Radiology Thyroid Imaging and Data System (ACR TI-RADS) score.</p><p><strong>Results: </strong>In the Stanford dataset, the AI application predicted malignancies with sensitivity of 1.0 and specificity of 0.55. Positive predictive value (PPV) was 0.18 and negative predictive value (NPV) was 1.0. The Area Under the Curve - Receiver Operating Characteristic (AUC-ROC) was 0.78. ACR TI-RADS based clinical recommendation had a polychoric correlation of 0.67. In the private dataset, the AI application predicted malignancies with sensitivity of 0.91 and specificity of 0.95. PPV was 0.8 and NPV was 0.98. AUC-ROC was 0.93 and accuracy was 0.94. ACR TI-RADS based score had a polychoric correlation of 0.94.</p><p><strong>Conclusion: </strong>The AI application showed good performance for sensitivity and NPV between the two datasets and demonstrated potential for 61.5% reduction in the need for fine needle aspiration (FNA) and strong correlation to ACR TI-RADS. However, PPV was variable between the datasets possibly from variability in image selection and prevalence of malignancy. If implemented widely and consistently among various clinical settings, this could lead to decreased patient burden associated with an invasive procedure and possibly to decreased health care spending.</p>","PeriodicalId":11682,"journal":{"name":"Endocrine Practice","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Endocrine Practice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.eprac.2024.08.001","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: To evaluate the efficacy of combining predictive artificial intelligence (AI) and image similarity model to risk stratify thyroid nodules, using retrospective external validation study.
Methods: Two datasets were used to determine efficacy of the AI application. One was Stanford dataset ultrasound images of 192 nodules between April 2017 to May 2018 and the second was private practice consisting of 118 thyroid nodule images between January 2018 to December 2023. The nodules had definitive diagnosis by cytology or surgical pathology. The AI application was used to predict the diagnosis and American College of Radiology Thyroid Imaging and Data System (ACR TI-RADS) score.
Results: In the Stanford dataset, the AI application predicted malignancies with sensitivity of 1.0 and specificity of 0.55. Positive predictive value (PPV) was 0.18 and negative predictive value (NPV) was 1.0. The Area Under the Curve - Receiver Operating Characteristic (AUC-ROC) was 0.78. ACR TI-RADS based clinical recommendation had a polychoric correlation of 0.67. In the private dataset, the AI application predicted malignancies with sensitivity of 0.91 and specificity of 0.95. PPV was 0.8 and NPV was 0.98. AUC-ROC was 0.93 and accuracy was 0.94. ACR TI-RADS based score had a polychoric correlation of 0.94.
Conclusion: The AI application showed good performance for sensitivity and NPV between the two datasets and demonstrated potential for 61.5% reduction in the need for fine needle aspiration (FNA) and strong correlation to ACR TI-RADS. However, PPV was variable between the datasets possibly from variability in image selection and prevalence of malignancy. If implemented widely and consistently among various clinical settings, this could lead to decreased patient burden associated with an invasive procedure and possibly to decreased health care spending.
期刊介绍:
Endocrine Practice (ISSN: 1530-891X), a peer-reviewed journal published twelve times a year, is the official journal of the American Association of Clinical Endocrinologists (AACE). The primary mission of Endocrine Practice is to enhance the health care of patients with endocrine diseases through continuing education of practicing endocrinologists.