Deep learning models for thyroid nodules diagnosis of fine-needle aspiration biopsy: a retrospective, prospective, multicentre study in China

IF 23.8 1区医学 Q1 MEDICAL INFORMATICS Lancet Digital Health Pub Date : 2024-06-06 DOI:10.1016/S2589-7500(24)00085-2

Jue Wang MS , Nafen Zheng BSc , Huan Wan BSc , Qinyue Yao MS , Shijun Jia MS , Xin Zhang MS , Sha Fu MD , Jingliang Ruan MD , Gui He BSc , Xulin Chen MS , Suiping Li MS , Rui Chen BSc , Boan Lai BSc , Jin Wang PhD , Prof Qingping Jiang MD , Prof Nengtai Ouyang MD , Yin Zhang PhD

{"title":"Deep learning models for thyroid nodules diagnosis of fine-needle aspiration biopsy: a retrospective, prospective, multicentre study in China","authors":"Jue Wang MS , Nafen Zheng BSc , Huan Wan BSc , Qinyue Yao MS , Shijun Jia MS , Xin Zhang MS , Sha Fu MD , Jingliang Ruan MD , Gui He BSc , Xulin Chen MS , Suiping Li MS , Rui Chen BSc , Boan Lai BSc , Jin Wang PhD , Prof Qingping Jiang MD , Prof Nengtai Ouyang MD , Yin Zhang PhD","doi":"10.1016/S2589-7500(24)00085-2","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Accurately distinguishing between malignant and benign thyroid nodules through fine-needle aspiration cytopathology is crucial for appropriate therapeutic intervention. However, cytopathologic diagnosis is time consuming and hindered by the shortage of experienced cytopathologists. Reliable assistive tools could improve cytopathologic diagnosis efficiency and accuracy. We aimed to develop and test an artificial intelligence (AI)-assistive system for thyroid cytopathologic diagnosis according to the Thyroid Bethesda Reporting System.</p></div><div><h3>Methods</h3><p>11 254 whole-slide images (WSIs) from 4037 patients were used to train deep learning models. Among the selected WSIs, cell level was manually annotated by cytopathologists according to The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) guidelines of the second edition (2017 version). A retrospective dataset of 5638 WSIs of 2914 patients from four medical centres was used for validation. 469 patients were recruited for the prospective study of the performance of AI models and their 537 thyroid nodule samples were used. Cohorts for training and validation were enrolled between Jan 1, 2016, and Aug 1, 2022, and the prospective dataset was recruited between Aug 1, 2022, and Jan 1, 2023. The performance of our AI models was estimated as the area under the receiver operating characteristic (AUROC), sensitivity, specificity, accuracy, positive predictive value, and negative predictive value. The primary outcomes were the prediction sensitivity and specificity of the model to assist cyto-diagnosis of thyroid nodules.</p></div><div><h3>Findings</h3><p>The AUROC of TBSRTC III+ (which distinguishes benign from TBSRTC classes III, IV, V, and VI) was 0·930 (95% CI 0·921–0·939) for Sun Yat-sen Memorial Hospital of Sun Yat-sen University (SYSMH) internal validation and 0·944 (0·929 – 0·959), 0·939 (0·924–0·955), 0·971 (0·938–1·000) for The First People's Hospital of Foshan (FPHF), Sichuan Cancer Hospital & Institute (SCHI), and The Third Affiliated Hospital of Guangzhou Medical University (TAHGMU) medical centres, respectively. The AUROC of TBSRTC V+ (which distinguishes benign from TBSRTC classes V and VI) was 0·990 (95% CI 0·986–0·995) for SYSMH internal validation and 0·988 (0·980–0·995), 0·965 (0·953–0·977), and 0·991 (0·972–1·000) for FPHF, SCHI, and TAHGMU medical centres, respectively. For the prospective study at SYSMH, the AUROC of TBSRTC III+ and TBSRTC V+ was 0·977 and 0·981, respectively. With the assistance of AI, the specificity of junior cytopathologists was boosted from 0·887 (95% CI 0·8440–0·922) to 0·993 (0·974–0·999) and the accuracy was improved from 0·877 (0·846–0·904) to 0·948 (0·926–0·965). 186 atypia of undetermined significance samples from 186 patients with <em>BRAF</em> mutation information were collected; 43 of them harbour the <em>BRAF</em><sup>V600E</sup> mutation. 91% (39/43) of <em>BRAF</em><sup>V600E</sup>-positive atypia of undetermined significance samples were identified as malignant by the AI models.</p></div><div><h3>Interpretation</h3><p>In this study, we developed an AI-assisted model named the Thyroid Patch-Oriented WSI Ensemble Recognition (ThyroPower) system, which facilitates rapid and robust cyto-diagnosis of thyroid nodules, potentially enhancing the diagnostic capabilities of cytopathologists. Moreover, it serves as a potential solution to mitigate the scarcity of cytopathologists.</p></div><div><h3>Funding</h3><p>Guangdong Science and Technology Department.</p></div><div><h3>Translation</h3><p>For the Chinese translation of the abstract see Supplementary Materials section.</p></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"6 7","pages":"Pages e458-e469"},"PeriodicalIF":23.8000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2589750024000852/pdfft?md5=d718eec693d6690f3aa369916941141d&pid=1-s2.0-S2589750024000852-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750024000852","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Accurately distinguishing between malignant and benign thyroid nodules through fine-needle aspiration cytopathology is crucial for appropriate therapeutic intervention. However, cytopathologic diagnosis is time consuming and hindered by the shortage of experienced cytopathologists. Reliable assistive tools could improve cytopathologic diagnosis efficiency and accuracy. We aimed to develop and test an artificial intelligence (AI)-assistive system for thyroid cytopathologic diagnosis according to the Thyroid Bethesda Reporting System.

Methods

11 254 whole-slide images (WSIs) from 4037 patients were used to train deep learning models. Among the selected WSIs, cell level was manually annotated by cytopathologists according to The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) guidelines of the second edition (2017 version). A retrospective dataset of 5638 WSIs of 2914 patients from four medical centres was used for validation. 469 patients were recruited for the prospective study of the performance of AI models and their 537 thyroid nodule samples were used. Cohorts for training and validation were enrolled between Jan 1, 2016, and Aug 1, 2022, and the prospective dataset was recruited between Aug 1, 2022, and Jan 1, 2023. The performance of our AI models was estimated as the area under the receiver operating characteristic (AUROC), sensitivity, specificity, accuracy, positive predictive value, and negative predictive value. The primary outcomes were the prediction sensitivity and specificity of the model to assist cyto-diagnosis of thyroid nodules.

Findings

The AUROC of TBSRTC III+ (which distinguishes benign from TBSRTC classes III, IV, V, and VI) was 0·930 (95% CI 0·921–0·939) for Sun Yat-sen Memorial Hospital of Sun Yat-sen University (SYSMH) internal validation and 0·944 (0·929 – 0·959), 0·939 (0·924–0·955), 0·971 (0·938–1·000) for The First People's Hospital of Foshan (FPHF), Sichuan Cancer Hospital & Institute (SCHI), and The Third Affiliated Hospital of Guangzhou Medical University (TAHGMU) medical centres, respectively. The AUROC of TBSRTC V+ (which distinguishes benign from TBSRTC classes V and VI) was 0·990 (95% CI 0·986–0·995) for SYSMH internal validation and 0·988 (0·980–0·995), 0·965 (0·953–0·977), and 0·991 (0·972–1·000) for FPHF, SCHI, and TAHGMU medical centres, respectively. For the prospective study at SYSMH, the AUROC of TBSRTC III+ and TBSRTC V+ was 0·977 and 0·981, respectively. With the assistance of AI, the specificity of junior cytopathologists was boosted from 0·887 (95% CI 0·8440–0·922) to 0·993 (0·974–0·999) and the accuracy was improved from 0·877 (0·846–0·904) to 0·948 (0·926–0·965). 186 atypia of undetermined significance samples from 186 patients with BRAF mutation information were collected; 43 of them harbour the BRAF^V600E mutation. 91% (39/43) of BRAF^V600E-positive atypia of undetermined significance samples were identified as malignant by the AI models.

Interpretation

In this study, we developed an AI-assisted model named the Thyroid Patch-Oriented WSI Ensemble Recognition (ThyroPower) system, which facilitates rapid and robust cyto-diagnosis of thyroid nodules, potentially enhancing the diagnostic capabilities of cytopathologists. Moreover, it serves as a potential solution to mitigate the scarcity of cytopathologists.

Funding

Guangdong Science and Technology Department.

Translation

For the Chinese translation of the abstract see Supplementary Materials section.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于甲状腺结节细针穿刺活检诊断的深度学习模型：一项在中国开展的回顾性、前瞻性多中心研究。

背景：通过细针穿刺细胞病理学准确区分甲状腺结节的恶性和良性对于适当的治疗干预至关重要。然而，细胞病理学诊断费时费力，而且缺乏有经验的细胞病理学家。可靠的辅助工具可以提高细胞病理学诊断的效率和准确性。我们的目标是根据甲状腺贝塞斯达报告系统开发并测试用于甲状腺细胞病理学诊断的人工智能（AI）辅助系统。在所选的 WSIs 中，细胞病理学家根据第二版（2017 年版）甲状腺细胞病理贝塞斯达报告系统（TBSRTC）指南对细胞水平进行了人工标注。来自四个医疗中心的2914名患者的5638个WSI的回顾性数据集被用于验证。469 名患者被招募参加人工智能模型性能的前瞻性研究，他们的 537 个甲状腺结节样本被用于研究。用于训练和验证的队列是在 2016 年 1 月 1 日至 2022 年 8 月 1 日期间招募的，而前瞻性数据集是在 2022 年 8 月 1 日至 2023 年 1 月 1 日期间招募的。我们用接收者操作特征下面积（AUROC）、灵敏度、特异性、准确性、阳性预测值和阴性预测值来估算人工智能模型的性能。主要结果是模型辅助甲状腺结节细胞诊断的预测灵敏度和特异性：中山大学孙逸仙纪念医院（SYSMH）内部验证的 TBSRTC III+（区分良性与 TBSRTC III、IV、V 和 VI 级）的 AUROC 为 0-930（95% CI 0-921-0-939），而内部验证的 AUROC 为 0-944（0-929 - 0-959）、0-939（0-924-0-955）、0-971（0-938-1-000）。SYSMH 内部验证的 TBSRTC V+（区分良性与 TBSRTC V 级和 VI 级）的 AUROC 为 0-990（95% CI 0-986-0-995），FPHF、SCHI 和广州医科大学附属第三医院医疗中心的 AUROC 分别为 0-988（0-980-0-995）、0-965（0-953-0-977）和 0-991（0-972-1-000）。在 SYSMH 的前瞻性研究中，TBSRTC III+ 和 TBSRTC V+ 的 AUROC 分别为 0-977 和 0-981。在人工智能的帮助下，初级细胞病理学家的特异性从0-887（95% CI 0-8440-0-922）提高到0-993（0-974-0-999），准确性从0-877（0-846-0-904）提高到0-948（0-926-0-965）。收集了 186 位患者的 186 份意义未定的不典型样本，其中 43 位患者存在 BRAFV600E 突变。91%（39/43）的BRAFV600E阳性未确定意义的不典型样本被人工智能模型确定为恶性：在这项研究中，我们开发了一种人工智能辅助模型，名为 "甲状腺斑块导向的WSI集合识别（ThyroPower）系统"，它有助于快速、稳健地对甲状腺结节进行细胞诊断，从而有可能提高细胞病理学家的诊断能力。此外，它还是缓解细胞病理学家稀缺问题的潜在解决方案：基金：广东省科学技术厅：摘要中译文见补充材料部分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Lancet Digital Health Multiple-

CiteScore

41.20

自引率

1.60%

发文量

232

审稿时长

13 weeks

期刊介绍： The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health. The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health. We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.