Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer.

IF 3.3 Q2 ONCOLOGY JCO Clinical Cancer Informatics Pub Date : 2024-08-01 DOI:10.1200/CCI.24.00021

Sunkyu Kim, Seung-Seob Kim, Eejung Kim, Michael Cecchini, Mi-Suk Park, Ji A Choi, Sung Hyun Kim, Ho Kyoung Hwang, Chang Moo Kang, Hye Jin Choi, Sang Joon Shin, Jaewoo Kang, Choong-Kun Lee

{"title":"Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer.","authors":"Sunkyu Kim, Seung-Seob Kim, Eejung Kim, Michael Cecchini, Mi-Suk Park, Ji A Choi, Sung Hyun Kim, Ho Kyoung Hwang, Chang Moo Kang, Hye Jin Choi, Sang Joon Shin, Jaewoo Kang, Choong-Kun Lee","doi":"10.1200/CCI.24.00021","DOIUrl":null,"url":null,"abstract":"Purpose: To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP).Methods: Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated.Results: Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model's contextual interpretation beyond specific phrases.Conclusion: Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400021"},"PeriodicalIF":3.3000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.24.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP).

Methods: Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated.

Results: Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model's contextual interpretation beyond specific phrases.

Conclusion: Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度传输学习的自然语言处理连续自由文本计算机断层扫描报告，用于预测胰腺癌患者的生存率。

目的：利用自然语言处理（NLP）技术探索序列计算机断层扫描（CT）放射学报告对胰腺癌生存率的预测潜力：用连续的自由文本 CT 报告对基于深度传输学习的 NLP 模型进行了回顾性训练和测试，并提取了韩国一家三级医院连续确诊的胰腺癌患者的生存信息。外部测试数据集包括从美国一家独立三甲医院随机挑选的胰腺癌患者及其序列 CT 报告。计算了预测生存率和实际生存率的一致性指数（c-index）以及预测1年生存率的接收者操作特征曲线下面积（AUROC）：2004年1月至2021年6月期间，2677名患者的12255份CT报告和670名患者的3058份CT报告分别被分配到训练数据集和内部测试数据集。在预测胰腺癌患者的总生存率方面，根据单次、首次 CT 报告训练的 ClinicalBERT（来自变换器的双向编码器表征）模型的 c 指数为 0.653，AUROC 为 0.722。从最初的报告开始，ClinicalBERT 对多达 15 份连续报告进行了训练，结果显示 c 指数提高到 0.811，AUROC 提高到 0.911。在包含 273 名患者和 1,947 份 CT 报告的外部测试集上，AUROC 为 0.888，这表明我们的模型具有普适性。进一步的分析表明，我们的模型对特定短语之外的上下文进行了解释：结论：基于深度传输学习的序列 CT 报告 NLP 模型可以预测胰腺癌患者的生存率。开发出的模型可为临床决策提供支持，其生存信息仅从序列放射学报告中提取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊