Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer.

IF 3.3 Q2 ONCOLOGY JCO Clinical Cancer Informatics Pub Date : 2024-08-01 DOI:10.1200/CCI.24.00021
Sunkyu Kim, Seung-Seob Kim, Eejung Kim, Michael Cecchini, Mi-Suk Park, Ji A Choi, Sung Hyun Kim, Ho Kyoung Hwang, Chang Moo Kang, Hye Jin Choi, Sang Joon Shin, Jaewoo Kang, Choong-Kun Lee
{"title":"Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer.","authors":"Sunkyu Kim, Seung-Seob Kim, Eejung Kim, Michael Cecchini, Mi-Suk Park, Ji A Choi, Sung Hyun Kim, Ho Kyoung Hwang, Chang Moo Kang, Hye Jin Choi, Sang Joon Shin, Jaewoo Kang, Choong-Kun Lee","doi":"10.1200/CCI.24.00021","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP).</p><p><strong>Methods: </strong>Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated.</p><p><strong>Results: </strong>Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model's contextual interpretation beyond specific phrases.</p><p><strong>Conclusion: </strong>Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.24.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP).

Methods: Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated.

Results: Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model's contextual interpretation beyond specific phrases.

Conclusion: Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于深度传输学习的自然语言处理连续自由文本计算机断层扫描报告,用于预测胰腺癌患者的生存率。
目的:利用自然语言处理(NLP)技术探索序列计算机断层扫描(CT)放射学报告对胰腺癌生存率的预测潜力:用连续的自由文本 CT 报告对基于深度传输学习的 NLP 模型进行了回顾性训练和测试,并提取了韩国一家三级医院连续确诊的胰腺癌患者的生存信息。外部测试数据集包括从美国一家独立三甲医院随机挑选的胰腺癌患者及其序列 CT 报告。计算了预测生存率和实际生存率的一致性指数(c-index)以及预测1年生存率的接收者操作特征曲线下面积(AUROC):2004年1月至2021年6月期间,2677名患者的12255份CT报告和670名患者的3058份CT报告分别被分配到训练数据集和内部测试数据集。在预测胰腺癌患者的总生存率方面,根据单次、首次 CT 报告训练的 ClinicalBERT(来自变换器的双向编码器表征)模型的 c 指数为 0.653,AUROC 为 0.722。从最初的报告开始,ClinicalBERT 对多达 15 份连续报告进行了训练,结果显示 c 指数提高到 0.811,AUROC 提高到 0.911。在包含 273 名患者和 1,947 份 CT 报告的外部测试集上,AUROC 为 0.888,这表明我们的模型具有普适性。进一步的分析表明,我们的模型对特定短语之外的上下文进行了解释:结论:基于深度传输学习的序列 CT 报告 NLP 模型可以预测胰腺癌患者的生存率。开发出的模型可为临床决策提供支持,其生存信息仅从序列放射学报告中提取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
期刊最新文献
Identifying Oncology Patients at High Risk for Potentially Preventable Emergency Department Visits Using a Novel Definition. Use of Patient-Reported Outcomes in Risk Prediction Model Development to Support Cancer Care Delivery: A Scoping Review. Optimizing End Points for Phase III Cancer Trials. Informatics and Artificial Intelligence-Guided Assessment of the Regulatory and Translational Research Landscape of First-in-Class Oncology Drugs in the United States, 2018-2022. Development and Portability of a Text Mining Algorithm for Capturing Disease Progression in Electronic Health Records of Patients With Stage IV Non-Small Cell Lung Cancer.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1