利用自然语言处理技术从放射学报告中推断转移性疾病的部位。

IF 3.3 Q2 ONCOLOGY JCO Clinical Cancer Informatics Pub Date : 2024-05-01 DOI:10.1200/CCI.23.00122
See Boon Tay, Guat Hwa Low, Gillian Jing En Wong, Han Jieh Tey, Fun Loon Leong, Constance Li, Melvin Lee Kiang Chua, Daniel Shao Weng Tan, Choon Hua Thng, Iain Bee Huat Tan, Ryan Shea Ying Cong Tan
{"title":"利用自然语言处理技术从放射学报告中推断转移性疾病的部位。","authors":"See Boon Tay, Guat Hwa Low, Gillian Jing En Wong, Han Jieh Tey, Fun Loon Leong, Constance Li, Melvin Lee Kiang Chua, Daniel Shao Weng Tan, Choon Hua Thng, Iain Bee Huat Tan, Ryan Shea Ying Cong Tan","doi":"10.1200/CCI.23.00122","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate natural language processing (NLP) methods to infer metastatic sites from radiology reports.</p><p><strong>Methods: </strong>A set of 4,522 computed tomography (CT) reports of 550 patients with 14 types of cancer was used to fine-tune four clinical large language models (LLMs) for multilabel classification of metastatic sites. We also developed an NLP information extraction (IE) system (on the basis of named entity recognition, assertion status detection, and relation extraction) for comparison. Model performances were measured by F1 scores on test and three external validation sets. The best model was used to facilitate analysis of metastatic frequencies in a cohort study of 6,555 patients with 53,838 CT reports.</p><p><strong>Results: </strong>The RadBERT, BioBERT, GatorTron-base, and GatorTron-medium LLMs achieved F1 scores of 0.84, 0.87, 0.89, and 0.91, respectively, on the test set. The IE system performed best, achieving an F1 score of 0.93. F1 scores of the IE system by individual cancer type ranged from 0.89 to 0.96. The IE system attained F1 scores of 0.89, 0.83, and 0.81, respectively, on external validation sets including additional cancer types, positron emission tomography-CT ,and magnetic resonance imaging scans, respectively. In our cohort study, we found that for colorectal cancer, liver-only metastases were higher in de novo stage IV versus recurrent patients (29.7% <i>v</i> 12.2%; <i>P</i> < .001). Conversely, lung-only metastases were more frequent in recurrent versus de novo stage IV patients (17.2% <i>v</i> 7.3%; <i>P</i> < .001).</p><p><strong>Conclusion: </strong>We developed an IE system that accurately infers metastatic sites in multiple primary cancers from radiology reports. It has explainable methods and performs better than some clinical LLMs. The inferred metastatic phenotypes could enhance cancer research databases and clinical trial matching, and identify potential patients for oligometastatic interventions.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300122"},"PeriodicalIF":3.3000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371090/pdf/","citationCount":"0","resultStr":"{\"title\":\"Use of Natural Language Processing to Infer Sites of Metastatic Disease From Radiology Reports at Scale.\",\"authors\":\"See Boon Tay, Guat Hwa Low, Gillian Jing En Wong, Han Jieh Tey, Fun Loon Leong, Constance Li, Melvin Lee Kiang Chua, Daniel Shao Weng Tan, Choon Hua Thng, Iain Bee Huat Tan, Ryan Shea Ying Cong Tan\",\"doi\":\"10.1200/CCI.23.00122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To evaluate natural language processing (NLP) methods to infer metastatic sites from radiology reports.</p><p><strong>Methods: </strong>A set of 4,522 computed tomography (CT) reports of 550 patients with 14 types of cancer was used to fine-tune four clinical large language models (LLMs) for multilabel classification of metastatic sites. We also developed an NLP information extraction (IE) system (on the basis of named entity recognition, assertion status detection, and relation extraction) for comparison. Model performances were measured by F1 scores on test and three external validation sets. The best model was used to facilitate analysis of metastatic frequencies in a cohort study of 6,555 patients with 53,838 CT reports.</p><p><strong>Results: </strong>The RadBERT, BioBERT, GatorTron-base, and GatorTron-medium LLMs achieved F1 scores of 0.84, 0.87, 0.89, and 0.91, respectively, on the test set. The IE system performed best, achieving an F1 score of 0.93. F1 scores of the IE system by individual cancer type ranged from 0.89 to 0.96. The IE system attained F1 scores of 0.89, 0.83, and 0.81, respectively, on external validation sets including additional cancer types, positron emission tomography-CT ,and magnetic resonance imaging scans, respectively. In our cohort study, we found that for colorectal cancer, liver-only metastases were higher in de novo stage IV versus recurrent patients (29.7% <i>v</i> 12.2%; <i>P</i> < .001). Conversely, lung-only metastases were more frequent in recurrent versus de novo stage IV patients (17.2% <i>v</i> 7.3%; <i>P</i> < .001).</p><p><strong>Conclusion: </strong>We developed an IE system that accurately infers metastatic sites in multiple primary cancers from radiology reports. It has explainable methods and performs better than some clinical LLMs. The inferred metastatic phenotypes could enhance cancer research databases and clinical trial matching, and identify potential patients for oligometastatic interventions.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"8 \",\"pages\":\"e2300122\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371090/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI.23.00122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.23.00122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:评估从放射学报告中推断转移部位的自然语言处理(NLP)方法:我们使用了一组包含550名14种癌症患者的4522份计算机断层扫描(CT)报告,对四种临床大型语言模型(LLM)进行了微调,以对转移部位进行多标签分类。我们还开发了一个 NLP 信息提取(IE)系统(基于命名实体识别、断言状态检测和关系提取)进行比较。模型性能通过测试集和三个外部验证集上的 F1 分数来衡量。最佳模型被用于一项队列研究中的转移频率分析,该队列研究包括 6555 名患者和 53838 份 CT 报告:RadBERT、BioBERT、GatorTron-base 和 GatorTron-medium LLM 在测试集上的 F1 分数分别为 0.84、0.87、0.89 和 0.91。IE 系统表现最佳,F1 得分为 0.93。按癌症类型划分,IE 系统的 F1 得分为 0.89 到 0.96 不等。在包括其他癌症类型、正电子发射断层扫描和磁共振成像扫描在内的外部验证集上,IE 系统的 F1 分数分别为 0.89、0.83 和 0.81。在我们的队列研究中,我们发现结直肠癌新发 IV 期患者的肝转移率高于复发患者(29.7% 对 12.2%;P < .001)。相反,复发的IV期患者与新发的IV期患者相比,仅肺转移的发生率更高(17.2% 对 7.3%;P < .001):我们开发了一种 IE 系统,可从放射学报告中准确推断多种原发性癌症的转移部位。结论:我们开发的 IE 系统能从放射学报告中准确推断出多种原发性癌症的转移部位,它具有可解释的方法,其表现优于一些临床 LLM。推断出的转移表型可加强癌症研究数据库和临床试验匹配,并识别出潜在的寡转移干预患者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Use of Natural Language Processing to Infer Sites of Metastatic Disease From Radiology Reports at Scale.

Purpose: To evaluate natural language processing (NLP) methods to infer metastatic sites from radiology reports.

Methods: A set of 4,522 computed tomography (CT) reports of 550 patients with 14 types of cancer was used to fine-tune four clinical large language models (LLMs) for multilabel classification of metastatic sites. We also developed an NLP information extraction (IE) system (on the basis of named entity recognition, assertion status detection, and relation extraction) for comparison. Model performances were measured by F1 scores on test and three external validation sets. The best model was used to facilitate analysis of metastatic frequencies in a cohort study of 6,555 patients with 53,838 CT reports.

Results: The RadBERT, BioBERT, GatorTron-base, and GatorTron-medium LLMs achieved F1 scores of 0.84, 0.87, 0.89, and 0.91, respectively, on the test set. The IE system performed best, achieving an F1 score of 0.93. F1 scores of the IE system by individual cancer type ranged from 0.89 to 0.96. The IE system attained F1 scores of 0.89, 0.83, and 0.81, respectively, on external validation sets including additional cancer types, positron emission tomography-CT ,and magnetic resonance imaging scans, respectively. In our cohort study, we found that for colorectal cancer, liver-only metastases were higher in de novo stage IV versus recurrent patients (29.7% v 12.2%; P < .001). Conversely, lung-only metastases were more frequent in recurrent versus de novo stage IV patients (17.2% v 7.3%; P < .001).

Conclusion: We developed an IE system that accurately infers metastatic sites in multiple primary cancers from radiology reports. It has explainable methods and performs better than some clinical LLMs. The inferred metastatic phenotypes could enhance cancer research databases and clinical trial matching, and identify potential patients for oligometastatic interventions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
期刊最新文献
Development, Validation, and Clinical Utility of Electronic Patient-Reported Outcome Measure-Enhanced Prediction Models for Overall Survival in Patients With Advanced Non-Small Cell Lung Cancer Receiving Immunotherapy. Metastatic Versus Localized Disease as Inclusion Criteria That Can Be Automatically Extracted From Randomized Controlled Trials Using Natural Language Processing. Identifying Oncology Patients at High Risk for Potentially Preventable Emergency Department Visits Using a Novel Definition. Use of Patient-Reported Outcomes in Risk Prediction Model Development to Support Cancer Care Delivery: A Scoping Review. Optimizing End Points for Phase III Cancer Trials.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1