Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition.

IF 3.3 Q2 ONCOLOGY JCO Clinical Cancer Informatics Pub Date : 2024-06-01 DOI:10.1200/CCI.23.00166
Xu Zuo, Ashok Kumar, Shuhan Shen, Jianfu Li, Grace Cong, Edward Jin, Qingxia Chen, Jeremy L Warner, Ping Yang, Hua Xu
{"title":"Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition.","authors":"Xu Zuo, Ashok Kumar, Shuhan Shen, Jianfu Li, Grace Cong, Edward Jin, Qingxia Chen, Jeremy L Warner, Ping Yang, Hua Xu","doi":"10.1200/CCI.23.00166","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The RECIST guidelines provide a standardized approach for evaluating the response of cancer to treatment, allowing for consistent comparison of treatment efficacy across different therapies and patients. However, collecting such information from electronic health records manually can be extremely labor-intensive and time-consuming because of the complexity and volume of clinical notes. The aim of this study is to apply natural language processing (NLP) techniques to automate this process, minimizing manual data collection efforts, and improving the consistency and reliability of the results.</p><p><strong>Methods: </strong>We proposed a complex, hybrid NLP system that automates the process of extracting, linking, and summarizing anticancer therapy and associated RECIST-like responses from narrative clinical text. The system consists of multiple machine learning-/deep learning-based and rule-based modules for diverse NLP tasks such as named entity recognition, assertion classification, relation extraction, and text normalization, to address different challenges associated with anticancer therapy and response information extraction. We then evaluated the system performances on two independent test sets from different institutions to demonstrate its effectiveness and generalizability.</p><p><strong>Results: </strong>The system used domain-specific language models, BioBERT and BioClinicalBERT, for high-performance therapy mentions identification and RECIST responses extraction and categorization. The best-performing model achieved a 0.66 score in linking therapy and RECIST response mentions, with end-to-end performance peaking at 0.74 after relation normalization, indicating substantial efficacy with room for improvement.</p><p><strong>Conclusion: </strong>We developed, implemented, and tested an information extraction system from clinical notes for cancer treatment and efficacy assessment information. We expect this system will support future cancer research, particularly oncologic studies that focus on efficiently assessing the effectiveness and reliability of cancer therapeutics.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300166"},"PeriodicalIF":3.3000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.23.00166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: The RECIST guidelines provide a standardized approach for evaluating the response of cancer to treatment, allowing for consistent comparison of treatment efficacy across different therapies and patients. However, collecting such information from electronic health records manually can be extremely labor-intensive and time-consuming because of the complexity and volume of clinical notes. The aim of this study is to apply natural language processing (NLP) techniques to automate this process, minimizing manual data collection efforts, and improving the consistency and reliability of the results.

Methods: We proposed a complex, hybrid NLP system that automates the process of extracting, linking, and summarizing anticancer therapy and associated RECIST-like responses from narrative clinical text. The system consists of multiple machine learning-/deep learning-based and rule-based modules for diverse NLP tasks such as named entity recognition, assertion classification, relation extraction, and text normalization, to address different challenges associated with anticancer therapy and response information extraction. We then evaluated the system performances on two independent test sets from different institutions to demonstrate its effectiveness and generalizability.

Results: The system used domain-specific language models, BioBERT and BioClinicalBERT, for high-performance therapy mentions identification and RECIST responses extraction and categorization. The best-performing model achieved a 0.66 score in linking therapy and RECIST response mentions, with end-to-end performance peaking at 0.74 after relation normalization, indicating substantial efficacy with room for improvement.

Conclusion: We developed, implemented, and tested an information extraction system from clinical notes for cancer treatment and efficacy assessment information. We expect this system will support future cancer research, particularly oncologic studies that focus on efficiently assessing the effectiveness and reliability of cancer therapeutics.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
根据 RECIST 定义从临床笔记中提取系统抗癌疗法和反应信息。
目的:RECIST 指南为评估癌症对治疗的反应提供了一种标准化方法,可以对不同疗法和患者的治疗效果进行一致的比较。然而,由于临床记录的复杂性和数量,从电子健康记录中手动收集此类信息可能会非常耗费人力和时间。本研究旨在应用自然语言处理(NLP)技术实现这一过程的自动化,最大限度地减少人工数据收集工作,并提高结果的一致性和可靠性:我们提出了一种复杂的混合 NLP 系统,该系统可自动从临床叙事文本中提取、链接和总结抗癌疗法及相关的 RECIST 类反应。该系统由多个基于机器学习/深度学习的模块和基于规则的模块组成,可完成命名实体识别、断言分类、关系提取和文本规范化等多种 NLP 任务,以应对与抗癌疗法和反应信息提取相关的不同挑战。然后,我们对来自不同机构的两个独立测试集进行了系统性能评估,以证明其有效性和通用性:结果:该系统使用了特定领域的语言模型 BioBERT 和 BioClinicalBERT,用于高性能的治疗提法识别和 RECIST 反应提取与分类。表现最好的模型在连接疗法和 RECIST 反应提及方面的得分达到了 0.66,在关系规范化后,端到端的表现达到了 0.74 的峰值,这表明该系统具有很高的效率,但仍有改进的余地:我们开发、实施并测试了一个从临床笔记中提取癌症治疗和疗效评估信息的系统。我们希望该系统能为未来的癌症研究提供支持,尤其是那些专注于高效评估癌症治疗有效性和可靠性的肿瘤研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
期刊最新文献
Development, Validation, and Clinical Utility of Electronic Patient-Reported Outcome Measure-Enhanced Prediction Models for Overall Survival in Patients With Advanced Non-Small Cell Lung Cancer Receiving Immunotherapy. Metastatic Versus Localized Disease as Inclusion Criteria That Can Be Automatically Extracted From Randomized Controlled Trials Using Natural Language Processing. Identifying Oncology Patients at High Risk for Potentially Preventable Emergency Department Visits Using a Novel Definition. Use of Patient-Reported Outcomes in Risk Prediction Model Development to Support Cancer Care Delivery: A Scoping Review. Optimizing End Points for Phase III Cancer Trials.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1