首页 > 最新文献

JCO Clinical Cancer Informatics最新文献

英文 中文
Automated, High-Throughput Platform to Generate a High-Reliability, Comprehensive Rectal Cancer Database. 生成高可靠性综合直肠癌数据库的自动化高通量平台
IF 4.2 Q2 Medicine Pub Date : 2024-05-01 DOI: 10.1200/CCI.23.00219
Neal Bhutiani, Mahmoud M G Yousef, Abdelrahman Yousef, Mohammad Zeineddine, Mark Knafl, Olivia Ratliff, Uditha P Fernando, Anastasia Turin, Fadl A Zeineddine, Jeff Jin, Kristin Alfaro-Munoz, Drew Goldstein, George J Chang, Scott Kopetz, John Paul Shen, Abhineet Uppal

Purpose: Dynamic operations platforms allow for cross-platform data extraction, integration, and analysis, although application of these platforms to large-scale oncology enterprises has not been described. This study presents a pipeline for automated, high-fidelity extraction, integration, and validation of cross-platform oncology data in patients undergoing treatment for rectal cancer at a single, high-volume institution.

Methods: A dynamic operations platform was used to identify patients with rectal cancer treated at MD Anderson Cancer Center between 2016 and 2022 who had magnetic resonance imaging (MRI) imaging and preoperative treatment details available in the electronic health record (EHR). Demographic, clinicopathologic, tumor mutation, radiographic, and treatment data were extracted from the EHR using a methodology adaptable to any disease site. Data accuracy was assessed by manual review. Accuracy before and after implementation of synoptic reporting was determined for MRI data.

Results: A total of 516 patients with localized rectal cancer were included. In the era after institutional adoption of synoptic reports, the dynamic operations platform extracted T (tumor) category data from the EHR with 95% accuracy compared with 87% before the use of synoptic reports, and N (lymph node) category with 88% compared with 58%. Correct extraction of pelvic sidewall adenopathy was 94% compared with 78%, and extramural vascular invasion accuracy was 99% compared with 89%. Neoadjuvant chemotherapy and radiation data were 99% accurate for patients who had synoptic data sources.

Conclusion: Using dynamic operations platforms enables automated cross-platform integration of multiparameter oncology data with high fidelity in patients undergoing multimodality treatment for rectal cancer. These pipelines can be adapted to other solid tumors and, together with standardized reporting, can increase efficiency in clinical research and the translation of actionable findings toward optimizing patient outcomes.

目的:动态操作平台可进行跨平台数据提取、整合和分析,但这些平台在大型肿瘤企业中的应用尚未见报道。本研究介绍了一种自动化、高保真提取、整合和验证跨平台肿瘤学数据的方法,该方法适用于在单一、高容量机构接受直肠癌治疗的患者:方法:使用动态操作平台识别2016年至2022年期间在MD安德森癌症中心接受治疗的直肠癌患者,这些患者的电子健康记录(EHR)中提供了磁共振成像(MRI)影像和术前治疗详情。采用适用于任何疾病部位的方法从电子病历中提取人口统计学、临床病理学、肿瘤突变、放射学和治疗数据。数据准确性通过人工审核进行评估。对核磁共振成像数据实施同步报告前后的准确性进行了测定:结果:共纳入了 516 名局部直肠癌患者。在机构采用同步报告后,动态操作平台从电子病历中提取T(肿瘤)类别数据的准确率为95%(使用同步报告前为87%),提取N(淋巴结)类别数据的准确率为88%(使用同步报告前为58%)。盆腔侧壁腺病的正确提取率为 94%,而使用同步报告前为 78%;壁外血管侵犯的准确率为 99%,而使用同步报告前为 89%。对于拥有同步数据源的患者,新辅助化疗和放疗数据的准确率为99%:结论:使用动态操作平台可以对接受直肠癌多模式治疗的患者的多参数肿瘤数据进行高保真的跨平台自动整合。这些流水线可适用于其他实体瘤,加上标准化报告,可提高临床研究的效率,并将可操作的研究结果转化为优化患者预后的方法。
{"title":"Automated, High-Throughput Platform to Generate a High-Reliability, Comprehensive Rectal Cancer Database.","authors":"Neal Bhutiani, Mahmoud M G Yousef, Abdelrahman Yousef, Mohammad Zeineddine, Mark Knafl, Olivia Ratliff, Uditha P Fernando, Anastasia Turin, Fadl A Zeineddine, Jeff Jin, Kristin Alfaro-Munoz, Drew Goldstein, George J Chang, Scott Kopetz, John Paul Shen, Abhineet Uppal","doi":"10.1200/CCI.23.00219","DOIUrl":"https://doi.org/10.1200/CCI.23.00219","url":null,"abstract":"<p><strong>Purpose: </strong>Dynamic operations platforms allow for cross-platform data extraction, integration, and analysis, although application of these platforms to large-scale oncology enterprises has not been described. This study presents a pipeline for automated, high-fidelity extraction, integration, and validation of cross-platform oncology data in patients undergoing treatment for rectal cancer at a single, high-volume institution.</p><p><strong>Methods: </strong>A dynamic operations platform was used to identify patients with rectal cancer treated at MD Anderson Cancer Center between 2016 and 2022 who had magnetic resonance imaging (MRI) imaging and preoperative treatment details available in the electronic health record (EHR). Demographic, clinicopathologic, tumor mutation, radiographic, and treatment data were extracted from the EHR using a methodology adaptable to any disease site. Data accuracy was assessed by manual review. Accuracy before and after implementation of synoptic reporting was determined for MRI data.</p><p><strong>Results: </strong>A total of 516 patients with localized rectal cancer were included. In the era after institutional adoption of synoptic reports, the dynamic operations platform extracted T (tumor) category data from the EHR with 95% accuracy compared with 87% before the use of synoptic reports, and N (lymph node) category with 88% compared with 58%. Correct extraction of pelvic sidewall adenopathy was 94% compared with 78%, and extramural vascular invasion accuracy was 99% compared with 89%. Neoadjuvant chemotherapy and radiation data were 99% accurate for patients who had synoptic data sources.</p><p><strong>Conclusion: </strong>Using dynamic operations platforms enables automated cross-platform integration of multiparameter oncology data with high fidelity in patients undergoing multimodality treatment for rectal cancer. These pipelines can be adapted to other solid tumors and, together with standardized reporting, can increase efficiency in clinical research and the translation of actionable findings toward optimizing patient outcomes.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140960790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting Electronic Health Record Neuroblastoma Treatment Data With High Fidelity Using the REDCap Clinical Data Interoperability Services Module. 使用 REDCap 临床数据互操作性服务模块高保真提取电子健康记录神经母细胞瘤治疗数据。
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-05-01 DOI: 10.1200/CCI.24.00009
Brian Furner, Alex Cheng, Ami V Desai, Daniel J Benedetti, Debra L Friedman, Kirk D Wyatt, Michael Watkins, Samuel L Volchenboum, Susan L Cohn

Purpose: Although the International Neuroblastoma Risk Group Data Commons (INRGdc) has enabled seminal large cohort studies, the research is limited by the lack of real-world, electronic health record (EHR) treatment data. To address this limitation, we evaluated the feasibility of extracting treatment data directly from EHRs using the REDCap Clinical Data Interoperability Services (CDIS) module for future submission to the INRGdc.

Methods: Patients enrolled on the Children's Oncology Group neuroblastoma biology study ANBL00B1 (ClinicalTrials.gov identifier: NCT00904241) who received care at the University of Chicago (UChicago) or the Vanderbilt University Medical Center (VUMC) after the go-live dates for the Fast Healthcare Interoperability Resources (FHIR)-compliant EHRs were identified. Antineoplastic drug orders were extracted using the CDIS module. To validate the CDIS output, antineoplastic agents extracted through FHIR were compared with those queried through EHR relational databases (UChicago's Clinical Research Data Warehouse and VUMC's Epic Clarity database) and manual chart review.

Results: The analytic cohort consisted of 41 patients at UChicago and 32 VUMC patients. Antineoplastic drug orders were identified in the extracted EHR records of 39 (95.1%) UChicago patients and 26 (81.3%) VUMC patients. Manual chart review confirmed that patients with missing (n = 8) or discontinued (n = 1) orders in the CDIS output did not receive antineoplastic agents during the timeframe of the study. More than 99% of the antineoplastic drug orders in the EHR relational databases were identified in the corresponding CDIS output.

Conclusion: Our results demonstrate the feasibility of extracting EHR treatment data with high fidelity using HL7-FHIR via REDCap CDIS for future submission to the INRGdc.

目的:尽管国际神经母细胞瘤风险组数据公共共享平台(INRGdc)促成了开创性的大型队列研究,但由于缺乏真实世界的电子病历(EHR)治疗数据,这项研究受到了限制。为了解决这一局限性,我们评估了使用 REDCap 临床数据互操作性服务(CDIS)模块直接从电子病历中提取治疗数据的可行性,以便将来提交给 INRGdc:方法:确定参加儿童肿瘤学组神经母细胞瘤生物学研究 ANBL00B1(ClinicalTrials.gov 标识符:NCT00904241)的患者,这些患者在符合快速医疗互操作性资源(FHIR)标准的电子病历启用日期之后在芝加哥大学(UChicago)或范德堡大学医学中心(VUMC)接受治疗。使用 CDIS 模块提取抗肿瘤药物订单。为了验证 CDIS 的输出结果,将通过 FHIR 提取的抗肿瘤药物与通过 EHR 关系数据库(芝加哥大学临床研究数据仓库和 VUMC 的 Epic Clarity 数据库)和人工病历审查查询的抗肿瘤药物进行了比较:分析队列由芝加哥大学的 41 名患者和弗吉尼亚大学医学院的 32 名患者组成。在提取的 EHR 记录中,确定了 39 名(95.1%)芝加哥大学患者和 26 名(81.3%)弗吉尼亚大学医学院患者的抗肿瘤药物订单。人工病历审查证实,CDIS 输出中缺失(8 例)或中断(1 例)订单的患者在研究期间未接受抗肿瘤药物治疗。电子病历关系数据库中 99% 以上的抗肿瘤药物订单在相应的 CDIS 输出中得到了确认:我们的研究结果证明了使用 HL7-FHIR 通过 REDCap CDIS 高保真提取电子病历治疗数据的可行性,以便将来提交给 INRGdc。
{"title":"Extracting Electronic Health Record Neuroblastoma Treatment Data With High Fidelity Using the REDCap Clinical Data Interoperability Services Module.","authors":"Brian Furner, Alex Cheng, Ami V Desai, Daniel J Benedetti, Debra L Friedman, Kirk D Wyatt, Michael Watkins, Samuel L Volchenboum, Susan L Cohn","doi":"10.1200/CCI.24.00009","DOIUrl":"10.1200/CCI.24.00009","url":null,"abstract":"<p><strong>Purpose: </strong>Although the International Neuroblastoma Risk Group Data Commons (INRGdc) has enabled seminal large cohort studies, the research is limited by the lack of real-world, electronic health record (EHR) treatment data. To address this limitation, we evaluated the feasibility of extracting treatment data directly from EHRs using the REDCap Clinical Data Interoperability Services (CDIS) module for future submission to the INRGdc.</p><p><strong>Methods: </strong>Patients enrolled on the Children's Oncology Group neuroblastoma biology study ANBL00B1 (ClinicalTrials.gov identifier: NCT00904241) who received care at the University of Chicago (UChicago) or the Vanderbilt University Medical Center (VUMC) after the go-live dates for the Fast Healthcare Interoperability Resources (FHIR)-compliant EHRs were identified. Antineoplastic drug orders were extracted using the CDIS module. To validate the CDIS output, antineoplastic agents extracted through FHIR were compared with those queried through EHR relational databases (UChicago's Clinical Research Data Warehouse and VUMC's Epic Clarity database) and manual chart review.</p><p><strong>Results: </strong>The analytic cohort consisted of 41 patients at UChicago and 32 VUMC patients. Antineoplastic drug orders were identified in the extracted EHR records of 39 (95.1%) UChicago patients and 26 (81.3%) VUMC patients. Manual chart review confirmed that patients with missing (n = 8) or discontinued (n = 1) orders in the CDIS output did not receive antineoplastic agents during the timeframe of the study. More than 99% of the antineoplastic drug orders in the EHR relational databases were identified in the corresponding CDIS output.</p><p><strong>Conclusion: </strong>Our results demonstrate the feasibility of extracting EHR treatment data with high fidelity using HL7-FHIR via REDCap CDIS for future submission to the INRGdc.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141180367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated, High-Throughput Platform to Generate a High-Reliability, Comprehensive Rectal Cancer Database. 生成高可靠性综合直肠癌数据库的自动化高通量平台
IF 4.2 Q2 Medicine Pub Date : 2024-05-01 DOI: 10.1200/CCI.23.00219
N. Bhutiani, Mahmoud M G Yousef, A. Yousef, M. Zeineddine, M. Knafl, Olivia Ratliff, Uditha P Fernando, Anastasia Turin, F. Zeineddine, Jeff Jin, Kristin D Alfaro-Munoz, Drew Goldstein, George J Chang, S. Kopetz, John Paul Shen, A. Uppal
PURPOSEDynamic operations platforms allow for cross-platform data extraction, integration, and analysis, although application of these platforms to large-scale oncology enterprises has not been described. This study presents a pipeline for automated, high-fidelity extraction, integration, and validation of cross-platform oncology data in patients undergoing treatment for rectal cancer at a single, high-volume institution.METHODSA dynamic operations platform was used to identify patients with rectal cancer treated at MD Anderson Cancer Center between 2016 and 2022 who had magnetic resonance imaging (MRI) imaging and preoperative treatment details available in the electronic health record (EHR). Demographic, clinicopathologic, tumor mutation, radiographic, and treatment data were extracted from the EHR using a methodology adaptable to any disease site. Data accuracy was assessed by manual review. Accuracy before and after implementation of synoptic reporting was determined for MRI data.RESULTSA total of 516 patients with localized rectal cancer were included. In the era after institutional adoption of synoptic reports, the dynamic operations platform extracted T (tumor) category data from the EHR with 95% accuracy compared with 87% before the use of synoptic reports, and N (lymph node) category with 88% compared with 58%. Correct extraction of pelvic sidewall adenopathy was 94% compared with 78%, and extramural vascular invasion accuracy was 99% compared with 89%. Neoadjuvant chemotherapy and radiation data were 99% accurate for patients who had synoptic data sources.CONCLUSIONUsing dynamic operations platforms enables automated cross-platform integration of multiparameter oncology data with high fidelity in patients undergoing multimodality treatment for rectal cancer. These pipelines can be adapted to other solid tumors and, together with standardized reporting, can increase efficiency in clinical research and the translation of actionable findings toward optimizing patient outcomes.
目的动态操作平台可进行跨平台数据提取、整合和分析,但这些平台在大型肿瘤企业中的应用尚未见报道。方法使用动态操作平台识别 2016 年至 2022 年期间在 MD 安德森癌症中心接受治疗的直肠癌患者,这些患者的电子病历(EHR)中提供了磁共振成像(MRI)和术前治疗的详细信息。采用适用于任何疾病部位的方法从电子病历中提取人口统计学、临床病理学、肿瘤突变、放射学和治疗数据。数据准确性通过人工审核进行评估。结果共纳入了 516 例局部直肠癌患者。在机构采用同步报告后,动态操作平台从电子病历中提取 T(肿瘤)类别数据的准确率为 95%,而使用同步报告前为 87%;提取 N(淋巴结)类别数据的准确率为 88%,而使用同步报告前为 58%。盆腔侧壁腺病的正确提取率为 94%,而使用同步报告前为 78%;壁外血管侵犯的准确率为 99%,而使用同步报告前为 89%。结论使用动态操作平台可以对接受直肠癌多模式治疗的患者的多参数肿瘤数据进行高保真的跨平台自动整合。这些流水线可适用于其他实体瘤,加上标准化报告,可提高临床研究的效率,并将可操作的研究结果转化为优化患者预后的方法。
{"title":"Automated, High-Throughput Platform to Generate a High-Reliability, Comprehensive Rectal Cancer Database.","authors":"N. Bhutiani, Mahmoud M G Yousef, A. Yousef, M. Zeineddine, M. Knafl, Olivia Ratliff, Uditha P Fernando, Anastasia Turin, F. Zeineddine, Jeff Jin, Kristin D Alfaro-Munoz, Drew Goldstein, George J Chang, S. Kopetz, John Paul Shen, A. Uppal","doi":"10.1200/CCI.23.00219","DOIUrl":"https://doi.org/10.1200/CCI.23.00219","url":null,"abstract":"PURPOSE\u0000Dynamic operations platforms allow for cross-platform data extraction, integration, and analysis, although application of these platforms to large-scale oncology enterprises has not been described. This study presents a pipeline for automated, high-fidelity extraction, integration, and validation of cross-platform oncology data in patients undergoing treatment for rectal cancer at a single, high-volume institution.\u0000\u0000\u0000METHODS\u0000A dynamic operations platform was used to identify patients with rectal cancer treated at MD Anderson Cancer Center between 2016 and 2022 who had magnetic resonance imaging (MRI) imaging and preoperative treatment details available in the electronic health record (EHR). Demographic, clinicopathologic, tumor mutation, radiographic, and treatment data were extracted from the EHR using a methodology adaptable to any disease site. Data accuracy was assessed by manual review. Accuracy before and after implementation of synoptic reporting was determined for MRI data.\u0000\u0000\u0000RESULTS\u0000A total of 516 patients with localized rectal cancer were included. In the era after institutional adoption of synoptic reports, the dynamic operations platform extracted T (tumor) category data from the EHR with 95% accuracy compared with 87% before the use of synoptic reports, and N (lymph node) category with 88% compared with 58%. Correct extraction of pelvic sidewall adenopathy was 94% compared with 78%, and extramural vascular invasion accuracy was 99% compared with 89%. Neoadjuvant chemotherapy and radiation data were 99% accurate for patients who had synoptic data sources.\u0000\u0000\u0000CONCLUSION\u0000Using dynamic operations platforms enables automated cross-platform integration of multiparameter oncology data with high fidelity in patients undergoing multimodality treatment for rectal cancer. These pipelines can be adapted to other solid tumors and, together with standardized reporting, can increase efficiency in clinical research and the translation of actionable findings toward optimizing patient outcomes.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141047594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Navigating the Complexities of Artificial Intelligence-Enabled Real-World Data Collection for Oncology Pharmacovigilance. 探索人工智能支持的肿瘤药物警戒真实世界数据收集的复杂性。
IF 4.2 Q2 Medicine Pub Date : 2024-05-01 DOI: 10.1200/CCI.24.00051
Jack Gallifant, Leo Anthony Celi, Elad Sharon, Danielle S Bitterman

This new editorial discusses the promise and challenges of successful integration of natural language processing methods into electronic health records for timely, robust, and fair oncology pharmacovigilance.

这篇新社论讨论了将自然语言处理方法成功整合到电子健康记录中以实现及时、稳健和公平的肿瘤药物警戒所带来的希望和挑战。
{"title":"Navigating the Complexities of Artificial Intelligence-Enabled Real-World Data Collection for Oncology Pharmacovigilance.","authors":"Jack Gallifant, Leo Anthony Celi, Elad Sharon, Danielle S Bitterman","doi":"10.1200/CCI.24.00051","DOIUrl":"https://doi.org/10.1200/CCI.24.00051","url":null,"abstract":"<p><p>This new editorial discusses the promise and challenges of successful integration of natural language processing methods into electronic health records for timely, robust, and fair oncology pharmacovigilance.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140877931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Value of Real-World Evidence for Treatment Selection: A Case Study in Colon Cancer. 真实世界证据对治疗选择的价值:结肠癌案例研究。
IF 4.2 Q2 Medicine Pub Date : 2024-05-01 DOI: 10.1200/CCI.23.00186
Lingjie Shen, Anja van Gestel, Peter Prinsen, Geraldine Vink, Felice N van Erning, Gijs Geleijnse, Maurits Kaptein

Purpose: Real-world evidence (RWE)-derived from analysis of real-world data (RWD)-has the potential to guide personalized treatment decisions. However, because of potential confounding, generating valid RWE is challenging. This study demonstrates how to responsibly generate RWE for treatment decisions. We validate our approach by demonstrating that we can uncover an existing adjuvant chemotherapy (ACT) guideline for stage II and III colon cancer (CC)-which came about using both data from randomized controlled trials and expert consensus-solely using RWD.

Methods: Data from the population-based Netherlands Cancer Registry from a total of 27,056 patients with stage II and III CC who underwent curative surgery were analyzed to estimate the overall survival (OS) benefit of ACT. Focusing on 5-year OS, the benefit of ACT was estimated for each patient using G-computation methods by adjusting for patient and tumor characteristics and estimated propensity score. Subsequently, on the basis of these estimates, an ACT decision tree was constructed.

Results: The constructed decision tree corresponds to the current Dutch guideline: patients with stage III or stage II with T stage 4 should receive surgery and ACT, whereas patients with stage II with T stage 3 should only receive surgery. Interestingly, we do not find sufficient RWE to conclude against ACT for stage II with T stage 4 and microsatellite instability-high (MSI-H), a recent addition to the current guideline.

Conclusion: RWE, if used carefully, can provide a valuable addition to our construction of evidence on clinical decision making and therefore ultimately affect treatment guidelines. Next to validating the ACT decisions advised in the current Dutch guideline, this paper suggests additional attention should be paid to MSI-H in future iterations of the guideline.

目的:真实世界证据(RWE)来自对真实世界数据(RWD)的分析,具有指导个性化治疗决策的潜力。然而,由于潜在的混杂因素,生成有效的真实世界证据具有挑战性。本研究展示了如何负责任地为治疗决策生成 RWE。我们验证了我们的方法,证明我们可以仅使用 RWD 来揭示现有的 II 期和 III 期结肠癌(CC)辅助化疗(ACT)指南,该指南是通过随机对照试验数据和专家共识产生的:方法:我们分析了荷兰癌症登记处(Netherlands Cancer Registry)以人口为基础的数据,该登记处共收集了 27056 名接受根治性手术的 II 期和 III 期结肠癌患者的数据,以估算 ACT 的总生存期(OS)。以5年OS为重点,通过调整患者和肿瘤特征以及估计倾向评分,使用G计算方法估算了每位患者的ACT获益。随后,根据这些估计值构建了ACT决策树:所构建的决策树符合荷兰现行指南:III 期或 II 期 T4 期患者应接受手术和 ACT 治疗,而 II 期 T3 期患者应仅接受手术治疗。有趣的是,我们没有发现足够的 RWE 来得出结论,反对对 T4 期 II 期和微卫星不稳定性高(MSI-H)患者进行 ACT,这也是当前指南中最新增加的一项内容:如果谨慎使用 RWE,可为我们构建临床决策证据提供有价值的补充,从而最终影响治疗指南。除了验证现行荷兰指南中建议的 ACT 决定外,本文还建议在今后的指南迭代中对 MSI-H 给予更多关注。
{"title":"Value of Real-World Evidence for Treatment Selection: A Case Study in Colon Cancer.","authors":"Lingjie Shen, Anja van Gestel, Peter Prinsen, Geraldine Vink, Felice N van Erning, Gijs Geleijnse, Maurits Kaptein","doi":"10.1200/CCI.23.00186","DOIUrl":"https://doi.org/10.1200/CCI.23.00186","url":null,"abstract":"<p><strong>Purpose: </strong>Real-world evidence (RWE)-derived from analysis of real-world data (RWD)-has the potential to guide personalized treatment decisions. However, because of potential confounding, generating valid RWE is challenging. This study demonstrates how to responsibly generate RWE for treatment decisions. We validate our approach by demonstrating that we can uncover an existing adjuvant chemotherapy (ACT) guideline for stage II and III colon cancer (CC)-which came about using both data from randomized controlled trials and expert consensus-solely using RWD.</p><p><strong>Methods: </strong>Data from the population-based Netherlands Cancer Registry from a total of 27,056 patients with stage II and III CC who underwent curative surgery were analyzed to estimate the overall survival (OS) benefit of ACT. Focusing on 5-year OS, the benefit of ACT was estimated for each patient using G-computation methods by adjusting for patient and tumor characteristics and estimated propensity score. Subsequently, on the basis of these estimates, an ACT decision tree was constructed.</p><p><strong>Results: </strong>The constructed decision tree corresponds to the current Dutch guideline: patients with stage III or stage II with T stage 4 should receive surgery and ACT, whereas patients with stage II with T stage 3 should only receive surgery. Interestingly, we do not find sufficient RWE to conclude against ACT for stage II with T stage 4 and microsatellite instability-high (MSI-H), a recent addition to the current guideline.</p><p><strong>Conclusion: </strong>RWE, if used carefully, can provide a valuable addition to our construction of evidence on clinical decision making and therefore ultimately affect treatment guidelines. Next to validating the ACT decisions advised in the current Dutch guideline, this paper suggests additional attention should be paid to MSI-H in future iterations of the guideline.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140946279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of Natural Language Understanding to Facilitate Surgical De-Escalation of Axillary Staging in Patients With Breast Cancer. 利用自然语言理解促进乳腺癌患者腋窝分期的手术切除。
IF 4.2 Q2 Medicine Pub Date : 2024-05-01 DOI: 10.1200/CCI.23.00177
Neil Carleton, Gilan Saadawi, Priscilla F McAuliffe, Atilla Soran, Steffi Oesterreich, Adrian V Lee, Emilia J Diego

Purpose: Natural language understanding (NLU) may be particularly well equipped for enhanced data capture from the electronic health record given its examination of both content-driven and context-driven extraction.

Methods: We developed and applied a NLU model to examine rates of pathological node positivity (pN+) and rates of lymphedema to determine whether omission of routine axillary staging could be extended to younger patients with estrogen receptor-positive (ER+)/cN0 disease.

Results: We found that rates of pN+ and arm lymphedema were similar between patients age 55-69 years and ≥70 years, with rates of lymphedema exceeding rates of pN+ for clinical stage T1c and smaller disease.

Conclusion: Data from our NLU model suggest that omission of sentinel lymph node biopsy might be extended beyond Choosing Wisely recommendations, limited to those older than 70 years and to all postmenopausal women with early-stage ER+/cN0 disease. These data support the recently reported SOUND trial results and provide additional granularity to facilitate surgical de-escalation.

目的:鉴于自然语言理解(NLU)对内容驱动和上下文驱动提取的检查,它可能特别适合从电子健康记录中增强数据采集:我们开发并应用了一个 NLU 模型来检查病理结节阳性率(pN+)和淋巴水肿率,以确定是否可以将常规腋窝分期的遗漏扩展到雌激素受体阳性(ER+)/cN0 疾病的年轻患者:我们发现,55-69岁和≥70岁患者的pN+率和手臂淋巴水肿率相似,临床分期为T1c和更小的患者的淋巴水肿率超过了pN+率:来自我们的 NLU 模型的数据表明,前哨淋巴结活检的省略可能会超出 Choosing Wisely 建议的范围,仅限于 70 岁以上的患者和所有患有早期 ER+/cN0 疾病的绝经后妇女。这些数据支持了最近报道的 SOUND 试验结果,并为促进手术降级提供了更多细节。
{"title":"Use of Natural Language Understanding to Facilitate Surgical De-Escalation of Axillary Staging in Patients With Breast Cancer.","authors":"Neil Carleton, Gilan Saadawi, Priscilla F McAuliffe, Atilla Soran, Steffi Oesterreich, Adrian V Lee, Emilia J Diego","doi":"10.1200/CCI.23.00177","DOIUrl":"10.1200/CCI.23.00177","url":null,"abstract":"<p><strong>Purpose: </strong>Natural language understanding (NLU) may be particularly well equipped for enhanced data capture from the electronic health record given its examination of both content-driven and context-driven extraction.</p><p><strong>Methods: </strong>We developed and applied a NLU model to examine rates of pathological node positivity (pN+) and rates of lymphedema to determine whether omission of routine axillary staging could be extended to younger patients with estrogen receptor-positive (ER+)/cN0 disease.</p><p><strong>Results: </strong>We found that rates of pN+ and arm lymphedema were similar between patients age 55-69 years and ≥70 years, with rates of lymphedema exceeding rates of pN+ for clinical stage T1c and smaller disease.</p><p><strong>Conclusion: </strong>Data from our NLU model suggest that omission of sentinel lymph node biopsy might be extended beyond Choosing Wisely recommendations, limited to those older than 70 years and to all postmenopausal women with early-stage ER+/cN0 disease. These data support the recently reported SOUND trial results and provide additional granularity to facilitate surgical de-escalation.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11180980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extraction and Imputation of Eastern Cooperative Oncology Group Performance Status From Unstructured Oncology Notes Using Language Models. 使用语言模型从非结构化肿瘤学笔记中提取和推算东部合作肿瘤学组的表现状态。
IF 4.2 Q2 Medicine Pub Date : 2024-05-01 DOI: 10.1200/CCI.23.00269
Wenxin Xu, Bowen Gu, William E Lotter, Kenneth L Kehl

Purpose: Eastern Cooperative Oncology Group (ECOG) performance status (PS) is a key clinical variable for cancer treatment and research, but it is usually only recorded in unstructured form in the electronic health record. We investigated whether natural language processing (NLP) models can impute ECOG PS using unstructured note text.

Materials and methods: Medical oncology notes were identified from all patients with cancer at our center from 1997 to 2023 and divided at the patient level into training (approximately 80%), tuning/validation (approximately 10%), and test (approximately 10%) sets. Regular expressions were used to extract explicitly documented PS. Extracted PS labels were used to train NLP models to impute ECOG PS (0-1 v 2-4) from the remainder of the notes (with regular expression-extracted PS documentation removed). We assessed associations between imputed PS and overall survival (OS).

Results: ECOG PS was extracted using regular expressions from 495,862 notes, corresponding to 79,698 patients. A Transformer-based Longformer model imputed PS with high discrimination (test set area under the receiver operating characteristic curve 0.95, area under the precision-recall curve 0.73). Imputed poor PS was associated with worse OS, including among notes with no explicit documentation of PS detected (OS hazard ratio, 11.9; 95% CI, 11.1 to 12.8).

Conclusion: NLP models can be used to impute performance status from unstructured oncologist notes at scale. This may aid the annotation of oncology data sets for clinical outcomes research and cancer care delivery.

目的:东部合作肿瘤学组(Eastern Cooperative Oncology Group,ECOG)的表现状态(PS)是癌症治疗和研究的一个关键临床变量,但它通常只以非结构化的形式记录在电子健康记录中。我们研究了自然语言处理(NLP)模型能否利用非结构化笔记文本推算 ECOG PS:从 1997 年到 2023 年,我们从中心的所有癌症患者中识别出了肿瘤内科笔记,并在患者层面上将其分为训练集(约占 80%)、调整/验证集(约占 10%)和测试集(约占 10%)。正则表达式用于提取明确记录的 PS。提取的 PS 标签用于训练 NLP 模型,以便从笔记的其余部分(去除正则表达式提取的 PS 文档)推算 ECOG PS(0-1 v 2-4)。我们评估了推算的 PS 与总生存期(OS)之间的关联:结果:使用正则表达式从 495,862 份笔记中提取了 ECOG PS,这些笔记对应于 79,698 名患者。基于变换器的 Longformer 模型以较高的辨别率估算了 PS(接收者操作特征曲线下的测试集面积为 0.95,精确度-召回曲线下的面积为 0.73)。推算出的不良PS与较差的OS有关,包括在没有明确PS检测记录的病例中(OS危险比,11.9;95% CI,11.1至12.8):结论:NLP 模型可用于从非结构化的肿瘤学家笔记中大规模推断患者的表现状态。结论:NLP 模型可以大规模地从非结构化的肿瘤医生笔记中推断患者的表现状态,这将有助于为临床结果研究和癌症治疗提供肿瘤数据集注释。
{"title":"Extraction and Imputation of Eastern Cooperative Oncology Group Performance Status From Unstructured Oncology Notes Using Language Models.","authors":"Wenxin Xu, Bowen Gu, William E Lotter, Kenneth L Kehl","doi":"10.1200/CCI.23.00269","DOIUrl":"https://doi.org/10.1200/CCI.23.00269","url":null,"abstract":"<p><strong>Purpose: </strong>Eastern Cooperative Oncology Group (ECOG) performance status (PS) is a key clinical variable for cancer treatment and research, but it is usually only recorded in unstructured form in the electronic health record. We investigated whether natural language processing (NLP) models can impute ECOG PS using unstructured note text.</p><p><strong>Materials and methods: </strong>Medical oncology notes were identified from all patients with cancer at our center from 1997 to 2023 and divided at the patient level into training (approximately 80%), tuning/validation (approximately 10%), and test (approximately 10%) sets. Regular expressions were used to extract explicitly documented PS. Extracted PS labels were used to train NLP models to impute ECOG PS (0-1 <i>v</i> 2-4) from the remainder of the notes (with regular expression-extracted PS documentation removed). We assessed associations between imputed PS and overall survival (OS).</p><p><strong>Results: </strong>ECOG PS was extracted using regular expressions from 495,862 notes, corresponding to 79,698 patients. A Transformer-based Longformer model imputed PS with high discrimination (test set area under the receiver operating characteristic curve 0.95, area under the precision-recall curve 0.73). Imputed poor PS was associated with worse OS, including among notes with no explicit documentation of PS detected (OS hazard ratio, 11.9; 95% CI, 11.1 to 12.8).</p><p><strong>Conclusion: </strong>NLP models can be used to impute performance status from unstructured oncologist notes at scale. This may aid the annotation of oncology data sets for clinical outcomes research and cancer care delivery.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141176621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phenotyping Hepatic Immune-Related Adverse Events in the Setting of Immune Checkpoint Inhibitor Therapy. 免疫检查点抑制剂治疗过程中肝脏免疫相关不良事件的表型分析
IF 3.3 Q2 Medicine Pub Date : 2024-05-01 DOI: 10.1200/CCI.23.00159
Theodore C Feldman, David E Kaplan, Albert Lin, Jennifer La, Jerry S H Lee, Mayada Aljehani, David P Tuck, Mary T Brophy, Nathanael R Fillmore, Nhan V Do

Purpose: We present and validate a rule-based algorithm for the detection of moderate to severe liver-related immune-related adverse events (irAEs) in a real-world patient cohort. The algorithm can be applied to studies of irAEs in large data sets.

Methods: We developed a set of criteria to define hepatic irAEs. The criteria include: the temporality of elevated laboratory measurements in the first 2-14 weeks of immune checkpoint inhibitor (ICI) treatment, steroid intervention within 2 weeks of the onset of elevated laboratory measurements, and intervention with a duration of at least 2 weeks. These criteria are based on the kinetics of patients who experienced moderate to severe hepatotoxicity (Common Terminology Criteria for Adverse Events grades 2-4). We applied these criteria to a retrospective cohort of 682 patients diagnosed with hepatocellular carcinoma and treated with ICI. All patients were required to have baseline laboratory measurements before and after the initiation of ICI.

Results: A set of 63 equally sampled patients were reviewed by two blinded, clinical adjudicators. Disagreements were reviewed and consensus was taken to be the ground truth. Of these, 25 patients with irAEs were identified, 16 were determined to be hepatic irAEs, 36 patients were nonadverse events, and two patients were of indeterminant status. Reviewers agreed in 44 of 63 patients, including 19 patients with irAEs (0.70 concordance, Fleiss' kappa: 0.43). By comparison, the algorithm achieved a sensitivity and specificity of identifying hepatic irAEs of 0.63 and 0.81, respectively, with a test efficiency (percent correctly classified) of 0.78 and outcome-weighted F1 score of 0.74.

Conclusion: The algorithm achieves greater concordance with the ground truth than either individual clinical adjudicator for the detection of irAEs.

目的:我们介绍并验证了一种基于规则的算法,该算法可用于在真实世界的患者队列中检测中度至重度肝脏相关免疫相关不良事件(irAEs)。该算法可应用于大型数据集中的irAEs研究:我们制定了一套标准来定义肝脏 irAEs。这些标准包括:在免疫检查点抑制剂(ICI)治疗的前 2-14 周内实验室测量值升高的时间性、在实验室测量值升高开始的 2 周内进行类固醇干预,以及干预持续时间至少 2 周。这些标准基于出现中度至重度肝毒性(不良事件通用术语标准 2-4 级)的患者的动力学。我们对 682 名被诊断为肝细胞癌并接受 ICI 治疗的患者组成的回顾性队列应用了这些标准。所有患者都必须在开始使用 ICI 之前和之后进行基线实验室测量:两名临床盲人评审员对 63 例抽样相同的患者进行了评审。对不同意见进行审查,并将共识作为基本事实。其中,25 例患者出现了虹膜不良事件,16 例被确定为肝脏虹膜不良事件,36 例为非不良事件,2 例为未确定状态。在 63 例患者中,有 44 例患者的审查结果与审查员一致,其中包括 19 例虹膜AEs 患者(一致性为 0.70,Fleiss' kappa:0.43)。相比之下,该算法识别肝脏虹膜AEs的灵敏度和特异度分别为0.63和0.81,测试效率(正确分类百分比)为0.78,结果加权F1得分为0.74:在检测虹膜睫状体异常方面,该算法与基本事实的吻合度高于任何一个临床判定者。
{"title":"Phenotyping Hepatic Immune-Related Adverse Events in the Setting of Immune Checkpoint Inhibitor Therapy.","authors":"Theodore C Feldman, David E Kaplan, Albert Lin, Jennifer La, Jerry S H Lee, Mayada Aljehani, David P Tuck, Mary T Brophy, Nathanael R Fillmore, Nhan V Do","doi":"10.1200/CCI.23.00159","DOIUrl":"10.1200/CCI.23.00159","url":null,"abstract":"<p><strong>Purpose: </strong>We present and validate a rule-based algorithm for the detection of moderate to severe liver-related immune-related adverse events (irAEs) in a real-world patient cohort. The algorithm can be applied to studies of irAEs in large data sets.</p><p><strong>Methods: </strong>We developed a set of criteria to define hepatic irAEs. The criteria include: the temporality of elevated laboratory measurements in the first 2-14 weeks of immune checkpoint inhibitor (ICI) treatment, steroid intervention within 2 weeks of the onset of elevated laboratory measurements, and intervention with a duration of at least 2 weeks. These criteria are based on the kinetics of patients who experienced moderate to severe hepatotoxicity (Common Terminology Criteria for Adverse Events grades 2-4). We applied these criteria to a retrospective cohort of 682 patients diagnosed with hepatocellular carcinoma and treated with ICI. All patients were required to have baseline laboratory measurements before and after the initiation of ICI.</p><p><strong>Results: </strong>A set of 63 equally sampled patients were reviewed by two blinded, clinical adjudicators. Disagreements were reviewed and consensus was taken to be the ground truth. Of these, 25 patients with irAEs were identified, 16 were determined to be hepatic irAEs, 36 patients were nonadverse events, and two patients were of indeterminant status. Reviewers agreed in 44 of 63 patients, including 19 patients with irAEs (0.70 concordance, Fleiss' kappa: 0.43). By comparison, the algorithm achieved a sensitivity and specificity of identifying hepatic irAEs of 0.63 and 0.81, respectively, with a test efficiency (percent correctly classified) of 0.78 and outcome-weighted F1 score of 0.74.</p><p><strong>Conclusion: </strong>The algorithm achieves greater concordance with the ground truth than either individual clinical adjudicator for the detection of irAEs.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11161238/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140905166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessment of BRCA1 and BRCA2 Germline Variant Data From Patients With Breast Cancer in a Real-World Data Registry. 评估真实世界数据登记册中乳腺癌患者的 BRCA1 和 BRCA2 基因变异数据。
IF 4.2 Q2 Medicine Pub Date : 2024-05-01 DOI: 10.1200/CCI.23.00251
Thales C Nepomuceno, Paulo Lyra, Jianbin Zhu, Fanchao Yi, Rachael H Martin, Daniel Lupu, Luke Peterson, Lauren C Peres, Anna Berry, Edwin S Iversen, Fergus J Couch, Qianxing Mo, Alvaro N Monteiro

Purpose: The emergence of large real-world clinical databases and tools to mine electronic medical records has allowed for an unprecedented look at large data sets with clinical and epidemiologic correlates. In clinical cancer genetics, real-world databases allow for the investigation of prevalence and effectiveness of prevention strategies and targeted treatments and for the identification of barriers to better outcomes. However, real-world data sets have inherent biases and problems (eg, selection bias, incomplete data, measurement error) that may hamper adequate analysis and affect statistical power.

Methods: Here, we leverage a real-world clinical data set from a large health network for patients with breast cancer tested for variants in BRCA1 and BRCA2 (N = 12,423). We conducted data cleaning and harmonization, cross-referenced with publicly available databases, performed variant reassessment and functional assays, and used functional data to inform a variant's clinical significance applying American College of Medical Geneticists and the Association of Molecular Pathology guidelines.

Results: In the cohort, White and Black patients were over-represented, whereas non-White Hispanic and Asian patients were under-represented. Incorrect or missing variant designations were the most significant contributor to data loss. While manual curation corrected many incorrect designations, a sizable fraction of patient carriers remained with incorrect or missing variant designations. Despite the large number of patients with clinical significance not reported, original reported clinical significance assessments were accurate. Reassessment of variants in which clinical significance was not reported led to a marked improvement in data quality.

Conclusion: We identify the most common issues with BRCA1 and BRCA2 testing data entry and suggest approaches to minimize data loss and keep interpretation of clinical significance of variants up to date.

目的:随着大型真实世界临床数据库和电子病历挖掘工具的出现,人们可以前所未有地查看与临床和流行病学相关的大型数据集。在临床癌症遗传学中,真实世界数据库可用于调查预防策略和靶向治疗的流行率和有效性,并确定获得更好结果的障碍。然而,真实世界的数据集存在固有的偏差和问题(如选择偏差、数据不完整、测量误差),可能会妨碍充分的分析并影响统计能力。方法:在此,我们利用一个大型医疗网络的真实世界临床数据集,对乳腺癌患者进行 BRCA1 和 BRCA2 变异检测(N = 12,423)。我们对数据进行了清理和统一,与公开数据库进行了交叉比对,进行了变异再评估和功能测定,并根据美国医学遗传学家学会和分子病理学协会的指导原则使用功能数据来确定变异的临床意义:在队列中,白人和黑人患者所占比例较高,而非白人的西班牙裔和亚裔患者所占比例较低。不正确或缺失的变异名称是造成数据丢失的最主要原因。虽然人工整理纠正了许多错误的指定,但仍有相当一部分患者携带者的变异体指定不正确或缺失。尽管有大量患者未报告临床意义,但原始报告的临床意义评估是准确的。对未报告临床意义的变异进行重新评估后,数据质量明显提高:我们找出了 BRCA1 和 BRCA2 检测数据录入中最常见的问题,并提出了尽量减少数据丢失和及时解释变异临床意义的方法。
{"title":"Assessment of <i>BRCA1</i> and <i>BRCA2</i> Germline Variant Data From Patients With Breast Cancer in a Real-World Data Registry.","authors":"Thales C Nepomuceno, Paulo Lyra, Jianbin Zhu, Fanchao Yi, Rachael H Martin, Daniel Lupu, Luke Peterson, Lauren C Peres, Anna Berry, Edwin S Iversen, Fergus J Couch, Qianxing Mo, Alvaro N Monteiro","doi":"10.1200/CCI.23.00251","DOIUrl":"10.1200/CCI.23.00251","url":null,"abstract":"<p><strong>Purpose: </strong>The emergence of large real-world clinical databases and tools to mine electronic medical records has allowed for an unprecedented look at large data sets with clinical and epidemiologic correlates. In clinical cancer genetics, real-world databases allow for the investigation of prevalence and effectiveness of prevention strategies and targeted treatments and for the identification of barriers to better outcomes. However, real-world data sets have inherent biases and problems (eg, selection bias, incomplete data, measurement error) that may hamper adequate analysis and affect statistical power.</p><p><strong>Methods: </strong>Here, we leverage a real-world clinical data set from a large health network for patients with breast cancer tested for variants in <i>BRCA1</i> and <i>BRCA2</i> (N = 12,423). We conducted data cleaning and harmonization, cross-referenced with publicly available databases, performed variant reassessment and functional assays, and used functional data to inform a variant's clinical significance applying American College of Medical Geneticists and the Association of Molecular Pathology guidelines.</p><p><strong>Results: </strong>In the cohort, White and Black patients were over-represented, whereas non-White Hispanic and Asian patients were under-represented. Incorrect or missing variant designations were the most significant contributor to data loss. While manual curation corrected many incorrect designations, a sizable fraction of patient carriers remained with incorrect or missing variant designations. Despite the large number of patients with clinical significance not reported, original reported clinical significance assessments were accurate. Reassessment of variants in which clinical significance was not reported led to a marked improvement in data quality.</p><p><strong>Conclusion: </strong>We identify the most common issues with <i>BRCA1</i> and <i>BRCA2</i> testing data entry and suggest approaches to minimize data loss and keep interpretation of clinical significance of variants up to date.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11161245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140864366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of Natural Language Processing to Infer Sites of Metastatic Disease From Radiology Reports at Scale. 利用自然语言处理技术从放射学报告中推断转移性疾病的部位。
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-05-01 DOI: 10.1200/CCI.23.00122
See Boon Tay, Guat Hwa Low, Gillian Jing En Wong, Han Jieh Tey, Fun Loon Leong, Constance Li, Melvin Lee Kiang Chua, Daniel Shao Weng Tan, Choon Hua Thng, Iain Bee Huat Tan, Ryan Shea Ying Cong Tan

Purpose: To evaluate natural language processing (NLP) methods to infer metastatic sites from radiology reports.

Methods: A set of 4,522 computed tomography (CT) reports of 550 patients with 14 types of cancer was used to fine-tune four clinical large language models (LLMs) for multilabel classification of metastatic sites. We also developed an NLP information extraction (IE) system (on the basis of named entity recognition, assertion status detection, and relation extraction) for comparison. Model performances were measured by F1 scores on test and three external validation sets. The best model was used to facilitate analysis of metastatic frequencies in a cohort study of 6,555 patients with 53,838 CT reports.

Results: The RadBERT, BioBERT, GatorTron-base, and GatorTron-medium LLMs achieved F1 scores of 0.84, 0.87, 0.89, and 0.91, respectively, on the test set. The IE system performed best, achieving an F1 score of 0.93. F1 scores of the IE system by individual cancer type ranged from 0.89 to 0.96. The IE system attained F1 scores of 0.89, 0.83, and 0.81, respectively, on external validation sets including additional cancer types, positron emission tomography-CT ,and magnetic resonance imaging scans, respectively. In our cohort study, we found that for colorectal cancer, liver-only metastases were higher in de novo stage IV versus recurrent patients (29.7% v 12.2%; P < .001). Conversely, lung-only metastases were more frequent in recurrent versus de novo stage IV patients (17.2% v 7.3%; P < .001).

Conclusion: We developed an IE system that accurately infers metastatic sites in multiple primary cancers from radiology reports. It has explainable methods and performs better than some clinical LLMs. The inferred metastatic phenotypes could enhance cancer research databases and clinical trial matching, and identify potential patients for oligometastatic interventions.

目的:评估从放射学报告中推断转移部位的自然语言处理(NLP)方法:我们使用了一组包含550名14种癌症患者的4522份计算机断层扫描(CT)报告,对四种临床大型语言模型(LLM)进行了微调,以对转移部位进行多标签分类。我们还开发了一个 NLP 信息提取(IE)系统(基于命名实体识别、断言状态检测和关系提取)进行比较。模型性能通过测试集和三个外部验证集上的 F1 分数来衡量。最佳模型被用于一项队列研究中的转移频率分析,该队列研究包括 6555 名患者和 53838 份 CT 报告:RadBERT、BioBERT、GatorTron-base 和 GatorTron-medium LLM 在测试集上的 F1 分数分别为 0.84、0.87、0.89 和 0.91。IE 系统表现最佳,F1 得分为 0.93。按癌症类型划分,IE 系统的 F1 得分为 0.89 到 0.96 不等。在包括其他癌症类型、正电子发射断层扫描和磁共振成像扫描在内的外部验证集上,IE 系统的 F1 分数分别为 0.89、0.83 和 0.81。在我们的队列研究中,我们发现结直肠癌新发 IV 期患者的肝转移率高于复发患者(29.7% 对 12.2%;P < .001)。相反,复发的IV期患者与新发的IV期患者相比,仅肺转移的发生率更高(17.2% 对 7.3%;P < .001):我们开发了一种 IE 系统,可从放射学报告中准确推断多种原发性癌症的转移部位。结论:我们开发的 IE 系统能从放射学报告中准确推断出多种原发性癌症的转移部位,它具有可解释的方法,其表现优于一些临床 LLM。推断出的转移表型可加强癌症研究数据库和临床试验匹配,并识别出潜在的寡转移干预患者。
{"title":"Use of Natural Language Processing to Infer Sites of Metastatic Disease From Radiology Reports at Scale.","authors":"See Boon Tay, Guat Hwa Low, Gillian Jing En Wong, Han Jieh Tey, Fun Loon Leong, Constance Li, Melvin Lee Kiang Chua, Daniel Shao Weng Tan, Choon Hua Thng, Iain Bee Huat Tan, Ryan Shea Ying Cong Tan","doi":"10.1200/CCI.23.00122","DOIUrl":"10.1200/CCI.23.00122","url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate natural language processing (NLP) methods to infer metastatic sites from radiology reports.</p><p><strong>Methods: </strong>A set of 4,522 computed tomography (CT) reports of 550 patients with 14 types of cancer was used to fine-tune four clinical large language models (LLMs) for multilabel classification of metastatic sites. We also developed an NLP information extraction (IE) system (on the basis of named entity recognition, assertion status detection, and relation extraction) for comparison. Model performances were measured by F1 scores on test and three external validation sets. The best model was used to facilitate analysis of metastatic frequencies in a cohort study of 6,555 patients with 53,838 CT reports.</p><p><strong>Results: </strong>The RadBERT, BioBERT, GatorTron-base, and GatorTron-medium LLMs achieved F1 scores of 0.84, 0.87, 0.89, and 0.91, respectively, on the test set. The IE system performed best, achieving an F1 score of 0.93. F1 scores of the IE system by individual cancer type ranged from 0.89 to 0.96. The IE system attained F1 scores of 0.89, 0.83, and 0.81, respectively, on external validation sets including additional cancer types, positron emission tomography-CT ,and magnetic resonance imaging scans, respectively. In our cohort study, we found that for colorectal cancer, liver-only metastases were higher in de novo stage IV versus recurrent patients (29.7% <i>v</i> 12.2%; <i>P</i> < .001). Conversely, lung-only metastases were more frequent in recurrent versus de novo stage IV patients (17.2% <i>v</i> 7.3%; <i>P</i> < .001).</p><p><strong>Conclusion: </strong>We developed an IE system that accurately infers metastatic sites in multiple primary cancers from radiology reports. It has explainable methods and performs better than some clinical LLMs. The inferred metastatic phenotypes could enhance cancer research databases and clinical trial matching, and identify potential patients for oligometastatic interventions.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141094670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JCO Clinical Cancer Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1