JCO Clinical Cancer Informatics最新文献_第5页

Comparative Analysis of Generative Pre-Trained Transformer Models in Oncogene-Driven Non-Small Cell Lung Cancer: Introducing the Generative Artificial Intelligence Performance Score. 癌基因驱动的非小细胞肺癌中生成预训练变压器模型的比较分析：引入生成人工智能性能评分。

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-12-11 DOI: 10.1200/CCI.24.00123

Zacharie Hamilton, Aseem Aseem, Zhengjia Chen, Noor Naffakh, Natalie M Reizine, Frank Weinberg, Shikha Jain, Larry G Kessler, Vijayakrishna K Gadi, Christopher Bun, Ryan H Nguyen

Purpose: Precision oncology in non-small cell lung cancer (NSCLC) relies on biomarker testing for clinical decision making. Despite its importance, challenges like the lack of genomic oncology training, nonstandardized biomarker reporting, and a rapidly evolving treatment landscape hinder its practice. Generative artificial intelligence (AI), such as ChatGPT, offers promise for enhancing clinical decision support. Effective performance metrics are crucial to evaluate these models' accuracy and their propensity for producing incorrect or hallucinated information. We assessed various ChatGPT versions' ability to generate accurate next-generation sequencing reports and treatment recommendations for NSCLC, using a novel Generative AI Performance Score (G-PS), which considers accuracy, relevancy, and hallucinations.

Methods: We queried ChatGPT versions for first-line NSCLC treatment recommendations with an Food and Drug Administration-approved targeted therapy, using a zero-shot prompt approach for eight oncogenes. Responses were assessed against National Comprehensive Cancer Network (NCCN) guidelines for accuracy, relevance, and hallucinations, with G-PS calculating scores from -1 (all hallucinations) to 1 (fully NCCN-compliant recommendations). G-PS was designed as a composite measure with a base score for correct recommendations (weighted for preferred treatments) and a penalty for hallucinations.

Results: Analyzing 160 responses, generative pre-trained transformer (GPT)-4 outperformed GPT-3.5, showing higher base score (90% v 60%; P < .01) and fewer hallucinations (34% v 53%; P < .01). GPT-4's overall G-PS was significantly higher (0.34 v -0.15; P < .01), indicating superior performance.

Conclusion: This study highlights the rapid improvement of generative AI in matching treatment recommendations with biomarkers in precision oncology. Although the rate of hallucinations improved in the GPT-4 model, future generative AI use in clinical care requires high levels of accuracy with minimal to no room for hallucinations. The GP-S represents a novel metric quantifying generative AI utility in health care compared with national guidelines, with potential adaptation beyond precision oncology.

目的：非小细胞肺癌（NSCLC）的精准肿瘤学依赖于临床决策的生物标记物检测。尽管其重要性不言而喻，但缺乏肿瘤基因组学培训、生物标记物报告不规范以及治疗环境快速变化等挑战阻碍了其实践。生成式人工智能（AI），如 ChatGPT，为加强临床决策支持带来了希望。有效的性能指标对于评估这些模型的准确性及其产生错误或幻觉信息的倾向至关重要。我们使用新颖的生成式人工智能性能评分（G-PS）评估了不同版本的 ChatGPT 生成准确的下一代测序报告和 NSCLC 治疗建议的能力，该评分考虑了准确性、相关性和幻觉：我们查询了 ChatGPT 版本，以获得美国食品和药物管理局批准的靶向疗法一线 NSCLC 治疗建议，并针对八种癌基因采用了零击提示方法。根据美国国家综合癌症网络（NCCN）指南对回复的准确性、相关性和幻觉进行评估，G-PS 计算的分数从-1（所有幻觉）到 1（完全符合 NCCN 的建议）不等。G-PS 被设计为一种综合测量方法，其中正确建议为基础分（根据首选治疗方法加权），幻觉为惩罚分：结果：分析了 160 个回复，生成式预训练转换器 (GPT)-4 的表现优于 GPT-3.5，显示出更高的基础分（90% 对 60%；P < .01）和更少的幻觉（34% 对 53%；P < .01）。GPT-4 的总体 G-PS 显著更高（0.34 v -0.15; P < .01），表明其性能更优：本研究强调了生成式人工智能在精准肿瘤学中将治疗建议与生物标志物相匹配方面的快速改进。虽然 GPT-4 模型的幻觉率有所改善，但未来在临床护理中使用生成式人工智能需要高水平的准确性，同时尽量减少或消除幻觉。与国家指南相比，GP-S代表了一种量化生成式人工智能在医疗保健中的实用性的新指标，具有超越精准肿瘤学的潜在适应性。

{"title":"Comparative Analysis of Generative Pre-Trained Transformer Models in Oncogene-Driven Non-Small Cell Lung Cancer: Introducing the Generative Artificial Intelligence Performance Score.","authors":"Zacharie Hamilton, Aseem Aseem, Zhengjia Chen, Noor Naffakh, Natalie M Reizine, Frank Weinberg, Shikha Jain, Larry G Kessler, Vijayakrishna K Gadi, Christopher Bun, Ryan H Nguyen","doi":"10.1200/CCI.24.00123","DOIUrl":"10.1200/CCI.24.00123","url":null,"abstract":"Purpose: Precision oncology in non-small cell lung cancer (NSCLC) relies on biomarker testing for clinical decision making. Despite its importance, challenges like the lack of genomic oncology training, nonstandardized biomarker reporting, and a rapidly evolving treatment landscape hinder its practice. Generative artificial intelligence (AI), such as ChatGPT, offers promise for enhancing clinical decision support. Effective performance metrics are crucial to evaluate these models' accuracy and their propensity for producing incorrect or hallucinated information. We assessed various ChatGPT versions' ability to generate accurate next-generation sequencing reports and treatment recommendations for NSCLC, using a novel Generative AI Performance Score (G-PS), which considers accuracy, relevancy, and hallucinations.Methods: We queried ChatGPT versions for first-line NSCLC treatment recommendations with an Food and Drug Administration-approved targeted therapy, using a zero-shot prompt approach for eight oncogenes. Responses were assessed against National Comprehensive Cancer Network (NCCN) guidelines for accuracy, relevance, and hallucinations, with G-PS calculating scores from -1 (all hallucinations) to 1 (fully NCCN-compliant recommendations). G-PS was designed as a composite measure with a base score for correct recommendations (weighted for preferred treatments) and a penalty for hallucinations.Results: Analyzing 160 responses, generative pre-trained transformer (GPT)-4 outperformed GPT-3.5, showing higher base score (90% v 60%; P < .01) and fewer hallucinations (34% v 53%; P < .01). GPT-4's overall G-PS was significantly higher (0.34 v -0.15; P < .01), indicating superior performance.Conclusion: This study highlights the rapid improvement of generative AI in matching treatment recommendations with biomarkers in precision oncology. Although the rate of hallucinations improved in the GPT-4 model, future generative AI use in clinical care requires high levels of accuracy with minimal to no room for hallucinations. The GP-S represents a novel metric quantifying generative AI utility in health care compared with national guidelines, with potential adaptation beyond precision oncology.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400123"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11634130/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing Large Language Models for Oncology Data Inference From Radiology Reports. 评估从放射学报告中推断肿瘤数据的大型语言模型。

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-12-11 DOI: 10.1200/CCI.24.00126

Li-Ching Chen, Travis Zack, Arda Demirci, Madhumita Sushil, Brenda Miao, Corynn Kasap, Atul Butte, Eric A Collisson, Julian C Hong

Purpose: We examined the effectiveness of proprietary and open large language models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports.

Methods: We analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Using generative pre-trained transformer (GPT)-4, GPT-3.5-turbo, and open models such as Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist.

Results: Among 164 patients with pancreatic tumor, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from objective findings directly. Most tested models demonstrated proficiency in identifying disease containing anatomic locations from a list of choices, with GPT-4 and Llama3-8B showing near-parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant postsurgical changes, affecting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5's interpretations, indicating the variability in human judgment.

Conclusion: LLMs, especially GPT-4, are proficient in deriving oncologic insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and health care analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable data set for further LLM research in oncology.

目的：我们研究了专有的和开放的大语言模型（LLMs）在检测胰腺癌的疾病存在、位置和治疗反应方面的有效性。方法：我们分析203份未确定的放射学报告，手工标注疾病状态、位置和需要随访的不确定结节。利用生成式预训练变压器(GPT)-4、GPT-3.5 turbo和开放式模型（如gma - 7b和Llama3-8B），我们采用了烧蚀和快速工程等策略来提高准确性。二级肿瘤学家审查了人类和模型解释之间的差异。结果：在164例胰腺肿瘤患者中，GPT-4对病情的判断准确率最高，达到75.5% （F1-micro）。开放式型号Mistral-7B和Llama3-8B表现相当，精度分别为68.6%和61.4%。Mistral-7B擅长直接从客观发现中得出正确的推论。大多数经过测试的模型都显示出从选择列表中识别包含解剖位置的疾病的熟练程度，GPT-4和Llama3-8B在疾病部位识别的准确性和召回率方面几乎相同。然而，开放模型很难区分术后良性和恶性变化，这影响了它们识别癌症不确定结果的准确性。次要的评论偶尔倾向于GPT-3.5的解释，表明人类判断的可变性。结论：LLMs，尤其是GPT-4，能够熟练地从放射学报告中获得肿瘤学见解。通过有效的总结策略，他们的表现得到了提高，展示了他们在临床支持和卫生保健分析方面的潜力。该研究还强调了在专有模型受到限制的环境中零射击开放模型实用的可能性。最后，通过提供一组带注释的放射学报告，本文为进一步的肿瘤学法学硕士研究提供了有价值的数据集。

{"title":"Assessing Large Language Models for Oncology Data Inference From Radiology Reports.","authors":"Li-Ching Chen, Travis Zack, Arda Demirci, Madhumita Sushil, Brenda Miao, Corynn Kasap, Atul Butte, Eric A Collisson, Julian C Hong","doi":"10.1200/CCI.24.00126","DOIUrl":"https://doi.org/10.1200/CCI.24.00126","url":null,"abstract":"Purpose: We examined the effectiveness of proprietary and open large language models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports.Methods: We analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Using generative pre-trained transformer (GPT)-4, GPT-3.5-turbo, and open models such as Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist.Results: Among 164 patients with pancreatic tumor, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from objective findings directly. Most tested models demonstrated proficiency in identifying disease containing anatomic locations from a list of choices, with GPT-4 and Llama3-8B showing near-parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant postsurgical changes, affecting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5's interpretations, indicating the variability in human judgment.Conclusion: LLMs, especially GPT-4, are proficient in deriving oncologic insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and health care analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable data set for further LLM research in oncology.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400126"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prediction of Hepatocellular Carcinoma After Hepatitis C Virus Sustained Virologic Response Using a Random Survival Forest Model. 使用随机生存森林模型预测丙型肝炎病毒持续病毒学反应后的肝细胞癌

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-12-18 DOI: 10.1200/CCI.24.00108

Hikaru Nakahara, Atsushi Ono, C Nelson Hayes, Yuki Shirane, Ryoichi Miura, Yasutoshi Fujii, Serami Murakami, Kenji Yamaoka, Hauri Bao, Shinsuke Uchikawa, Hatsue Fujino, Eisuke Murakami, Tomokazu Kawaoka, Daiki Miki, Masataka Tsuge, Shiro Oka

Purpose: Postsustained virologic response (SVR) screening following clinical guidelines does not address individual risk of hepatocellular carcinoma (HCC). Our aim is to provide tailored screening for patients using machine learning to predict HCC incidence after SVR.

Methods: Using clinical data from 1,028 SVR patients, we developed an HCC prediction model using a random survival forest (RSF). Model performance was assessed using Harrel's c-index and validated in an independent cohort of 737 SVR patients. Shapley additive explanation (SHAP) facilitated feature quantification, whereas optimal cutoffs were determined using maximally selected rank statistics. We used Kaplan-Meier analysis to compare cumulative HCC incidence between risk groups.

Results: We achieved c-index scores and 95% CIs of 0.90 (0.85 to 0.94) and 0.80 (0.74 to 0.85) in the derivation and validation cohorts, respectively, in a model using platelet count, gamma-glutamyl transpeptidase, sex, age, and ALT. Stratification resulted in four risk groups: low, intermediate, high, and very high. The 5-year cumulative HCC incidence rates and 95% CIs for these groups were as follows: derivation: 0% (0 to 0), 3.8% (0.6 to 6.8), 26.2% (17.2 to 34.3), and 54.2% (20.2 to 73.7), respectively, and validation: 0.7% (0 to 1.6), 7.1% (2.7 to 11.3), 5.2% (0 to 10.8), and 28.6% (0 to 55.3), respectively.

Conclusion: The integration of RSF and SHAP enabled accurate HCC risk classification after SVR, which may facilitate individualized HCC screening strategies and more cost-effective care.

目的：临床指南下的持续后病毒学反应（SVR）筛查并不能解决肝细胞癌（HCC）的个体风险。我们的目标是使用机器学习为患者提供量身定制的筛查，以预测SVR后HCC的发病率。方法：利用1028例SVR患者的临床数据，我们建立了一个使用随机生存森林（RSF）的HCC预测模型。采用Harrel’s c指数评估模型的性能，并在737例SVR患者的独立队列中进行验证。Shapley加性解释（SHAP）有助于特征量化，而最佳截止点是使用最大选择的秩统计来确定的。我们使用Kaplan-Meier分析比较不同危险组间HCC的累积发病率。结果：在使用血小板计数、γ -谷氨酰转肽酶、性别、年龄和ALT的模型中，我们在衍生和验证队列中分别获得了0.90（0.85至0.94）和0.80（0.74至0.85）的c指数评分和95% ci。分层产生了四个风险组：低、中、高和非常高。这些组的5年累积HCC发病率和95% ci分别为：推导：0%（0 ~ 0）、3.8%（0.6 ~ 6.8）、26.2%（17.2 ~ 34.3）和54.2%(20.2 ~ 73.7)，验证：0.7%（0 ~ 1.6）、7.1%（2.7 ~ 11.3）、5.2%（0 ~ 10.8）和28.6%（0 ~ 55.3）。结论：RSF和SHAP的结合可实现SVR后HCC风险的准确分类，有助于制定个性化的HCC筛查策略，提高治疗的成本效益。

{"title":"Prediction of Hepatocellular Carcinoma After Hepatitis C Virus Sustained Virologic Response Using a Random Survival Forest Model.","authors":"Hikaru Nakahara, Atsushi Ono, C Nelson Hayes, Yuki Shirane, Ryoichi Miura, Yasutoshi Fujii, Serami Murakami, Kenji Yamaoka, Hauri Bao, Shinsuke Uchikawa, Hatsue Fujino, Eisuke Murakami, Tomokazu Kawaoka, Daiki Miki, Masataka Tsuge, Shiro Oka","doi":"10.1200/CCI.24.00108","DOIUrl":"https://doi.org/10.1200/CCI.24.00108","url":null,"abstract":"Purpose: Postsustained virologic response (SVR) screening following clinical guidelines does not address individual risk of hepatocellular carcinoma (HCC). Our aim is to provide tailored screening for patients using machine learning to predict HCC incidence after SVR.Methods: Using clinical data from 1,028 SVR patients, we developed an HCC prediction model using a random survival forest (RSF). Model performance was assessed using Harrel's c-index and validated in an independent cohort of 737 SVR patients. Shapley additive explanation (SHAP) facilitated feature quantification, whereas optimal cutoffs were determined using maximally selected rank statistics. We used Kaplan-Meier analysis to compare cumulative HCC incidence between risk groups.Results: We achieved c-index scores and 95% CIs of 0.90 (0.85 to 0.94) and 0.80 (0.74 to 0.85) in the derivation and validation cohorts, respectively, in a model using platelet count, gamma-glutamyl transpeptidase, sex, age, and ALT. Stratification resulted in four risk groups: low, intermediate, high, and very high. The 5-year cumulative HCC incidence rates and 95% CIs for these groups were as follows: derivation: 0% (0 to 0), 3.8% (0.6 to 6.8), 26.2% (17.2 to 34.3), and 54.2% (20.2 to 73.7), respectively, and validation: 0.7% (0 to 1.6), 7.1% (2.7 to 11.3), 5.2% (0 to 10.8), and 28.6% (0 to 55.3), respectively.Conclusion: The integration of RSF and SHAP enabled accurate HCC risk classification after SVR, which may facilitate individualized HCC screening strategies and more cost-effective care.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400108"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-World Outcomes in Patients With Metastatic Renal Cell Carcinoma Treated With First-Line Nivolumab Plus Ipilimumab in the United States. 美国一线Nivolumab + Ipilimumab治疗转移性肾细胞癌患者的真实世界结果

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-12-20 DOI: 10.1200/CCI.24.00132

Gurjyot K Doshi, Andrew J Osterland, Ping Shi, Annette Yim, Viviana Del Tejo, Sarah B Guttenplan, Samantha Eiffert, Xin Yin, Lisa Rosenblatt, Paul R Conkling

Purpose: Nivolumab plus ipilimumab (NIVO + IPI) is a first-in-class combination immunotherapy for the treatment of intermediate- or poor (I/P)-risk advanced or metastatic renal cell carcinoma (mRCC). Currently, there are limited real-world data regarding clinical effectiveness beyond 12-24 months from treatment initiation. In this real-world study, treatment patterns and clinical outcomes were evaluated for NIVO + IPI in a community oncology setting.

Methods: A retrospective analysis using electronic medical record data from The US Oncology Network examined patients with I/P-risk clear cell mRCC who initiated first-line (1L) NIVO + IPI between January 4, 2018, and December 31, 2019, with follow-up until June 30, 2022. Baseline demographics, clinical characteristics, treatment patterns, clinical effectiveness, and safety outcomes were assessed descriptively. Overall survival (OS) and real-world progression-free survival (rwPFS) were analyzed using Kaplan-Meier methods.

Results: Among 187 patients identified (median follow-up, 22.4 months), with median age 63 (range, 30-89) years, 74 (39.6%) patients had poor risk and 37 (19.8%) patients had Eastern Cooperative Oncology Group performance status score ≥2. Of 86 patients who received second-line therapy, 54.7% received cabozantinib and 10.5% received pazopanib. The median (95% CI) OS and rwPFS were 38.4 (24.7-46.1) months and 11.1 (7.5-15.0) months, respectively. Treatment-related adverse events (TRAEs) were reported in 89 (47.6%) patients, including fatigue (n = 25, 13.4%) and rash (n = 19, 10.2%).

Conclusion: This study provides data to support the understanding of the real-world utilization and long-term effectiveness of 1L NIVO + IPI in patients with I/P-risk mRCC. TRAE rates were low relative to clinical trials.

目的：Nivolumab + ipilimumab （NIVO + IPI）是一种用于治疗中或低（I/P）风险晚期或转移性肾细胞癌（mRCC）的首创联合免疫疗法。目前，关于治疗开始后12-24个月的临床有效性的实际数据有限。在这项现实世界的研究中，在社区肿瘤学环境中评估了NIVO + IPI的治疗模式和临床结果。方法：回顾性分析美国肿瘤网络的电子病历数据，对2018年1月4日至2019年12月31日期间开始一线（1L） NIVO + IPI的I/ p -风险透明细胞mRCC患者进行分析，随访至2022年6月30日。对基线人口统计学、临床特征、治疗模式、临床有效性和安全性结果进行描述性评估。采用Kaplan-Meier方法分析总生存期（OS）和真实世界无进展生存期（rwPFS）。结果：187例患者（中位随访22.4个月），中位年龄63岁（范围30 ~ 89岁），不良风险74例（39.6%），东部肿瘤合作组绩效状态评分≥2例（19.8%）。86名接受二线治疗的患者中，54.7%接受卡博赞替尼治疗，10.5%接受帕唑帕尼治疗。中位（95% CI） OS和rwPFS分别为38.4（24.7-46.1）个月和11.1（7.5-15.0）个月。89例（47.6%）患者报告了治疗相关不良事件（TRAEs），包括疲劳（n = 25, 13.4%）和皮疹（n = 19, 10.2%）。结论：本研究提供的数据支持了解1L NIVO + IPI在I/P-risk mRCC患者中的实际使用情况和长期有效性。与临床试验相比，TRAE率较低。

{"title":"Real-World Outcomes in Patients With Metastatic Renal Cell Carcinoma Treated With First-Line Nivolumab Plus Ipilimumab in the United States.","authors":"Gurjyot K Doshi, Andrew J Osterland, Ping Shi, Annette Yim, Viviana Del Tejo, Sarah B Guttenplan, Samantha Eiffert, Xin Yin, Lisa Rosenblatt, Paul R Conkling","doi":"10.1200/CCI.24.00132","DOIUrl":"10.1200/CCI.24.00132","url":null,"abstract":"Purpose: Nivolumab plus ipilimumab (NIVO + IPI) is a first-in-class combination immunotherapy for the treatment of intermediate- or poor (I/P)-risk advanced or metastatic renal cell carcinoma (mRCC). Currently, there are limited real-world data regarding clinical effectiveness beyond 12-24 months from treatment initiation. In this real-world study, treatment patterns and clinical outcomes were evaluated for NIVO + IPI in a community oncology setting.Methods: A retrospective analysis using electronic medical record data from The US Oncology Network examined patients with I/P-risk clear cell mRCC who initiated first-line (1L) NIVO + IPI between January 4, 2018, and December 31, 2019, with follow-up until June 30, 2022. Baseline demographics, clinical characteristics, treatment patterns, clinical effectiveness, and safety outcomes were assessed descriptively. Overall survival (OS) and real-world progression-free survival (rwPFS) were analyzed using Kaplan-Meier methods.Results: Among 187 patients identified (median follow-up, 22.4 months), with median age 63 (range, 30-89) years, 74 (39.6%) patients had poor risk and 37 (19.8%) patients had Eastern Cooperative Oncology Group performance status score ≥2. Of 86 patients who received second-line therapy, 54.7% received cabozantinib and 10.5% received pazopanib. The median (95% CI) OS and rwPFS were 38.4 (24.7-46.1) months and 11.1 (7.5-15.0) months, respectively. Treatment-related adverse events (TRAEs) were reported in 89 (47.6%) patients, including fatigue (n = 25, 13.4%) and rash (n = 19, 10.2%).Conclusion: This study provides data to support the understanding of the real-world utilization and long-term effectiveness of 1L NIVO + IPI in patients with I/P-risk mRCC. TRAE rates were low relative to clinical trials.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400132"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Implementing Cancer Registry Data With the PCORnet Common Data Model: The Greater Plains Collaborative Experience. 用PCORnet公共数据模型实现癌症登记数据：大平原协作经验。

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-12-17 DOI: 10.1200/CCI-24-00196

Bradley D McDowell, Michael A O'Rorke, Mary C Schroeder, Elizabeth A Chrischilles, Christine M Spinka, Lemuel R Waitman, Kelechi Anuforo, Alejandro Araya, Haddyjatou Bah, Jackson Barlocker, Sravani Chandaka, Lindsay G Cowell, Carol R Geary, Snehil Gupta, Benjamin D Horne, Boyd M Knosp, Albert M Lai, Vasanthi Mandhadi, Abu Saleh Mohammad Mosa, Phillip Reeder, Giyung Ryu, Brian Shukwit, Claire Smith, Alexander J Stoddard, Mahanazuddin Syed, Shorabuddin Syed, Bradley W Taylor, Jeffrey J VanWormer

Purpose: Electronic health records (EHRs) comprise a rich source of real-world data for cancer studies, but they often lack critical structured data elements such as diagnosis date and disease stage. Fortunately, such concepts are available from hospital cancer registries. We describe experiences from integrating cancer registry data with EHR and billing data in an interoperable data model across a multisite clinical research network.

Methods: After sites implemented cancer registry data into a tumor table compatible with the PCORnet Common Data Model (CDM), distributed queries were performed to assess quality issues. After remediation of quality issues, another query produced descriptive frequencies of cancer types and demographic characteristics. This included linked BMI. We also report two current use cases of the new resource.

Results: Eleven sites implemented the tumor table, yielding a resource with data for 572,902 tumors. Institutional and technical barriers were surmounted to accomplish this. Variations in racial and ethnic distributions across the sites were observed; the percent of tumors among Black patients ranged from <1% to 15% across sites, and the percent of tumors among Hispanic patients ranged from 1% to 46% across sites. Current use cases include a pragmatic prospective cohort study of a rare cancer and a retrospective cohort study leveraging body size and chemotherapy dosing.

Conclusion: Integrating cancer registry data with the PCORnet CDM across multiple institutions creates a powerful resource for cancer studies. It provides a wider array of structured, cancer-relevant concepts, and it allows investigators to examine variability in those concepts across many treatment environments. Having the CDM tumor table in place enhances the impact of the network's effectiveness for real-world cancer research.

目的：电子健康记录（EHRs）为癌症研究提供了丰富的真实数据来源，但它们往往缺乏关键的结构化数据元素，如诊断日期和疾病阶段。幸运的是，这些概念可以从医院癌症登记处获得。我们描述了在跨多站点临床研究网络的可互操作数据模型中整合癌症注册数据与电子病历和计费数据的经验。方法：在站点将癌症注册数据导入与PCORnet公共数据模型（CDM）兼容的肿瘤表后，执行分布式查询以评估质量问题。在修复了质量问题后，另一个查询产生了癌症类型和人口统计学特征的描述性频率。这包括关联BMI。我们还报告了新资源的两个当前用例。结果：11个站点实现了肿瘤表，产生了572,902个肿瘤的数据资源。为了实现这一目标，克服了体制和技术障碍。观察到各地点种族和民族分布的差异；结论：将癌症登记数据与跨多个机构的PCORnet CDM相结合，为癌症研究创造了强大的资源。它提供了更广泛的结构化的、与癌症相关的概念，它允许研究人员在许多治疗环境中检查这些概念的可变性。CDM肿瘤表的建立增强了网络对真实世界癌症研究的有效性。

{"title":"Implementing Cancer Registry Data With the PCORnet Common Data Model: The Greater Plains Collaborative Experience.","authors":"Bradley D McDowell, Michael A O'Rorke, Mary C Schroeder, Elizabeth A Chrischilles, Christine M Spinka, Lemuel R Waitman, Kelechi Anuforo, Alejandro Araya, Haddyjatou Bah, Jackson Barlocker, Sravani Chandaka, Lindsay G Cowell, Carol R Geary, Snehil Gupta, Benjamin D Horne, Boyd M Knosp, Albert M Lai, Vasanthi Mandhadi, Abu Saleh Mohammad Mosa, Phillip Reeder, Giyung Ryu, Brian Shukwit, Claire Smith, Alexander J Stoddard, Mahanazuddin Syed, Shorabuddin Syed, Bradley W Taylor, Jeffrey J VanWormer","doi":"10.1200/CCI-24-00196","DOIUrl":"10.1200/CCI-24-00196","url":null,"abstract":"Purpose: Electronic health records (EHRs) comprise a rich source of real-world data for cancer studies, but they often lack critical structured data elements such as diagnosis date and disease stage. Fortunately, such concepts are available from hospital cancer registries. We describe experiences from integrating cancer registry data with EHR and billing data in an interoperable data model across a multisite clinical research network.Methods: After sites implemented cancer registry data into a tumor table compatible with the PCORnet Common Data Model (CDM), distributed queries were performed to assess quality issues. After remediation of quality issues, another query produced descriptive frequencies of cancer types and demographic characteristics. This included linked BMI. We also report two current use cases of the new resource.Results: Eleven sites implemented the tumor table, yielding a resource with data for 572,902 tumors. Institutional and technical barriers were surmounted to accomplish this. Variations in racial and ethnic distributions across the sites were observed; the percent of tumors among Black patients ranged from <1% to 15% across sites, and the percent of tumors among Hispanic patients ranged from 1% to 46% across sites. Current use cases include a pragmatic prospective cohort study of a rare cancer and a retrospective cohort study leveraging body size and chemotherapy dosing.Conclusion: Integrating cancer registry data with the PCORnet CDM across multiple institutions creates a powerful resource for cancer studies. It provides a wider array of structured, cancer-relevant concepts, and it allows investigators to examine variability in those concepts across many treatment environments. Having the CDM tumor table in place enhances the impact of the network's effectiveness for real-world cancer research.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400196"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Metastatic Versus Localized Disease as Inclusion Criteria That Can Be Automatically Extracted From Randomized Controlled Trials Using Natural Language Processing. 利用自然语言处理技术从随机对照试验中自动提取转移性疾病与局部疾病的纳入标准

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-11-27 DOI: 10.1200/CCI-24-00150

Paul Windisch, Fabio Dennstädt, Carole Koechli, Robert Förster, Christina Schröder, Daniel M Aebersold, Daniel R Zwahlen

Purpose: Extracting inclusion and exclusion criteria in a structured, automated fashion remains a challenge to developing better search functionalities or automating systematic reviews of randomized controlled trials in oncology. The question "Did this trial enroll patients with localized disease, metastatic disease, or both?" could be used to narrow down the number of potentially relevant trials when conducting a search.

Methods: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. Five hundred trials were used to develop and validate three different models, with 100 trials being stored away for testing. The test set was also used to evaluate the performance of GPT-4o in the same task.

Results: In the test set, a rule-based system using regular expressions achieved F1 scores of 0.72 for the prediction of whether the trial allowed for the inclusion of patients with localized disease and 0.77 for metastatic disease. A transformer-based machine learning (ML) model achieved F1 scores of 0.97 and 0.88, respectively. A combined approach where the rule-based system was allowed to over-rule the ML model achieved F1 scores of 0.97 and 0.89, respectively. GPT-4o achieved F1 scores of 0.87 and 0.92, respectively.

Conclusion: Automatic classification of cancer trials with regard to the inclusion of patients with localized and/or metastatic disease is feasible. Turning the extraction of trial criteria into classification problems could, in selected cases, improve text-mining approaches in evidence-based medicine. Increasingly large language models can reduce or eliminate the need for previous training on the task at the expense of increased computational power and, in turn, cost.

目的：以结构化、自动化的方式提取纳入和排除标准仍然是开发更好的搜索功能或对肿瘤随机对照试验进行自动化系统综述所面临的挑战。在进行检索时，"该试验是否纳入了局部疾病、转移性疾病或两者兼有的患者？"这一问题可用于缩小潜在相关试验的数量：根据是否允许纳入局部性疾病和/或转移性疾病患者，对来自高影响力医学期刊的600项试验进行了分类。500 项试验用于开发和验证三种不同的模型，其中 100 项试验用于测试。测试集还用于评估 GPT-4o 在同一任务中的性能：在测试集中，基于规则的系统使用正则表达式预测试验是否允许纳入局部疾病患者，F1 得分为 0.72，预测转移性疾病的 F1 得分为 0.77。基于转换器的机器学习（ML）模型的 F1 分数分别为 0.97 和 0.88。在一种综合方法中，允许基于规则的系统凌驾于 ML 模型之上，F1 分数分别为 0.97 和 0.89。GPT-4o 的 F1 分数分别为 0.87 和 0.92：在纳入局部和/或转移性疾病患者方面对癌症试验进行自动分类是可行的。在某些情况下，将提取试验标准转化为分类问题可以改进循证医学中的文本挖掘方法。越来越多的大型语言模型可以减少或消除对先前任务训练的需求，但代价是计算能力的提高和成本的增加。

{"title":"Metastatic Versus Localized Disease as Inclusion Criteria That Can Be Automatically Extracted From Randomized Controlled Trials Using Natural Language Processing.","authors":"Paul Windisch, Fabio Dennstädt, Carole Koechli, Robert Förster, Christina Schröder, Daniel M Aebersold, Daniel R Zwahlen","doi":"10.1200/CCI-24-00150","DOIUrl":"https://doi.org/10.1200/CCI-24-00150","url":null,"abstract":"Purpose: Extracting inclusion and exclusion criteria in a structured, automated fashion remains a challenge to developing better search functionalities or automating systematic reviews of randomized controlled trials in oncology. The question \"Did this trial enroll patients with localized disease, metastatic disease, or both?\" could be used to narrow down the number of potentially relevant trials when conducting a search.Methods: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. Five hundred trials were used to develop and validate three different models, with 100 trials being stored away for testing. The test set was also used to evaluate the performance of GPT-4o in the same task.Results: In the test set, a rule-based system using regular expressions achieved F1 scores of 0.72 for the prediction of whether the trial allowed for the inclusion of patients with localized disease and 0.77 for metastatic disease. A transformer-based machine learning (ML) model achieved F1 scores of 0.97 and 0.88, respectively. A combined approach where the rule-based system was allowed to over-rule the ML model achieved F1 scores of 0.97 and 0.89, respectively. GPT-4o achieved F1 scores of 0.87 and 0.92, respectively.Conclusion: Automatic classification of cancer trials with regard to the inclusion of patients with localized and/or metastatic disease is feasible. Turning the extraction of trial criteria into classification problems could, in selected cases, improve text-mining approaches in evidence-based medicine. Increasingly large language models can reduce or eliminate the need for previous training on the task at the expense of increased computational power and, in turn, cost.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400150"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY. 用人工智能增强甲状腺病理学：使用 RUBY 从电子健康报告中自动提取数据。

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-12-10 DOI: 10.1200/CCI.23.00263

Dorian Culié, Renaud Schiappa, Sara Contu, Eva Seutin, Tanguy Pace-Loscos, Gilles Poissonnet, Agathe Villarme, Alexandre Bozec, Emmanuel Chamorey

Purpose: Thyroid nodules are common in the general population, and assessing their malignancy risk is the initial step in care. Surgical exploration remains the sole definitive option for indeterminate nodules. Extensive database access is crucial for improving this initial assessment. Our objective was to develop an automated process using convolutional neural networks (CNNs) to extract and structure biomedical insights from electronic health reports (EHRs) in a large thyroid pathology cohort.

Materials and methods: We randomly selected 1,500 patients with thyroid pathology from our cohort for model development and an additional 100 for testing. We then divided the cohort of 1,500 patients into training (70%) and validation (30%) sets. We used EHRs from initial surgeon visits, preanesthesia visits, ultrasound, surgery, and anatomopathology reports. We selected 42 variables of interest and had them manually annotated by a clinical expert. We developed RUBY-THYRO using six distinct CNN models from SpaCy, supplemented with keyword extraction rules and postprocessing. Evaluation against a gold standard database included calculating precision, recall, and F1 score.

Results: Performance remained consistent across the test and validation sets, with the majority of variables (30/42) achieving performance metrics exceeding 90% for all metrics in both sets. Results differed according to the variables; pathologic tumor stage score achieved 100% in precision, recall, and F1 score, versus 45%, 28%, and 32% for the number of nodules in the test set, respectively. Surgical and preanesthesia reports demonstrated particularly high performance.

Conclusion: Our study successfully implemented a CNN-based natural language processing (NLP) approach for extracting and structuring data from various EHRs in thyroid pathology. This highlights the potential of artificial intelligence-driven NLP techniques for extensive and cost-effective data extraction, paving the way for creating comprehensive, hospital-wide data warehouses.

目的：甲状腺结节在普通人群中很常见，评估其恶性风险是治疗的第一步。对于不确定的结节，手术探查仍是唯一明确的选择。广泛的数据库访问对于改善这种初步评估至关重要。我们的目标是开发一种使用卷积神经网络（CNN）的自动化流程，以从大型甲状腺病理学队列中的电子健康报告（EHR）中提取并构建生物医学见解：我们从队列中随机选取了 1,500 名甲状腺病理患者进行模型开发，并另外选取了 100 名患者进行测试。然后，我们将这 1500 名患者分为训练集（70%）和验证集（30%）。我们使用了外科医生初次就诊、麻醉前就诊、超声检查、手术和解剖病理报告中的电子病历。我们选择了 42 个感兴趣的变量，并由临床专家进行人工标注。我们使用 SpaCy 的六个不同 CNN 模型开发了 RUBY-THYRO，并辅以关键词提取规则和后处理。根据金标准数据库进行的评估包括计算精确度、召回率和 F1 分数：测试集和验证集的性能保持一致，大多数变量（30/42）的性能指标在两个集的所有指标中都超过了 90%。变量不同，结果也不同；病理肿瘤分期得分的精确度、召回率和F1得分均为100%，而测试集中结节数量的精确度、召回率和F1得分分别为45%、28%和32%。手术报告和麻醉前报告的表现尤为突出：我们的研究成功实施了一种基于 CNN 的自然语言处理 (NLP) 方法，用于从甲状腺病理学的各种电子病历中提取和构建数据。这凸显了人工智能驱动的 NLP 技术在广泛且经济高效的数据提取方面的潜力，为创建全院范围的综合数据仓库铺平了道路。

{"title":"Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY.","authors":"Dorian Culié, Renaud Schiappa, Sara Contu, Eva Seutin, Tanguy Pace-Loscos, Gilles Poissonnet, Agathe Villarme, Alexandre Bozec, Emmanuel Chamorey","doi":"10.1200/CCI.23.00263","DOIUrl":"https://doi.org/10.1200/CCI.23.00263","url":null,"abstract":"Purpose: Thyroid nodules are common in the general population, and assessing their malignancy risk is the initial step in care. Surgical exploration remains the sole definitive option for indeterminate nodules. Extensive database access is crucial for improving this initial assessment. Our objective was to develop an automated process using convolutional neural networks (CNNs) to extract and structure biomedical insights from electronic health reports (EHRs) in a large thyroid pathology cohort.Materials and methods: We randomly selected 1,500 patients with thyroid pathology from our cohort for model development and an additional 100 for testing. We then divided the cohort of 1,500 patients into training (70%) and validation (30%) sets. We used EHRs from initial surgeon visits, preanesthesia visits, ultrasound, surgery, and anatomopathology reports. We selected 42 variables of interest and had them manually annotated by a clinical expert. We developed RUBY-THYRO using six distinct CNN models from SpaCy, supplemented with keyword extraction rules and postprocessing. Evaluation against a gold standard database included calculating precision, recall, and F1 score.Results: Performance remained consistent across the test and validation sets, with the majority of variables (30/42) achieving performance metrics exceeding 90% for all metrics in both sets. Results differed according to the variables; pathologic tumor stage score achieved 100% in precision, recall, and F1 score, versus 45%, 28%, and 32% for the number of nodules in the test set, respectively. Surgical and preanesthesia reports demonstrated particularly high performance.Conclusion: Our study successfully implemented a CNN-based natural language processing (NLP) approach for extracting and structuring data from various EHRs in thyroid pathology. This highlights the potential of artificial intelligence-driven NLP techniques for extensive and cost-effective data extraction, paving the way for creating comprehensive, hospital-wide data warehouses.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300263"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine Learning to Predict the Individual Risk of Treatment-Relevant Toxicity for Patients With Breast Cancer Undergoing Neoadjuvant Systemic Treatment. 机器学习预测接受新辅助全身治疗的乳腺癌患者治疗相关毒性的个体风险。

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-12-23 DOI: 10.1200/CCI.24.00010

Lie Cai, Thomas M Deutsch, Chris Sidey-Gibbons, Michelle Kobel, Fabian Riedel, Katharina Smetanay, Carlo Fremd, Laura Michel, Michael Golatta, Joerg Heil, Andreas Schneeweiss, André Pfob

Purpose: Toxicity to systemic cancer treatment represents a major anxiety for patients and a challenge to treatment plans. We aimed to develop machine learning algorithms for the upfront prediction of an individual's risk of experiencing treatment-relevant toxicity during the course of treatment.

Methods: Clinical records were retrieved from a single-center, consecutive cohort of patients who underwent neoadjuvant treatment for early breast cancer. We developed and validated machine learning algorithms to predict grade 3 or 4 toxicity (anemia, neutropenia, deviation of liver enzymes, nephrotoxicity, thrombopenia, electrolyte disturbance, or neuropathy). We used 10-fold cross-validation to develop two algorithms (logistic regression with elastic net penalty [GLM] and support vector machines [SVMs]). Algorithm predictions were compared with documented toxicity events and diagnostic performance was evaluated via area under the curve (AUROC).

Results: A total of 590 patients were identified, 432 in the development set and 158 in the validation set. The median age was 51 years, and 55.8% (329 of 590) experienced grade 3 or 4 toxicity. The performance improved significantly when adding referenced treatment information (referenced regimen, referenced summation dose intensity product) in addition to patient and tumor variables: GLM AUROC 0.59 versus 0.75, P = .02; SVM AUROC 0.64 versus 0.75, P = .01.

Conclusion: The individual risk of treatment-relevant toxicity can be predicted using machine learning algorithms. We demonstrate a promising way to improve efficacy and facilitate proactive toxicity management of systemic cancer treatment.

目的：系统性癌症治疗的毒性是患者的主要焦虑，也是对治疗计划的挑战。我们的目标是开发机器学习算法，以提前预测个体在治疗过程中出现治疗相关毒性的风险。方法：从接受新辅助治疗的早期乳腺癌患者的单中心、连续队列中检索临床记录。我们开发并验证了机器学习算法来预测3级或4级毒性（贫血、中性粒细胞减少、肝酶偏离、肾毒性、血小板减少、电解质紊乱或神经病变）。我们使用10倍交叉验证来开发两种算法（弹性网络惩罚逻辑回归[GLM]和支持向量机[svm]）。将算法预测与记录的毒性事件进行比较，并通过曲线下面积（AUROC）评估诊断性能。结果：共确定了590例患者，其中432例在开发组，158例在验证组。中位年龄为51岁，55.8%（590人中329人）出现3级或4级毒性。除患者和肿瘤变量外，添加参考治疗信息（参考方案、参考总剂量强度积）可显著提高疗效：GLM AUROC为0.59比0.75,P = 0.02；支持向量机AUROC为0.64 vs . 0.75, P = 0.01。结论：使用机器学习算法可以预测治疗相关毒性的个体风险。我们展示了一种有希望的方法来提高系统性癌症治疗的疗效和促进主动毒性管理。

{"title":"Machine Learning to Predict the Individual Risk of Treatment-Relevant Toxicity for Patients With Breast Cancer Undergoing Neoadjuvant Systemic Treatment.","authors":"Lie Cai, Thomas M Deutsch, Chris Sidey-Gibbons, Michelle Kobel, Fabian Riedel, Katharina Smetanay, Carlo Fremd, Laura Michel, Michael Golatta, Joerg Heil, Andreas Schneeweiss, André Pfob","doi":"10.1200/CCI.24.00010","DOIUrl":"10.1200/CCI.24.00010","url":null,"abstract":"Purpose: Toxicity to systemic cancer treatment represents a major anxiety for patients and a challenge to treatment plans. We aimed to develop machine learning algorithms for the upfront prediction of an individual's risk of experiencing treatment-relevant toxicity during the course of treatment.Methods: Clinical records were retrieved from a single-center, consecutive cohort of patients who underwent neoadjuvant treatment for early breast cancer. We developed and validated machine learning algorithms to predict grade 3 or 4 toxicity (anemia, neutropenia, deviation of liver enzymes, nephrotoxicity, thrombopenia, electrolyte disturbance, or neuropathy). We used 10-fold cross-validation to develop two algorithms (logistic regression with elastic net penalty [GLM] and support vector machines [SVMs]). Algorithm predictions were compared with documented toxicity events and diagnostic performance was evaluated via area under the curve (AUROC).Results: A total of 590 patients were identified, 432 in the development set and 158 in the validation set. The median age was 51 years, and 55.8% (329 of 590) experienced grade 3 or 4 toxicity. The performance improved significantly when adding referenced treatment information (referenced regimen, referenced summation dose intensity product) in addition to patient and tumor variables: GLM AUROC 0.59 versus 0.75, P = .02; SVM AUROC 0.64 versus 0.75, P = .01.Conclusion: The individual risk of treatment-relevant toxicity can be predicted using machine learning algorithms. We demonstrate a promising way to improve efficacy and facilitate proactive toxicity management of systemic cancer treatment.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400010"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670908/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Actionability of Synthetic Data in a Heterogeneous and Rare Health Care Demographic: Adolescents and Young Adults With Cancer. 综合数据在异质性和罕见的卫生保健人口统计中的可操作性：患有癌症的青少年和年轻成人。

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-12-03 DOI: 10.1200/CCI.24.00056

Joshi Hogenboom, Aiara Lobo Gomes, Andre Dekker, Winette Van Der Graaf, Olga Husson, Leonard Wee

Purpose: Research on rare diseases and atypical health care demographics is often slowed by high interparticipant heterogeneity and overall scarcity of data. Synthetic data (SD) have been proposed as means for data sharing, enlargement, and diversification, by artificially generating real phenomena while obscuring the real patient data. The utility of SD is actively scrutinized in health care research, but the role of sample size for actionability of SD is insufficiently explored. We aim to understand the interplay of actionability and sample size by generating SD sets of varying sizes from gradually diminishing amounts of real individuals' data. We evaluate the actionability of SD in a highly heterogeneous and rare demographic: adolescents and young adults (AYAs) with cancer.

Methods: A population-based cross-sectional cohort study of 3,735 AYAs was subsampled at random to produce 13 training data sets of varying sample sizes. We studied four distinct generator architectures built on the open-source Synthetic Data Vault library. Each architecture was used to generate SD of varying sizes on the basis of each aforementioned training subsets. SD actionability was assessed by comparing the resulting SD with their respective real data against three metrics-veracity, utility, and privacy concealment.

Results: All examined generator architectures yielded actionable data when generating SD with sizes similar to the real data. Large SD sample size increased veracity but generally increased privacy risks. Using fewer training participants led to faster convergence in veracity, but partially exacerbated privacy concealment issues.

Conclusion: SD is a potentially promising option for data sharing and data augmentation, yet sample size plays a significant role in its actionability. SD generation should go hand-in-hand with consistent scrutiny, and sample size should be carefully considered in this process.

目的：对罕见病和非典型卫生保健人口统计的研究往往因参与者之间的高度异质性和数据的总体稀缺性而减慢。合成数据（SD）被提出作为数据共享、扩大和多样化的手段，通过人为地产生真实的现象，同时模糊真实的患者数据。在卫生保健研究中，SD的效用受到了积极的审视，但样本大小对SD可操作性的作用尚未得到充分的探讨。我们的目标是通过从逐渐减少的真实个人数据中生成不同大小的SD集来理解可操作性和样本量之间的相互作用。我们评估了SD在一个高度异质性和罕见的人口统计学中的可操作性：患有癌症的青少年和年轻人（AYAs）。方法：以人群为基础的横断面队列研究，随机抽样3,735名AYAs，产生13个不同样本量的训练数据集。我们研究了基于开源Synthetic Data Vault库构建的四种不同的生成器体系结构。每一种体系结构都被用来在上述每个训练子集的基础上生成不同大小的SD。通过将结果SD与各自的真实数据与三个指标（准确性、实用性和隐私隐蔽性）进行比较，评估SD的可操作性。结果：当生成大小与真实数据相似的SD时，所有检查的生成器架构都产生了可操作的数据。较大的SD样本量增加了准确性，但通常增加了隐私风险。使用较少的培训参与者可以加快准确性的收敛速度，但在一定程度上加剧了隐私隐藏问题。结论：SD是一种潜在的有前途的数据共享和数据增强选择，但样本量在其可操作性中起着重要作用。SD生成应与持续的审查齐头并进，在此过程中应仔细考虑样本大小。

{"title":"Actionability of Synthetic Data in a Heterogeneous and Rare Health Care Demographic: Adolescents and Young Adults With Cancer.","authors":"Joshi Hogenboom, Aiara Lobo Gomes, Andre Dekker, Winette Van Der Graaf, Olga Husson, Leonard Wee","doi":"10.1200/CCI.24.00056","DOIUrl":"10.1200/CCI.24.00056","url":null,"abstract":"Purpose: Research on rare diseases and atypical health care demographics is often slowed by high interparticipant heterogeneity and overall scarcity of data. Synthetic data (SD) have been proposed as means for data sharing, enlargement, and diversification, by artificially generating real phenomena while obscuring the real patient data. The utility of SD is actively scrutinized in health care research, but the role of sample size for actionability of SD is insufficiently explored. We aim to understand the interplay of actionability and sample size by generating SD sets of varying sizes from gradually diminishing amounts of real individuals' data. We evaluate the actionability of SD in a highly heterogeneous and rare demographic: adolescents and young adults (AYAs) with cancer.Methods: A population-based cross-sectional cohort study of 3,735 AYAs was subsampled at random to produce 13 training data sets of varying sample sizes. We studied four distinct generator architectures built on the open-source Synthetic Data Vault library. Each architecture was used to generate SD of varying sizes on the basis of each aforementioned training subsets. SD actionability was assessed by comparing the resulting SD with their respective real data against three metrics-veracity, utility, and privacy concealment.Results: All examined generator architectures yielded actionable data when generating SD with sizes similar to the real data. Large SD sample size increased veracity but generally increased privacy risks. Using fewer training participants led to faster convergence in veracity, but partially exacerbated privacy concealment issues.Conclusion: SD is a potentially promising option for data sharing and data augmentation, yet sample size plays a significant role in its actionability. SD generation should go hand-in-hand with consistent scrutiny, and sample size should be carefully considered in this process.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400056"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11627331/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Toward the Clinically Effective Evaluation of Artificial Intelligence-Generated Responses. 人工智能应答的临床有效评价

IF 3.3 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2024-12-01 Epub Date: 2024-12-11 DOI: 10.1200/CCI-24-00258

Silambarasan Anbumani, Ergun Ahunbay

引用次数: 0