首页 > 最新文献

JCO Clinical Cancer Informatics最新文献

英文 中文
Validation of Non-Small Cell Lung Cancer Clinical Insights Using a Generalized Oncology Natural Language Processing Model. 使用通用肿瘤学自然语言处理模型验证非小细胞肺癌临床见解。
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-09-01 DOI: 10.1200/CCI.23.00099
Rachel C Kenney, Xiaoren Chen, Kazuki Shintani, Clara Gagnon, John Liu, Stacey DaCosta Byfield, Lorre Ochs, Anne-Marie Currie

Purpose: Limited studies have used natural language processing (NLP) in the context of non-small cell lung cancer (NSCLC). This study aimed to validate the application of an NLP model to an NSCLC cohort by extracting NSCLC concepts from free-text medical notes and converting them to structured, interpretable data.

Methods: Patients with a lung neoplasm, NSCLC histology, and treatment information in their notes were selected from a repository of over 27 million patients. From these, 200 were randomly selected for this study with the longest and the most recent note included for each patient. An NLP model developed and validated on a large solid and blood cancer oncology cohort was applied to this NSCLC cohort. Two certified tumor registrars and a curator abstracted concepts from the notes: neoplasm, histology, stage, TNM values, and metastasis sites. This manually abstracted gold standard was compared with the NLP model output. Precision and recall scores were calculated.

Results: The NLP model extracted the NSCLC concepts with excellent precision and recall with the following scores, respectively: Lung neoplasm 100% and 100%, NSCLC histology 99% and 88%, histology correctly linked to neoplasm 98% and 79%, stage value 98.8% and 92%, stage TNM value 93% and 98%, and metastasis site 97% and 89%. High precision is related to a low number of false positives, and therefore, extracted concepts are likely accurate. High recall indicates that the model captured most of the desired concepts.

Conclusion: This study validates that Optum's oncology NLP model has high precision and recall with clinical real-world data and is a reliable model to support research studies and clinical trials. This validation study shows that our nonspecific solid tumor and blood cancer oncology model is generalizable to successfully extract clinical information from specific cancer cohorts.

目的:将自然语言处理(NLP)用于非小细胞肺癌(NSCLC)的研究非常有限。本研究旨在通过从自由文本医疗笔记中提取 NSCLC 概念并将其转换为结构化、可解释的数据,验证 NLP 模型在 NSCLC 队列中的应用:从超过 2700 万名患者的资料库中选取了笔记中包含肺部肿瘤、NSCLC 组织学和治疗信息的患者。从这些患者中随机抽取 200 名患者进行研究,每名患者都包含最长和最近的病历。我们将在大型实体肿瘤和血液肿瘤队列中开发和验证的 NLP 模型应用于 NSCLC 队列。两名经过认证的肿瘤登记员和一名馆长从笔记中抽取了概念:肿瘤、组织学、分期、TNM 值和转移部位。人工抽取的金标准与 NLP 模型输出进行了比较。结果:结果:NLP 模型提取 NSCLC 概念的精确度和召回率非常高,分别达到了以下分数:肺肿瘤 100%和 100%,NSCLC 组织学 99%和 88%,组织学与肿瘤正确关联 98%和 79%,分期值 98.8%和 92%,分期 TNM 值 93%和 98%,转移部位 97%和 89%。高精确度与低误报率有关,因此提取的概念很可能是准确的。高召回率表明模型捕捉到了大部分所需的概念:本研究验证了 Optum 的肿瘤学 NLP 模型在临床实际数据中具有较高的精确度和召回率,是支持研究和临床试验的可靠模型。这项验证研究表明,我们的非特异性实体肿瘤和血液肿瘤肿瘤学模型具有通用性,可以成功地从特定的癌症队列中提取临床信息。
{"title":"Validation of Non-Small Cell Lung Cancer Clinical Insights Using a Generalized Oncology Natural Language Processing Model.","authors":"Rachel C Kenney, Xiaoren Chen, Kazuki Shintani, Clara Gagnon, John Liu, Stacey DaCosta Byfield, Lorre Ochs, Anne-Marie Currie","doi":"10.1200/CCI.23.00099","DOIUrl":"https://doi.org/10.1200/CCI.23.00099","url":null,"abstract":"<p><strong>Purpose: </strong>Limited studies have used natural language processing (NLP) in the context of non-small cell lung cancer (NSCLC). This study aimed to validate the application of an NLP model to an NSCLC cohort by extracting NSCLC concepts from free-text medical notes and converting them to structured, interpretable data.</p><p><strong>Methods: </strong>Patients with a lung neoplasm, NSCLC histology, and treatment information in their notes were selected from a repository of over 27 million patients. From these, 200 were randomly selected for this study with the longest and the most recent note included for each patient. An NLP model developed and validated on a large solid and blood cancer oncology cohort was applied to this NSCLC cohort. Two certified tumor registrars and a curator abstracted concepts from the notes: neoplasm, histology, stage, TNM values, and metastasis sites. This manually abstracted gold standard was compared with the NLP model output. Precision and recall scores were calculated.</p><p><strong>Results: </strong>The NLP model extracted the NSCLC concepts with excellent precision and recall with the following scores, respectively: Lung neoplasm 100% and 100%, NSCLC histology 99% and 88%, histology correctly linked to neoplasm 98% and 79%, stage value 98.8% and 92%, stage TNM value 93% and 98%, and metastasis site 97% and 89%. High precision is related to a low number of false positives, and therefore, extracted concepts are likely accurate. High recall indicates that the model captured most of the desired concepts.</p><p><strong>Conclusion: </strong>This study validates that Optum's oncology NLP model has high precision and recall with clinical real-world data and is a reliable model to support research studies and clinical trials. This validation study shows that our nonspecific solid tumor and blood cancer oncology model is generalizable to successfully extract clinical information from specific cancer cohorts.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increasing Power in Phase III Oncology Trials With Multivariable Regression: An Empirical Assessment of 535 Primary End Point Analyses. 利用多变量回归提高 III 期肿瘤学试验的有效性:对 535 项主要终点分析的经验评估。
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-09-01 DOI: 10.1200/CCI.24.00102
Alexander D Sherry, Adina H Passy, Zachary R McCaw, Joseph Abi Jaoude, Timothy A Lin, Ramez Kouzy, Avital M Miller, Gabrielle S Kupferman, Esther J Beck, Pavlos Msaouel, Ethan B Ludmir

Purpose: A previous study demonstrated that power against the (unobserved) true effect for the primary end point (PEP) of most phase III oncology trials is low, suggesting an increased risk of false-negative findings in the field of late-phase oncology. Fitting models with prognostic covariates is a potential solution to improve power; however, the extent to which trials leverage this approach, and its impact on trial interpretation at scale, is unknown. To that end, we hypothesized that phase III trials using multivariable PEP analyses are more likely to demonstrate superiority versus trials with univariable analyses.

Methods: PEP analyses were reviewed from trials registered on ClinicalTrials.gov. Adjusted odds ratios (aORs) were calculated by logistic regressions.

Results: Of the 535 trials enrolling 454,824 patients, 69% (n = 368) used a multivariable PEP analysis. Trials with multivariable PEP analyses were more likely to demonstrate PEP superiority (57% [209 of 368] v 42% [70 of 167]; aOR, 1.78 [95% CI, 1.18 to 2.72]; P = .007). Among trials with a multivariable PEP model, 16 conditioned on covariates and 352 stratified by covariates. However, 108 (35%) of 312 trials with stratified analyses lost power by categorizing a continuous variable, which was especially common among immunotherapy trials (aOR, 2.45 [95% CI, 1.23 to 4.92]; P = .01).

Conclusion: Trials increasing power by fitting multivariable models were more likely to demonstrate PEP superiority than trials with unadjusted analysis. Underutilization of conditioning models and empirical power loss associated with covariate categorization required by stratification were identified as barriers to power gains. These findings underscore the opportunity to increase power in phase III trials with conventional methodology and improve patient access to effective novel therapies.

目的:之前的一项研究表明,大多数 III 期肿瘤学试验的主要终点(PEP)的(未观察到的)真实效应的功率很低,这表明在晚期肿瘤学领域出现假阴性结果的风险增加了。利用预后协变量拟合模型是一种提高功率的潜在解决方案;然而,试验利用这种方法的程度及其对大规模试验解释的影响尚不清楚。为此,我们假设,与采用单变量分析的试验相比,采用多变量PEP分析的III期试验更有可能显示出优越性:我们审查了在 ClinicalTrials.gov 上注册的试验的 PEP 分析。通过逻辑回归计算调整后的几率比(aORs):在有 454824 名患者参与的 535 项试验中,69%(n = 368)的试验采用了多变量 PEP 分析。采用多变量 PEP 分析的试验更有可能显示出 PEP 的优越性(57% [368 项试验中的 209 项] 对 42% [167 项试验中的 70 项];aOR,1.78 [95% CI,1.18 至 2.72];P = .007)。在采用多变量 PEP 模型的试验中,16 项试验以协变量为条件,352 项试验以协变量为分层条件。然而,在312项进行分层分析的试验中,有108项(35%)的试验因对连续变量进行分类而失去了作用力,这在免疫疗法试验中尤为常见(aOR,2.45 [95% CI,1.23~4.92];P = .01):结论:与采用未调整分析的试验相比,通过拟合多变量模型来提高功率的试验更有可能证明PEP的优越性。未充分利用调节模型和分层所需的协变量分类造成的经验功率损失被认为是提高功率的障碍。这些发现强调了在采用传统方法的III期试验中提高功率的机会,并改善了患者获得有效新型疗法的机会。
{"title":"Increasing Power in Phase III Oncology Trials With Multivariable Regression: An Empirical Assessment of 535 Primary End Point Analyses.","authors":"Alexander D Sherry, Adina H Passy, Zachary R McCaw, Joseph Abi Jaoude, Timothy A Lin, Ramez Kouzy, Avital M Miller, Gabrielle S Kupferman, Esther J Beck, Pavlos Msaouel, Ethan B Ludmir","doi":"10.1200/CCI.24.00102","DOIUrl":"10.1200/CCI.24.00102","url":null,"abstract":"<p><strong>Purpose: </strong>A previous study demonstrated that power against the (unobserved) true effect for the primary end point (PEP) of most phase III oncology trials is low, suggesting an increased risk of false-negative findings in the field of late-phase oncology. Fitting models with prognostic covariates is a potential solution to improve power; however, the extent to which trials leverage this approach, and its impact on trial interpretation at scale, is unknown. To that end, we hypothesized that phase III trials using multivariable PEP analyses are more likely to demonstrate superiority versus trials with univariable analyses.</p><p><strong>Methods: </strong>PEP analyses were reviewed from trials registered on ClinicalTrials.gov. Adjusted odds ratios (aORs) were calculated by logistic regressions.</p><p><strong>Results: </strong>Of the 535 trials enrolling 454,824 patients, 69% (n = 368) used a multivariable PEP analysis. Trials with multivariable PEP analyses were more likely to demonstrate PEP superiority (57% [209 of 368] <i>v</i> 42% [70 of 167]; aOR, 1.78 [95% CI, 1.18 to 2.72]; <i>P</i> = .007). Among trials with a multivariable PEP model, 16 conditioned on covariates and 352 stratified by covariates. However, 108 (35%) of 312 trials with stratified analyses lost power by categorizing a continuous variable, which was especially common among immunotherapy trials (aOR, 2.45 [95% CI, 1.23 to 4.92]; <i>P</i> = .01).</p><p><strong>Conclusion: </strong>Trials increasing power by fitting multivariable models were more likely to demonstrate PEP superiority than trials with unadjusted analysis. Underutilization of conditioning models and empirical power loss associated with covariate categorization required by stratification were identified as barriers to power gains. These findings underscore the opportunity to increase power in phase III trials with conventional methodology and improve patient access to effective novel therapies.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371366/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142114636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports. 从病理报告中提取乳腺癌临床和病理特征的自然语言处理算法的开发与验证
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-08-01 DOI: 10.1200/CCI.24.00034
Elisabetta Munzone, Antonio Marra, Federico Comotto, Lorenzo Guercio, Claudia Anna Sangalli, Martina Lo Cascio, Eleonora Pagan, Davide Sangalli, Ilaria Bigoni, Francesca Maria Porta, Marianna D'Ercole, Fabiana Ritorti, Vincenzo Bagnardi, Nicola Fusco, Giuseppe Curigliano

Purpose: Electronic health records (EHRs) are valuable information repositories that offer insights for enhancing clinical research on breast cancer (BC) using real-world data. The objective of this study was to develop a natural language processing (NLP) model specifically designed to extract structured data from BC pathology reports written in natural language.

Methods: During the initial phase, the algorithm's development cohort comprised 193 pathology reports from 116 patients with BC from 2012 to 2016. A rule-based NLP algorithm was applied to extract 26 variables for analysis and was compared with the manual extraction of data performed by both a data entry specialist and an oncologist. Following the first approach, the data set was expanded to include 513 reports, and a Named Entity Recognition (NER)-NLP model was trained and evaluated using K-fold cross-validation.

Results: The first approach led to a concordance analysis, which revealed an 82.9% agreement between the algorithm and the oncologist, whereas the concordance between the data entry specialist and the oncologist was 90.8%. The second training approach introduced the definition of an NER-NLP model, in which the accuracy showed remarkable potential (97.8%). Notably, the model demonstrated remarkable performance, especially for parameters such as estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and Ki-67 (F1-score 1.0).

Conclusion: The present study aligns with the rapidly evolving field of artificial intelligence (AI) applications in oncology, seeking to expedite the development of complex cancer databases and registries. The results of the model are currently undergoing postprocessing procedures to organize the data into tabular structures, facilitating their utilization in real-world clinical and research endeavors.

目的:电子健康记录(EHR)是宝贵的信息库,可为利用真实世界数据加强乳腺癌(BC)临床研究提供见解。本研究的目的是开发一种自然语言处理(NLP)模型,专门用于从以自然语言编写的乳腺癌病理报告中提取结构化数据:在初始阶段,该算法的开发队列包括2012年至2016年期间116名BC患者的193份病理报告。应用基于规则的 NLP 算法提取了 26 个变量进行分析,并与数据录入专家和肿瘤学家的手动数据提取进行了比较。在第一种方法之后,数据集扩大到包括513份报告,并使用K倍交叉验证对命名实体识别(NER)-NLP模型进行了训练和评估:第一种方法进行了一致性分析,结果显示算法与肿瘤学家的一致性为 82.9%,而数据录入专家与肿瘤学家的一致性为 90.8%。第二种训练方法引入了 NER-NLP 模型的定义,该模型的准确率显示出显著的潜力(97.8%)。值得注意的是,该模型表现出了卓越的性能,尤其是在雌激素受体、孕酮受体、人表皮生长因子受体 2 和 Ki-67 等参数方面(F1 分数为 1.0):本研究与人工智能(AI)在肿瘤学应用领域的快速发展相一致,旨在加快复杂癌症数据库和登记册的开发。目前正在对模型结果进行后处理,将数据整理成表格结构,以便在实际临床和研究工作中加以利用。
{"title":"Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports.","authors":"Elisabetta Munzone, Antonio Marra, Federico Comotto, Lorenzo Guercio, Claudia Anna Sangalli, Martina Lo Cascio, Eleonora Pagan, Davide Sangalli, Ilaria Bigoni, Francesca Maria Porta, Marianna D'Ercole, Fabiana Ritorti, Vincenzo Bagnardi, Nicola Fusco, Giuseppe Curigliano","doi":"10.1200/CCI.24.00034","DOIUrl":"https://doi.org/10.1200/CCI.24.00034","url":null,"abstract":"<p><strong>Purpose: </strong>Electronic health records (EHRs) are valuable information repositories that offer insights for enhancing clinical research on breast cancer (BC) using real-world data. The objective of this study was to develop a natural language processing (NLP) model specifically designed to extract structured data from BC pathology reports written in natural language.</p><p><strong>Methods: </strong>During the initial phase, the algorithm's development cohort comprised 193 pathology reports from 116 patients with BC from 2012 to 2016. A rule-based NLP algorithm was applied to extract 26 variables for analysis and was compared with the manual extraction of data performed by both a data entry specialist and an oncologist. Following the first approach, the data set was expanded to include 513 reports, and a Named Entity Recognition (NER)-NLP model was trained and evaluated using K-fold cross-validation.</p><p><strong>Results: </strong>The first approach led to a concordance analysis, which revealed an 82.9% agreement between the algorithm and the oncologist, whereas the concordance between the data entry specialist and the oncologist was 90.8%. The second training approach introduced the definition of an NER-NLP model, in which the accuracy showed remarkable potential (97.8%). Notably, the model demonstrated remarkable performance, especially for parameters such as estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and Ki-67 (F1-score 1.0).</p><p><strong>Conclusion: </strong>The present study aligns with the rapidly evolving field of artificial intelligence (AI) applications in oncology, seeking to expedite the development of complex cancer databases and registries. The results of the model are currently undergoing postprocessing procedures to organize the data into tabular structures, facilitating their utilization in real-world clinical and research endeavors.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives. 自然语言处理技术准确区分电子健康记录叙述中的癌症症状信息。
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-08-01 DOI: 10.1200/CCI.23.00235
Alaa Albashayreh, Anindita Bandyopadhyay, Nahid Zeinali, Min Zhang, Weiguo Fan, Stephanie Gilbertson White

Purpose: Identifying cancer symptoms in electronic health record (EHR) narratives is feasible with natural language processing (NLP). However, more efficient NLP systems are needed to detect various symptoms and distinguish observed symptoms from negated symptoms and medication-related side effects. We evaluated the accuracy of NLP in (1) detecting 14 symptom groups (ie, pain, fatigue, swelling, depressed mood, anxiety, nausea/vomiting, pruritus, headache, shortness of breath, constipation, numbness/tingling, decreased appetite, impaired memory, disturbed sleep) and (2) distinguishing observed symptoms in EHR narratives among patients with cancer.

Methods: We extracted 902,508 notes for 11,784 unique patients diagnosed with cancer and developed a gold standard corpus of 1,112 notes labeled for presence or absence of 14 symptom groups. We trained an embeddings-augmented NLP system integrating human and machine intelligence and conventional machine learning algorithms. NLP metrics were calculated on a gold standard corpus subset for testing.

Results: The interannotator agreement for labeling the gold standard corpus was excellent at 92%. The embeddings-augmented NLP model achieved the best performance (F1 score = 0.877). The highest NLP accuracy was observed in pruritus (F1 score = 0.937) while the lowest accuracy was in swelling (F1 score = 0.787). After classifying the entire data set with embeddings-augmented NLP, we found that 41% of the notes included symptom documentation. Pain was the most documented symptom (29% of all notes) while impaired memory was the least documented (0.7% of all notes).

Conclusion: We illustrated the feasibility of detecting 14 symptom groups in EHR narratives and showed that an embeddings-augmented NLP system outperforms conventional machine learning algorithms in detecting symptom information and differentiating observed symptoms from negated symptoms and medication-related side effects.

目的:利用自然语言处理(NLP)技术识别电子健康记录(EHR)叙述中的癌症症状是可行的。然而,需要更高效的 NLP 系统来检测各种症状,并将观察到的症状与否定症状和药物相关副作用区分开来。我们评估了 NLP 在以下方面的准确性:(1) 检测 14 组症状(即疼痛、疲劳、肿胀、情绪低落、焦虑、恶心/呕吐、瘙痒、头痛、气短、便秘、麻木/刺痛、食欲下降、记忆力减退、睡眠紊乱);(2) 区分癌症患者电子病历叙述中的观察到的症状:我们提取了 11,784 名癌症患者的 902,508 份笔记,并开发了一个由 1,112 份笔记组成的金标准语料库,标注了 14 个症状组的存在与否。我们训练了一个嵌入式增强 NLP 系统,该系统集成了人类智能、机器智能和传统机器学习算法。在黄金标准语料子集上计算了 NLP 指标,以进行测试:结果:对黄金标准语料进行标注时,标注者之间的一致性非常好,达到 92%。嵌入式增强 NLP 模型取得了最佳性能(F1 分数 = 0.877)。瘙痒症的 NLP 准确率最高(F1 分数 = 0.937),而肿胀症的准确率最低(F1 分数 = 0.787)。使用嵌入式增强 NLP 对整个数据集进行分类后,我们发现 41% 的笔记包含症状记录。疼痛是记录最多的症状(占所有笔记的 29%),而记忆受损是记录最少的症状(占所有笔记的 0.7%):我们展示了在电子病历叙述中检测 14 个症状组的可行性,并表明在检测症状信息以及区分观察到的症状与否定症状和药物相关副作用方面,嵌入式增强 NLP 系统优于传统的机器学习算法。
{"title":"Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives.","authors":"Alaa Albashayreh, Anindita Bandyopadhyay, Nahid Zeinali, Min Zhang, Weiguo Fan, Stephanie Gilbertson White","doi":"10.1200/CCI.23.00235","DOIUrl":"https://doi.org/10.1200/CCI.23.00235","url":null,"abstract":"<p><strong>Purpose: </strong>Identifying cancer symptoms in electronic health record (EHR) narratives is feasible with natural language processing (NLP). However, more efficient NLP systems are needed to detect various symptoms and distinguish observed symptoms from negated symptoms and medication-related side effects. We evaluated the accuracy of NLP in (1) detecting 14 symptom groups (ie, pain, fatigue, swelling, depressed mood, anxiety, nausea/vomiting, pruritus, headache, shortness of breath, constipation, numbness/tingling, decreased appetite, impaired memory, disturbed sleep) and (2) distinguishing observed symptoms in EHR narratives among patients with cancer.</p><p><strong>Methods: </strong>We extracted 902,508 notes for 11,784 unique patients diagnosed with cancer and developed a gold standard corpus of 1,112 notes labeled for presence or absence of 14 symptom groups. We trained an embeddings-augmented NLP system integrating human and machine intelligence and conventional machine learning algorithms. NLP metrics were calculated on a gold standard corpus subset for testing.</p><p><strong>Results: </strong>The interannotator agreement for labeling the gold standard corpus was excellent at 92%. The embeddings-augmented NLP model achieved the best performance (F1 score = 0.877). The highest NLP accuracy was observed in pruritus (F1 score = 0.937) while the lowest accuracy was in swelling (F1 score = 0.787). After classifying the entire data set with embeddings-augmented NLP, we found that 41% of the notes included symptom documentation. Pain was the most documented symptom (29% of all notes) while impaired memory was the least documented (0.7% of all notes).</p><p><strong>Conclusion: </strong>We illustrated the feasibility of detecting 14 symptom groups in EHR narratives and showed that an embeddings-augmented NLP system outperforms conventional machine learning algorithms in detecting symptom information and differentiating observed symptoms from negated symptoms and medication-related side effects.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141908282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interinstitutional Approach to Advancing Geospatial Technologies for US Cancer Centers. 为美国癌症中心推进地理空间技术的机构间方法。
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-08-01 DOI: 10.1200/CCI.24.00099
Todd Burus, Josh Martinez, Peter DelNero, Sam Pepper, Isuru Ratnayake, Debora L Oh, Christopher McNair, Hope Krebill, Dinesh Pal Mudaranthakam
{"title":"Interinstitutional Approach to Advancing Geospatial Technologies for US Cancer Centers.","authors":"Todd Burus, Josh Martinez, Peter DelNero, Sam Pepper, Isuru Ratnayake, Debora L Oh, Christopher McNair, Hope Krebill, Dinesh Pal Mudaranthakam","doi":"10.1200/CCI.24.00099","DOIUrl":"10.1200/CCI.24.00099","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11296499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Insights Into the Patient Experience of Hormone Therapy for Early Breast Cancer Treatment Using Patient Forum Discussions and Natural Language Processing. 利用患者论坛讨论和自然语言处理深入了解早期乳腺癌治疗中激素疗法的患者体验。
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-08-01 DOI: 10.1200/CCI.24.00038
Sameet Sreenivasan, Chao Fang, Emuella M Flood, Natasha Markuzon, Jasmine Y Y Sze

Purpose: Understanding the real-world experience of patients with early breast cancer (eBC) is imperative for optimizing outcomes and evolving patient care. However, there is a lack of patient-level data, hindering clinical development. This social listening study was performed to understand patient insights into symptoms and impacts of hormone therapy (HT) for eBC using posts from patient forums on breastcancer.org to inform future clinical research.

Methods: Natural language processing (NLP) and machine learning techniques were used to identify themes related to eBC from a sample of 500,000 posts. After relevant data selection, 362,074 eBC posts were retained for further analysis of symptoms and impacts related to HT, as well as insights into symptom severity, pain locations, and symptom management using exercise and yoga.

Results: Overall, 32 symptoms and nine impacts had significant associations with ≥one HT. Hot flush (relative risk [RR], 6.70 [95% CI, 3.36 to 13.36]), arthralgia (RR, 6.67 [95% CI, 3.53 to 12.59]), weight increased (RR, 4.83 [95% CI, 3.20 to 7.28]), mood swings (RR, 7.36 [95% CI, 5.75 to 9.42]), insomnia (RR, 4.76 [95% CI, 3.14 to 7.22]), and depression (RR, 3.05 [95% CI, 1.71 to 5.44]) demonstrated the strongest associations. Severe headache, dizziness, back pain, and muscle spasms showed significant associations with ≥one HT despite their low overall prevalence in eBC posts.

Conclusion: The social listening approach allowed the identification of real-world insights from posts specific to eBC HT from a large-scale online breast cancer forum that captured experiences from a uniquely diverse group of patients. Using NLP has a potential to scale analysis of patient feedback and reveal actionable insights into patient experiences of treatment that can inform the development of future therapies and improve the care of patients with eBC.

目的:了解早期乳腺癌(eBC)患者的真实经历对于优化治疗效果和发展患者护理至关重要。然而,由于缺乏患者层面的数据,阻碍了临床开发。本社交聆听研究利用乳腺癌网站(breastcancer.org)患者论坛上的帖子了解患者对激素疗法(HT)治疗早期乳腺癌的症状和影响的见解,为未来的临床研究提供信息:方法:使用自然语言处理 (NLP) 和机器学习技术从 500,000 个帖子样本中识别与 eBC 相关的主题。在对相关数据进行筛选后,保留了 362,074 篇 eBC 帖子,用于进一步分析与 HT 相关的症状和影响,以及对症状严重程度、疼痛部位和使用运动和瑜伽进行症状管理的见解:总的来说,32 种症状和 9 种影响与≥一种高血压有显著关联。潮热(相对风险 [RR],6.70 [95% CI,3.36 至 13.36])、关节痛(RR,6.67 [95% CI,3.53 至 12.59])、体重增加(RR,4.83 [95% CI,3.20 至 7.28])、情绪波动(RR,7.36 [95% CI, 5.75 to 9.42])、失眠(RR, 4.76 [95% CI, 3.14 to 7.22])和抑郁(RR, 3.05 [95% CI, 1.71 to 5.44])显示出最强的关联性。尽管严重头痛、头晕、背痛和肌肉痉挛在 eBC 帖子中的总体发生率较低,但这些症状与≥一种 HT 有显著关联:通过社会聆听方法,可以从一个大型在线乳腺癌论坛的eBC HT帖子中发现真实世界的见解,该论坛收集了来自独特的不同患者群体的经验。使用 NLP 有可能扩大对患者反馈的分析范围,并揭示出患者治疗经历中的可行见解,从而为未来疗法的开发提供依据,并改善对 eBC 患者的护理。
{"title":"Insights Into the Patient Experience of Hormone Therapy for Early Breast Cancer Treatment Using Patient Forum Discussions and Natural Language Processing.","authors":"Sameet Sreenivasan, Chao Fang, Emuella M Flood, Natasha Markuzon, Jasmine Y Y Sze","doi":"10.1200/CCI.24.00038","DOIUrl":"10.1200/CCI.24.00038","url":null,"abstract":"<p><strong>Purpose: </strong>Understanding the real-world experience of patients with early breast cancer (eBC) is imperative for optimizing outcomes and evolving patient care. However, there is a lack of patient-level data, hindering clinical development. This social listening study was performed to understand patient insights into symptoms and impacts of hormone therapy (HT) for eBC using posts from patient forums on breastcancer.org to inform future clinical research.</p><p><strong>Methods: </strong>Natural language processing (NLP) and machine learning techniques were used to identify themes related to eBC from a sample of 500,000 posts. After relevant data selection, 362,074 eBC posts were retained for further analysis of symptoms and impacts related to HT, as well as insights into symptom severity, pain locations, and symptom management using exercise and yoga.</p><p><strong>Results: </strong>Overall, 32 symptoms and nine impacts had significant associations with ≥one HT. Hot flush (relative risk [RR], 6.70 [95% CI, 3.36 to 13.36]), arthralgia (RR, 6.67 [95% CI, 3.53 to 12.59]), weight increased (RR, 4.83 [95% CI, 3.20 to 7.28]), mood swings (RR, 7.36 [95% CI, 5.75 to 9.42]), insomnia (RR, 4.76 [95% CI, 3.14 to 7.22]), and depression (RR, 3.05 [95% CI, 1.71 to 5.44]) demonstrated the strongest associations. Severe headache, dizziness, back pain, and muscle spasms showed significant associations with ≥one HT despite their low overall prevalence in eBC posts.</p><p><strong>Conclusion: </strong>The social listening approach allowed the identification of real-world insights from posts specific to eBC HT from a large-scale online breast cancer forum that captured experiences from a uniquely diverse group of patients. Using NLP has a potential to scale analysis of patient feedback and reveal actionable insights into patient experiences of treatment that can inform the development of future therapies and improve the care of patients with eBC.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141894881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning-Based Prediction of 1-Year Survival Using Subjective and Objective Parameters in Patients With Cancer. 基于机器学习的癌症患者 1 年生存期主客观参数预测法
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-08-01 DOI: 10.1200/CCI.24.00041
Maria Rosa Salvador Comino, Paul Youssef, Anna Heinzelmann, Florian Bernhardt, Christin Seifert, Mitra Tewes

Purpose: Palliative care is recommended for patients with cancer with a life expectancy of <12 months. Machine learning (ML) techniques can help in predicting survival outcomes among patients with cancer and may help distinguish who benefits the most from palliative care support. We aim to explore the importance of several objective and subjective self-reported variables. Subjective variables were collected through electronic psycho-oncologic and palliative care self-assessment screenings. We used these variables to predict 1-year mortality.

Materials and methods: Between April 1, 2020, and March 31, 2021, a total of 265 patients with advanced cancer completed a patient-reported outcome tool. We documented objective and subjective variables collected from electronic health records, self-reported subjective variables, and all clinical variables combined. We used logistic regression (LR), 20-fold cross-validation, decision trees, and random forests to predict 1-year mortality. We analyzed the receiver operating characteristic (ROC) curve-AUC, the precision-recall curve-AUC (PR-AUC)-and the feature importance of the ML models.

Results: The performance of clinical nonpatient variables in predictions (LR reaches 0.81 [ROC-AUC] and 0.72 [F1 score]) are much more predictive than that of subjective patient-reported variables (LR reaches 0.55 [ROC-AUC] and 0.52 [F1 score]).

Conclusion: The results show that objective variables used in this study are much more predictive than subjective patient-reported variables, which measure subjective burden. These findings indicate that subjective burden cannot be reliably used to predict survival. Further research is needed to clarify the role of self-reported patient burden and mortality prediction using ML.

目的:建议对预期寿命不长的癌症患者进行姑息治疗 材料与方法:在 2020 年 4 月 1 日至 2021 年 3 月 31 日期间,共有 265 名晚期癌症患者填写了患者报告结果工具。我们记录了从电子健康记录中收集的客观和主观变量、自我报告的主观变量以及所有临床变量。我们使用逻辑回归(LR)、20 倍交叉验证、决策树和随机森林预测 1 年死亡率。我们分析了接收者操作特征曲线(ROC)-AUC、精确度-召回曲线-AUC(PR-AUC)以及ML模型的特征重要性:结果:临床非患者变量的预测性能(LR 达到 0.81 [ROC-AUC] 和 0.72 [F1 分数])远高于患者主观报告变量的预测性能(LR 达到 0.55 [ROC-AUC] 和 0.52 [F1 分数]):结果表明,本研究中使用的客观变量比患者报告的主观变量(衡量主观负担的变量)更具预测性。这些结果表明,主观负担不能可靠地用于预测生存率。还需要进一步研究,以明确患者自我报告的负担和使用 ML 预测死亡率的作用。
{"title":"Machine Learning-Based Prediction of 1-Year Survival Using Subjective and Objective Parameters in Patients With Cancer.","authors":"Maria Rosa Salvador Comino, Paul Youssef, Anna Heinzelmann, Florian Bernhardt, Christin Seifert, Mitra Tewes","doi":"10.1200/CCI.24.00041","DOIUrl":"https://doi.org/10.1200/CCI.24.00041","url":null,"abstract":"<p><strong>Purpose: </strong>Palliative care is recommended for patients with cancer with a life expectancy of <12 months. Machine learning (ML) techniques can help in predicting survival outcomes among patients with cancer and may help distinguish who benefits the most from palliative care support. We aim to explore the importance of several objective and subjective self-reported variables. Subjective variables were collected through electronic psycho-oncologic and palliative care self-assessment screenings. We used these variables to predict 1-year mortality.</p><p><strong>Materials and methods: </strong>Between April 1, 2020, and March 31, 2021, a total of 265 patients with advanced cancer completed a patient-reported outcome tool. We documented objective and subjective variables collected from electronic health records, self-reported subjective variables, and all clinical variables combined. We used logistic regression (LR), 20-fold cross-validation, decision trees, and random forests to predict 1-year mortality. We analyzed the receiver operating characteristic (ROC) curve-AUC, the precision-recall curve-AUC (PR-AUC)-and the feature importance of the ML models.</p><p><strong>Results: </strong>The performance of clinical nonpatient variables in predictions (LR reaches 0.81 [ROC-AUC] and 0.72 [F1 score]) are much more predictive than that of subjective patient-reported variables (LR reaches 0.55 [ROC-AUC] and 0.52 [F1 score]).</p><p><strong>Conclusion: </strong>The results show that objective variables used in this study are much more predictive than subjective patient-reported variables, which measure subjective burden. These findings indicate that subjective burden cannot be reliably used to predict survival. Further research is needed to clarify the role of self-reported patient burden and mortality prediction using ML.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142086386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit. 自动提取乳腺癌治疗后以患者为中心的结果:基于大型语言模型的开源工具包。
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-08-01 DOI: 10.1200/CCI.23.00258
Man Luo, Shubham Trivedi, Allison W Kurian, Kevin Ward, Theresa H M Keegan, Daniel Rubin, Imon Banerjee

Purpose: Patient-centered outcomes (PCOs) are pivotal in cancer treatment, as they directly reflect patients' quality of life. Although multiple studies suggest that factors affecting breast cancer-related morbidity and survival are influenced by treatment side effects and adherence to long-term treatment, such data are generally only available on a smaller scale or from a single center. The primary challenge with collecting these data is that the outcomes are captured as free text in clinical narratives written by clinicians.

Materials and methods: Given the complexity of PCO documentation in these narratives, computerized methods are necessary to unlock the wealth of information buried in unstructured text notes that often document PCOs. Inspired by the success of large language models (LLMs), we examined the adaptability of three LLMs, GPT-2, BioGPT, and PMC-LLaMA, on PCO tasks across three institutions, Mayo Clinic, Emory University Hospital, and Stanford University. We developed an open-source framework for fine-tuning LLM that can directly extract the five different categories of PCO from the clinic notes.

Results: We found that these LLMs without fine-tuning (zero-shot) struggle with challenging PCO extraction tasks, displaying almost random performance, even with some task-specific examples (few-shot learning). The performance of our fine-tuned, task-specific models is notably superior compared with their non-fine-tuned LLM models. Moreover, the fine-tuned GPT-2 model has demonstrated a significantly better performance than the other two larger LLMs.

Conclusion: Our discovery indicates that although LLMs serve as effective general-purpose models for tasks across various domains, they require fine-tuning when applied to the clinician domain. Our proposed approach has the potential to lead more efficient, adaptable models for PCO information extraction, reducing reliance on extensive computational resources while still delivering superior performance for specific tasks.

目的:以患者为中心的治疗结果(PCOs)直接反映了患者的生活质量,因此在癌症治疗中至关重要。尽管多项研究表明,影响乳腺癌相关发病率和生存率的因素受到治疗副作用和坚持长期治疗的影响,但这些数据通常只能在较小范围内或从单一中心获得。收集这些数据的主要挑战在于,这些结果是以自由文本的形式记录在临床医生撰写的临床叙述中的:鉴于这些叙述中 PCO 记录的复杂性,有必要采用计算机化的方法来挖掘通常记录 PCO 的非结构化文本笔记中埋藏的大量信息。受大型语言模型(LLM)成功的启发,我们研究了三种 LLM(GPT-2、BioGPT 和 PMC-LaMA)在梅奥诊所、埃默里大学医院和斯坦福大学这三个机构的 PCO 任务中的适应性。我们开发了一个用于微调 LLM 的开源框架,可直接从门诊笔记中提取 PCO 的五个不同类别:我们发现,这些未进行微调(零点学习)的 LLM 在完成具有挑战性的 PCO 提取任务时非常吃力,即使使用一些特定任务的示例(少量学习),其表现也几乎是随机的。与未进行微调的 LLM 模型相比,我们针对特定任务进行微调的模型性能明显更优。此外,经过微调的 GPT-2 模型的性能明显优于其他两个较大的 LLM:我们的发现表明,虽然 LLM 是适用于不同领域任务的有效通用模型,但在应用于临床医生领域时需要对其进行微调。我们提出的方法有可能为 PCO 信息提取提供更高效、适应性更强的模型,减少对大量计算资源的依赖,同时还能为特定任务提供卓越的性能。
{"title":"Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.","authors":"Man Luo, Shubham Trivedi, Allison W Kurian, Kevin Ward, Theresa H M Keegan, Daniel Rubin, Imon Banerjee","doi":"10.1200/CCI.23.00258","DOIUrl":"https://doi.org/10.1200/CCI.23.00258","url":null,"abstract":"<p><strong>Purpose: </strong>Patient-centered outcomes (PCOs) are pivotal in cancer treatment, as they directly reflect patients' quality of life. Although multiple studies suggest that factors affecting breast cancer-related morbidity and survival are influenced by treatment side effects and adherence to long-term treatment, such data are generally only available on a smaller scale or from a single center. The primary challenge with collecting these data is that the outcomes are captured as free text in clinical narratives written by clinicians.</p><p><strong>Materials and methods: </strong>Given the complexity of PCO documentation in these narratives, computerized methods are necessary to unlock the wealth of information buried in unstructured text notes that often document PCOs. Inspired by the success of large language models (LLMs), we examined the adaptability of three LLMs, GPT-2, BioGPT, and PMC-LLaMA, on PCO tasks across three institutions, Mayo Clinic, Emory University Hospital, and Stanford University. We developed an open-source framework for fine-tuning LLM that can directly extract the five different categories of PCO from the clinic notes.</p><p><strong>Results: </strong>We found that these LLMs without fine-tuning (zero-shot) struggle with challenging PCO extraction tasks, displaying almost random performance, even with some task-specific examples (few-shot learning). The performance of our fine-tuned, task-specific models is notably superior compared with their non-fine-tuned LLM models. Moreover, the fine-tuned GPT-2 model has demonstrated a significantly better performance than the other two larger LLMs.</p><p><strong>Conclusion: </strong>Our discovery indicates that although LLMs serve as effective general-purpose models for tasks across various domains, they require fine-tuning when applied to the clinician domain. Our proposed approach has the potential to lead more efficient, adaptable models for PCO information extraction, reducing reliance on extensive computational resources while still delivering superior performance for specific tasks.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cureit: An End-to-End Pipeline for Implementing Mixture Cure Models With an Application to Liposarcoma Data. Cureit:应用于脂肪肉瘤数据的端到端混合治愈模型实施流程
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-08-01 DOI: 10.1200/CCI.23.00234
Karissa Whiting, Teng Fei, Samuel Singer, Li-Xuan Qin

Purpose: Cure models are a useful alternative to Cox proportional hazards models in oncology studies when there is a subpopulation of patients who will not experience the event of interest. Although software is available to fit cure models, there are limited tools to evaluate, report, and visualize model results. This article introduces the cureit R package, an end-to-end pipeline for building mixture cure models, and demonstrates its use in a data set of patients with primary extremity and truncal liposarcoma.

Methods: To assess associations between liposarcoma histologic subtypes and disease-specific death (DSD) in patients treated at Memorial Sloan Kettering Cancer Center between July 1982 and September 2017, mixture cure models were fit and evaluated using the cureit package. Liposarcoma histologic subtypes were defined as well-differentiated, dedifferentiated, myxoid, round cell, and pleomorphic.

Results: All other analyzed liposarcoma histologic subtypes were significantly associated with higher DSD in cure models compared with well-differentiated. In multivariable models, myxoid (odds ratio [OR], 6.25 [95% CI, 1.32 to 29.6]) and round cell (OR, 16.2 [95% CI, 2.80 to 93.2]) liposarcoma had higher incidences of DSD compared with well-differentiated patients. By contrast, dedifferentiated liposarcoma was associated with the latency of DSD (hazard ratio, 10.6 [95% CI, 1.48 to 75.9]). Pleomorphic liposarcomas had significantly higher risk in both incidence and the latency of DSD (P < .0001). Brier scores indicated comparable predictive accuracy between cure and Cox models.

Conclusion: We developed the cureit pipeline to fit and evaluate mixture cure models and demonstrated its clinical utility in the liposarcoma disease setting, shedding insights on the subtype-specific associations with incidence and/or latency.

目的:在肿瘤学研究中,当有一部分患者不会发生相关事件时,治愈模型是 Cox 比例危险度模型的一种有效替代方法。虽然有软件可用于拟合治愈模型,但评估、报告和可视化模型结果的工具却很有限。本文介绍了 cureit R 软件包--一种用于构建混合治愈模型的端到端管道,并展示了其在原发性四肢和躯干脂肪肉瘤患者数据集中的应用:为了评估1982年7月至2017年9月期间在纪念斯隆-凯特琳癌症中心接受治疗的脂肪肉瘤组织学亚型与疾病特异性死亡(DSD)之间的关联,使用cureit软件包拟合并评估了混合治愈模型。脂肪肉瘤组织学亚型被定义为分化良好型、去分化型、肌样型、圆形细胞型和多形性:结果:在治愈模型中,与分化良好的脂肪肉瘤相比,所有其他分析的脂肪肉瘤组织学亚型都与较高的DSD显著相关。在多变量模型中,与分化良好的患者相比,类肌瘤(几率比[OR],6.25[95% CI,1.32至29.6])和圆形细胞(OR,16.2[95% CI,2.80至93.2])脂肪肉瘤的DSD发生率较高。相比之下,低分化脂肪肉瘤与DSD的潜伏期有关(危险比为10.6 [95% CI, 1.48 to 75.9])。多形性脂肪肉瘤在发病率和DSD潜伏期方面的风险都明显更高(P < .0001)。Brier评分表明,治愈模型和Cox模型的预测准确性相当:我们开发了 cureit 管道来拟合和评估混合治愈模型,并证明了其在脂肪肉瘤疾病环境中的临床实用性,揭示了亚型与发病率和/或潜伏期的特异性关联。
{"title":"<i>Cureit</i>: An End-to-End Pipeline for Implementing Mixture Cure Models With an Application to Liposarcoma Data.","authors":"Karissa Whiting, Teng Fei, Samuel Singer, Li-Xuan Qin","doi":"10.1200/CCI.23.00234","DOIUrl":"https://doi.org/10.1200/CCI.23.00234","url":null,"abstract":"<p><strong>Purpose: </strong>Cure models are a useful alternative to Cox proportional hazards models in oncology studies when there is a subpopulation of patients who will not experience the event of interest. Although software is available to fit cure models, there are limited tools to evaluate, report, and visualize model results. This article introduces the <i>cureit</i> R package, an end-to-end pipeline for building mixture cure models, and demonstrates its use in a data set of patients with primary extremity and truncal liposarcoma.</p><p><strong>Methods: </strong>To assess associations between liposarcoma histologic subtypes and disease-specific death (DSD) in patients treated at Memorial Sloan Kettering Cancer Center between July 1982 and September 2017, mixture cure models were fit and evaluated using the <i>cureit</i> package. Liposarcoma histologic subtypes were defined as well-differentiated, dedifferentiated, myxoid, round cell, and pleomorphic.</p><p><strong>Results: </strong>All other analyzed liposarcoma histologic subtypes were significantly associated with higher DSD in cure models compared with well-differentiated. In multivariable models, myxoid (odds ratio [OR], 6.25 [95% CI, 1.32 to 29.6]) and round cell (OR, 16.2 [95% CI, 2.80 to 93.2]) liposarcoma had higher incidences of DSD compared with well-differentiated patients. By contrast, dedifferentiated liposarcoma was associated with the latency of DSD (hazard ratio, 10.6 [95% CI, 1.48 to 75.9]). Pleomorphic liposarcomas had significantly higher risk in both incidence and the latency of DSD (<i>P</i> < .0001). Brier scores indicated comparable predictive accuracy between cure and Cox models.</p><p><strong>Conclusion: </strong>We developed the <i>cureit</i> pipeline to fit and evaluate mixture cure models and demonstrated its clinical utility in the liposarcoma disease setting, shedding insights on the subtype-specific associations with incidence and/or latency.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141879769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Real-World Tumor Response Derived From Electronic Health Record Data Sources: A Feasibility Analysis in Patients With Metastatic Non-Small Cell Lung Cancer Treated With Chemotherapy. 评估从电子健康记录数据源得出的真实世界肿瘤反应:对接受化疗的转移性非小细胞肺癌患者进行可行性分析。
IF 3.3 Q2 ONCOLOGY Pub Date : 2024-08-01 DOI: 10.1200/CCI.24.00091
Brittany A McKelvey, Elizabeth Garrett-Mayer, Donna R Rivera, Amy Alabaster, Hillary S Andrews, Elizabeth G Bond, Thomas D Brown, Amanda Bruno, Lauren Damato, Janet L Espirito, Laura L Fernandes, Eric Hansen, Paul Kluetz, Xinran Ma, Andrea McCracken, Pallavi S Mishra-Kalyani, Yanina Natanzon, Danielle Potter, Nicholas J Robert, Lawrence Schwartz, Regina Schwind, Connor Sweetnam, Joseph Wagner, Mark D Stewart, Jeff D Allen

Purpose: Real-world data (RWD) holds promise for ascribing a real-world (rw) outcome to a drug intervention; however, ascertaining rw-response to treatment from RWD can be challenging. Friends of Cancer Research formed a collaboration to assess available data attributes related to rw-response across RWD sources to inform methods for capturing, defining, and evaluating rw-response.

Materials and methods: This retrospective noninterventional (observational) study included seven electronic health record data companies (data providers) providing summary-level deidentified data from 200 patients diagnosed with metastatic non-small cell lung cancer (mNSCLC) and treated with first-line platinum doublet chemotherapy following a common protocol. Data providers reviewed the availability and frequency of data components to assess rw-response (ie, images, radiology imaging reports, and clinician response assessments). A common protocol was used to assess and report rw-response end points, including rw-response rate (rwRR), rw-duration of response (rwDOR), and the association of rw-response with rw-overall survival (rwOS), rw-time to treatment discontinuation (rwTTD), and rw-time to next treatment (rwTTNT).

Results: The availability and timing of clinician assessments was relatively consistent across data sets in contrast to images and image reports. Real-world response was analyzed using clinician response assessments (median proportion of patients evaluable, 77.5%), which had the highest consistency in the timing of assessments. Relative consistency was observed across data sets for rwRR (median 46.5%), as well as the median and directionality of rwOS, rwTTD, and rwTTNT. There was variability in rwDOR across data sets.

Conclusion: This collaborative effort demonstrated the feasibility of aligning disparate data sources to evaluate rw-response end points using clinician-documented responses in patients with mNSCLC. Heterogeneity exists in the availability of data components to assess response and related rw-end points, and further work is needed to inform drug effectiveness evaluation within RWD sources.

目的:真实世界数据(RWD)为将真实世界(Rw)结果归因于药物干预带来了希望;然而,从 RWD 中确定治疗的 Rw 反应可能具有挑战性。癌症研究之友 "组织了一次合作,以评估与真实世界数据源中的治疗反应相关的可用数据属性,从而为捕获、定义和评估治疗反应的方法提供信息:这项回顾性非干预性(观察性)研究包括七家电子健康记录数据公司(数据提供者),它们提供了 200 名确诊为转移性非小细胞肺癌(mNSCLC)并按照共同方案接受一线铂双t化疗的患者的摘要级去标识化数据。数据提供者审查了用于评估 rw 反应(即图像、放射成像报告和临床医生反应评估)的数据组件的可用性和频率。采用通用方案评估和报告rw反应终点,包括rw反应率(rwRR)、rw反应持续时间(rwDOR),以及rw反应与rw总生存期(rwOS)、rw终止治疗时间(rwTTD)和rw下次治疗时间(rwTTNT)的关系:结果:与图像和图像报告相比,各数据集的临床医生评估的可用性和时间相对一致。使用临床医生的反应评估(可评估患者的中位比例为 77.5%)分析真实世界的反应,评估时间的一致性最高。各数据集的 rwRR(中位数为 46.5%)以及 rwOS、rwTTD 和 rwTTNT 的中位数和方向性也相对一致。不同数据集的 rwDOR 存在差异:这项合作证明了利用临床医生记录的 mNSCLC 患者的反应,整合不同数据源以评估 rw 反应终点的可行性。在评估反应和相关rw终点的数据组件的可用性方面存在异质性,需要进一步开展工作,以便在RWD来源中为药物疗效评估提供信息。
{"title":"Evaluation of Real-World Tumor Response Derived From Electronic Health Record Data Sources: A Feasibility Analysis in Patients With Metastatic Non-Small Cell Lung Cancer Treated With Chemotherapy.","authors":"Brittany A McKelvey, Elizabeth Garrett-Mayer, Donna R Rivera, Amy Alabaster, Hillary S Andrews, Elizabeth G Bond, Thomas D Brown, Amanda Bruno, Lauren Damato, Janet L Espirito, Laura L Fernandes, Eric Hansen, Paul Kluetz, Xinran Ma, Andrea McCracken, Pallavi S Mishra-Kalyani, Yanina Natanzon, Danielle Potter, Nicholas J Robert, Lawrence Schwartz, Regina Schwind, Connor Sweetnam, Joseph Wagner, Mark D Stewart, Jeff D Allen","doi":"10.1200/CCI.24.00091","DOIUrl":"10.1200/CCI.24.00091","url":null,"abstract":"<p><strong>Purpose: </strong>Real-world data (RWD) holds promise for ascribing a real-world (rw) outcome to a drug intervention; however, ascertaining rw-response to treatment from RWD can be challenging. Friends of Cancer Research formed a collaboration to assess available data attributes related to rw-response across RWD sources to inform methods for capturing, defining, and evaluating rw-response.</p><p><strong>Materials and methods: </strong>This retrospective noninterventional (observational) study included seven electronic health record data companies (data providers) providing summary-level deidentified data from 200 patients diagnosed with metastatic non-small cell lung cancer (mNSCLC) and treated with first-line platinum doublet chemotherapy following a common protocol. Data providers reviewed the availability and frequency of data components to assess rw-response (ie, images, radiology imaging reports, and clinician response assessments). A common protocol was used to assess and report rw-response end points, including rw-response rate (rwRR), rw-duration of response (rwDOR), and the association of rw-response with rw-overall survival (rwOS), rw-time to treatment discontinuation (rwTTD), and rw-time to next treatment (rwTTNT).</p><p><strong>Results: </strong>The availability and timing of clinician assessments was relatively consistent across data sets in contrast to images and image reports. Real-world response was analyzed using clinician response assessments (median proportion of patients evaluable, 77.5%), which had the highest consistency in the timing of assessments. Relative consistency was observed across data sets for rwRR (median 46.5%), as well as the median and directionality of rwOS, rwTTD, and rwTTNT. There was variability in rwDOR across data sets.</p><p><strong>Conclusion: </strong>This collaborative effort demonstrated the feasibility of aligning disparate data sources to evaluate rw-response end points using clinician-documented responses in patients with mNSCLC. Heterogeneity exists in the availability of data components to assess response and related rw-end points, and further work is needed to inform drug effectiveness evaluation within RWD sources.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JCO Clinical Cancer Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1