Sameet Sreenivasan, Chao Fang, Emuella M Flood, Natasha Markuzon, Jasmine Y Y Sze
Purpose: Understanding the real-world experience of patients with early breast cancer (eBC) is imperative for optimizing outcomes and evolving patient care. However, there is a lack of patient-level data, hindering clinical development. This social listening study was performed to understand patient insights into symptoms and impacts of hormone therapy (HT) for eBC using posts from patient forums on breastcancer.org to inform future clinical research.
Methods: Natural language processing (NLP) and machine learning techniques were used to identify themes related to eBC from a sample of 500,000 posts. After relevant data selection, 362,074 eBC posts were retained for further analysis of symptoms and impacts related to HT, as well as insights into symptom severity, pain locations, and symptom management using exercise and yoga.
Results: Overall, 32 symptoms and nine impacts had significant associations with ≥one HT. Hot flush (relative risk [RR], 6.70 [95% CI, 3.36 to 13.36]), arthralgia (RR, 6.67 [95% CI, 3.53 to 12.59]), weight increased (RR, 4.83 [95% CI, 3.20 to 7.28]), mood swings (RR, 7.36 [95% CI, 5.75 to 9.42]), insomnia (RR, 4.76 [95% CI, 3.14 to 7.22]), and depression (RR, 3.05 [95% CI, 1.71 to 5.44]) demonstrated the strongest associations. Severe headache, dizziness, back pain, and muscle spasms showed significant associations with ≥one HT despite their low overall prevalence in eBC posts.
Conclusion: The social listening approach allowed the identification of real-world insights from posts specific to eBC HT from a large-scale online breast cancer forum that captured experiences from a uniquely diverse group of patients. Using NLP has a potential to scale analysis of patient feedback and reveal actionable insights into patient experiences of treatment that can inform the development of future therapies and improve the care of patients with eBC.
{"title":"Insights Into the Patient Experience of Hormone Therapy for Early Breast Cancer Treatment Using Patient Forum Discussions and Natural Language Processing.","authors":"Sameet Sreenivasan, Chao Fang, Emuella M Flood, Natasha Markuzon, Jasmine Y Y Sze","doi":"10.1200/CCI.24.00038","DOIUrl":"10.1200/CCI.24.00038","url":null,"abstract":"<p><strong>Purpose: </strong>Understanding the real-world experience of patients with early breast cancer (eBC) is imperative for optimizing outcomes and evolving patient care. However, there is a lack of patient-level data, hindering clinical development. This social listening study was performed to understand patient insights into symptoms and impacts of hormone therapy (HT) for eBC using posts from patient forums on breastcancer.org to inform future clinical research.</p><p><strong>Methods: </strong>Natural language processing (NLP) and machine learning techniques were used to identify themes related to eBC from a sample of 500,000 posts. After relevant data selection, 362,074 eBC posts were retained for further analysis of symptoms and impacts related to HT, as well as insights into symptom severity, pain locations, and symptom management using exercise and yoga.</p><p><strong>Results: </strong>Overall, 32 symptoms and nine impacts had significant associations with ≥one HT. Hot flush (relative risk [RR], 6.70 [95% CI, 3.36 to 13.36]), arthralgia (RR, 6.67 [95% CI, 3.53 to 12.59]), weight increased (RR, 4.83 [95% CI, 3.20 to 7.28]), mood swings (RR, 7.36 [95% CI, 5.75 to 9.42]), insomnia (RR, 4.76 [95% CI, 3.14 to 7.22]), and depression (RR, 3.05 [95% CI, 1.71 to 5.44]) demonstrated the strongest associations. Severe headache, dizziness, back pain, and muscle spasms showed significant associations with ≥one HT despite their low overall prevalence in eBC posts.</p><p><strong>Conclusion: </strong>The social listening approach allowed the identification of real-world insights from posts specific to eBC HT from a large-scale online breast cancer forum that captured experiences from a uniquely diverse group of patients. Using NLP has a potential to scale analysis of patient feedback and reveal actionable insights into patient experiences of treatment that can inform the development of future therapies and improve the care of patients with eBC.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400038"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141894881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Rosa Salvador Comino, Paul Youssef, Anna Heinzelmann, Florian Bernhardt, Christin Seifert, Mitra Tewes
Purpose: Palliative care is recommended for patients with cancer with a life expectancy of <12 months. Machine learning (ML) techniques can help in predicting survival outcomes among patients with cancer and may help distinguish who benefits the most from palliative care support. We aim to explore the importance of several objective and subjective self-reported variables. Subjective variables were collected through electronic psycho-oncologic and palliative care self-assessment screenings. We used these variables to predict 1-year mortality.
Materials and methods: Between April 1, 2020, and March 31, 2021, a total of 265 patients with advanced cancer completed a patient-reported outcome tool. We documented objective and subjective variables collected from electronic health records, self-reported subjective variables, and all clinical variables combined. We used logistic regression (LR), 20-fold cross-validation, decision trees, and random forests to predict 1-year mortality. We analyzed the receiver operating characteristic (ROC) curve-AUC, the precision-recall curve-AUC (PR-AUC)-and the feature importance of the ML models.
Results: The performance of clinical nonpatient variables in predictions (LR reaches 0.81 [ROC-AUC] and 0.72 [F1 score]) are much more predictive than that of subjective patient-reported variables (LR reaches 0.55 [ROC-AUC] and 0.52 [F1 score]).
Conclusion: The results show that objective variables used in this study are much more predictive than subjective patient-reported variables, which measure subjective burden. These findings indicate that subjective burden cannot be reliably used to predict survival. Further research is needed to clarify the role of self-reported patient burden and mortality prediction using ML.
{"title":"Machine Learning-Based Prediction of 1-Year Survival Using Subjective and Objective Parameters in Patients With Cancer.","authors":"Maria Rosa Salvador Comino, Paul Youssef, Anna Heinzelmann, Florian Bernhardt, Christin Seifert, Mitra Tewes","doi":"10.1200/CCI.24.00041","DOIUrl":"https://doi.org/10.1200/CCI.24.00041","url":null,"abstract":"<p><strong>Purpose: </strong>Palliative care is recommended for patients with cancer with a life expectancy of <12 months. Machine learning (ML) techniques can help in predicting survival outcomes among patients with cancer and may help distinguish who benefits the most from palliative care support. We aim to explore the importance of several objective and subjective self-reported variables. Subjective variables were collected through electronic psycho-oncologic and palliative care self-assessment screenings. We used these variables to predict 1-year mortality.</p><p><strong>Materials and methods: </strong>Between April 1, 2020, and March 31, 2021, a total of 265 patients with advanced cancer completed a patient-reported outcome tool. We documented objective and subjective variables collected from electronic health records, self-reported subjective variables, and all clinical variables combined. We used logistic regression (LR), 20-fold cross-validation, decision trees, and random forests to predict 1-year mortality. We analyzed the receiver operating characteristic (ROC) curve-AUC, the precision-recall curve-AUC (PR-AUC)-and the feature importance of the ML models.</p><p><strong>Results: </strong>The performance of clinical nonpatient variables in predictions (LR reaches 0.81 [ROC-AUC] and 0.72 [F1 score]) are much more predictive than that of subjective patient-reported variables (LR reaches 0.55 [ROC-AUC] and 0.52 [F1 score]).</p><p><strong>Conclusion: </strong>The results show that objective variables used in this study are much more predictive than subjective patient-reported variables, which measure subjective burden. These findings indicate that subjective burden cannot be reliably used to predict survival. Further research is needed to clarify the role of self-reported patient burden and mortality prediction using ML.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400041"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142086386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Man Luo, Shubham Trivedi, Allison W Kurian, Kevin Ward, Theresa H M Keegan, Daniel Rubin, Imon Banerjee
Purpose: Patient-centered outcomes (PCOs) are pivotal in cancer treatment, as they directly reflect patients' quality of life. Although multiple studies suggest that factors affecting breast cancer-related morbidity and survival are influenced by treatment side effects and adherence to long-term treatment, such data are generally only available on a smaller scale or from a single center. The primary challenge with collecting these data is that the outcomes are captured as free text in clinical narratives written by clinicians.
Materials and methods: Given the complexity of PCO documentation in these narratives, computerized methods are necessary to unlock the wealth of information buried in unstructured text notes that often document PCOs. Inspired by the success of large language models (LLMs), we examined the adaptability of three LLMs, GPT-2, BioGPT, and PMC-LLaMA, on PCO tasks across three institutions, Mayo Clinic, Emory University Hospital, and Stanford University. We developed an open-source framework for fine-tuning LLM that can directly extract the five different categories of PCO from the clinic notes.
Results: We found that these LLMs without fine-tuning (zero-shot) struggle with challenging PCO extraction tasks, displaying almost random performance, even with some task-specific examples (few-shot learning). The performance of our fine-tuned, task-specific models is notably superior compared with their non-fine-tuned LLM models. Moreover, the fine-tuned GPT-2 model has demonstrated a significantly better performance than the other two larger LLMs.
Conclusion: Our discovery indicates that although LLMs serve as effective general-purpose models for tasks across various domains, they require fine-tuning when applied to the clinician domain. Our proposed approach has the potential to lead more efficient, adaptable models for PCO information extraction, reducing reliance on extensive computational resources while still delivering superior performance for specific tasks.
{"title":"Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.","authors":"Man Luo, Shubham Trivedi, Allison W Kurian, Kevin Ward, Theresa H M Keegan, Daniel Rubin, Imon Banerjee","doi":"10.1200/CCI.23.00258","DOIUrl":"https://doi.org/10.1200/CCI.23.00258","url":null,"abstract":"<p><strong>Purpose: </strong>Patient-centered outcomes (PCOs) are pivotal in cancer treatment, as they directly reflect patients' quality of life. Although multiple studies suggest that factors affecting breast cancer-related morbidity and survival are influenced by treatment side effects and adherence to long-term treatment, such data are generally only available on a smaller scale or from a single center. The primary challenge with collecting these data is that the outcomes are captured as free text in clinical narratives written by clinicians.</p><p><strong>Materials and methods: </strong>Given the complexity of PCO documentation in these narratives, computerized methods are necessary to unlock the wealth of information buried in unstructured text notes that often document PCOs. Inspired by the success of large language models (LLMs), we examined the adaptability of three LLMs, GPT-2, BioGPT, and PMC-LLaMA, on PCO tasks across three institutions, Mayo Clinic, Emory University Hospital, and Stanford University. We developed an open-source framework for fine-tuning LLM that can directly extract the five different categories of PCO from the clinic notes.</p><p><strong>Results: </strong>We found that these LLMs without fine-tuning (zero-shot) struggle with challenging PCO extraction tasks, displaying almost random performance, even with some task-specific examples (few-shot learning). The performance of our fine-tuned, task-specific models is notably superior compared with their non-fine-tuned LLM models. Moreover, the fine-tuned GPT-2 model has demonstrated a significantly better performance than the other two larger LLMs.</p><p><strong>Conclusion: </strong>Our discovery indicates that although LLMs serve as effective general-purpose models for tasks across various domains, they require fine-tuning when applied to the clinician domain. Our proposed approach has the potential to lead more efficient, adaptable models for PCO information extraction, reducing reliance on extensive computational resources while still delivering superior performance for specific tasks.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300258"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karissa Whiting, Teng Fei, Samuel Singer, Li-Xuan Qin
Purpose: Cure models are a useful alternative to Cox proportional hazards models in oncology studies when there is a subpopulation of patients who will not experience the event of interest. Although software is available to fit cure models, there are limited tools to evaluate, report, and visualize model results. This article introduces the cureit R package, an end-to-end pipeline for building mixture cure models, and demonstrates its use in a data set of patients with primary extremity and truncal liposarcoma.
Methods: To assess associations between liposarcoma histologic subtypes and disease-specific death (DSD) in patients treated at Memorial Sloan Kettering Cancer Center between July 1982 and September 2017, mixture cure models were fit and evaluated using the cureit package. Liposarcoma histologic subtypes were defined as well-differentiated, dedifferentiated, myxoid, round cell, and pleomorphic.
Results: All other analyzed liposarcoma histologic subtypes were significantly associated with higher DSD in cure models compared with well-differentiated. In multivariable models, myxoid (odds ratio [OR], 6.25 [95% CI, 1.32 to 29.6]) and round cell (OR, 16.2 [95% CI, 2.80 to 93.2]) liposarcoma had higher incidences of DSD compared with well-differentiated patients. By contrast, dedifferentiated liposarcoma was associated with the latency of DSD (hazard ratio, 10.6 [95% CI, 1.48 to 75.9]). Pleomorphic liposarcomas had significantly higher risk in both incidence and the latency of DSD (P < .0001). Brier scores indicated comparable predictive accuracy between cure and Cox models.
Conclusion: We developed the cureit pipeline to fit and evaluate mixture cure models and demonstrated its clinical utility in the liposarcoma disease setting, shedding insights on the subtype-specific associations with incidence and/or latency.
目的:在肿瘤学研究中,当有一部分患者不会发生相关事件时,治愈模型是 Cox 比例危险度模型的一种有效替代方法。虽然有软件可用于拟合治愈模型,但评估、报告和可视化模型结果的工具却很有限。本文介绍了 cureit R 软件包--一种用于构建混合治愈模型的端到端管道,并展示了其在原发性四肢和躯干脂肪肉瘤患者数据集中的应用:为了评估1982年7月至2017年9月期间在纪念斯隆-凯特琳癌症中心接受治疗的脂肪肉瘤组织学亚型与疾病特异性死亡(DSD)之间的关联,使用cureit软件包拟合并评估了混合治愈模型。脂肪肉瘤组织学亚型被定义为分化良好型、去分化型、肌样型、圆形细胞型和多形性:结果:在治愈模型中,与分化良好的脂肪肉瘤相比,所有其他分析的脂肪肉瘤组织学亚型都与较高的DSD显著相关。在多变量模型中,与分化良好的患者相比,类肌瘤(几率比[OR],6.25[95% CI,1.32至29.6])和圆形细胞(OR,16.2[95% CI,2.80至93.2])脂肪肉瘤的DSD发生率较高。相比之下,低分化脂肪肉瘤与DSD的潜伏期有关(危险比为10.6 [95% CI, 1.48 to 75.9])。多形性脂肪肉瘤在发病率和DSD潜伏期方面的风险都明显更高(P < .0001)。Brier评分表明,治愈模型和Cox模型的预测准确性相当:我们开发了 cureit 管道来拟合和评估混合治愈模型,并证明了其在脂肪肉瘤疾病环境中的临床实用性,揭示了亚型与发病率和/或潜伏期的特异性关联。
{"title":"<i>Cureit</i>: An End-to-End Pipeline for Implementing Mixture Cure Models With an Application to Liposarcoma Data.","authors":"Karissa Whiting, Teng Fei, Samuel Singer, Li-Xuan Qin","doi":"10.1200/CCI.23.00234","DOIUrl":"https://doi.org/10.1200/CCI.23.00234","url":null,"abstract":"<p><strong>Purpose: </strong>Cure models are a useful alternative to Cox proportional hazards models in oncology studies when there is a subpopulation of patients who will not experience the event of interest. Although software is available to fit cure models, there are limited tools to evaluate, report, and visualize model results. This article introduces the <i>cureit</i> R package, an end-to-end pipeline for building mixture cure models, and demonstrates its use in a data set of patients with primary extremity and truncal liposarcoma.</p><p><strong>Methods: </strong>To assess associations between liposarcoma histologic subtypes and disease-specific death (DSD) in patients treated at Memorial Sloan Kettering Cancer Center between July 1982 and September 2017, mixture cure models were fit and evaluated using the <i>cureit</i> package. Liposarcoma histologic subtypes were defined as well-differentiated, dedifferentiated, myxoid, round cell, and pleomorphic.</p><p><strong>Results: </strong>All other analyzed liposarcoma histologic subtypes were significantly associated with higher DSD in cure models compared with well-differentiated. In multivariable models, myxoid (odds ratio [OR], 6.25 [95% CI, 1.32 to 29.6]) and round cell (OR, 16.2 [95% CI, 2.80 to 93.2]) liposarcoma had higher incidences of DSD compared with well-differentiated patients. By contrast, dedifferentiated liposarcoma was associated with the latency of DSD (hazard ratio, 10.6 [95% CI, 1.48 to 75.9]). Pleomorphic liposarcomas had significantly higher risk in both incidence and the latency of DSD (<i>P</i> < .0001). Brier scores indicated comparable predictive accuracy between cure and Cox models.</p><p><strong>Conclusion: </strong>We developed the <i>cureit</i> pipeline to fit and evaluate mixture cure models and demonstrated its clinical utility in the liposarcoma disease setting, shedding insights on the subtype-specific associations with incidence and/or latency.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300234"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141879769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brittany A McKelvey, Elizabeth Garrett-Mayer, Donna R Rivera, Amy Alabaster, Hillary S Andrews, Elizabeth G Bond, Thomas D Brown, Amanda Bruno, Lauren Damato, Janet L Espirito, Laura L Fernandes, Eric Hansen, Paul Kluetz, Xinran Ma, Andrea McCracken, Pallavi S Mishra-Kalyani, Yanina Natanzon, Danielle Potter, Nicholas J Robert, Lawrence Schwartz, Regina Schwind, Connor Sweetnam, Joseph Wagner, Mark D Stewart, Jeff D Allen
Purpose: Real-world data (RWD) holds promise for ascribing a real-world (rw) outcome to a drug intervention; however, ascertaining rw-response to treatment from RWD can be challenging. Friends of Cancer Research formed a collaboration to assess available data attributes related to rw-response across RWD sources to inform methods for capturing, defining, and evaluating rw-response.
Materials and methods: This retrospective noninterventional (observational) study included seven electronic health record data companies (data providers) providing summary-level deidentified data from 200 patients diagnosed with metastatic non-small cell lung cancer (mNSCLC) and treated with first-line platinum doublet chemotherapy following a common protocol. Data providers reviewed the availability and frequency of data components to assess rw-response (ie, images, radiology imaging reports, and clinician response assessments). A common protocol was used to assess and report rw-response end points, including rw-response rate (rwRR), rw-duration of response (rwDOR), and the association of rw-response with rw-overall survival (rwOS), rw-time to treatment discontinuation (rwTTD), and rw-time to next treatment (rwTTNT).
Results: The availability and timing of clinician assessments was relatively consistent across data sets in contrast to images and image reports. Real-world response was analyzed using clinician response assessments (median proportion of patients evaluable, 77.5%), which had the highest consistency in the timing of assessments. Relative consistency was observed across data sets for rwRR (median 46.5%), as well as the median and directionality of rwOS, rwTTD, and rwTTNT. There was variability in rwDOR across data sets.
Conclusion: This collaborative effort demonstrated the feasibility of aligning disparate data sources to evaluate rw-response end points using clinician-documented responses in patients with mNSCLC. Heterogeneity exists in the availability of data components to assess response and related rw-end points, and further work is needed to inform drug effectiveness evaluation within RWD sources.
{"title":"Evaluation of Real-World Tumor Response Derived From Electronic Health Record Data Sources: A Feasibility Analysis in Patients With Metastatic Non-Small Cell Lung Cancer Treated With Chemotherapy.","authors":"Brittany A McKelvey, Elizabeth Garrett-Mayer, Donna R Rivera, Amy Alabaster, Hillary S Andrews, Elizabeth G Bond, Thomas D Brown, Amanda Bruno, Lauren Damato, Janet L Espirito, Laura L Fernandes, Eric Hansen, Paul Kluetz, Xinran Ma, Andrea McCracken, Pallavi S Mishra-Kalyani, Yanina Natanzon, Danielle Potter, Nicholas J Robert, Lawrence Schwartz, Regina Schwind, Connor Sweetnam, Joseph Wagner, Mark D Stewart, Jeff D Allen","doi":"10.1200/CCI.24.00091","DOIUrl":"10.1200/CCI.24.00091","url":null,"abstract":"<p><strong>Purpose: </strong>Real-world data (RWD) holds promise for ascribing a real-world (rw) outcome to a drug intervention; however, ascertaining rw-response to treatment from RWD can be challenging. Friends of Cancer Research formed a collaboration to assess available data attributes related to rw-response across RWD sources to inform methods for capturing, defining, and evaluating rw-response.</p><p><strong>Materials and methods: </strong>This retrospective noninterventional (observational) study included seven electronic health record data companies (data providers) providing summary-level deidentified data from 200 patients diagnosed with metastatic non-small cell lung cancer (mNSCLC) and treated with first-line platinum doublet chemotherapy following a common protocol. Data providers reviewed the availability and frequency of data components to assess rw-response (ie, images, radiology imaging reports, and clinician response assessments). A common protocol was used to assess and report rw-response end points, including rw-response rate (rwRR), rw-duration of response (rwDOR), and the association of rw-response with rw-overall survival (rwOS), rw-time to treatment discontinuation (rwTTD), and rw-time to next treatment (rwTTNT).</p><p><strong>Results: </strong>The availability and timing of clinician assessments was relatively consistent across data sets in contrast to images and image reports. Real-world response was analyzed using clinician response assessments (median proportion of patients evaluable, 77.5%), which had the highest consistency in the timing of assessments. Relative consistency was observed across data sets for rwRR (median 46.5%), as well as the median and directionality of rwOS, rwTTD, and rwTTNT. There was variability in rwDOR across data sets.</p><p><strong>Conclusion: </strong>This collaborative effort demonstrated the feasibility of aligning disparate data sources to evaluate rw-response end points using clinician-documented responses in patients with mNSCLC. Heterogeneity exists in the availability of data components to assess response and related rw-end points, and further work is needed to inform drug effectiveness evaluation within RWD sources.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400091"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sunkyu Kim, Seung-Seob Kim, Eejung Kim, Michael Cecchini, Mi-Suk Park, Ji A Choi, Sung Hyun Kim, Ho Kyoung Hwang, Chang Moo Kang, Hye Jin Choi, Sang Joon Shin, Jaewoo Kang, Choong-Kun Lee
Purpose: To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP).
Methods: Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated.
Results: Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model's contextual interpretation beyond specific phrases.
Conclusion: Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.
{"title":"Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer.","authors":"Sunkyu Kim, Seung-Seob Kim, Eejung Kim, Michael Cecchini, Mi-Suk Park, Ji A Choi, Sung Hyun Kim, Ho Kyoung Hwang, Chang Moo Kang, Hye Jin Choi, Sang Joon Shin, Jaewoo Kang, Choong-Kun Lee","doi":"10.1200/CCI.24.00021","DOIUrl":"https://doi.org/10.1200/CCI.24.00021","url":null,"abstract":"<p><strong>Purpose: </strong>To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP).</p><p><strong>Methods: </strong>Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated.</p><p><strong>Results: </strong>Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model's contextual interpretation beyond specific phrases.</p><p><strong>Conclusion: </strong>Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400021"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The More, the Better? Modalities of Metastatic Status Extraction on Free Medical Reports Based on Natural Language Processing.","authors":"Emmanuelle Kempf, Sonia Priou, Ariel Cohen, Akram Redjdal, Etienne Guével, Xavier Tannier","doi":"10.1200/CCI.24.00026","DOIUrl":"https://doi.org/10.1200/CCI.24.00026","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400026"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xabier García-Albéniz, John Hsu, Ruth Etzioni, June M Chan, Joy Shi, Barbra Dickerman, Miguel A Hernán
Purpose: No consensus about the effectiveness of prostate-specific antigen (PSA) screening exists among clinical guidelines, especially for the elderly. Randomized trials of PSA screening have yielded different results, partly because of variations in adherence, and it is unlikely that new trials will be conducted. Our objective was to estimate the effect of annual PSA screening on prostate cancer (PC) mortality in Medicare beneficiaries age 67-84 years.
Methods: This is a large-scale, population-based, observational study of two screening strategies: annual PSA screening and no screening. We used data from 537,599 US Medicare (2001-2008) beneficiaries age 67-84 years who had a good life expectancy, no previous PC, and no PSA test in the 2 years before baseline. We estimated the 8-year PC mortality and incidence, treatments for PC, and treatment complications of PSA screening.
Results: In men age 67-74 years, the estimated difference in 8-year risk of PC death between PSA screening and no screening was -2.3 (95% CI, -4.1 to -1.1) deaths per 1,000 men (a negative risk difference favors screening). Treatment complications were more frequent under PSA screening than under no screening. In men age 75-84 years, risk difference estimates were closer to zero.
Conclusion: Our estimates suggest that under conventional statistical criteria, annual PSA screening for 8 years is highly compatible with reductions of PC mortality from four to one fewer PC deaths per 1,000 screened men age 67-74 years. As with any study using real-world data, the estimates could be affected by residual confounding.
{"title":"Prostate-Specific Antigen Screening and Prostate Cancer Mortality: An Emulation of Target Trials in US Medicare.","authors":"Xabier García-Albéniz, John Hsu, Ruth Etzioni, June M Chan, Joy Shi, Barbra Dickerman, Miguel A Hernán","doi":"10.1200/CCI.24.00094","DOIUrl":"10.1200/CCI.24.00094","url":null,"abstract":"<p><strong>Purpose: </strong>No consensus about the effectiveness of prostate-specific antigen (PSA) screening exists among clinical guidelines, especially for the elderly. Randomized trials of PSA screening have yielded different results, partly because of variations in adherence, and it is unlikely that new trials will be conducted. Our objective was to estimate the effect of annual PSA screening on prostate cancer (PC) mortality in Medicare beneficiaries age 67-84 years.</p><p><strong>Methods: </strong>This is a large-scale, population-based, observational study of two screening strategies: annual PSA screening and no screening. We used data from 537,599 US Medicare (2001-2008) beneficiaries age 67-84 years who had a good life expectancy, no previous PC, and no PSA test in the 2 years before baseline. We estimated the 8-year PC mortality and incidence, treatments for PC, and treatment complications of PSA screening.</p><p><strong>Results: </strong>In men age 67-74 years, the estimated difference in 8-year risk of PC death between PSA screening and no screening was -2.3 (95% CI, -4.1 to -1.1) deaths per 1,000 men (a negative risk difference favors screening). Treatment complications were more frequent under PSA screening than under no screening. In men age 75-84 years, risk difference estimates were closer to zero.</p><p><strong>Conclusion: </strong>Our estimates suggest that under conventional statistical criteria, annual PSA screening for 8 years is highly compatible with reductions of PC mortality from four to one fewer PC deaths per 1,000 screened men age 67-74 years. As with any study using real-world data, the estimates could be affected by residual confounding.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400094"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11579517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142005809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timothy J Brown, Phyllis A Gimotty, Ronac Mamtani, Thomas B Karasic, Yu-Xiao Yang
Purpose: Systemic therapy with atezolizumab and bevacizumab can extend life for patients with advanced hepatocellular carcinoma (HCC). However, there is substantial variability in response to therapy and overall survival. Although current prognostic models have been validated in HCC, they primarily consider covariates that may be reflective of the severity of the underlying liver disease of patients with HCC. We developed and internally validated a classification and regression tree (CART) to identify patient characteristics associated with risks of early mortality, at or before 6 months from treatment initiation.
Methods: This retrospective cohort study used the nationwide Flatiron Health electronic health record-derived deidentified database and included patients with a diagnosis of HCC after January 1, 2020, who received initial systemic therapy with atezolizumab and bevacizumab. CART was developed from available baseline clinical and demographic information to predict mortality within 6 months from treatment initiation. Model characteristics were compared to the albumin-bilirubin (ALBI) model and was further validated against a contemporary validation cohort of patients after a data update.
Results: A total of 293 patients were analyzed. The CART identified seven cohorts of patients from baseline demographic and laboratory characteristics. The model had an area under the receiver operating curve (AUROC) of 0.739 (95% CI, 0.683 to 0.794) for predicting 6-month mortality. This model was internally valid and performed more favorably than the ALBI model, which had an AUROC of 0.608 (95% CI, 0.557 to 0.660). The model applied to the contemporary validation cohort (n = 111) had an AUROC of 0.666 (95% CI, 0.506 to 0.826).
Conclusion: Using CART, we identified unique cohorts of patients with HCC treated with atezolizumab and bevacizumab with distinct risks of early mortality. This approach outperformed the ALBI model and used clinical and laboratory characteristics that are readily available to oncologists caring for these patients.
{"title":"Classification and Regression Trees to Predict for Survival for Patients With Hepatocellular Carcinoma Treated With Atezolizumab and Bevacizumab.","authors":"Timothy J Brown, Phyllis A Gimotty, Ronac Mamtani, Thomas B Karasic, Yu-Xiao Yang","doi":"10.1200/CCI.23.00220","DOIUrl":"10.1200/CCI.23.00220","url":null,"abstract":"<p><strong>Purpose: </strong>Systemic therapy with atezolizumab and bevacizumab can extend life for patients with advanced hepatocellular carcinoma (HCC). However, there is substantial variability in response to therapy and overall survival. Although current prognostic models have been validated in HCC, they primarily consider covariates that may be reflective of the severity of the underlying liver disease of patients with HCC. We developed and internally validated a classification and regression tree (CART) to identify patient characteristics associated with risks of early mortality, at or before 6 months from treatment initiation.</p><p><strong>Methods: </strong>This retrospective cohort study used the nationwide Flatiron Health electronic health record-derived deidentified database and included patients with a diagnosis of HCC after January 1, 2020, who received initial systemic therapy with atezolizumab and bevacizumab. CART was developed from available baseline clinical and demographic information to predict mortality within 6 months from treatment initiation. Model characteristics were compared to the albumin-bilirubin (ALBI) model and was further validated against a contemporary validation cohort of patients after a data update.</p><p><strong>Results: </strong>A total of 293 patients were analyzed. The CART identified seven cohorts of patients from baseline demographic and laboratory characteristics. The model had an area under the receiver operating curve (AUROC) of 0.739 (95% CI, 0.683 to 0.794) for predicting 6-month mortality. This model was internally valid and performed more favorably than the ALBI model, which had an AUROC of 0.608 (95% CI, 0.557 to 0.660). The model applied to the contemporary validation cohort (n = 111) had an AUROC of 0.666 (95% CI, 0.506 to 0.826).</p><p><strong>Conclusion: </strong>Using CART, we identified unique cohorts of patients with HCC treated with atezolizumab and bevacizumab with distinct risks of early mortality. This approach outperformed the ALBI model and used clinical and laboratory characteristics that are readily available to oncologists caring for these patients.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300220"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11296500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141876704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ayyuce Begum Bektas, Lynn Hakki, Asama Khan, Maria Widmar, Iris H Wei, Emmanouil Pappou, J Joshua Smith, Garrett M Nash, Philip B Paty, Julio Garcia-Aguilar, Andrea Cercek, Zsofia Stadler, Neil H Segal, Jinru Shia, Mithat Gonen, Martin R Weiser
Purpose: Outcome for patients with nonmetastatic, microsatellite instability (MSI) colon cancer is favorable: however, high-risk cohorts exist. This study was aimed at developing and validating a nomogram model to predict freedom from recurrence (FFR) for patients with resected MSI colon cancer.
Patients and methods: Data from patients who underwent curative resection of stage I, II, or III MSI colon cancer in 2014-2021 (model training cohort, 384 patients, 33 events; median follow-up, 38.8 months) were retrospectively collected from institutional databases. Variables associated with recurrence in multivariable analysis were selected for inclusion in the clinical calculator. The calculator's predictive accuracy was measured with the concordance index and validated using data from patients who underwent treatment for MSI colon cancer in 2007-2013 (validation cohort, 164 patients, eight events; median follow-up, 84.8 months).
Results: T category and number of positive lymph nodes were significantly associated with recurrence in multivariable analysis and were selected for inclusion in the clinical calculator. The calculator's concordance index for FFR in the model training cohort was 0.812 (95% CI, 0.742 to 0.873), compared with 0.759 (95% CI, 0.683 to 0.840) for the staging schema of the eighth edition of the American Joint Committee on Cancer Staging Manual. The concordance index for the validation cohort was 0.744 (95% CI, 0.666 to 0.822), confirming robust predictive accuracy.
Conclusion: Although in general patients with nonmetastatic MSI colon cancer had favorable outcome, patients with advanced T category and multiple metastatic lymph nodes had higher risk of recurrence. The clinical calculator identified patients with MSI colon cancer at high risk for recurrence, and this could inform surveillance strategies. In addition, the model could be used in trial design to identify patients suitable for novel adjuvant therapy.
{"title":"Clinical Calculator for Predicting Freedom From Recurrence After Resection of Stage I-III Colon Cancer in Patients With Microsatellite Instability.","authors":"Ayyuce Begum Bektas, Lynn Hakki, Asama Khan, Maria Widmar, Iris H Wei, Emmanouil Pappou, J Joshua Smith, Garrett M Nash, Philip B Paty, Julio Garcia-Aguilar, Andrea Cercek, Zsofia Stadler, Neil H Segal, Jinru Shia, Mithat Gonen, Martin R Weiser","doi":"10.1200/CCI.23.00233","DOIUrl":"10.1200/CCI.23.00233","url":null,"abstract":"<p><strong>Purpose: </strong>Outcome for patients with nonmetastatic, microsatellite instability (MSI) colon cancer is favorable: however, high-risk cohorts exist. This study was aimed at developing and validating a nomogram model to predict freedom from recurrence (FFR) for patients with resected MSI colon cancer.</p><p><strong>Patients and methods: </strong>Data from patients who underwent curative resection of stage I, II, or III MSI colon cancer in 2014-2021 (model training cohort, 384 patients, 33 events; median follow-up, 38.8 months) were retrospectively collected from institutional databases. Variables associated with recurrence in multivariable analysis were selected for inclusion in the clinical calculator. The calculator's predictive accuracy was measured with the concordance index and validated using data from patients who underwent treatment for MSI colon cancer in 2007-2013 (validation cohort, 164 patients, eight events; median follow-up, 84.8 months).</p><p><strong>Results: </strong>T category and number of positive lymph nodes were significantly associated with recurrence in multivariable analysis and were selected for inclusion in the clinical calculator. The calculator's concordance index for FFR in the model training cohort was 0.812 (95% CI, 0.742 to 0.873), compared with 0.759 (95% CI, 0.683 to 0.840) for the staging schema of the eighth edition of the American Joint Committee on Cancer Staging Manual. The concordance index for the validation cohort was 0.744 (95% CI, 0.666 to 0.822), confirming robust predictive accuracy.</p><p><strong>Conclusion: </strong>Although in general patients with nonmetastatic MSI colon cancer had favorable outcome, patients with advanced T category and multiple metastatic lymph nodes had higher risk of recurrence. The clinical calculator identified patients with MSI colon cancer at high risk for recurrence, and this could inform surveillance strategies. In addition, the model could be used in trial design to identify patients suitable for novel adjuvant therapy.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300233"},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323037/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}