Pub Date : 2025-12-20DOI: 10.1016/j.radonc.2025.111360
Abel Bregman , Jikke J. Rutgers , Truls Andersen , Arjen van der Schaaf , Charlotte L. Brouwer , Geert O. Janssens , Eelco W. Hoving , Maarten H. Lequin , Rutger A.J. Nievelstein , Stefan Both , Johannes A. Langendijk , Hiska L. van der Weide , John H. Maduro , Dirk Wagenaar
Background/purpose
Radiation-induced contrast enhancement (RICE) has been identified as a metric of subclinical toxicity. The aim of this study was to develop a normal tissue complication probability (NTCP) model for RICE after proton therapy in paediatric patients with posterior fossa tumours.
Materials and methods
Paediatric patients (n = 75) treated with proton radiotherapy (49.6–59.4 Gy (RBE)), focally or in combination with craniospinal axis irradiation, were included. Follow-up magnetic resonance imaging scans were evaluated for RICE. Dose (D), dose multiplied by dose-averaged linear energy transfer (D⋅LETd), organ at risk association, RICE association and age data were extracted to construct a multivariable logistic regression model to predict RICE in voxels. NTCP was calculated using univariable logistic regression with RICE status as the dependent variable and the expected number of RICE voxels as the independent variable.
Results
A total of 60 RICE lesions were identified in 23 patients (30.7 %), of which 36 (60 %) were located in the brainstem, primarily the pons. We observed an increased density of RICE voxels in regions with a combination of high D (38.32–55.94 Gy) and medium-to-high LETd (1.89–6.00 keV/µm) values. In addition, younger age and the anatomical location in the pons were identified as independent risk factors for RICE. The NTCP model’s optimism corrected area under the receiver operating characteristic curve (AUC) was 0.79, and the Brier score was 0.16.
Conclusion
We developed models to predict RICE in paediatric patients. The RICE probability at the voxel- and patient-level increased with D and D⋅LETd, younger age and within the brainstem pons.
{"title":"Development of the first prediction model for radiation-induced contrast enhancement after proton therapy for posterior fossa tumours in paediatric patients","authors":"Abel Bregman , Jikke J. Rutgers , Truls Andersen , Arjen van der Schaaf , Charlotte L. Brouwer , Geert O. Janssens , Eelco W. Hoving , Maarten H. Lequin , Rutger A.J. Nievelstein , Stefan Both , Johannes A. Langendijk , Hiska L. van der Weide , John H. Maduro , Dirk Wagenaar","doi":"10.1016/j.radonc.2025.111360","DOIUrl":"10.1016/j.radonc.2025.111360","url":null,"abstract":"<div><h3>Background/purpose</h3><div>Radiation-induced contrast enhancement (RICE) has been identified as a metric of subclinical toxicity. The aim of this study was to develop a normal tissue complication probability (NTCP) model for RICE after proton therapy in paediatric patients with posterior fossa tumours.</div></div><div><h3>Materials and methods</h3><div>Paediatric patients (n = 75) treated with proton radiotherapy (49.6–59.4 Gy (RBE)), focally or in combination with craniospinal axis irradiation, were included. Follow-up magnetic resonance imaging scans were evaluated for RICE. Dose (<em>D</em>), dose multiplied by dose-averaged linear energy transfer (<em>D⋅LETd),</em> organ at risk association, RICE association and age data were extracted to construct a multivariable logistic regression model to predict RICE in voxels. NTCP was calculated using univariable logistic regression with RICE status as the dependent variable and the expected number of RICE voxels as the independent variable.</div></div><div><h3>Results</h3><div>A total of 60 RICE lesions were identified in 23 patients (30.7 %), of which 36 (60 %) were located in the brainstem, primarily the pons. We observed an increased density of RICE voxels in regions with a combination of high <em>D</em> (38.32–55.94 Gy) and medium-to-high <em>LETd (</em>1.89–6.00 keV/µm) values. In addition, younger age and the anatomical location in the pons were identified as independent risk factors for RICE. The NTCP model’s optimism corrected area under the receiver operating characteristic curve (AUC) was 0.79, and the Brier score was 0.16.</div></div><div><h3>Conclusion</h3><div>We developed models to predict RICE in paediatric patients. The RICE probability at the voxel- and patient-level increased with <em>D</em> and <em>D⋅LETd,</em> younger age and within the brainstem pons.</div></div>","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":"216 ","pages":"Article 111360"},"PeriodicalIF":5.3,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145811148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1016/j.radonc.2025.111351
Mathijs G. Dassen , Marcel van Herk , Marnix G. Witte , Tomas Janssen , Floris Pos , Uulke A. van der Heide
Purpose
Planning target volume (PTV) margin recipes assume all parts of the target are equally important. For the prostate clinical target volume (CTV) this is invalid. We evaluated the impact of the spatial probability distribution of microscopic disease in the prostate on CTV-to-PTV margins.
Materials and methods
A prostate with a volume of 44 cm3 was defined as CTVprostate. Homogenous dose distributions were created with margins ranging 0–5 mm. The gross tumor volume (GTV) was assumed covered with a separate margin. Microscopic satellites were sampled within the CTVprostate from a histopathology-based probability distribution for a range of numbers (1–10) and sizes (0.02–0.2 cm3) to define CTVsatellites. Geometric errors were sampled from a 3D Gaussian distribution, simulating online adaptive treatment of 5 fractions. Each CTV was shifted with respect to the dose according to each total error. The PTV margin ensuring 95 % of the prescribed dose to the CTVsatellites in 90 % of simulations was determined and compared with CTVprostate.
Results
For systematic errors with width (Σ) 0.5 mm and random errors with width (σr) 1.5 mm, the margin for the CTVprostate was 3 mm, whereas for each definition of CTVsatellites this margin was 0–1 mm. For σr = 2.7 mm, a margin of 5 mm was adequate for the CTVprostate and 2–3 mm for all except the most favourable and unfavourable CTVsatellites definition.
Conclusion
The CTV-to-PTV margins used in online adaptive radiotherapy for prostate cancer can be reduced by ∼2 mm, if the GTV is covered with an adequate margin.
{"title":"Are the CTV-to-PTV margins currently used in online adaptive radiotherapy for prostate cancer too large? The impact of the distribution of microscopic disease on treatment margin requirements","authors":"Mathijs G. Dassen , Marcel van Herk , Marnix G. Witte , Tomas Janssen , Floris Pos , Uulke A. van der Heide","doi":"10.1016/j.radonc.2025.111351","DOIUrl":"10.1016/j.radonc.2025.111351","url":null,"abstract":"<div><h3>Purpose</h3><div>Planning target volume (PTV) margin recipes assume all parts of the target are equally important. For the prostate clinical target volume (CTV) this is invalid. We evaluated the impact of the spatial probability distribution of microscopic disease in the prostate on CTV-to-PTV margins.</div></div><div><h3>Materials and methods</h3><div>A prostate with a volume of 44 cm<sup>3</sup> was defined as CTV<sub>prostate</sub>. Homogenous dose distributions were created with margins ranging 0–5 mm. The gross tumor volume (GTV) was assumed covered with a separate margin. Microscopic satellites were sampled within the CTV<sub>prostate</sub> from a histopathology-based probability distribution for a range of numbers (1–10) and sizes (0.02–0.2 cm<sup>3</sup>) to define CTV<sub>satellites</sub>. Geometric errors were sampled from a 3D Gaussian distribution, simulating online adaptive treatment of 5 fractions. Each CTV was shifted with respect to the dose according to each total error. The PTV margin ensuring 95 % of the prescribed dose to the CTV<sub>satellites</sub> in 90 % of simulations was determined and compared with CTV<sub>prostate</sub>.</div></div><div><h3>Results</h3><div>For systematic errors with width (Σ) 0.5 mm and random errors with width (σ<sub>r</sub>) 1.5 mm, the margin for the CTV<sub>prostate</sub> was 3 mm, whereas for each definition of CTV<sub>satellites</sub> this margin was 0–1 mm. For σ<sub>r</sub> = 2.7 mm, a margin of 5 mm was adequate for the CTV<sub>prostate</sub> and 2–3 mm for all except the most favourable and unfavourable CTV<sub>satellites</sub> definition.</div></div><div><h3>Conclusion</h3><div>The CTV-to-PTV margins used in online adaptive radiotherapy for prostate cancer can be reduced by ∼2 mm, if the GTV is covered with an adequate margin.</div></div>","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":"216 ","pages":"Article 111351"},"PeriodicalIF":5.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1016/j.radonc.2025.111349
Wei Zou , Lei Dong , Arnaud Pin , Rasmus Nilsson , Michele Kim , Ontida Apinorasethkul , Julia Pakela , Andrew Friberg , Brandon Koger , Rudi Labarbe , Carolina Llina Fuentes , Keith Cengel , Erik Traneus , Swati Girdhani , Costas Koumenis , Francois Vander Stappen , Eric Diffenderfer , Jeffrey Bradley , Boon-Keng Kevin Teo , Alexander Lin
Purpose
Clinical translation of ultra-high dose rate (UHDR) delivery to harness potential FLASH effect requires a treatment planning system (TPS) to optimize and calculate dose and dose rate in patients. Proton conformal FLASH treatment aims to deliver pencil beam scanning (PBS) Bragg Peaks to the tumor region with UHDR. In this work, we conducted a treatment planning study for head and neck (H&N) re-irradiation patients using a research-version of a commercial TPS paired with conformal FLASH hardware integrated into a nozzle of a clinical cyclotron-based system.
Methods
Fifteen H&N patients were planned for re-irradiation of 40 GyRBE in 5 fractions to the area of intact tumor. The TPS was configured with validated UHDR beam measurement to generate optimized patient FLASH plans with one or two beams, delivered as single-beam-per-fraction (SBPF). Each beam consists of a deliverable mono-energetic PBS map, a 3D-printable conformal energy modulator design, a selection of aluminum range shifter plates, and a brass aperture. Python scripts with machine-specific delivery timing parameters were used for Monte Carlo dose and dose rate calculations. Clinical VMAT and IMPT plans were also generated for dosimetric comparison.
Results
All plans met the tumor target and OAR planning objectives. Conformal FLASH plans showed very similar dose distributions to the clinical IMPT plans. Compared to VMAT plans, both IMPT and FLASH plans have reduced low dose region, maximum cord dose D0.03 cc (8.37 ± 0.94 vs. 3.19 ± 3.81 and 4.32 ± 3.12 GyRBE, respectively), contra-lateral parotid mean dose (1.88 ± 0.99 vs. 0.00 ± 0.01 and 0.00 ± 0.00 GyRBE, respectively) and contra-lateral submandibular gland mean dose (2.49 ± 1.06 vs. 0.14 ± 0.13 and 0.19 ± 0.19 GyRBE, respectively). With 500 nA quasi-continuous nozzle beam current, the mean dose-averaged dose rate in CTVs of these 15 patients achieved 95.75 ± 22.78 Gy/s.
Conclusions
We report the deliverable proton conformal FLASH treatment plans for H&N re-irradiation patients using the innovative hardware configuration and measured beam data in our institution. The FLASH plans have very similar plan qualities to clinical IMPT proton plans and were deliverable with our proton machine. The machine specific 3D dose rate distribution can be calculated and displayed in the TPS.
目的:超高剂量率(UHDR)给药利用潜在的FLASH效应的临床转化需要一个治疗计划系统(TPS)来优化和计算患者的剂量和剂量率。质子适形FLASH治疗旨在通过UHDR将铅笔束扫描(PBS)布拉格峰传递到肿瘤区域。在这项工作中,我们对头颈部(H&N)再照射患者进行了一项治疗计划研究,使用了商用TPS的研究版本,并将适形FLASH硬件集成到临床回旋加速器系统的喷嘴中。方法:选取15例H&N患者,分5段对完整肿瘤区域进行40次GyRBE再照射。TPS配置了经过验证的UHDR光束测量,以一束或两束的方式生成优化的患者FLASH计划,以单束/分数(SBPF)的方式交付。每个光束由一个可交付的单能量PBS图,一个3d打印的保形能量调制器设计,一个铝制范围移位板和一个黄铜孔径组成。使用带有机器特定给药时间参数的Python脚本进行蒙特卡罗剂量和剂量率计算。临床VMAT和IMPT计划也生成用于剂量学比较。结果:所有方案均达到肿瘤目标和OAR计划目标。适形FLASH计划显示与临床IMPT计划非常相似的剂量分布。相比VMAT计划,IMPT和FLASH计划减少了低剂量,最大线剂量D0.03 cc( 8.37±0.94 vs 3.19 ± 3.81和4.32±3.12 GyRBE,分别),contra-lateral腮腺意味着剂量(1.88 ±0.99 vs 0.00 ± 0.01和0.00±0.00 GyRBE,分别)和contra-lateral颌下腺意味着剂量(2.49 ±1.06 vs 0.14 ± 0.13和0.19±0.19 GyRBE,分别)。在500nA准连续喷嘴光束电流下,15例患者ctv的平均剂量率达到95.75 ± 22.78 Gy/s。结论:我们报告了在我们机构使用创新硬件配置和测量光束数据的H&N再照射患者的可交付质子适形FLASH治疗计划。FLASH计划与临床IMPT质子计划具有非常相似的计划质量,并且可以与我们的质子机一起交付。机器特定的三维剂量率分布可以计算并显示在TPS中。
{"title":"Deliverable proton conformal FLASH radiotherapy treatment planning for head and neck re-irradiation patients","authors":"Wei Zou , Lei Dong , Arnaud Pin , Rasmus Nilsson , Michele Kim , Ontida Apinorasethkul , Julia Pakela , Andrew Friberg , Brandon Koger , Rudi Labarbe , Carolina Llina Fuentes , Keith Cengel , Erik Traneus , Swati Girdhani , Costas Koumenis , Francois Vander Stappen , Eric Diffenderfer , Jeffrey Bradley , Boon-Keng Kevin Teo , Alexander Lin","doi":"10.1016/j.radonc.2025.111349","DOIUrl":"10.1016/j.radonc.2025.111349","url":null,"abstract":"<div><h3>Purpose</h3><div>Clinical translation of ultra-high dose rate (UHDR) delivery to harness potential FLASH effect requires a treatment planning system (TPS) to optimize and calculate dose and dose rate in patients. Proton conformal FLASH treatment aims to deliver pencil beam scanning (PBS) Bragg Peaks to the tumor region with UHDR. In this work, we conducted a treatment planning study for head and neck (H&N) re-irradiation patients using a research-version of a commercial TPS paired with conformal FLASH hardware integrated into a nozzle of a clinical cyclotron-based system.</div></div><div><h3>Methods</h3><div>Fifteen H&N patients were planned for re-irradiation of 40 GyRBE in 5 fractions to the area of intact tumor. The TPS was configured with validated UHDR beam measurement to generate optimized patient FLASH plans with one or two beams, delivered as single-beam-per-fraction (SBPF). Each beam consists of a deliverable mono-energetic PBS map, a 3D-printable conformal energy modulator design, a selection of aluminum range shifter plates, and a brass aperture. Python scripts with machine-specific delivery timing parameters were used for Monte Carlo dose and dose rate calculations. Clinical VMAT and IMPT plans were also generated for dosimetric comparison.</div></div><div><h3>Results</h3><div>All plans met the tumor target and OAR planning objectives. Conformal FLASH plans showed very similar dose distributions to the clinical IMPT plans. Compared to VMAT plans, both IMPT and FLASH plans have reduced low dose region, maximum cord dose D0.03 cc (8.37 ± 0.94 vs. 3.19 ± 3.81 and 4.32 ± 3.12 GyRBE, respectively), contra-lateral parotid mean dose (1.88 ± 0.99 vs. 0.00 ± 0.01 and 0.00 ± 0.00 GyRBE, respectively) and contra-lateral submandibular gland mean dose (2.49 ± 1.06 vs. 0.14 ± 0.13 and 0.19 ± 0.19 GyRBE, respectively). With 500 nA quasi-continuous nozzle beam current, the mean dose-averaged dose rate in CTVs of these 15 patients achieved 95.75 ± 22.78 Gy/s.</div></div><div><h3>Conclusions</h3><div>We report the deliverable proton conformal FLASH treatment plans for H&N re-irradiation patients using the innovative hardware configuration and measured beam data in our institution. The FLASH plans have very similar plan qualities to clinical IMPT proton plans and were deliverable with our proton machine. The machine specific 3D dose rate distribution can be calculated and displayed in the TPS.</div></div>","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":"216 ","pages":"Article 111349"},"PeriodicalIF":5.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.radonc.2025.111353
S.P.M. de Vette, J.A. Langendijk, N.M. Sijtsema, L.V. van Dijk
{"title":"In response to Liu et al","authors":"S.P.M. de Vette, J.A. Langendijk, N.M. Sijtsema, L.V. van Dijk","doi":"10.1016/j.radonc.2025.111353","DOIUrl":"10.1016/j.radonc.2025.111353","url":null,"abstract":"","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":"216 ","pages":"Article 111353"},"PeriodicalIF":5.3,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145800707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.radonc.2025.111350
Gibson C. Ugwu , Farzad Jalali , Geoffrey Liu , Guojun Li , Johannes Albertus Langendijk , Behrooz Z. Alizadeh
An increasing number of Artificial intelligence (AI) and machine learning (ML) models are being developed to predict radiation-induced toxicities (RITs) in patients with head and neck cancer (HNC). But their performance and reliability remain uncertain. This systematic review and meta-analysis evaluated the predictive accuracy and methodological quality of these models. We comprehensively searched PubMed, EMBASE, Web of Science, and the Cochrane Library to identify studies reporting on ML/AI models for predicting RITs in HNC patients. Eligible studies were assessed for bias risk using the PROBAST tool, and key performance metrics, including the area under the receiver operating curve (AUROC), were extracted. A hierarchical multilevel meta-analysis was performed to estimate pooled AUROC values, and subgroup analyses explored the influence of study characteristics on model performance. A total of 67 studies with a total of 568 models were included, showing moderate discriminatory power of ML/AI models, with a pooled AUROC = 0.76; 95 % CI: 0.73–0.78. Nonetheless, substantial heterogeneity was observed across studies. Incorporating imaging biomarkers significantly improved model performance. Prospective and internal validation showed comparable performance; external validation shows true generalizability. The predominance of retrospective designs and variability in predictor selection may have introduced bias, affecting model reliability and generalisability. ML/AI models hold promise for predicting RITs in HNC patients, but methodological constraints limit their applicability. Standardised and transparent reporting of model development and validation processes is vital for improving comparability among studies. Future research should explore hybrid modelling methods and the integration of clinical, dosimetric, radiomic, and genomic data to boost predictive accuracy.
越来越多的人工智能(AI)和机器学习(ML)模型正在开发中,以预测头颈癌(HNC)患者的辐射诱发毒性(RITs)。但它们的性能和可靠性仍不确定。本系统综述和荟萃分析评估了这些模型的预测准确性和方法学质量。我们全面检索了PubMed、EMBASE、Web of Science和Cochrane Library,以确定报告ML/AI模型预测HNC患者RITs的研究。使用PROBAST工具评估符合条件的研究的偏倚风险,并提取关键绩效指标,包括受试者工作曲线下面积(AUROC)。采用分层多水平元分析来估计汇总的AUC值,并进行亚组分析,探讨研究特征对模型性能的影响。共纳入67项研究,共568个模型,ML/AI模型具有中等的区分力,合并AUC = 0.76;95 % ci: 0.73-0.78。尽管如此,在研究中观察到大量的异质性。结合成像生物标志物可显著提高模型性能。前瞻性和内部验证显示相当的性能;外部验证显示了真正的通用性。回顾性设计的优势和预测器选择的可变性可能会引入偏差,影响模型的可靠性和通用性。ML/AI模型有望预测HNC患者的RITs,但方法上的限制限制了它们的适用性。模型开发和验证过程的标准化和透明报告对于提高研究之间的可比性至关重要。未来的研究应该探索混合建模方法和临床、剂量学、放射学和基因组数据的整合,以提高预测的准确性。
{"title":"The actual performance of ML/AI models in predicting radiation-induced toxicity in head and neck cancer: a systematic review and meta-analysis","authors":"Gibson C. Ugwu , Farzad Jalali , Geoffrey Liu , Guojun Li , Johannes Albertus Langendijk , Behrooz Z. Alizadeh","doi":"10.1016/j.radonc.2025.111350","DOIUrl":"10.1016/j.radonc.2025.111350","url":null,"abstract":"<div><div>An increasing number of Artificial intelligence (AI) and machine learning (ML) models are being developed to predict radiation-induced toxicities (RITs) in patients with head and neck cancer (HNC). But their performance and reliability remain uncertain. This systematic review and <em>meta</em>-analysis evaluated the predictive accuracy and methodological quality of these models. We comprehensively searched PubMed, EMBASE, Web of Science, and the Cochrane Library to identify studies reporting on ML/AI models for predicting RITs in HNC patients. Eligible studies were assessed for bias risk using the PROBAST tool, and key performance metrics, including the area under the receiver operating curve (AUROC), were extracted. A hierarchical multilevel <em>meta</em>-analysis was performed to estimate pooled AUROC values, and subgroup analyses explored the influence of study characteristics on model performance. A total of 67 studies with a total of 568 models were included, showing moderate discriminatory power of ML/AI models, with a pooled AUROC = 0.76; 95 % CI: 0.73–0.78. Nonetheless, substantial heterogeneity was observed across studies. Incorporating imaging biomarkers significantly improved model performance. Prospective and internal validation showed comparable performance; external validation shows true generalizability. The predominance of retrospective designs and variability in predictor selection may have introduced bias, affecting model reliability and generalisability. ML/AI models hold promise for predicting RITs in HNC patients, but methodological constraints limit their applicability. Standardised and transparent reporting of model development and validation processes is vital for improving comparability among studies. Future research should explore hybrid modelling methods and the integration of clinical, dosimetric, radiomic, and genomic data to boost predictive accuracy.</div></div>","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":"216 ","pages":"Article 111350"},"PeriodicalIF":5.3,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145800712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.radonc.2025.111348
Federico Mastroleo , Mariana Borras-Osorio , Shiv P. Patel , Sarah Peterson , Renthony Wilson , Mi Zhou , Satomi Shiraishi , Andrew Y.K. Foong , David M. Routman , Mark R. Waddle
Background
Accurate toxicity assessment is critical in oncology trials, yet current reporting frameworks such as the Common Terminology Criteria for Adverse Events (CTCAE) remain labor-intensive and subject to inter-observer variability. Large language models (LLMs) offer potential to automate extraction and grading of adverse events from clinical notes and patient-reported outcomes (PROs), but their comparative performance and cost-effectiveness remain underexplored.
Methods
We evaluated five off-the-shelf LLMs (Gemini 2.0 Flash, Gemini 2.5 Flash, Gemini 2.5 Pro, GPT-4o, and GPT-5) using a rule-augmented few-shot prompting strategy to extract CTCAE-graded gastrointestinal and genitourinary toxicities from a prospective prostate radiotherapy trial (NCT02874014; n = 55 patients, 8968 toxicity records). Binary and grade-level accuracy, precision, recall, specificity, F1 score, Cohen’s kappa, and computational costs were assessed.
Results
All models achieved high binary accuracy (84.6–87.4 %) and moderate grade accuracy (79.1–82.3 %). GPT-4o reached the best binary (87.4 %) and grade (83.5 %) accuracy, while Gemini 2.5 Pro demonstrated highest sensitivity (74.0 %). Specificity peaked with GPT-4o (96.0 %). Cohen’s kappa values indicated moderate agreement (0.552–0.560 for binary; 0.401–0.465 for grades). Costs for the entire extraction varied substantially: Gemini 2.0 Flash delivered competitive accuracy at $0.77 total, whereas Gemini 2.5 Pro and GPT-5 exceeded $21.
Conclusions
Off-the-shelf LLMs can extract clinically relevant toxicities with performance approaching human inter-rater reliability, at variable but often negligible costs. While grade-level accuracy remains limited, LLM integration into oncology workflows is feasible, offering scalable, low-cost support for toxicity monitoring and data abstraction in clinical research.
{"title":"Large language models for toxicity extraction in oncology trials: A real-world benchmark in prostate radiotherapy","authors":"Federico Mastroleo , Mariana Borras-Osorio , Shiv P. Patel , Sarah Peterson , Renthony Wilson , Mi Zhou , Satomi Shiraishi , Andrew Y.K. Foong , David M. Routman , Mark R. Waddle","doi":"10.1016/j.radonc.2025.111348","DOIUrl":"10.1016/j.radonc.2025.111348","url":null,"abstract":"<div><h3>Background</h3><div>Accurate toxicity assessment is critical in oncology trials, yet current reporting frameworks such as the Common Terminology Criteria for Adverse Events (CTCAE) remain labor-intensive and subject to inter-observer variability. Large language models (LLMs) offer potential to automate extraction and grading of adverse events from clinical notes and patient-reported outcomes (PROs), but their comparative performance and cost-effectiveness remain underexplored.</div></div><div><h3>Methods</h3><div>We evaluated five off-the-shelf LLMs (Gemini 2.0 Flash, Gemini 2.5 Flash, Gemini 2.5 Pro, GPT-4o, and GPT-5) using a rule-augmented few-shot prompting strategy to extract CTCAE-graded gastrointestinal and genitourinary toxicities from a prospective prostate radiotherapy trial (NCT02874014; n = 55 patients, 8968 toxicity records). Binary and grade-level accuracy, precision, recall, specificity, F1 score, Cohen’s kappa, and computational costs were assessed.</div></div><div><h3>Results</h3><div>All models achieved high binary accuracy (84.6–87.4 %) and moderate grade accuracy (79.1–82.3 %). GPT-4o reached the best binary (87.4 %) and grade (83.5 %) accuracy, while Gemini 2.5 Pro demonstrated highest sensitivity (74.0 %). Specificity peaked with GPT-4o (96.0 %). Cohen’s kappa values indicated moderate agreement (0.552–0.560 for binary; 0.401–0.465 for grades). Costs for the entire extraction varied substantially: Gemini 2.0 Flash delivered competitive accuracy at $0.77 total, whereas Gemini 2.5 Pro and GPT-5 exceeded $21.</div></div><div><h3>Conclusions</h3><div>Off-the-shelf LLMs can extract clinically relevant toxicities with performance approaching human inter-rater reliability, at variable but often negligible costs. While grade-level accuracy remains limited, LLM integration into oncology workflows is feasible, offering scalable, low-cost support for toxicity monitoring and data abstraction in clinical research.</div></div>","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":"216 ","pages":"Article 111348"},"PeriodicalIF":5.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145794695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: This study reports the long-term outcomes of simultaneous integrated boost radiotherapy (SIB-RT) combined with oral S-1 chemotherapy (CRTCT) in inoperable patients aged ≥ 70 years with inoperable esophageal squamous cell carcinoma (ESCC).
Methods: In this multicenter, phase III randomized trial, patients with inoperable, locally advanced stage II-IV ESCC were randomized to receive either CRTCT or RT alone. The primary endpoint was overall survival (OS); secondary endpoints included progression-free survival (PFS), restricted mean survival time (RMST), and patterns of failure.
Results: After a median follow-up of 75 months, patients in the CRTCT group had longer OS than those in the RT group (hazard ratio (HR), 0.74; 95 % confidence interval (CI), 0.57-0.95; P = 0.02), with better 5-year OS rates (34.1 % vs. 23.6 %,P = 0.02), PFS (31.1 % vs. 20.9 %,P = 0.02), and RMST (33.0 vs. 27.0 months,P = 0.02). CRTCT's effect on OS stabilized over 5 years. Locoregional failure was lower in the CRTCT group (HR, 0.64; 95 % CI, 0.45-0.92; P = 0.02), with similar distant and mixed failure risks.
Conclusions: SIB-RT combined with oral S-1 chemotherapy significantly improved long-term survival and therefore might be considered a standard of care for elderly patients with inoperable ESCC. TRIAL REGISTRATION Clinical Trials.gov Identifier: NCT0297969.
目的:本研究报告了同步综合增强放疗(sibb - rt)联合口服S-1化疗(CRTCT)治疗年龄 ≥ 70 岁不能手术的食管鳞状细胞癌(ESCC)患者的长期预后。方法:在这项多中心III期随机试验中,不能手术的局部晚期II-IV期ESCC患者随机接受CRTCT或单独RT治疗。主要终点是总生存期(OS);次要终点包括无进展生存期(PFS)、受限平均生存期(RMST)和失败模式。结果:中位随访75 个月后,CRTCT组患者的OS时间长于RT组(风险比(HR), 0.74;95 %置信区间(CI), 0.57-0.95;P = 0.02),更好的利率5年操作系统(34.1 % 23.6 vs % P = 0.02),PFS(31.1 % 20.9 vs % P = 0.02),和RMST (33.0 vs 27.0 月,P = 0.02)。CRTCT对OS的影响稳定在5 年以上。CRTCT组局部区域失败较低(HR, 0.64; 95 % CI, 0.45-0.92; P = 0.02),远端和混合失败风险相似。结论:sibb - rt联合口服S-1化疗可显著提高ESCC患者的长期生存率,可作为老年不能手术ESCC患者的标准治疗方案。临床试验。gov标识符:NCT0297969。
{"title":"Long-term outcomes of S-1-based chemoradiotherapy in inoperable elderly patients with esophageal carcinoma: A multicenter, randomized, phase III clinical trial.","authors":"Junqiang Chen, Xiao Chang, Wenyang Liu, Xiaomin Wang, Xiaolin Ge, Ke Liu, Lei Deng, Miaomiao Hu, Jianchao Lu, Wei Wang, Haiwen Zhou, Shuai Qie, Jihong Zhang, Weiming Han, Wenqing Wang, Zongmei Zhou, Xin Wang, Jun Liang, Nan Bi, Tao Zhang, Jianyang Wang, Yirui Zhai, Lan Wang, Yu Lin, Yidian Zhao, Qingsong Pang, Xinchen Sun, Yonggang Shi, Kaixian Zhang, Ling Li, Qifeng Wang, Minghe Li, Hongyun Shi, Zhilong Yu, Wei Deng, Chun Han, Junjie Wang, Wanqing Chen, Wencheng Zhang, Zefen Xiao","doi":"10.1016/j.radonc.2025.111334","DOIUrl":"https://doi.org/10.1016/j.radonc.2025.111334","url":null,"abstract":"<p><strong>Purpose: </strong>This study reports the long-term outcomes of simultaneous integrated boost radiotherapy (SIB-RT) combined with oral S-1 chemotherapy (CRTCT) in inoperable patients aged ≥ 70 years with inoperable esophageal squamous cell carcinoma (ESCC).</p><p><strong>Methods: </strong>In this multicenter, phase III randomized trial, patients with inoperable, locally advanced stage II-IV ESCC were randomized to receive either CRTCT or RT alone. The primary endpoint was overall survival (OS); secondary endpoints included progression-free survival (PFS), restricted mean survival time (RMST), and patterns of failure.</p><p><strong>Results: </strong>After a median follow-up of 75 months, patients in the CRTCT group had longer OS than those in the RT group (hazard ratio (HR), 0.74; 95 % confidence interval (CI), 0.57-0.95; P = 0.02), with better 5-year OS rates (34.1 % vs. 23.6 %,P = 0.02), PFS (31.1 % vs. 20.9 %,P = 0.02), and RMST (33.0 vs. 27.0 months,P = 0.02). CRTCT's effect on OS stabilized over 5 years. Locoregional failure was lower in the CRTCT group (HR, 0.64; 95 % CI, 0.45-0.92; P = 0.02), with similar distant and mixed failure risks.</p><p><strong>Conclusions: </strong>SIB-RT combined with oral S-1 chemotherapy significantly improved long-term survival and therefore might be considered a standard of care for elderly patients with inoperable ESCC. TRIAL REGISTRATION Clinical Trials.gov Identifier: NCT0297969.</p>","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":" ","pages":"111334"},"PeriodicalIF":5.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145782707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.radonc.2025.111333
Lily Nguyen , Yuze Song , Anna Dornisch , Madison Baxter , Tristan Barrett , Anders M. Dale , Mukesh Harisinghani , Sophia C. Kamran , Michael A. Liss , Robert T. Dess , Daniel J.A. Margolis , Eric P. Weinberg , Tyler M. Seibert
Purpose
Precise delineation of genitourinary structures during prostate cancer (PCa) care is critical to optimize treatment delivery while minimizing toxicity and injury. The Prostate and UREthra on MRI (PURE-MRI) study is an international, prospective study to assess physicians’ accuracy segmenting prostate and urethra on MRI.
Methods
Physicians who diagnose or treat PCa were invited to contour prostate and urethra on patient cases using standard T2-weighted MRI (all planes). We compared these contours to reference consensus segmentations produced by a multidisciplinary panel of experts. We also evaluated performance of validated auto-segmentation AI tools. Accuracy was assessed with spatial and volumetric analyses. A mixed effects model was used to evaluate potential factors influencing contour performance.
Results
62 specialists from 11 countries created 114 prostate and 110 urethra contours. Prostate median (min, max) accuracy for physicians [vs. AI] was Dice score: 0.92 (0.62, 0.95) [vs. 0.95 (0.94, 0.96)], maximum deviation inside prostate: 3.4 mm (1.0, 12.4) [vs. 3.0 mm (3.0, 3.0)], maximum deviation beyond prostate: 5.3 mm (2.4, 17.3) [vs. 3.9 mm (3.1, 4.9)], mean deviation (per case) from the reference prostate: 1.6 mm (0.9, 3.9) [vs. 1.2 mm (1.1, 1.6)]. Urethra accuracy was Dice score: 0.33 (0.03, 0.69) [vs. 0.41 (0.35, 0.48)], coverage: 36 % (3 %, 96 %) [vs. 81 % (80 %, 91 %)], maximum (2D) deviation beyond urethra: 1.6 mm (1.0, 2.3) [vs. 1.7 mm (1.3, 2.1)].
Conclusion
Physicians contour the prostate on MRI with overall Dice score >0.90, though typical cases still include at least one error >5 mm and sometimes >10 mm. Physician urethra contours were less accurate, with typical coverage of <40 % of the reference urethra (compared to >80 % for AI). Physician trainees performed similarly to experienced clinicians. AI tools give comparable accuracy to practicing physicians for prostate contours and achieve better coverage of the urethra.
{"title":"PURE-MRI: An international study assessing physician accuracy in delineating the prostate and urethra on prostate MRI","authors":"Lily Nguyen , Yuze Song , Anna Dornisch , Madison Baxter , Tristan Barrett , Anders M. Dale , Mukesh Harisinghani , Sophia C. Kamran , Michael A. Liss , Robert T. Dess , Daniel J.A. Margolis , Eric P. Weinberg , Tyler M. Seibert","doi":"10.1016/j.radonc.2025.111333","DOIUrl":"10.1016/j.radonc.2025.111333","url":null,"abstract":"<div><h3>Purpose</h3><div>Precise delineation of genitourinary structures during prostate cancer (PCa) care is critical to optimize treatment delivery while minimizing toxicity and injury. The Prostate and UREthra on MRI (PURE-MRI) study is an international, prospective study to assess physicians’ accuracy segmenting prostate and urethra on MRI.</div></div><div><h3>Methods</h3><div>Physicians who diagnose or treat PCa were invited to contour prostate and urethra on patient cases using standard <em>T<sub>2</sub></em>-weighted MRI (all planes). We compared these contours to reference consensus segmentations produced by a multidisciplinary panel of experts. We also evaluated performance of validated auto-segmentation AI tools. Accuracy was assessed with spatial and volumetric analyses. A mixed effects model was used to evaluate potential factors influencing contour performance.</div></div><div><h3>Results</h3><div>62 specialists from 11 countries created 114 prostate and 110 urethra contours. Prostate median (min, max) accuracy for physicians [vs. AI] was Dice score: 0.92 (0.62, 0.95) [vs. 0.95 (0.94, 0.96)], maximum deviation inside prostate: 3.4 mm (1.0, 12.4) [vs. 3.0 mm (3.0, 3.0)], maximum deviation beyond prostate: 5.3 mm (2.4, 17.3) [vs. 3.9 mm (3.1, 4.9)], mean deviation (per case) from the reference prostate: 1.6 mm (0.9, 3.9) [vs. 1.2 mm (1.1, 1.6)]. Urethra accuracy was Dice score: 0.33 (0.03, 0.69) [vs. 0.41 (0.35, 0.48)], coverage: 36 % (3 %, 96 %) [vs. 81 % (80 %, 91 %)], maximum (2D) deviation beyond urethra: 1.6 mm (1.0, 2.3) [vs. 1.7 mm (1.3, 2.1)].</div></div><div><h3>Conclusion</h3><div>Physicians contour the prostate on MRI with overall Dice score >0.90, though typical cases still include at least one error >5 mm and sometimes >10 mm. Physician urethra contours were less accurate, with typical coverage of <40 % of the reference urethra (compared to >80 % for AI). Physician trainees performed similarly to experienced clinicians. AI tools give comparable accuracy to practicing physicians for prostate contours and achieve better coverage of the urethra.</div></div>","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":"216 ","pages":"Article 111333"},"PeriodicalIF":5.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145782668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}