Pub Date : 2025-07-01DOI: 10.3171/2025.4.FOCUS2597
Yuki Shinya, Abdul Karim Ghaith, Sukwoo Hong, Justine S Herndon, Sandhya R Palit, Dana Erickson, Irina Bancos, Miguel Saez-Alegre, Ramin A Morshed, Carlos Pinheiro Neto, Fredric B Meyer, John L D Atkinson, Jamie J Van Gompel
Objective: Patients with growth hormone (GH)-secreting pituitary adenomas (PAs) experience various symptoms and comorbidities, which can ultimately lead to increased mortality. This study aimed to develop and validate a machine learning (ML) model for predicting long-term outcomes in patients with GH-secreting PAs following endonasal transsphenoidal surgery (ETS).
Methods: The authors conducted a retrospective three-institution cohort study that included patients with GH-secreting PAs treated with ETS between 2013 and 2023. Clinical, radiological, and biochemical data were collected. The main outcome of interest was the intervention-free rate (IFR) after primary ETS. Supervised ML algorithms, including decision trees and random forests, were developed to predict the IFR. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC) and Shapley Additive Explanations (SHAP) values.
Results: The median follow-up for 100 patients with GH-secreting PAs (53% female) was 64 months (range 1-130 months). Additional intervention for persistent or recurrent acromegaly was required in 32% of patients. Following primary ETS alone, the 3-year IFR was 70% and the 5-year IFR was 67%. Multiple ML models were developed and evaluated using AUROCs. The decision tree analysis achieved an accuracy of 81% and emphasized the importance of both gross-total resection (GTR) and patient age in determining the long-term IFR. To better understand the factors that contributed to model performance, SHAP analysis was applied to the best-performing model. The SHAP dependence plots showed that key factors associated with a longer IFR included tumor size < 9 mm, GTR, patient age > 65 years, and Knosp grade 0.
Conclusions: This ML model offers a more nuanced and potentially more accurate approach to identify patients more likely to develop recurrent or persistent acromegaly following primary ETS and require additional treatment. Following external validation, this ML model could improve personalized treatment planning and follow-up strategies and enhance patient care and resource allocation in clinical practice.
{"title":"A supervised machine learning approach for predicting the need for postsurgical intervention in acromegaly.","authors":"Yuki Shinya, Abdul Karim Ghaith, Sukwoo Hong, Justine S Herndon, Sandhya R Palit, Dana Erickson, Irina Bancos, Miguel Saez-Alegre, Ramin A Morshed, Carlos Pinheiro Neto, Fredric B Meyer, John L D Atkinson, Jamie J Van Gompel","doi":"10.3171/2025.4.FOCUS2597","DOIUrl":"10.3171/2025.4.FOCUS2597","url":null,"abstract":"<p><strong>Objective: </strong>Patients with growth hormone (GH)-secreting pituitary adenomas (PAs) experience various symptoms and comorbidities, which can ultimately lead to increased mortality. This study aimed to develop and validate a machine learning (ML) model for predicting long-term outcomes in patients with GH-secreting PAs following endonasal transsphenoidal surgery (ETS).</p><p><strong>Methods: </strong>The authors conducted a retrospective three-institution cohort study that included patients with GH-secreting PAs treated with ETS between 2013 and 2023. Clinical, radiological, and biochemical data were collected. The main outcome of interest was the intervention-free rate (IFR) after primary ETS. Supervised ML algorithms, including decision trees and random forests, were developed to predict the IFR. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC) and Shapley Additive Explanations (SHAP) values.</p><p><strong>Results: </strong>The median follow-up for 100 patients with GH-secreting PAs (53% female) was 64 months (range 1-130 months). Additional intervention for persistent or recurrent acromegaly was required in 32% of patients. Following primary ETS alone, the 3-year IFR was 70% and the 5-year IFR was 67%. Multiple ML models were developed and evaluated using AUROCs. The decision tree analysis achieved an accuracy of 81% and emphasized the importance of both gross-total resection (GTR) and patient age in determining the long-term IFR. To better understand the factors that contributed to model performance, SHAP analysis was applied to the best-performing model. The SHAP dependence plots showed that key factors associated with a longer IFR included tumor size < 9 mm, GTR, patient age > 65 years, and Knosp grade 0.</p><p><strong>Conclusions: </strong>This ML model offers a more nuanced and potentially more accurate approach to identify patients more likely to develop recurrent or persistent acromegaly following primary ETS and require additional treatment. Following external validation, this ML model could improve personalized treatment planning and follow-up strategies and enhance patient care and resource allocation in clinical practice.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E10"},"PeriodicalIF":3.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144541557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.3171/2025.4.FOCUS25207
Eunice Yang, Harrison Howell, Praveen V Mummaneni, Dean Chou, Mohamad Bydon, Erica F Bisson, Christopher I Shaffrey, Oren N Gottfried, Anthony L Asher, Domagoj Coric, Eric A Potts, Kevin T Foley, Michael Y Wang, Kai-Ming Fu, Michael S Virk, John J Knightly, Scott Meyer, Paul Park, Cheerag D Upadhyaya, Chun-Po Yen, Juan S Uribe, Luis M Tumialán, Jay D Turner, Regis W Haid, Andrew K Chan
Objective: Coexisting medical conditions are increasingly prevalent in surgical populations. The impact of multiple comorbidities on patient-reported outcomes (PROs) and endotypes of frequently co-occurring conditions for cervical spondylotic myelopathy (CSM) remain unclear. This study explores whether CSM patients with multimorbidity have worse baseline and postoperative PROs and less functional improvement after surgery compared to those with few or no comorbidities. The authors also investigated whether distinct comorbidity endotypes exist among CSM surgical patients and whether they influence postoperative outcomes.
Methods: The prospective Quality Outcomes Database (QOD) was used to assess patients undergoing surgery for CSM. Multimorbidity was defined as ≥ 2 chronic conditions, including diabetes, coronary artery disease, peripheral vascular disease, arthritis, chronic renal disease, chronic obstructive pulmonary disease, Parkinson's disease, multiple sclerosis, depression, and anxiety. Baseline characteristics and 24-month PROs were assessed across multiple-comorbidity status, including modified Japanese Orthopaedic Association (mJOA), Neck Disability Index (NDI), visual analog scale for neck and arm pain, EQ-5D, and patient satisfaction scores. Clusters were identified from the full cohort using k-medoids, revealing subgroups with similar comorbidity endotypes.
Results: The final cohort included 1141 CSM patients (83.1% reaching 24-month follow-up), with 761 (66.7%) having 0 or 1 comorbidity and 380 (33.3%) ≥ 2 comorbidities. The multimorbidity cohort was older (mean age 62.6 ± 11.2 vs 59.5 ± 12.0 years, p < 0.001), more likely to be female (52.9% vs 44.7%, p = 0.011), and had a higher BMI (mean 31.1 ± 6.7 vs 29.7 ± 6.2 kg/m2, p < 0.001). Multimorbidity patients exhibited worse mJOA, NDI, and EQ-5D scores at baseline and 24 months (p < 0.05). On multivariable analysis, the total number of comorbidities was not significantly associated with any PRO measures. Four comorbidity clusters were identified: low burden, arthritis, diabetes, and high burden. On one-way ANOVA, the baseline mJOA score was significantly different across clusters (p = 0.003). At 24 months, the mJOA score was significantly lower in the diabetes and high-burden endotypes. Twenty-four-month score change and minimal clinically important difference (MCID) achievement of all PROs remained similar across clusters (p > 0.05).
Conclusions: While patients with multimorbidity have worse baseline and postoperative PROs, they achieve similar functional and pain-related improvements following CSM surgery. Similarly, the comorbidity endotypes identified in this QOD cohort suggest that certain patterns of coexisting chronic conditions, such as overlapping diabetes and arthritis, are associated with different levels of disability but may not diminish the effectiveness of surgical intervention.
目的:共存的医疗条件是越来越普遍的手术人群。多种合并症对脊髓型颈椎病(CSM)患者报告的预后(PROs)和常并发疾病的内型的影响尚不清楚。本研究探讨了与无或少合并症的CSM患者相比,多病CSM患者的基线和术后PROs是否更差,术后功能改善是否更少。作者还调查了CSM手术患者中是否存在不同的共病内型,以及它们是否影响术后结果。方法:采用前瞻性质量结局数据库(QOD)对接受CSM手术的患者进行评估。多病定义为≥2种慢性疾病,包括糖尿病、冠状动脉疾病、周围血管疾病、关节炎、慢性肾病、慢性阻塞性肺病、帕金森病、多发性硬化症、抑郁和焦虑。基线特征和24个月的pro评估了多重合并症状态,包括修改的日本骨科协会(mJOA)、颈部残疾指数(NDI)、颈部和手臂疼痛的视觉模拟量表、EQ-5D和患者满意度评分。使用k- medioids从整个队列中确定集群,揭示具有相似共病内型的亚组。结果:最终队列纳入1141例CSM患者(随访24个月的占83.1%),其中761例(66.7%)存在0或1个合并症,380例(33.3%)存在≥2个合并症。多病组患者年龄较大(平均年龄62.6±11.2岁vs 59.5±12.0岁,p < 0.001),女性居多(52.9% vs 44.7%, p = 0.011), BMI较高(平均31.1±6.7 vs 29.7±6.2 kg/m2, p < 0.001)。多病患者在基线和24个月时mJOA、NDI和EQ-5D评分较差(p < 0.05)。在多变量分析中,合并症的总数与任何PRO测量均无显著相关。确定了四种合并症:低负担、关节炎、糖尿病和高负担。在单因素方差分析中,基线mJOA评分在集群之间有显著差异(p = 0.003)。在24个月时,糖尿病和高负担内型患者的mJOA评分显著降低。所有PROs的24个月评分变化和最小临床重要差异(MCID)成就在不同组间保持相似(p > 0.05)。结论:虽然多病患者的基线和术后PROs较差,但他们在CSM手术后获得了类似的功能和疼痛相关改善。同样,在这个QOD队列中发现的共病内型表明,某些共存的慢性疾病,如重叠的糖尿病和关节炎,与不同程度的残疾有关,但可能不会降低手术干预的有效性。
{"title":"Defining cervical spondylotic myelopathy surgical endotypes using comorbidity clustering: a Quality Outcomes Database cervical spondylotic myelopathy study.","authors":"Eunice Yang, Harrison Howell, Praveen V Mummaneni, Dean Chou, Mohamad Bydon, Erica F Bisson, Christopher I Shaffrey, Oren N Gottfried, Anthony L Asher, Domagoj Coric, Eric A Potts, Kevin T Foley, Michael Y Wang, Kai-Ming Fu, Michael S Virk, John J Knightly, Scott Meyer, Paul Park, Cheerag D Upadhyaya, Chun-Po Yen, Juan S Uribe, Luis M Tumialán, Jay D Turner, Regis W Haid, Andrew K Chan","doi":"10.3171/2025.4.FOCUS25207","DOIUrl":"https://doi.org/10.3171/2025.4.FOCUS25207","url":null,"abstract":"<p><strong>Objective: </strong>Coexisting medical conditions are increasingly prevalent in surgical populations. The impact of multiple comorbidities on patient-reported outcomes (PROs) and endotypes of frequently co-occurring conditions for cervical spondylotic myelopathy (CSM) remain unclear. This study explores whether CSM patients with multimorbidity have worse baseline and postoperative PROs and less functional improvement after surgery compared to those with few or no comorbidities. The authors also investigated whether distinct comorbidity endotypes exist among CSM surgical patients and whether they influence postoperative outcomes.</p><p><strong>Methods: </strong>The prospective Quality Outcomes Database (QOD) was used to assess patients undergoing surgery for CSM. Multimorbidity was defined as ≥ 2 chronic conditions, including diabetes, coronary artery disease, peripheral vascular disease, arthritis, chronic renal disease, chronic obstructive pulmonary disease, Parkinson's disease, multiple sclerosis, depression, and anxiety. Baseline characteristics and 24-month PROs were assessed across multiple-comorbidity status, including modified Japanese Orthopaedic Association (mJOA), Neck Disability Index (NDI), visual analog scale for neck and arm pain, EQ-5D, and patient satisfaction scores. Clusters were identified from the full cohort using k-medoids, revealing subgroups with similar comorbidity endotypes.</p><p><strong>Results: </strong>The final cohort included 1141 CSM patients (83.1% reaching 24-month follow-up), with 761 (66.7%) having 0 or 1 comorbidity and 380 (33.3%) ≥ 2 comorbidities. The multimorbidity cohort was older (mean age 62.6 ± 11.2 vs 59.5 ± 12.0 years, p < 0.001), more likely to be female (52.9% vs 44.7%, p = 0.011), and had a higher BMI (mean 31.1 ± 6.7 vs 29.7 ± 6.2 kg/m2, p < 0.001). Multimorbidity patients exhibited worse mJOA, NDI, and EQ-5D scores at baseline and 24 months (p < 0.05). On multivariable analysis, the total number of comorbidities was not significantly associated with any PRO measures. Four comorbidity clusters were identified: low burden, arthritis, diabetes, and high burden. On one-way ANOVA, the baseline mJOA score was significantly different across clusters (p = 0.003). At 24 months, the mJOA score was significantly lower in the diabetes and high-burden endotypes. Twenty-four-month score change and minimal clinically important difference (MCID) achievement of all PROs remained similar across clusters (p > 0.05).</p><p><strong>Conclusions: </strong>While patients with multimorbidity have worse baseline and postoperative PROs, they achieve similar functional and pain-related improvements following CSM surgery. Similarly, the comorbidity endotypes identified in this QOD cohort suggest that certain patterns of coexisting chronic conditions, such as overlapping diabetes and arthritis, are associated with different levels of disability but may not diminish the effectiveness of surgical intervention.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E4"},"PeriodicalIF":3.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144541559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.3171/2025.4.FOCUS24834
Benjamin S Hopkins, Jonathan Dallas, James Yu, Robert G Briggs, Lawrance K Chung, David J Cote, David Gomez, Ishan Shah, John D Carmichael, John C Liu, William J Mack, Gabriel Zada
Objective: Document dictation remains a significant clinical burden and generative artificial intelligence (AI) systems utilizing transformer-based technology offer efficient speech processing methods that could streamline clinical documentation. This study aimed to evaluate the potential of generative AI in enhancing dictation efficiency and workflow within a targeted neurosurgical practice.
Methods: Ten operative reports from both cranial and spinal neurosurgical procedures were dictated and recorded by three independent physicians. The audio files were processed by 1) a modified speech-to-text model implemented based on a backbone architecture created by OpenAI's Whisper model and 2) Nuance's Dragon Medical One as a comparative commercial standard. Word error rate (WER) was manually reviewed.
Results: The mean WER was 1.75% for Whisper and 1.54% for Dragon (p = 0.080). When excluding linguistic errors, Whisper outperformed Dragon with a mean WER of 0.50% versus 1.34% (p < 0.001), including the mean number of total errors (Whisper: 6.1, Dragon: 9.7; p = 0.002). For all unstratified dictations, a positive correlation was seen between total errors and word count (p < 0.001, R2 = 0.37), as well as total errors and recording length (p < 0.001, R2 = 0.22). A positive correlation was noted between words spoken per second and total errors for Dragon (p = 0.020, R2 = 0.18), but not for Whisper (p = 0.205, R2 = 0.06). Similarly, when analyzing linguistic errors only, this trend held for Dragon (p = 0.014, R2 = 0.20), but not for Whisper (p = 0.331, R2 = 0.03).
Conclusions: An AI-based model performed at a noninferior rate compared to a commercially available speech-to-text dictation program. Generative models provide potential benefits such as contextual inference that show promise in limiting errors with increased dictation speed or adjustment for impure input data.
{"title":"The use of generative artificial intelligence-based dictation in a neurosurgical practice: a pilot study.","authors":"Benjamin S Hopkins, Jonathan Dallas, James Yu, Robert G Briggs, Lawrance K Chung, David J Cote, David Gomez, Ishan Shah, John D Carmichael, John C Liu, William J Mack, Gabriel Zada","doi":"10.3171/2025.4.FOCUS24834","DOIUrl":"https://doi.org/10.3171/2025.4.FOCUS24834","url":null,"abstract":"<p><strong>Objective: </strong>Document dictation remains a significant clinical burden and generative artificial intelligence (AI) systems utilizing transformer-based technology offer efficient speech processing methods that could streamline clinical documentation. This study aimed to evaluate the potential of generative AI in enhancing dictation efficiency and workflow within a targeted neurosurgical practice.</p><p><strong>Methods: </strong>Ten operative reports from both cranial and spinal neurosurgical procedures were dictated and recorded by three independent physicians. The audio files were processed by 1) a modified speech-to-text model implemented based on a backbone architecture created by OpenAI's Whisper model and 2) Nuance's Dragon Medical One as a comparative commercial standard. Word error rate (WER) was manually reviewed.</p><p><strong>Results: </strong>The mean WER was 1.75% for Whisper and 1.54% for Dragon (p = 0.080). When excluding linguistic errors, Whisper outperformed Dragon with a mean WER of 0.50% versus 1.34% (p < 0.001), including the mean number of total errors (Whisper: 6.1, Dragon: 9.7; p = 0.002). For all unstratified dictations, a positive correlation was seen between total errors and word count (p < 0.001, R2 = 0.37), as well as total errors and recording length (p < 0.001, R2 = 0.22). A positive correlation was noted between words spoken per second and total errors for Dragon (p = 0.020, R2 = 0.18), but not for Whisper (p = 0.205, R2 = 0.06). Similarly, when analyzing linguistic errors only, this trend held for Dragon (p = 0.014, R2 = 0.20), but not for Whisper (p = 0.331, R2 = 0.03).</p><p><strong>Conclusions: </strong>An AI-based model performed at a noninferior rate compared to a commercially available speech-to-text dictation program. Generative models provide potential benefits such as contextual inference that show promise in limiting errors with increased dictation speed or adjustment for impure input data.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E8"},"PeriodicalIF":3.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144541497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.3171/2025.4.FOCUS25225
Austin A Barr, Eddie Guo, Brij S Karmur, Emre Sezgin
Objective: Use of neurosurgical data for clinical research and machine learning (ML) model development is often limited by data availability, sample sizes, and regulatory constraints. Synthetic data offer a potential solution to challenges associated with accessing, sharing, and using real-world data (RWD). The aim of this study was to evaluate the capability of generating synthetic neurosurgical data with a generative adversarial network and large language model (LLM) to augment RWD, perform secondary analyses in place of RWD, and train an ML model to predict postoperative outcomes.
Methods: Synthetic data were generated with a conditional tabular generative adversarial network (CTGAN) and the LLM GPT-4o based on a real-world neurosurgical dataset of 140 older adults who underwent neurosurgical interventions. Each model was used to generate datasets at equivalent (n = 140) and amplified (n = 1000) sample sizes. Data fidelity was evaluated by comparing univariate and bivariate statistics to the RWD. Privacy evaluation involved measuring the uniqueness of generated synthetic records. Utility was assessed by: 1) reproducing and extending clinical analyses on predictors of Karnofsky Performance Status (KPS) deterioration at discharge and a prolonged postoperative intensive care unit (ICU) stay, and 2) training a binary ML classifier on amplified synthetic datasets to predict KPS deterioration on RWD.
Results: Both the CTGAN and GPT-4o generated complete, high-fidelity synthetic tabular datasets. GPT-4o matched or exceeded CTGAN across all measured fidelity, utility, and privacy metrics. All significant clinical predictors of KPS deterioration and prolonged ICU stay were retained in the GPT-4o-generated synthetic data, with some differences observed in effect sizes. Preoperative KPS was not preserved as a significant predictor in the CTGAN-generated data. The ML classifier trained on GPT-4o data outperformed the model trained on CTGAN data, achieving a higher F1 score (0.725 vs 0.688) for predicting KPS deterioration.
Conclusions: This study demonstrated a promising ability to produce high-fidelity synthetic neurosurgical data using generative models. Synthetic neurosurgical data present a potential solution to critical limitations in data availability for neurosurgical research. Further investigation is necessary to enhance synthetic data utility for secondary analyses and ML model training, and to evaluate synthetic data generation methods across other datasets, including clinical trial data.
{"title":"Synthetic neurosurgical data generation with generative adversarial networks and large language models:an investigation on fidelity, utility, and privacy.","authors":"Austin A Barr, Eddie Guo, Brij S Karmur, Emre Sezgin","doi":"10.3171/2025.4.FOCUS25225","DOIUrl":"https://doi.org/10.3171/2025.4.FOCUS25225","url":null,"abstract":"<p><strong>Objective: </strong>Use of neurosurgical data for clinical research and machine learning (ML) model development is often limited by data availability, sample sizes, and regulatory constraints. Synthetic data offer a potential solution to challenges associated with accessing, sharing, and using real-world data (RWD). The aim of this study was to evaluate the capability of generating synthetic neurosurgical data with a generative adversarial network and large language model (LLM) to augment RWD, perform secondary analyses in place of RWD, and train an ML model to predict postoperative outcomes.</p><p><strong>Methods: </strong>Synthetic data were generated with a conditional tabular generative adversarial network (CTGAN) and the LLM GPT-4o based on a real-world neurosurgical dataset of 140 older adults who underwent neurosurgical interventions. Each model was used to generate datasets at equivalent (n = 140) and amplified (n = 1000) sample sizes. Data fidelity was evaluated by comparing univariate and bivariate statistics to the RWD. Privacy evaluation involved measuring the uniqueness of generated synthetic records. Utility was assessed by: 1) reproducing and extending clinical analyses on predictors of Karnofsky Performance Status (KPS) deterioration at discharge and a prolonged postoperative intensive care unit (ICU) stay, and 2) training a binary ML classifier on amplified synthetic datasets to predict KPS deterioration on RWD.</p><p><strong>Results: </strong>Both the CTGAN and GPT-4o generated complete, high-fidelity synthetic tabular datasets. GPT-4o matched or exceeded CTGAN across all measured fidelity, utility, and privacy metrics. All significant clinical predictors of KPS deterioration and prolonged ICU stay were retained in the GPT-4o-generated synthetic data, with some differences observed in effect sizes. Preoperative KPS was not preserved as a significant predictor in the CTGAN-generated data. The ML classifier trained on GPT-4o data outperformed the model trained on CTGAN data, achieving a higher F1 score (0.725 vs 0.688) for predicting KPS deterioration.</p><p><strong>Conclusions: </strong>This study demonstrated a promising ability to produce high-fidelity synthetic neurosurgical data using generative models. Synthetic neurosurgical data present a potential solution to critical limitations in data availability for neurosurgical research. Further investigation is necessary to enhance synthetic data utility for secondary analyses and ML model training, and to evaluate synthetic data generation methods across other datasets, including clinical trial data.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E17"},"PeriodicalIF":3.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144541496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.3171/2025.4.FOCUS25245
Sameer Sundrani, Derek J Doss, Graham W Johnson, Harsh Jain, Omar Zakieh, Adam M Wegner, Julian G Lugo-Pico, Amir M Abtahi, Byron F Stephens, Scott L Zuckerman
Objective: Mechanical complications are a vexing occurrence after adult spinal deformity (ASD) surgery. While achieving ideal spinal alignment in ASD surgery is critical, alignment alone may not fully explain all mechanical complications. The authors sought to determine which combination of inputs produced the most sensitive and specific machine learning model to predict mechanical complications using postoperative alignment, bone quality, and soft tissue data.
Methods: A retrospective cohort study was performed in patients undergoing ASD surgery from 2009 to 2021. Inclusion criteria were a fusion ≥ 5 levels, sagittal/coronal deformity, and at least 2 years of follow-up. The primary exposure variables were 1) alignment, evaluated in both the sagittal and coronal planes using the L1-pelvic angle ± 3°, L4-S1 lordosis, sagittal vertical axis, pelvic tilt, and coronal vertical axis; 2) bone quality, evaluated by the T-score from a dual-energy x-ray absorptiometry scan; and 3) soft tissue, evaluated by the paraspinal muscle-to-vertebral body ratio and fatty infiltration. The primary outcome was mechanical complications. Alongside demographic data in each model, 7 machine learning models with all combinations of domains (alignment, bone quality, and soft tissue) were trained. The positive predictive value (PPV) was calculated for each model.
Results: Of 231 patients (24% male) undergoing ASD surgery with a mean age of 64 ± 17 years, 147 (64%) developed at least one mechanical complication. The model with alignment alone performed poorly, with a PPV of 0.85. However, the model with alignment, bone quality, and soft tissue achieved a high PPV of 0.90, sensitivity of 0.67, and specificity of 0.84. Moreover, the model with alignment alone failed to predict 15 complications of 100, whereas the model with all three domains only failed to predict 10 of 100.
Conclusions: These results support the notion that not every mechanical failure is explained by alignment alone. The authors found that a combination of alignment, bone quality, and soft tissue provided the most accurate prediction of mechanical complications after ASD surgery. While achieving optimal alignment is essential, additional data including bone and soft tissue are necessary to minimize mechanical complications.
{"title":"Does alignment alone predict mechanical complications after adult spinal deformity surgery? A machine learning comparison of alignment, bone quality, and soft tissue.","authors":"Sameer Sundrani, Derek J Doss, Graham W Johnson, Harsh Jain, Omar Zakieh, Adam M Wegner, Julian G Lugo-Pico, Amir M Abtahi, Byron F Stephens, Scott L Zuckerman","doi":"10.3171/2025.4.FOCUS25245","DOIUrl":"https://doi.org/10.3171/2025.4.FOCUS25245","url":null,"abstract":"<p><strong>Objective: </strong>Mechanical complications are a vexing occurrence after adult spinal deformity (ASD) surgery. While achieving ideal spinal alignment in ASD surgery is critical, alignment alone may not fully explain all mechanical complications. The authors sought to determine which combination of inputs produced the most sensitive and specific machine learning model to predict mechanical complications using postoperative alignment, bone quality, and soft tissue data.</p><p><strong>Methods: </strong>A retrospective cohort study was performed in patients undergoing ASD surgery from 2009 to 2021. Inclusion criteria were a fusion ≥ 5 levels, sagittal/coronal deformity, and at least 2 years of follow-up. The primary exposure variables were 1) alignment, evaluated in both the sagittal and coronal planes using the L1-pelvic angle ± 3°, L4-S1 lordosis, sagittal vertical axis, pelvic tilt, and coronal vertical axis; 2) bone quality, evaluated by the T-score from a dual-energy x-ray absorptiometry scan; and 3) soft tissue, evaluated by the paraspinal muscle-to-vertebral body ratio and fatty infiltration. The primary outcome was mechanical complications. Alongside demographic data in each model, 7 machine learning models with all combinations of domains (alignment, bone quality, and soft tissue) were trained. The positive predictive value (PPV) was calculated for each model.</p><p><strong>Results: </strong>Of 231 patients (24% male) undergoing ASD surgery with a mean age of 64 ± 17 years, 147 (64%) developed at least one mechanical complication. The model with alignment alone performed poorly, with a PPV of 0.85. However, the model with alignment, bone quality, and soft tissue achieved a high PPV of 0.90, sensitivity of 0.67, and specificity of 0.84. Moreover, the model with alignment alone failed to predict 15 complications of 100, whereas the model with all three domains only failed to predict 10 of 100.</p><p><strong>Conclusions: </strong>These results support the notion that not every mechanical failure is explained by alignment alone. The authors found that a combination of alignment, bone quality, and soft tissue provided the most accurate prediction of mechanical complications after ASD surgery. While achieving optimal alignment is essential, additional data including bone and soft tissue are necessary to minimize mechanical complications.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E15"},"PeriodicalIF":3.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144541560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.3171/2025.4.FOCUS25163
Mathijs de Boer, Jesse A M van Doormaal, Mare H Köllen, Lambertus W Bartels, Pierre A J T Robe, Tristan P C van Doormaal
Objective: The aim of this study was to develop and validate a fully automatic anatomical landmark localization and trajectory planning method for external ventricular drain (EVD) placement using CT or MRI.
Methods: The authors used 125 preoperative CT and 137 contrast-enhanced T1-weighted MRI scans to generate 3D surface meshes of patients' skin and ventricular systems. Seven anatomical landmarks were manually annotated to train a neural network for automatic landmark localization. The model's accuracy was assessed by calculating the mean Euclidian distance of predicted landmarks to the ground truth. Kocher's point and EVD trajectories were automatically calculated with the foramen of Monro as the target. Performance was evaluated using Kakarla grades, as assessed by 3 clinicians. Interobserver agreement was measured with Pearson correlation, and scores were aggregated using majority voting. Ordinal linear regressions were used to assess whether modality or placement side had an effect on Kakarla grades. The impact of landmark localization error on the final EVD plan was also evaluated.
Results: The automated landmark localization model achieved a mean error of 4.0 mm (SD 2.6 mm). Trajectory planning generated a trajectory for all patients, with a Kakarla grade of 1 in 92.9% of cases. Statistical analyses indicated a strong interobserver agreement and no significant differences between modalities (CT vs MRI) or EVD placement sides. The location of Kocher's point and the target point were significantly correlated to nasion landmark localization error, with median drifts of 9.38 mm (95% CI 1.94-19.16 mm) and 3.91 mm (95% CI 0.18-26.76 mm) for Kocher's point and the target point, respectively.
Conclusions: The presented method was efficient and robust for landmark localization and accurate EVD trajectory planning. The short processing time thereby also provides a base for use in emergency settings.
目的:本研究的目的是开发和验证一种全自动解剖地标定位和轨迹规划方法,用于使用CT或MRI放置外脑室引流(EVD)。方法:术前125次CT扫描和137次t1加权MRI增强扫描生成患者皮肤和心室系统的三维表面网格。对7个解剖标记进行人工标注,训练神经网络进行自动标记定位。该模型的准确性是通过计算预测的地标到地面真实的平均欧几里德距离来评估的。以Monro孔为目标,自动计算Kocher点轨迹和EVD轨迹。由3名临床医生评估,采用Kakarla评分法评估患者的表现。观察者间的一致性用Pearson相关来衡量,分数用多数投票来汇总。使用有序线性回归来评估方式或放置侧是否对Kakarla评分有影响。并对地标定位误差对最终EVD计划的影响进行了评价。结果:自动地标定位模型平均误差为4.0 mm (SD为2.6 mm)。轨迹规划生成了所有患者的轨迹,92.9%的病例Kakarla评分为1级。统计分析表明,观察者之间的一致性很强,在CT与MRI或EVD放置侧之间没有显著差异。Kocher点和目标点的位置与国家地标定位误差显著相关,Kocher点和目标点的中位漂移分别为9.38 mm (95% CI 1.94 ~ 19.16 mm)和3.91 mm (95% CI 0.18 ~ 26.76 mm)。结论:该方法具有较强的鲁棒性,可用于标记定位和准确的EVD轨迹规划。因此,较短的处理时间也为在紧急情况下使用提供了基础。
{"title":"Fully automatic anatomical landmark localization and trajectory planning for navigated external ventricular drain placement.","authors":"Mathijs de Boer, Jesse A M van Doormaal, Mare H Köllen, Lambertus W Bartels, Pierre A J T Robe, Tristan P C van Doormaal","doi":"10.3171/2025.4.FOCUS25163","DOIUrl":"https://doi.org/10.3171/2025.4.FOCUS25163","url":null,"abstract":"<p><strong>Objective: </strong>The aim of this study was to develop and validate a fully automatic anatomical landmark localization and trajectory planning method for external ventricular drain (EVD) placement using CT or MRI.</p><p><strong>Methods: </strong>The authors used 125 preoperative CT and 137 contrast-enhanced T1-weighted MRI scans to generate 3D surface meshes of patients' skin and ventricular systems. Seven anatomical landmarks were manually annotated to train a neural network for automatic landmark localization. The model's accuracy was assessed by calculating the mean Euclidian distance of predicted landmarks to the ground truth. Kocher's point and EVD trajectories were automatically calculated with the foramen of Monro as the target. Performance was evaluated using Kakarla grades, as assessed by 3 clinicians. Interobserver agreement was measured with Pearson correlation, and scores were aggregated using majority voting. Ordinal linear regressions were used to assess whether modality or placement side had an effect on Kakarla grades. The impact of landmark localization error on the final EVD plan was also evaluated.</p><p><strong>Results: </strong>The automated landmark localization model achieved a mean error of 4.0 mm (SD 2.6 mm). Trajectory planning generated a trajectory for all patients, with a Kakarla grade of 1 in 92.9% of cases. Statistical analyses indicated a strong interobserver agreement and no significant differences between modalities (CT vs MRI) or EVD placement sides. The location of Kocher's point and the target point were significantly correlated to nasion landmark localization error, with median drifts of 9.38 mm (95% CI 1.94-19.16 mm) and 3.91 mm (95% CI 0.18-26.76 mm) for Kocher's point and the target point, respectively.</p><p><strong>Conclusions: </strong>The presented method was efficient and robust for landmark localization and accurate EVD trajectory planning. The short processing time thereby also provides a base for use in emergency settings.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E14"},"PeriodicalIF":3.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144541561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: Endoscopic endonasal transsphenoidal surgery (EETS) is a minimally invasive procedure that accesses the sellar and parasellar regions. Various anatomical structures must be identified during the operation, particularly the sella turcica and internal carotid artery (ICA) bilaterally. In the present retrospective cohort study, authors aimed to evaluate the performance of a deep learning (DL) model in detecting the sella turcica and ICA bilaterally in EETS video footage, with the goal of recognizing crucial landmarks and preventing potentially fatal injury.
Methods: The authors collected images from the endoscopic video footage of 98 patients who had undergone EETS from January 2015 to June 2024. The ICAs and sella turcica were labeled by neurosurgeons, and the entire dataset was divided into training, validation, and test datasets at a ratio of 7:2:1. The model for ICA and sella turcica detection was trained using the YOLOv5s object detection architecture, and precision, recall, mean average precision (mAP)@0.5, and mAP@0.5:0.95 were reported during the validation process. Moreover, the confusion matrix and area under the receiver operating characteristic curve (AUC) were assessed from the model using unseen images from the test dataset.
Results: The DL model had precision, recall, mAP@0.5, and mAP@0.5:0.95 of 0.942, 0.955, 0.969, and 0.617, respectively, for all objects in the training processes with validation. For testing the model with unseen images, the AUC was 0.97 (95% CI 0.95-0.98), whereas average precision was 0.99 (95% CI 0.99-1.00). For ICA detection with a multiclass approach, the AUCs were 0.98 (95% CI 0.97-0.99) for the absence of any ICA, 0.93 (95% CI 0.91-0.95) for 1 ICA in the images, and 0.95 (95% CI 0.93-0.96) for both ICAs in the image. Additionally, accuracy for the ICA and sella turcica was 0.958 and 0.965, respectively.
Conclusions: Complex anatomical landmarks should be recognized during EETS. The computer vision model was effective in detecting the sella turcica and ICA bilaterally, as well as in identifying and avoiding fatal complications. For the model to generalize with reliability, it requires novel, unseen data from various settings to refine it and facilitate transfer learning.
目的:内镜鼻内经蝶窦手术(EETS)是一种进入鞍区和鞍旁区的微创手术。在手术中必须识别各种解剖结构,特别是蝶鞍和颈内动脉(ICA)。在本回顾性队列研究中,作者旨在评估深度学习(DL)模型在EETS视频片段中检测蝶鞍和ICA双侧的性能,目的是识别关键标志并预防潜在的致命伤害。方法:收集2015年1月至2024年6月行EETS的98例患者的内镜影像。ica和蝶鞍由神经外科医生进行标记,整个数据集按7:2:1的比例分为训练、验证和测试数据集。采用YOLOv5s目标检测架构对ICA和鞍区检测模型进行训练,验证过程中得到精度、召回率、平均精度(mAP)@0.5和mAP@0.5:0.95。此外,使用测试数据集中的未见图像从模型中评估混淆矩阵和接收器工作特征曲线下的面积(AUC)。结果:DL模型对训练过程中所有对象的准确率、召回率、mAP@0.5和mAP@0.5分别为0.942、0.955、0.969和0.617,分别为0.95。对于未见图像的模型测试,AUC为0.97 (95% CI 0.95-0.98),而平均精度为0.99 (95% CI 0.99-1.00)。对于用多类方法检测ICA,没有ICA的auc为0.98 (95% CI 0.97-0.99),图像中一个ICA的auc为0.93 (95% CI 0.91-0.95),图像中两个ICA的auc为0.95 (95% CI 0.93-0.96)。ICA和蝶鞍的准确度分别为0.958和0.965。结论:eet术中应识别复杂的解剖标志。计算机视觉模型能有效地检测蝶鞍和双侧ICA,并能识别和避免致命并发症。为了使模型具有可靠的泛化,它需要来自各种设置的新颖的,未见过的数据来改进它并促进迁移学习。
{"title":"Image-based detection of the internal carotid arteries and sella turcica in endoscopic endonasal transsphenoidal surgery.","authors":"Thara Tunthanathip, Thakul Oearsakul, Chin Taweesomboonyat, Nuttha Sanghan, Rakkrit Duangsoithong","doi":"10.3171/2025.4.FOCUS24940","DOIUrl":"https://doi.org/10.3171/2025.4.FOCUS24940","url":null,"abstract":"<p><strong>Objective: </strong>Endoscopic endonasal transsphenoidal surgery (EETS) is a minimally invasive procedure that accesses the sellar and parasellar regions. Various anatomical structures must be identified during the operation, particularly the sella turcica and internal carotid artery (ICA) bilaterally. In the present retrospective cohort study, authors aimed to evaluate the performance of a deep learning (DL) model in detecting the sella turcica and ICA bilaterally in EETS video footage, with the goal of recognizing crucial landmarks and preventing potentially fatal injury.</p><p><strong>Methods: </strong>The authors collected images from the endoscopic video footage of 98 patients who had undergone EETS from January 2015 to June 2024. The ICAs and sella turcica were labeled by neurosurgeons, and the entire dataset was divided into training, validation, and test datasets at a ratio of 7:2:1. The model for ICA and sella turcica detection was trained using the YOLOv5s object detection architecture, and precision, recall, mean average precision (mAP)@0.5, and mAP@0.5:0.95 were reported during the validation process. Moreover, the confusion matrix and area under the receiver operating characteristic curve (AUC) were assessed from the model using unseen images from the test dataset.</p><p><strong>Results: </strong>The DL model had precision, recall, mAP@0.5, and mAP@0.5:0.95 of 0.942, 0.955, 0.969, and 0.617, respectively, for all objects in the training processes with validation. For testing the model with unseen images, the AUC was 0.97 (95% CI 0.95-0.98), whereas average precision was 0.99 (95% CI 0.99-1.00). For ICA detection with a multiclass approach, the AUCs were 0.98 (95% CI 0.97-0.99) for the absence of any ICA, 0.93 (95% CI 0.91-0.95) for 1 ICA in the images, and 0.95 (95% CI 0.93-0.96) for both ICAs in the image. Additionally, accuracy for the ICA and sella turcica was 0.958 and 0.965, respectively.</p><p><strong>Conclusions: </strong>Complex anatomical landmarks should be recognized during EETS. The computer vision model was effective in detecting the sella turcica and ICA bilaterally, as well as in identifying and avoiding fatal complications. For the model to generalize with reliability, it requires novel, unseen data from various settings to refine it and facilitate transfer learning.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E11"},"PeriodicalIF":3.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144541563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01DOI: 10.3171/2025.4.FOCUS25157
Christian J Quinones, Deepak Kumbhare, Matthew Palfreeman, Udaysinh Rathod, Devesh Sarda, Subhajit Chakrabarty, Bharat Guthikonda, Stanley Hoang
Objective: Robotics and artificial intelligence (AI) are being increasingly integrated in spine surgery. One emerging application of AI is in hand motion detection to assess surgical skill. However, no standardized framework currently exists for evaluating trainee proficiency in spine surgery. This proof-of-concept study applied AI-based motion analysis and the machine learning (ML) pipeline to evaluate hand movements during lumbar pedicle screw placement, aiming to generate objective metrics for skill assessment.
Methods: AI-based motion tracking was used to analyze hand movements during pedicle screw placement on a lumbar spine sawbone model. Video recordings of hand movements during freehand (FH) and robot-assisted (RB) pedicle screw placement were analyzed to extract metrics including distance, displacement, speed, velocity, acceleration, jerk, and normalized jerk index. Due to the limited number of participants, data augmentation techniques were used to generate synthetic data to expand the dataset. Extracted and derived kinematic metrics were then evaluated for their ability to predict training level and surgical technique.
Results: In general, procedure time and movement distance appeared to decrease with increasing trainee experience, with more pronounced improvements in FH procedures. Kinematic analysis trended toward a reduction in speed, displacement, and jerk variability across training years. RB procedures were associated with reduced movement variability as extremes in velocity, acceleration, and jerk were limited. ML models were able to classify augmented data by training level and procedure type with acceptable accuracy.
Conclusions: This proof-of-concept study presents a data processing pipeline capable of analyzing metrics to quantify surgical proficiency during spinal procedures. The methods described demonstrate the feasibility of using AI-driven video analysis to assess hand motion. It also highlights specific motion-based metrics that can distinguish between FH and RB techniques and correlate with surgical training level. These findings lay the groundwork for developing a standardized, objective framework for proficiency assessment in spine surgery.
{"title":"Kinematic analysis of lumbar pedicle screw placement using an artificial intelligence framework.","authors":"Christian J Quinones, Deepak Kumbhare, Matthew Palfreeman, Udaysinh Rathod, Devesh Sarda, Subhajit Chakrabarty, Bharat Guthikonda, Stanley Hoang","doi":"10.3171/2025.4.FOCUS25157","DOIUrl":"https://doi.org/10.3171/2025.4.FOCUS25157","url":null,"abstract":"<p><strong>Objective: </strong>Robotics and artificial intelligence (AI) are being increasingly integrated in spine surgery. One emerging application of AI is in hand motion detection to assess surgical skill. However, no standardized framework currently exists for evaluating trainee proficiency in spine surgery. This proof-of-concept study applied AI-based motion analysis and the machine learning (ML) pipeline to evaluate hand movements during lumbar pedicle screw placement, aiming to generate objective metrics for skill assessment.</p><p><strong>Methods: </strong>AI-based motion tracking was used to analyze hand movements during pedicle screw placement on a lumbar spine sawbone model. Video recordings of hand movements during freehand (FH) and robot-assisted (RB) pedicle screw placement were analyzed to extract metrics including distance, displacement, speed, velocity, acceleration, jerk, and normalized jerk index. Due to the limited number of participants, data augmentation techniques were used to generate synthetic data to expand the dataset. Extracted and derived kinematic metrics were then evaluated for their ability to predict training level and surgical technique.</p><p><strong>Results: </strong>In general, procedure time and movement distance appeared to decrease with increasing trainee experience, with more pronounced improvements in FH procedures. Kinematic analysis trended toward a reduction in speed, displacement, and jerk variability across training years. RB procedures were associated with reduced movement variability as extremes in velocity, acceleration, and jerk were limited. ML models were able to classify augmented data by training level and procedure type with acceptable accuracy.</p><p><strong>Conclusions: </strong>This proof-of-concept study presents a data processing pipeline capable of analyzing metrics to quantify surgical proficiency during spinal procedures. The methods described demonstrate the feasibility of using AI-driven video analysis to assess hand motion. It also highlights specific motion-based metrics that can distinguish between FH and RB techniques and correlate with surgical training level. These findings lay the groundwork for developing a standardized, objective framework for proficiency assessment in spine surgery.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E9"},"PeriodicalIF":3.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144541565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01DOI: 10.3171/2025.3.FOCUS2532
Shane Shahrestani, Catherine Garcia, Andrew M Miller, Robin Babadjouni, Andre E Boyke, Miguel Quintero-Consuegra, Rohin Singh, Alexander Tuchman, Corey T Walker
Objective: The aim of this study was to develop and compare 4 predictive algorithms, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and neural network (NN), for perioperative outcomes in adult spinal deformity (ASD) surgery. By evaluating these models, the authors sought to explore how linear and nonlinear interactions unique to each outcome influence predictive accuracy, emphasizing the need for outcome-specific model selection.
Methods: A retrospective cohort of 7430 patients (mean age of 67 years) who underwent multilevel thoracolumbar deformity correction was identified using the Nationwide Readmissions Database (2016-2019). Predictor variables included demographic data, frailty status, comorbidity indices, nutritional status, and hospital characteristics. Outcomes assessed were prolonged hospital length of stay (LOS), nonroutine discharge, top-quartile all-payer cost, 30-day readmission, and posthemorrhagic anemia. Models were trained on 75% of the dataset and tested on the remaining 25%. LR served as the baseline parametric model, while RF and GBM employed ensemble methods to handle nonlinear interactions, and NN used hidden layers optimized via backpropagation. Model performance was assessed using area under the receiver operating characteristic curve (AUC) values, and DeLong's test was used for statistical comparisons.
Results: RF achieved the highest AUC for LOS (0.713), while GBM excelled for posthemorrhagic anemia (AUC = 0.717). LR provided consistent moderate accuracy across all outcomes (AUC range 0.556-0.690). NN underperformed (AUC range 0.540-0.665), likely due to dataset size limitations. Significant differences were observed between models for prediction of LOS and posthemorrhagic anemia (p < 0.05), with RF and GBM performing the best as they capture nonlinear interactions effectively.
Conclusions: The results highlight that no single algorithm universally outperforms others across all perioperative outcomes, as each model captures different linear and nonlinear heterogeneities. Careful consideration of the outcome's unique characteristics is essential when selecting a predictive model for ASD surgery. These findings support the integration of tailored machine learning approaches to optimize patient-specific risk stratification and perioperative care.
{"title":"Optimizing predictive model performance in adult spinal deformity surgery: a comparative head-to-head analysis of learning models for perioperative complications.","authors":"Shane Shahrestani, Catherine Garcia, Andrew M Miller, Robin Babadjouni, Andre E Boyke, Miguel Quintero-Consuegra, Rohin Singh, Alexander Tuchman, Corey T Walker","doi":"10.3171/2025.3.FOCUS2532","DOIUrl":"https://doi.org/10.3171/2025.3.FOCUS2532","url":null,"abstract":"<p><strong>Objective: </strong>The aim of this study was to develop and compare 4 predictive algorithms, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and neural network (NN), for perioperative outcomes in adult spinal deformity (ASD) surgery. By evaluating these models, the authors sought to explore how linear and nonlinear interactions unique to each outcome influence predictive accuracy, emphasizing the need for outcome-specific model selection.</p><p><strong>Methods: </strong>A retrospective cohort of 7430 patients (mean age of 67 years) who underwent multilevel thoracolumbar deformity correction was identified using the Nationwide Readmissions Database (2016-2019). Predictor variables included demographic data, frailty status, comorbidity indices, nutritional status, and hospital characteristics. Outcomes assessed were prolonged hospital length of stay (LOS), nonroutine discharge, top-quartile all-payer cost, 30-day readmission, and posthemorrhagic anemia. Models were trained on 75% of the dataset and tested on the remaining 25%. LR served as the baseline parametric model, while RF and GBM employed ensemble methods to handle nonlinear interactions, and NN used hidden layers optimized via backpropagation. Model performance was assessed using area under the receiver operating characteristic curve (AUC) values, and DeLong's test was used for statistical comparisons.</p><p><strong>Results: </strong>RF achieved the highest AUC for LOS (0.713), while GBM excelled for posthemorrhagic anemia (AUC = 0.717). LR provided consistent moderate accuracy across all outcomes (AUC range 0.556-0.690). NN underperformed (AUC range 0.540-0.665), likely due to dataset size limitations. Significant differences were observed between models for prediction of LOS and posthemorrhagic anemia (p < 0.05), with RF and GBM performing the best as they capture nonlinear interactions effectively.</p><p><strong>Conclusions: </strong>The results highlight that no single algorithm universally outperforms others across all perioperative outcomes, as each model captures different linear and nonlinear heterogeneities. Careful consideration of the outcome's unique characteristics is essential when selecting a predictive model for ASD surgery. These findings support the integration of tailored machine learning approaches to optimize patient-specific risk stratification and perioperative care.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"58 6","pages":"E12"},"PeriodicalIF":3.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144199700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01DOI: 10.3171/2025.3.FOCUS2510
Anthony L Mikula, Zach Pennington, Nikita Lakomkin, Michael L Martini, Abdelrahman M Hamouda, Ahmad Nassr, Brett Freedman, Arjun S Sebastian, William W Cross, Christopher P Ames, Benjamin D Elder, Jeremy L Fogelson
Objective: The purpose of this study was to evaluate the risks and benefits of removing painful pelvic/iliac screws in spine fusion surgery patients.
Methods: A retrospective review identified patients who had traditional iliac and S2-alar-iliac (S2AI) screws removed for pain. The minimum follow-up was 24 months.
Results: Fifty-two patients (75% women) were included with a mean age of 63 years, BMI of 28, and follow-up of 65 months. Most of the removed screws were S2AI (83%) compared with traditional iliac screws (17%). Forty-three patients (83%) had improvement in their pelvic screw related-pain following removal. Eight patients (15%) experienced lumbosacral mechanical complications following pelvic screw removal including sacral fracture (n = 3, 6%) and/or L4-5 or L5-S1 rod fracture (n = 7, 13%). On multivariable analysis, risk factors for mechanical complications following pelvic screw removal included a longer fusion construct (OR 1.34, p = 0.035), greater postoperative L4-S1 lordosis (OR 1.14, p = 0.04, ideal cutoff > 40°), and lack of bone morphogenetic protein (BMP; OR 0.03, p = 0.02). Ten patients (19%) underwent subsequent SI joint fusion following pelvic screw removal, and higher standing pelvic incidence (OR 1.10, p = 0.03) was the only independent predictor of SI fusion.
Conclusions: Removal of painful pelvic screws resulted in a high rate of postoperative pain relief, albeit with a risk of lumbosacral mechanical complications and subsequent SI joint fusion. Patients at risk for lumbosacral mechanical complications following pelvic screw removal included those with longer fusion constructs, more lordosis from L4 to S1 (> 40°), and lack of BMP. Patients at risk for receiving an instrumented SI joint fusion following pelvic screw removal included those with a higher pelvic incidence.
目的:本研究的目的是评估在脊柱融合手术患者中取出疼痛的骨盆/髂螺钉的风险和益处。方法:回顾性分析因疼痛而取下传统髂螺钉和s2 -翼髂螺钉(S2AI)的患者。最小随访时间为24个月。结果:纳入52例患者(75%为女性),平均年龄63岁,BMI为28,随访65个月。与传统髂骨螺钉(17%)相比,大部分切除螺钉为S2AI(83%)。43例(83%)患者的骨盆螺钉相关疼痛在取出后得到改善。8例患者(15%)在骨盆螺钉取出后出现腰骶机械并发症,包括骶骨骨折(n = 3.6%)和/或L4-5或L5-S1棒骨折(n = 7.13%)。在多变量分析中,骨盆螺钉取出后机械并发症的危险因素包括融合结构较长(OR 1.34, p = 0.035),术后L4-S1前凸较大(OR 1.14, p = 0.04,理想截断bb0 40°),以及缺乏骨形态发生蛋白(BMP;OR 0.03, p = 0.02)。10例患者(19%)在骨盆螺钉取出后进行了SI关节融合,较高的站立骨盆发生率(OR 1.10, p = 0.03)是SI融合的唯一独立预测因素。结论:移除疼痛的骨盆螺钉导致术后疼痛缓解率很高,尽管存在腰骶机械并发症和随后的SI关节融合的风险。骨盆螺钉取出后存在腰骶机械并发症风险的患者包括融合装置较长、从L4到S1的前凸较大(bbb40°)和缺乏BMP的患者。盆腔螺钉取出后接受内固定SI关节融合术的风险患者包括盆腔发生率较高的患者。
{"title":"Removal of painful pelvic screws following spine fusion surgery: outcomes and complications.","authors":"Anthony L Mikula, Zach Pennington, Nikita Lakomkin, Michael L Martini, Abdelrahman M Hamouda, Ahmad Nassr, Brett Freedman, Arjun S Sebastian, William W Cross, Christopher P Ames, Benjamin D Elder, Jeremy L Fogelson","doi":"10.3171/2025.3.FOCUS2510","DOIUrl":"https://doi.org/10.3171/2025.3.FOCUS2510","url":null,"abstract":"<p><strong>Objective: </strong>The purpose of this study was to evaluate the risks and benefits of removing painful pelvic/iliac screws in spine fusion surgery patients.</p><p><strong>Methods: </strong>A retrospective review identified patients who had traditional iliac and S2-alar-iliac (S2AI) screws removed for pain. The minimum follow-up was 24 months.</p><p><strong>Results: </strong>Fifty-two patients (75% women) were included with a mean age of 63 years, BMI of 28, and follow-up of 65 months. Most of the removed screws were S2AI (83%) compared with traditional iliac screws (17%). Forty-three patients (83%) had improvement in their pelvic screw related-pain following removal. Eight patients (15%) experienced lumbosacral mechanical complications following pelvic screw removal including sacral fracture (n = 3, 6%) and/or L4-5 or L5-S1 rod fracture (n = 7, 13%). On multivariable analysis, risk factors for mechanical complications following pelvic screw removal included a longer fusion construct (OR 1.34, p = 0.035), greater postoperative L4-S1 lordosis (OR 1.14, p = 0.04, ideal cutoff > 40°), and lack of bone morphogenetic protein (BMP; OR 0.03, p = 0.02). Ten patients (19%) underwent subsequent SI joint fusion following pelvic screw removal, and higher standing pelvic incidence (OR 1.10, p = 0.03) was the only independent predictor of SI fusion.</p><p><strong>Conclusions: </strong>Removal of painful pelvic screws resulted in a high rate of postoperative pain relief, albeit with a risk of lumbosacral mechanical complications and subsequent SI joint fusion. Patients at risk for lumbosacral mechanical complications following pelvic screw removal included those with longer fusion constructs, more lordosis from L4 to S1 (> 40°), and lack of BMP. Patients at risk for receiving an instrumented SI joint fusion following pelvic screw removal included those with a higher pelvic incidence.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"58 6","pages":"E15"},"PeriodicalIF":3.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144199702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}