首页 > 最新文献

International Journal of Computer Assisted Radiology and Surgery最新文献

英文 中文
DyEndoVO: scene dynamics-aware pose estimation of endoscope in minimally invasive surgery. 微创手术中内窥镜的场景动态感知姿态估计。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-22 DOI: 10.1007/s11548-025-03549-0
Jinjing Xu, Reuben Docea, Micha Pfeiffer, Martin Wagner, Marius Distler, Stefanie Speidel

Purpose: Estimating the 6 degrees of freedom (DoF) pose of an endoscope is crucial for various applications in minimally invasive computer-assisted surgery. Image-based approaches are some of the most practical solutions for pose estimation in surgical environments, due to a limited workspace and sensor constraints. However, these methods often struggle or fail in dynamic scenes, such as those involving tissue deformation, surgical tool movement, and tool-tissue interaction.

Methods: We propose DyEndoVO, an end-to-end visual odometry method in dynamic endoscopic scenes. Our method consists of a transformer-based motion detection network and a weighted pose-optimization module. The motion detection network infers scene dynamics and guides the pose estimation. Furthermore, we introduce a semi-synthetic dataset featuring tissue and tool movement categories. It serves as training data, improving pose estimation accuracy, and also includes motion masks to enable a fine-grained inspection and evaluation.

Results: DyEndoVO significantly outperforms state-of-the-art methods in pose estimation for dynamic surgical scenes. Despite being trained solely on a synthetic dataset, our method generalizes well to real-world data without fine-tuning. Further analysis attributes this success to the effective detection of scene dynamics and the adaptation in the learned weight toward pose estimation; moreover, the semi-synthetic dataset also plays a key role in bridging the sim-to-real gap.

Conclusions: In this work, we aim to improve the accuracy and robustness of pose estimation in challenging dynamic surgical scenes, by effectively handling scene dynamics. Our method, combined with the proposed synthetic dataset, demonstrates improved performance in pose estimation and generalizes well to real-world data, showing its potential in advancing related works such as SLAM and 3D reconstruction in complex surgical environments.

目的:估计内窥镜的6自由度(DoF)姿态对于微创计算机辅助手术的各种应用至关重要。由于有限的工作空间和传感器的限制,基于图像的方法是手术环境中姿态估计的一些最实用的解决方案。然而,这些方法经常在动态场景中挣扎或失败,例如涉及组织变形、手术工具运动和工具-组织相互作用的场景。方法:我们提出DyEndoVO,一种在动态内镜场景下的端到端视觉里程计方法。该方法由基于变压器的运动检测网络和加权姿态优化模块组成。运动检测网络推断场景动态并指导姿态估计。此外,我们引入了一个半合成的数据集,其中包括组织和工具运动类别。它可以作为训练数据,提高姿态估计的准确性,还包括运动面具,以实现细粒度的检查和评估。结果:DyEndoVO在动态手术场景的姿态估计方面明显优于最先进的方法。尽管仅在合成数据集上进行训练,但我们的方法可以很好地推广到真实世界的数据,而无需微调。进一步分析将这一成功归因于对场景动态的有效检测和学习权值对姿态估计的适应性;此外,半合成数据集在弥合模拟与真实之间的差距方面也起着关键作用。结论:在这项工作中,我们旨在通过有效地处理场景动态,提高在具有挑战性的动态手术场景中姿态估计的准确性和鲁棒性。我们的方法与所提出的合成数据集相结合,在姿态估计方面表现出更好的性能,并且可以很好地推广到现实世界的数据中,显示出其在复杂手术环境中推进SLAM和3D重建等相关工作的潜力。
{"title":"DyEndoVO: scene dynamics-aware pose estimation of endoscope in minimally invasive surgery.","authors":"Jinjing Xu, Reuben Docea, Micha Pfeiffer, Martin Wagner, Marius Distler, Stefanie Speidel","doi":"10.1007/s11548-025-03549-0","DOIUrl":"https://doi.org/10.1007/s11548-025-03549-0","url":null,"abstract":"<p><strong>Purpose: </strong>Estimating the 6 degrees of freedom (DoF) pose of an endoscope is crucial for various applications in minimally invasive computer-assisted surgery. Image-based approaches are some of the most practical solutions for pose estimation in surgical environments, due to a limited workspace and sensor constraints. However, these methods often struggle or fail in dynamic scenes, such as those involving tissue deformation, surgical tool movement, and tool-tissue interaction.</p><p><strong>Methods: </strong>We propose DyEndoVO, an end-to-end visual odometry method in dynamic endoscopic scenes. Our method consists of a transformer-based motion detection network and a weighted pose-optimization module. The motion detection network infers scene dynamics and guides the pose estimation. Furthermore, we introduce a semi-synthetic dataset featuring tissue and tool movement categories. It serves as training data, improving pose estimation accuracy, and also includes motion masks to enable a fine-grained inspection and evaluation.</p><p><strong>Results: </strong>DyEndoVO significantly outperforms state-of-the-art methods in pose estimation for dynamic surgical scenes. Despite being trained solely on a synthetic dataset, our method generalizes well to real-world data without fine-tuning. Further analysis attributes this success to the effective detection of scene dynamics and the adaptation in the learned weight toward pose estimation; moreover, the semi-synthetic dataset also plays a key role in bridging the sim-to-real gap.</p><p><strong>Conclusions: </strong>In this work, we aim to improve the accuracy and robustness of pose estimation in challenging dynamic surgical scenes, by effectively handling scene dynamics. Our method, combined with the proposed synthetic dataset, demonstrates improved performance in pose estimation and generalizes well to real-world data, showing its potential in advancing related works such as SLAM and 3D reconstruction in complex surgical environments.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT surgery. 使用深度视觉语言模型提高了内窥镜耳鼻喉手术辅助应用中的多任务性能。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-22 DOI: 10.1007/s11548-025-03512-z
Richard Bieck, Martin Sorge, Katharina Heuermann, Viktor Kunz, Markus Pirlich, Thomas Neumuth

Purpose: Deep learning models for endoscopic assistance applications predominantly focus on image-based tasks, such as tool detection, anatomical classification, and workflow segmentation. However, these approaches often neglect the integration of natural language, limiting their assistance capabilities. This work adopts a proven architecture for vision-language models (VLM) to perform multi-task learning for image classification, text prediction, and surgical report generation, specifically for endoscopic ENT surgeries.

Methods: We adopted a VLM architecture utilizing encoders biased for the endoscopy domain for image and text embedding and combine them via cross-attention. The model was trained on a newly created multi-task dataset derived from 30 annotated endoscopic procedures, comprising 130,000 multi-label images, anatomical descriptions, and synchronized surgical reports. Two variations of the model, a lightweight 61M parameter and a 176M parameter model, were evaluated both against an existing baseline from previous mono-task studies as well as the EndoVit and SurgicalGPT models as external references. Ablation studies investigate the influence of removing image or text embeddings, and cross-attention on the task performance. Performance was measured for landmark classification, structured text prediction, and report generation using precision, recall, F1-score, BLEU-2, ROUGE-L, and cosine similarity metrics.

Results: The VLM base model improves the baseline f1 score for image classification by up to 12% and natural language text generation by up to 14% across image classification and report generation tasks. The text generation of structured language tasks, however, showed minimal gains, indicating limitations in structured sentence learning from combined image-text embeddings. EndoViT and SurgicalGPT slightly trail our domain-specific VLM. The image-only and text-only ablations confirm that the vision component benefits language tasks, whereas text has limited impact on landmark detection.

Conclusion: We developed a vision-language model capable of integrating image and text data for endoscopic ENT assistance tasks, that is able to replace three isolated models, delivering multi-task assistance while outperforming prior and general‑purpose baselines. Remaining challenges include the handling of imbalanced class distributions and limited gains on templated structured text.

目的:内窥镜辅助应用的深度学习模型主要关注基于图像的任务,如工具检测、解剖分类和工作流分割。然而,这些方法往往忽视了自然语言的整合,限制了它们的辅助能力。这项工作采用了一种成熟的视觉语言模型(VLM)架构来执行多任务学习,用于图像分类、文本预测和手术报告生成,特别是内窥镜耳鼻喉手术。方法:采用一种基于内窥镜域的编码器的VLM架构进行图像和文本嵌入,并通过交叉关注将它们组合起来。该模型在一个新创建的多任务数据集上进行训练,该数据集来自30个带注释的内镜手术,包括130,000个多标签图像、解剖描述和同步手术报告。模型的两种变体,轻量级61M参数模型和176M参数模型,根据先前单任务研究的现有基线以及EndoVit和SurgicalGPT模型作为外部参考进行评估。消融研究探讨了去除图像或文本嵌入以及交叉注意对任务表现的影响。使用精度、召回率、F1-score、BLEU-2、ROUGE-L和余弦相似度指标来测量里程碑分类、结构化文本预测和报告生成的性能。结果:在图像分类和报告生成任务中,VLM基础模型将图像分类的基线f1分数提高了12%,将自然语言文本生成提高了14%。然而,结构化语言任务的文本生成显示出最小的收益,这表明了从图像-文本组合嵌入中学习结构化句子的局限性。EndoViT和SurgicalGPT稍微落后于我们领域特定的VLM。纯图像和纯文本的消融证实了视觉成分有利于语言任务,而文本对地标检测的影响有限。结论:我们开发了一种视觉语言模型,能够集成内窥镜耳鼻炎辅助任务的图像和文本数据,能够取代三个孤立的模型,提供多任务辅助,同时优于先前和通用基线。剩下的挑战包括处理不平衡的类分布和模板化结构化文本的有限收益。
{"title":"Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT surgery.","authors":"Richard Bieck, Martin Sorge, Katharina Heuermann, Viktor Kunz, Markus Pirlich, Thomas Neumuth","doi":"10.1007/s11548-025-03512-z","DOIUrl":"https://doi.org/10.1007/s11548-025-03512-z","url":null,"abstract":"<p><strong>Purpose: </strong>Deep learning models for endoscopic assistance applications predominantly focus on image-based tasks, such as tool detection, anatomical classification, and workflow segmentation. However, these approaches often neglect the integration of natural language, limiting their assistance capabilities. This work adopts a proven architecture for vision-language models (VLM) to perform multi-task learning for image classification, text prediction, and surgical report generation, specifically for endoscopic ENT surgeries.</p><p><strong>Methods: </strong>We adopted a VLM architecture utilizing encoders biased for the endoscopy domain for image and text embedding and combine them via cross-attention. The model was trained on a newly created multi-task dataset derived from 30 annotated endoscopic procedures, comprising 130,000 multi-label images, anatomical descriptions, and synchronized surgical reports. Two variations of the model, a lightweight 61M parameter and a 176M parameter model, were evaluated both against an existing baseline from previous mono-task studies as well as the EndoVit and SurgicalGPT models as external references. Ablation studies investigate the influence of removing image or text embeddings, and cross-attention on the task performance. Performance was measured for landmark classification, structured text prediction, and report generation using precision, recall, F1-score, BLEU-2, ROUGE-L, and cosine similarity metrics.</p><p><strong>Results: </strong>The VLM base model improves the baseline f1 score for image classification by up to 12% and natural language text generation by up to 14% across image classification and report generation tasks. The text generation of structured language tasks, however, showed minimal gains, indicating limitations in structured sentence learning from combined image-text embeddings. EndoViT and SurgicalGPT slightly trail our domain-specific VLM. The image-only and text-only ablations confirm that the vision component benefits language tasks, whereas text has limited impact on landmark detection.</p><p><strong>Conclusion: </strong>We developed a vision-language model capable of integrating image and text data for endoscopic ENT assistance tasks, that is able to replace three isolated models, delivering multi-task assistance while outperforming prior and general‑purpose baselines. Remaining challenges include the handling of imbalanced class distributions and limited gains on templated structured text.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying the anatomical variability of the proximal femur. 量化股骨近端解剖变异性。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-19 DOI: 10.1007/s11548-025-03560-5
Angelika Ramesh, Johann Henckel, Alister Hart, Anna Di Laura

Purpose: Achieving a prosthetic femoral version (PFV) within the target range of 10-20° is crucial for optimal biomechanics in total hip arthroplasty (THA). Predicting the PFV preoperatively is challenging due to the limited understanding of the relationship between native femoral version (NFV) and the morphology of the intramedullary canal. This study aims to quantify the 3D morphological variability and identify the most variable anatomical features of the proximal femur pre- and post-operatively.

Methods: Pre- and post-operative CT scans from 62 patients (31 males, 31 females) who underwent THA and received a single stem design (straight, triple-tapered) were analysed. Four femoral models were generated per patient: 1. Native proximal femur, 2. Native femur after neck osteotomy, 3. Internal femoral canal after neck osteotomy, and 4. Reconstructed femur. Statistical Shape Models (SSMs) were developed separately by sex, and principal component analysis (PCA) was used to identify dominant modes of anatomical variation.

Results: The first three principal components (PCs) accounted for over 60% of shape variability across all models. PFV showed weak correlation with NFV as variability existed between the SSM of the internal femoral canal and SSM of the native proximal femur. Sex-specific differences in the measured NFV and PFV were found, with females exhibiting a greater range and a more anteverted femur/femoral stem. The female canal model showed intramedullary version variability; however, this variability was not present in the first three PCs in the corresponding male model.

Conclusions: This study demonstrates that PFV cannot be reliably predicted from NFV alone. These findings underscore the need for advanced, 3D preoperative planning tools to better predict stem version and accommodate patient-specific anatomy. Additionally, the increased variability observed in females may warrant sex-specific consideration in implant design choice and surgical technique.

目的:在全髋关节置换术(THA)中实现10-20°目标范围内的假体股骨版本(PFV)对于获得最佳生物力学至关重要。由于对天然股骨版本(NFV)与髓内管形态之间关系的了解有限,因此术前预测PFV具有挑战性。本研究旨在量化股骨近端术前和术后的三维形态学变异,并确定最易变化的解剖特征。方法:对62例(男31例,女31例)行全髋关节置换术并接受单茎设计(直、三锥形)的患者进行术前和术后CT扫描分析。每个患者生成4个股骨模型:1。原生股骨近端,2;颈部截骨后原生股骨,3。颈截骨术后股骨内管;重建股骨。统计形状模型(SSMs)按性别分别开发,主成分分析(PCA)用于确定解剖变异的优势模式。结果:前三个主成分(pc)占所有模型形状变异性的60%以上。PFV与NFV的相关性较弱,因为股骨内管SSM与原股骨近端SSM存在差异。在测量的NFV和PFV中发现了性别特异性差异,女性表现出更大的范围和更前倾的股骨/股骨干。雌性椎管模型显示髓内形态变异;然而,在相应的男性模型的前三个pc中不存在这种可变性。结论:本研究表明,仅从NFV不能可靠地预测PFV。这些发现强调需要先进的3D术前计划工具来更好地预测干细胞版本并适应患者特定的解剖结构。此外,在女性中观察到的变异性增加可能需要在植入物设计选择和手术技术方面考虑性别特异性。
{"title":"Quantifying the anatomical variability of the proximal femur.","authors":"Angelika Ramesh, Johann Henckel, Alister Hart, Anna Di Laura","doi":"10.1007/s11548-025-03560-5","DOIUrl":"https://doi.org/10.1007/s11548-025-03560-5","url":null,"abstract":"<p><strong>Purpose: </strong>Achieving a prosthetic femoral version (PFV) within the target range of 10-20° is crucial for optimal biomechanics in total hip arthroplasty (THA). Predicting the PFV preoperatively is challenging due to the limited understanding of the relationship between native femoral version (NFV) and the morphology of the intramedullary canal. This study aims to quantify the 3D morphological variability and identify the most variable anatomical features of the proximal femur pre- and post-operatively.</p><p><strong>Methods: </strong>Pre- and post-operative CT scans from 62 patients (31 males, 31 females) who underwent THA and received a single stem design (straight, triple-tapered) were analysed. Four femoral models were generated per patient: 1. Native proximal femur, 2. Native femur after neck osteotomy, 3. Internal femoral canal after neck osteotomy, and 4. Reconstructed femur. Statistical Shape Models (SSMs) were developed separately by sex, and principal component analysis (PCA) was used to identify dominant modes of anatomical variation.</p><p><strong>Results: </strong>The first three principal components (PCs) accounted for over 60% of shape variability across all models. PFV showed weak correlation with NFV as variability existed between the SSM of the internal femoral canal and SSM of the native proximal femur. Sex-specific differences in the measured NFV and PFV were found, with females exhibiting a greater range and a more anteverted femur/femoral stem. The female canal model showed intramedullary version variability; however, this variability was not present in the first three PCs in the corresponding male model.</p><p><strong>Conclusions: </strong>This study demonstrates that PFV cannot be reliably predicted from NFV alone. These findings underscore the need for advanced, 3D preoperative planning tools to better predict stem version and accommodate patient-specific anatomy. Additionally, the increased variability observed in females may warrant sex-specific consideration in implant design choice and surgical technique.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145794570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal framework for swallow detection in video-fluoroscopic swallow studies using manometric pressure distributions from dysphagic patients. 使用吞咽困难患者的压力分布的视频透视吞咽研究中吞咽检测的多模态框架。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-15 DOI: 10.1007/s11548-025-03556-1
Manuel Maria Loureiro da Rocha, Lisette van der Molen, Marise Neijman, Marteen J A van Alphen, Michiel M W M van den Brekel, Françoise J Siepel

Purpose: Oropharyngeal dysphagia affects up to half of head and neck cancer (HNC) patients. Multi-swallow video-fluoroscopic swallow studies (VFSS) combined with high-resolution impedance manometry (HRIM) offer a comprehensive assessment of swallowing function. However, their use in HNC populations is limited by high clinical workload and complexity of data collection and analysis with existing software.

Methods: To address the data collection challenge, we propose a framework for automatic swallow detection in simultaneous VFSS-HRIM examinations. The framework identifies candidate swallow intervals in continuous VFSS videos using an optimized double-sweep optical flow algorithm. Each candidate interval is then classified using a pressure-based swallow template derived from three annotated samples, leveraging features such as normalized peak-to-peak amplitude, mean, and standard deviation from upper esophageal sphincter sensors.

Results: The methodology was evaluated on 97 swallows from twelve post-head and neck cancer patients. The detection pipeline achieved 95% Recall and 92% F1-score. Importantly, the number of required HRIM annotations was reduced by 63%, substantially decreasing clinician workload while maintaining high accuracy.

Conclusion: This framework overcomes limitations of current software for simultaneous VFSS-HRIM collection by enabling high-accuracy, low-input swallow detection in HNC patients. Validated on a heterogeneous patient cohort, it initiates the groundwork for scalable, objective, and multimodal swallowing assessment.

目的:口咽吞咽困难影响多达一半的头颈癌(HNC)患者。多次吞咽视频透视吞咽研究(VFSS)结合高分辨率阻抗测压(hrm)提供了吞咽功能的全面评估。然而,它们在HNC人群中的使用受到高临床工作量和现有软件收集和分析数据的复杂性的限制。方法:为了解决数据收集的挑战,我们提出了一个在vfss - hrm同时检查中自动吞咽检测的框架。该框架使用优化的双扫描光流算法识别连续VFSS视频中的候选吞下间隔。然后,利用上食管括约肌传感器的标准化峰间振幅、平均值和标准差等特征,使用来自三个注释样本的基于压力的吞咽模板对每个候选区间进行分类。结果:对12例头颈癌后患者的97只燕子进行了评价。检测管道达到95%的召回率和92%的f1得分。重要的是,所需的hrm注释数量减少了63%,在保持高准确性的同时大大减少了临床医生的工作量。结论:该框架克服了当前软件同时采集vfss - hrm的局限性,实现了HNC患者的高精度、低输入吞咽检测。在异质患者队列中验证,它为可扩展、客观和多模态吞咽评估奠定了基础。
{"title":"Multimodal framework for swallow detection in video-fluoroscopic swallow studies using manometric pressure distributions from dysphagic patients.","authors":"Manuel Maria Loureiro da Rocha, Lisette van der Molen, Marise Neijman, Marteen J A van Alphen, Michiel M W M van den Brekel, Françoise J Siepel","doi":"10.1007/s11548-025-03556-1","DOIUrl":"https://doi.org/10.1007/s11548-025-03556-1","url":null,"abstract":"<p><strong>Purpose: </strong>Oropharyngeal dysphagia affects up to half of head and neck cancer (HNC) patients. Multi-swallow video-fluoroscopic swallow studies (VFSS) combined with high-resolution impedance manometry (HRIM) offer a comprehensive assessment of swallowing function. However, their use in HNC populations is limited by high clinical workload and complexity of data collection and analysis with existing software.</p><p><strong>Methods: </strong>To address the data collection challenge, we propose a framework for automatic swallow detection in simultaneous VFSS-HRIM examinations. The framework identifies candidate swallow intervals in continuous VFSS videos using an optimized double-sweep optical flow algorithm. Each candidate interval is then classified using a pressure-based swallow template derived from three annotated samples, leveraging features such as normalized peak-to-peak amplitude, mean, and standard deviation from upper esophageal sphincter sensors.</p><p><strong>Results: </strong>The methodology was evaluated on 97 swallows from twelve post-head and neck cancer patients. The detection pipeline achieved 95% Recall and 92% F1-score. Importantly, the number of required HRIM annotations was reduced by 63%, substantially decreasing clinician workload while maintaining high accuracy.</p><p><strong>Conclusion: </strong>This framework overcomes limitations of current software for simultaneous VFSS-HRIM collection by enabling high-accuracy, low-input swallow detection in HNC patients. Validated on a heterogeneous patient cohort, it initiates the groundwork for scalable, objective, and multimodal swallowing assessment.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Colormap augmentation: a novel method for cross-modality domain generalization. 色图增强:一种跨模态域泛化的新方法。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-15 DOI: 10.1007/s11548-025-03559-y
Falko Heitzer, Duc Duy Pham, Wojciech Kowalczyk, Marcus Jäger, Josef Pauli

Purpose: Domain generalization plays a crucial role in analyzing medical images from diverse clinics, scanner vendors, and imaging modalities. Existing methods often require substantial computational resources to train a highly generalized segmentation network, presenting challenges in terms of both availability and cost. The goal of this work is to evaluate a novel, yet simple and effective method for enhancing the generalization of deep learning models in segmentation across varying modalities.

Methods: Eight augmentation methods will be applied individually to a source domain dataset in order to generalize deep learning models. These models will then be tested on completely unseen target domain datasets from a different imaging modality and compared against a lower baseline model. By leveraging standard augmentation techniques, extensive intensity augmentations, and carefully chosen color transformations, we aim to address the domain shift problem, particularly in the cross-modality setting.

Results: Our novel CmapAug method, when combined with standard augmentation techniques, resulted in a substantial improvement in the Dice Score, outperforming the baseline. While the baseline struggled to segment the liver structure in some test cases, our selective combination of augmentation methods achieved Dice scores as high as 83.2%.

Conclusion: Our results highlight the general effectiveness of the tested augmentation methods in addressing domain generalization and mitigating the domain shift problem caused by differences in imaging modalities between the source and target domains. The proposed augmentation strategy offers a simple yet powerful solution to this challenge, with significant potential in clinical scenarios where annotated data from the target domain are limited or unavailable.

目的:域概化在分析来自不同诊所、扫描仪供应商和成像方式的医学图像中起着至关重要的作用。现有的方法通常需要大量的计算资源来训练高度一般化的分割网络,这在可用性和成本方面都存在挑战。这项工作的目标是评估一种新颖、简单而有效的方法,用于增强深度学习模型在不同模式分割中的泛化。方法:将八种增强方法分别应用于源域数据集,以推广深度学习模型。然后,这些模型将在来自不同成像方式的完全不可见的目标域数据集上进行测试,并与较低基线模型进行比较。通过利用标准增强技术、广泛的强度增强和精心选择的颜色变换,我们的目标是解决域移位问题,特别是在跨模态设置中。结果:我们的新型camapaug方法与标准增强技术相结合,可以显著提高Dice Score,优于基线。虽然基线在一些测试案例中难以分割肝脏结构,但我们选择性地组合增强方法获得了高达83.2%的Dice分数。结论:我们的研究结果强调了所测试的增强方法在解决域泛化和减轻由源域和目标域之间成像方式差异引起的域漂移问题方面的总体有效性。提出的增强策略为这一挑战提供了一个简单而强大的解决方案,在目标域的注释数据有限或不可用的临床场景中具有重大潜力。
{"title":"Colormap augmentation: a novel method for cross-modality domain generalization.","authors":"Falko Heitzer, Duc Duy Pham, Wojciech Kowalczyk, Marcus Jäger, Josef Pauli","doi":"10.1007/s11548-025-03559-y","DOIUrl":"https://doi.org/10.1007/s11548-025-03559-y","url":null,"abstract":"<p><strong>Purpose: </strong>Domain generalization plays a crucial role in analyzing medical images from diverse clinics, scanner vendors, and imaging modalities. Existing methods often require substantial computational resources to train a highly generalized segmentation network, presenting challenges in terms of both availability and cost. The goal of this work is to evaluate a novel, yet simple and effective method for enhancing the generalization of deep learning models in segmentation across varying modalities.</p><p><strong>Methods: </strong>Eight augmentation methods will be applied individually to a source domain dataset in order to generalize deep learning models. These models will then be tested on completely unseen target domain datasets from a different imaging modality and compared against a lower baseline model. By leveraging standard augmentation techniques, extensive intensity augmentations, and carefully chosen color transformations, we aim to address the domain shift problem, particularly in the cross-modality setting.</p><p><strong>Results: </strong>Our novel CmapAug method, when combined with standard augmentation techniques, resulted in a substantial improvement in the Dice Score, outperforming the baseline. While the baseline struggled to segment the liver structure in some test cases, our selective combination of augmentation methods achieved Dice scores as high as 83.2%.</p><p><strong>Conclusion: </strong>Our results highlight the general effectiveness of the tested augmentation methods in addressing domain generalization and mitigating the domain shift problem caused by differences in imaging modalities between the source and target domains. The proposed augmentation strategy offers a simple yet powerful solution to this challenge, with significant potential in clinical scenarios where annotated data from the target domain are limited or unavailable.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating workload and usability of remote magnetic navigation for catheter ablation. 研究远程磁导航用于导管消融的工作量和可用性。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-15 DOI: 10.1007/s11548-025-03558-z
Florian Heemeyer, Leonardo E Guido Lopez, Miguel E Jáuregui Abularach, Beatriz Sanz Verdejo, Quentin Boehler, Oliver Brinkmann, José L Merino, Bradley J Nelson

Purpose: Robotic systems for catheter ablation have been in clinical use for many years. While their impact on the clinical outcome and procedure times is well studied, aspects like usability and operator workload have received limited attention in the literature. Reduced workload and stress levels benefit the operator's mental and physical health, and can also lower the risk of errors and ultimately improve patient safety. The aim of this study is to investigate the workload and usability of remote magnetic navigation compared to conventional manual navigation.

Methods: We performed a user study with eight electrophysiologists. Each participant performed identical in-vitro navigation tasks replicating those found in pulmonary vein isolation using both manual and magnetic navigation. Magnetic navigation experiments were performed using the Navion, a mobile electromagnetic navigation system.

Results: Magnetic navigation significantly improved usability (p < 0.02) and workload (p < 0.01) compared to manual navigation, measured using the System Usability Scale (magnetic: 85.6 ± 9.3 vs. manual: 75.0 ± 17.8) and NASA Task Load Index (magnetic: 72.4 ± 13.5 vs. manual: 45.8 ± 16.7). Additionally, task completion times were shorter (p < 0.01) with magnetic navigation (284.6 ± 80.7 s) compared to manual navigation (411.0 ± 123.7 s).

Conclusion: The findings of this study suggest that remote magnetic navigation using the Navion significantly improves operator experiences in terms of workload and usability, reinforcing the case for wider adoption of well-designed robotic systems in cardiac electrophysiology labs.

目的:机器人导管消融系统已在临床应用多年。虽然它们对临床结果和手术时间的影响已经得到了很好的研究,但诸如可用性和操作员工作量等方面在文献中受到的关注有限。减少工作量和压力水平有利于操作人员的身心健康,还可以降低出错的风险,最终提高患者的安全。本研究的目的是研究远程磁导航与传统手动导航相比的工作量和可用性。方法:我们与8位电生理学家进行了一项用户研究。每个参与者都执行了相同的体外导航任务,复制了在肺静脉隔离中使用手动和磁导航的任务。利用移动电磁导航系统Navion进行了磁导航实验。结论:本研究的结果表明,使用navon的远程磁导航在工作量和可用性方面显着改善了操作员的体验,加强了在心脏电生理实验室中广泛采用精心设计的机器人系统的案例。
{"title":"Investigating workload and usability of remote magnetic navigation for catheter ablation.","authors":"Florian Heemeyer, Leonardo E Guido Lopez, Miguel E Jáuregui Abularach, Beatriz Sanz Verdejo, Quentin Boehler, Oliver Brinkmann, José L Merino, Bradley J Nelson","doi":"10.1007/s11548-025-03558-z","DOIUrl":"https://doi.org/10.1007/s11548-025-03558-z","url":null,"abstract":"<p><strong>Purpose: </strong>Robotic systems for catheter ablation have been in clinical use for many years. While their impact on the clinical outcome and procedure times is well studied, aspects like usability and operator workload have received limited attention in the literature. Reduced workload and stress levels benefit the operator's mental and physical health, and can also lower the risk of errors and ultimately improve patient safety. The aim of this study is to investigate the workload and usability of remote magnetic navigation compared to conventional manual navigation.</p><p><strong>Methods: </strong>We performed a user study with eight electrophysiologists. Each participant performed identical in-vitro navigation tasks replicating those found in pulmonary vein isolation using both manual and magnetic navigation. Magnetic navigation experiments were performed using the Navion, a mobile electromagnetic navigation system.</p><p><strong>Results: </strong>Magnetic navigation significantly improved usability (p < 0.02) and workload (p < 0.01) compared to manual navigation, measured using the System Usability Scale (magnetic: 85.6 ± 9.3 vs. manual: 75.0 ± 17.8) and NASA Task Load Index (magnetic: 72.4 ± 13.5 vs. manual: 45.8 ± 16.7). Additionally, task completion times were shorter (p < 0.01) with magnetic navigation (284.6 ± 80.7 s) compared to manual navigation (411.0 ± 123.7 s).</p><p><strong>Conclusion: </strong>The findings of this study suggest that remote magnetic navigation using the Navion significantly improves operator experiences in terms of workload and usability, reinforcing the case for wider adoption of well-designed robotic systems in cardiac electrophysiology labs.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reshapeit: reliable shape interaction with implicit template for medical anatomy reconstruction. shapeit:可靠的形状交互与隐式模板用于医学解剖重建。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-13 DOI: 10.1007/s11548-025-03557-0
Minghui Zhang, Yun Gu

Purpose: Shape modeling of volumetric medical images plays a crucial role in quantitative analysis and surgical planning for computer-aided diagnosis. However, automatic shape reconstruction from deep learning models often suffers from limited image resolution and the lack of shape prior constraints. This study aims to address these challenges by developing a method that enables reliable and accurate anatomical shape modeling in the continuous space.

Methods: We present the Reliable Shape Interaction with Implicit Template (ReShapeIT) network, which represents anatomical structures using continuous implicit fields rather than discrete voxel grids. The approach combines a category-specific implicit template field with a deformation network to encode anatomical shapes from training shapes. In addition, a Template Interaction Module (TIM) is designed to refine test cases by aligning learned template shapes with instance-specific latent codes.

Results: We evaluated ReShapeIT on three anatomical datasets-Liver, Pancreas, and Lung Lobe. The proposed method outperforms state-of-the-art approaches in 3D shape reconstruction, achieving Chamfer Distance/Earth Mover's Distance scores of 0.225/0.318 for Liver, 0.125/0.067 for Pancreas, and 0.414/0.098 for Lung Lobe.

Conclusion: ReShapeIT provides a reliable and generalizable solution for implicit anatomical shape modeling by leveraging shared template priors and instance-level deformations. The implementation is publicly available at: https://github.com/EndoluminalSurgicalVision-IMR/ReShapeIT .

目的:体积医学图像的形状建模在计算机辅助诊断的定量分析和手术计划中起着至关重要的作用。然而,深度学习模型的自动形状重建经常受到图像分辨率有限和缺乏形状先验约束的影响。本研究旨在通过开发一种在连续空间中实现可靠和准确解剖形状建模的方法来解决这些挑战。方法:我们提出了可靠的形状交互隐式模板(ReShapeIT)网络,该网络使用连续隐式场而不是离散体素网格来表示解剖结构。该方法将特定类别的隐式模板域与变形网络相结合,从训练形状中编码解剖形状。此外,还设计了一个模板交互模块(TIM),通过将学习到的模板形状与实例特定的潜在代码对齐来改进测试用例。结果:我们在三个解剖数据集——肝脏、胰腺和肺叶上评估了repeit。该方法在三维形状重建方面优于最先进的方法,肝脏的Chamfer Distance/Earth Mover’s Distance评分为0.225/0.318,胰腺为0.125/0.067,肺叶为0.414/0.098。结论:通过利用共享模板先验和实例级变形,ReShapeIT为隐式解剖形状建模提供了可靠且可推广的解决方案。该实现可在:https://github.com/EndoluminalSurgicalVision-IMR/ReShapeIT上公开获得。
{"title":"Reshapeit: reliable shape interaction with implicit template for medical anatomy reconstruction.","authors":"Minghui Zhang, Yun Gu","doi":"10.1007/s11548-025-03557-0","DOIUrl":"https://doi.org/10.1007/s11548-025-03557-0","url":null,"abstract":"<p><strong>Purpose: </strong>Shape modeling of volumetric medical images plays a crucial role in quantitative analysis and surgical planning for computer-aided diagnosis. However, automatic shape reconstruction from deep learning models often suffers from limited image resolution and the lack of shape prior constraints. This study aims to address these challenges by developing a method that enables reliable and accurate anatomical shape modeling in the continuous space.</p><p><strong>Methods: </strong>We present the Reliable Shape Interaction with Implicit Template (ReShapeIT) network, which represents anatomical structures using continuous implicit fields rather than discrete voxel grids. The approach combines a category-specific implicit template field with a deformation network to encode anatomical shapes from training shapes. In addition, a Template Interaction Module (TIM) is designed to refine test cases by aligning learned template shapes with instance-specific latent codes.</p><p><strong>Results: </strong>We evaluated ReShapeIT on three anatomical datasets-Liver, Pancreas, and Lung Lobe. The proposed method outperforms state-of-the-art approaches in 3D shape reconstruction, achieving Chamfer Distance/Earth Mover's Distance scores of 0.225/0.318 for Liver, 0.125/0.067 for Pancreas, and 0.414/0.098 for Lung Lobe.</p><p><strong>Conclusion: </strong>ReShapeIT provides a reliable and generalizable solution for implicit anatomical shape modeling by leveraging shared template priors and instance-level deformations. The implementation is publicly available at: https://github.com/EndoluminalSurgicalVision-IMR/ReShapeIT .</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-based treatment outcome prediction in head and neck cancer using integrated noninvasive diagnostics. 基于机器学习的综合无创诊断头颈癌治疗结果预测。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-08 DOI: 10.1007/s11548-025-03539-2
Melda Yeghaian, Stefano Trebeschi, Marina Herrero-Huertas, Francisco Javier Mendoza Ferradás, Paula Bos, Maarten J A van Alphen, Marcel A J van Gerven, Regina G H Beets-Tan, Zuhir Bodalal, Lilly-Ann van der Velden

Purpose: Accurate prediction of treatment outcomes is crucial for personalized treatment in head and neck squamous cell carcinoma (HNSCC). Beyond one-year survival, assessing long-term enteral nutrition dependence is essential for optimizing patient counseling and resource allocation. This preliminary study aimed to predict one-year survival and feeding tube dependence in surgically treated HNSCC patients using classical machine learning.

Methods: This proof-of-principle retrospective study included 558 surgically treated HNSCC patients. Baseline clinical data, routine blood markers, and MRI-based radiomic features were collected before treatment. Additional postsurgical treatments within one year were also recorded. Random forest classifiers were trained to predict one-year survival and feeding tube dependence. Model explainability was assessed using Shapley Additive exPlanation (SHAP) values.

Results: Using tenfold stratified cross-validation, clinical data showed the highest predictive performance for survival (AUC = 0.75 ± 0.10; p < 0.001). Blood (AUC = 0.67 ± 0.17; p = 0.001) and imaging (AUC = 0.68 ± 0.16; p = 0.26) showed moderate performance, and multimodal integration did not improve predictions (AUC = 0.68 ± 0.16; p = 0.38). For feeding tube dependence, all modalities had low predictive power (AUC ≤ 0.66; p > 0.05). However, postsurgical treatment information outperformed all other modalities (AUC = 0.67 ± 0.07; p = 0.002), but had the lowest predictive value for survival (AUC = 0.57 ± 0.11; p = 0.08).

Conclusion: Clinical data appeared to be the strongest predictor of one-year survival in surgically treated HNSCC, although overall predictive performance was moderate. Postsurgical treatment information played a key role in predicting tube feeding dependence. While multimodal integration did not enhance overall model performance, it showed modest gains for weaker individual modalities, suggesting potential complementarity that warrants further investigation.

目的:准确预测治疗结果对头颈部鳞状细胞癌(HNSCC)的个性化治疗至关重要。超过一年的生存期,评估长期肠内营养依赖对于优化患者咨询和资源分配至关重要。这项初步研究旨在使用经典机器学习预测手术治疗的HNSCC患者的一年生存率和饲管依赖性。方法:这项原理证明回顾性研究包括558例手术治疗的HNSCC患者。治疗前收集基线临床数据、常规血液标志物和基于mri的放射学特征。术后一年内的其他治疗情况也有记录。随机森林分类器被训练来预测一年生存率和饲管依赖性。采用Shapley加性解释(SHAP)值评估模型的可解释性。结果:采用十倍分层交叉验证,临床数据显示最高的生存预测性能(AUC = 0.75±0.10;p 0.05)。然而,术后治疗信息优于所有其他方式(AUC = 0.67±0.07;p = 0.002),但对生存的预测值最低(AUC = 0.57±0.11;p = 0.08)。结论:临床数据似乎是手术治疗的HNSCC一年生存率的最强预测指标,尽管总体预测性能一般。术后治疗信息是预测管饲依赖的关键因素。虽然多模态集成并没有提高整体模型的性能,但对于较弱的单个模态,它显示出适度的增益,这表明潜在的互补性值得进一步研究。
{"title":"Machine learning-based treatment outcome prediction in head and neck cancer using integrated noninvasive diagnostics.","authors":"Melda Yeghaian, Stefano Trebeschi, Marina Herrero-Huertas, Francisco Javier Mendoza Ferradás, Paula Bos, Maarten J A van Alphen, Marcel A J van Gerven, Regina G H Beets-Tan, Zuhir Bodalal, Lilly-Ann van der Velden","doi":"10.1007/s11548-025-03539-2","DOIUrl":"https://doi.org/10.1007/s11548-025-03539-2","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate prediction of treatment outcomes is crucial for personalized treatment in head and neck squamous cell carcinoma (HNSCC). Beyond one-year survival, assessing long-term enteral nutrition dependence is essential for optimizing patient counseling and resource allocation. This preliminary study aimed to predict one-year survival and feeding tube dependence in surgically treated HNSCC patients using classical machine learning.</p><p><strong>Methods: </strong>This proof-of-principle retrospective study included 558 surgically treated HNSCC patients. Baseline clinical data, routine blood markers, and MRI-based radiomic features were collected before treatment. Additional postsurgical treatments within one year were also recorded. Random forest classifiers were trained to predict one-year survival and feeding tube dependence. Model explainability was assessed using Shapley Additive exPlanation (SHAP) values.</p><p><strong>Results: </strong>Using tenfold stratified cross-validation, clinical data showed the highest predictive performance for survival (AUC = 0.75 ± 0.10; p < 0.001). Blood (AUC = 0.67 ± 0.17; p = 0.001) and imaging (AUC = 0.68 ± 0.16; p = 0.26) showed moderate performance, and multimodal integration did not improve predictions (AUC = 0.68 ± 0.16; p = 0.38). For feeding tube dependence, all modalities had low predictive power (AUC ≤ 0.66; p > 0.05). However, postsurgical treatment information outperformed all other modalities (AUC = 0.67 ± 0.07; p = 0.002), but had the lowest predictive value for survival (AUC = 0.57 ± 0.11; p = 0.08).</p><p><strong>Conclusion: </strong>Clinical data appeared to be the strongest predictor of one-year survival in surgically treated HNSCC, although overall predictive performance was moderate. Postsurgical treatment information played a key role in predicting tube feeding dependence. While multimodal integration did not enhance overall model performance, it showed modest gains for weaker individual modalities, suggesting potential complementarity that warrants further investigation.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Glioblastoma survival prediction through MRI and clinical data integration with transfer learning. 通过MRI和临床数据整合迁移学习预测胶质母细胞瘤的生存。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-04 DOI: 10.1007/s11548-025-03548-1
A Marasi, D Milesi, D Aquino, F M Doniselli, R Pascuzzo, M Grisoli, A Redaelli, E De Momi

Purpose: Accurate prediction of overall survival (OS) in glioblastoma patients is critical for advancing personalized treatments and improving clinical trial design. Conventional radiomics approaches rely on manually engineered features, which limit their ability to capture complex, high-dimensional imaging patterns. This study employs a deep learning architecture to process MRI data for automated glioma segmentation and feature extraction, leveraging high-level representations from the encoder's latent space.

Methods: Multimodal MRI data from the BraTS2020 dataset and a proprietary dataset from Fondazione IRCCS Istituto Neurologico Carlo Besta (Milan, Italy) were processed independently using a U-Net-like model pre-trained on BraTS2018 and fine-tuned on BraTS2020. Features extracted from the encoder's latent space represented hierarchical imaging patterns. These features were combined with clinical variable (patient's age) and reduced via principal component analysis (PCA) to enhance computational efficiency. Machine learning classifiers-including random forest, XGBoost, and a fully connected neural network-were trained on the reduced feature vectors for OS classification.

Results: In the four-modality BraTS4CH setting, the multi-layer perceptron achieved the best performance (F1 = 0.71, AUC = 0.74, accuracy = 0.71). When limited to two modalities on BraTS2020 (BraTS2CH), MLP again led (F1 = 0.67, AUC = 0.70, accuracy = 0.67). On the IRCCS Besta two-modality cohort (Besta2CH), XGBoost produced the highest F1-score and accuracy (F1 = 0.65, accuracy = 0.66), while MLP obtained the top AUC (0.70). These results are competitive with-and in some metrics exceed-state-of-the-art reports, demonstrating the robustness and scalability of our automated framework relative to traditional radiomics and AI-driven approaches.

Conclusion: Integrating encoder-derived features from multimodal MRI data with clinical variables offers a scalable and effective approach for OS prediction in glioblastoma patients. This study demonstrates the potential of deep learning to address traditional radiomics limitations, paving the way for more precise and personalized prognostic tools.

目的:准确预测胶质母细胞瘤患者的总生存期(OS)对于推进个性化治疗和改进临床试验设计至关重要。传统的放射组学方法依赖于人工设计的特征,这限制了它们捕捉复杂、高维成像模式的能力。本研究采用深度学习架构来处理MRI数据,以实现自动胶质瘤分割和特征提取,利用编码器潜在空间的高级表示。方法:使用在BraTS2018上预训练并在BraTS2020上微调的类似u - net的模型,对来自BraTS2020数据集和来自Fondazione IRCCS Istituto Neurologico Carlo Besta (Milan, Italy)的专有数据集的多模态MRI数据进行独立处理。从编码器的潜在空间中提取的特征表示分层成像模式。这些特征与临床变量(患者年龄)相结合,并通过主成分分析(PCA)减少以提高计算效率。机器学习分类器——包括随机森林、XGBoost和一个完全连接的神经网络——在简化的特征向量上进行训练,用于OS分类。结果:在四模态BraTS4CH设置下,多层感知器的性能最佳(F1 = 0.71, AUC = 0.74,准确率= 0.71)。当限于BraTS2020 (BraTS2CH)的两种模式时,MLP再次领先(F1 = 0.67, AUC = 0.70,准确率= 0.67)。在IRCCS best双模队列(Besta2CH)中,XGBoost获得最高的F1得分和准确率(F1 = 0.65,准确率= 0.66),而MLP获得最高的AUC(0.70)。这些结果与最先进的报告具有竞争力,在某些指标上甚至超过了最先进的报告,证明了我们的自动化框架相对于传统放射组学和人工智能驱动方法的稳健性和可扩展性。结论:将多模态MRI数据的编码器衍生特征与临床变量相结合,为胶质母细胞瘤患者的OS预测提供了一种可扩展且有效的方法。这项研究展示了深度学习解决传统放射组学限制的潜力,为更精确和个性化的预后工具铺平了道路。
{"title":"Glioblastoma survival prediction through MRI and clinical data integration with transfer learning.","authors":"A Marasi, D Milesi, D Aquino, F M Doniselli, R Pascuzzo, M Grisoli, A Redaelli, E De Momi","doi":"10.1007/s11548-025-03548-1","DOIUrl":"https://doi.org/10.1007/s11548-025-03548-1","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate prediction of overall survival (OS) in glioblastoma patients is critical for advancing personalized treatments and improving clinical trial design. Conventional radiomics approaches rely on manually engineered features, which limit their ability to capture complex, high-dimensional imaging patterns. This study employs a deep learning architecture to process MRI data for automated glioma segmentation and feature extraction, leveraging high-level representations from the encoder's latent space.</p><p><strong>Methods: </strong>Multimodal MRI data from the BraTS2020 dataset and a proprietary dataset from Fondazione IRCCS Istituto Neurologico Carlo Besta (Milan, Italy) were processed independently using a U-Net-like model pre-trained on BraTS2018 and fine-tuned on BraTS2020. Features extracted from the encoder's latent space represented hierarchical imaging patterns. These features were combined with clinical variable (patient's age) and reduced via principal component analysis (PCA) to enhance computational efficiency. Machine learning classifiers-including random forest, XGBoost, and a fully connected neural network-were trained on the reduced feature vectors for OS classification.</p><p><strong>Results: </strong>In the four-modality BraTS4CH setting, the multi-layer perceptron achieved the best performance (F1 = 0.71, AUC = 0.74, accuracy = 0.71). When limited to two modalities on BraTS2020 (BraTS2CH), MLP again led (F1 = 0.67, AUC = 0.70, accuracy = 0.67). On the IRCCS Besta two-modality cohort (Besta2CH), XGBoost produced the highest F1-score and accuracy (F1 = 0.65, accuracy = 0.66), while MLP obtained the top AUC (0.70). These results are competitive with-and in some metrics exceed-state-of-the-art reports, demonstrating the robustness and scalability of our automated framework relative to traditional radiomics and AI-driven approaches.</p><p><strong>Conclusion: </strong>Integrating encoder-derived features from multimodal MRI data with clinical variables offers a scalable and effective approach for OS prediction in glioblastoma patients. This study demonstrates the potential of deep learning to address traditional radiomics limitations, paving the way for more precise and personalized prognostic tools.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145670828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-depth characterization of a laparoscopic radical prostatectomy procedure based on surgical process modeling. 基于手术过程建模的腹腔镜根治性前列腺切除术的深入表征。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-03 DOI: 10.1007/s11548-025-03552-5
Nuno S Rodrigues, Pedro Morais, Lukas R Buschle, Estevão Lima, João L Vilaça

Purpose: Minimally invasive surgical approaches are currently the standard of care for men with prostate cancer, presenting higher rates of erectile function preservation. With these laparoscopic techniques, there is an increasing amount of data and information available. Adaptive systems can play an important role, acting as an intelligent information filter, assuring that all the available information can become useful for the procedure and not overwhelming for the surgeon. Standardizing and structuring the surgical workflow are key requirements for such smart assistants to recognize the different surgical steps through context information about the environment. This work aims to do a detailed characterization of a laparoscopic radical prostatectomy procedure, focusing on the formalization of medical expert knowledge, via surgical process modeling.

Methods: Data were acquired manually, via online and offline observation, and discussion with medical experts. A total of 14 procedures were observed. Both manual laparoscopic radical prostatectomy and robot-assisted laparoscopic prostatectomy were studied. The derived SPM focuses only on the intraoperatory part of the procedure, with constant feedback from the endoscopic camera. For surgery observation, a dedicated Excel template was developed.

Results: The final model is represented in a descriptive and numerical format, combining task description with a workflow diagram arrangement for ease of interpretation. Practical applications of the generated surgical process model are exemplified with the creation of activation trees for surgical phase identification. Anatomical structures are reported for each phase, distinguishing between visible and inferable ones. Additionally, the surgeons involved are identified, surgical instruments, and actions performed in each phase. A total of 11 phases were identified and characterized. Average surgery duration is 87 min.

Conclusion: The generated surgical process model is a first step toward the development of a context-aware surgical assistant and can potentially be used as a roadmap by other research teams, operating room managers and surgical teams.

目的:微创手术入路是目前前列腺癌患者的标准治疗方法,具有较高的勃起功能保留率。有了这些腹腔镜技术,有越来越多的数据和信息可用。自适应系统可以发挥重要作用,充当智能信息过滤器,确保所有可用信息对手术有用,而不会让外科医生不知所措。手术工作流程的标准化和结构化是智能助手通过环境上下文信息识别不同手术步骤的关键要求。本工作旨在通过手术过程建模,对腹腔镜根治性前列腺切除术进行详细描述,重点关注医学专家知识的形式化。方法:人工采集资料,通过线上、线下观察,并与医学专家讨论。共观察了14个手术过程。对人工腹腔镜根治性前列腺切除术和机器人辅助腹腔镜前列腺切除术进行了研究。导出的SPM仅关注术中部分的过程,并从内窥镜相机不断反馈。为手术观察,开发了专用的Excel模板。结果:最终模型以描述性和数字格式表示,将任务描述与易于解释的工作流图安排相结合。通过创建用于手术阶段识别的激活树,举例说明了生成的手术过程模型的实际应用。解剖结构报告了每个阶段,区分可见的和可推断的。此外,在每个阶段确定涉及的外科医生、手术器械和操作。共鉴定和表征了11个相。平均手术时间为87分钟。结论:所生成的手术过程模型是开发情境感知手术助手的第一步,可能被其他研究团队、手术室管理人员和手术团队用作路线图。
{"title":"In-depth characterization of a laparoscopic radical prostatectomy procedure based on surgical process modeling.","authors":"Nuno S Rodrigues, Pedro Morais, Lukas R Buschle, Estevão Lima, João L Vilaça","doi":"10.1007/s11548-025-03552-5","DOIUrl":"https://doi.org/10.1007/s11548-025-03552-5","url":null,"abstract":"<p><strong>Purpose: </strong>Minimally invasive surgical approaches are currently the standard of care for men with prostate cancer, presenting higher rates of erectile function preservation. With these laparoscopic techniques, there is an increasing amount of data and information available. Adaptive systems can play an important role, acting as an intelligent information filter, assuring that all the available information can become useful for the procedure and not overwhelming for the surgeon. Standardizing and structuring the surgical workflow are key requirements for such smart assistants to recognize the different surgical steps through context information about the environment. This work aims to do a detailed characterization of a laparoscopic radical prostatectomy procedure, focusing on the formalization of medical expert knowledge, via surgical process modeling.</p><p><strong>Methods: </strong>Data were acquired manually, via online and offline observation, and discussion with medical experts. A total of 14 procedures were observed. Both manual laparoscopic radical prostatectomy and robot-assisted laparoscopic prostatectomy were studied. The derived SPM focuses only on the intraoperatory part of the procedure, with constant feedback from the endoscopic camera. For surgery observation, a dedicated Excel template was developed.</p><p><strong>Results: </strong>The final model is represented in a descriptive and numerical format, combining task description with a workflow diagram arrangement for ease of interpretation. Practical applications of the generated surgical process model are exemplified with the creation of activation trees for surgical phase identification. Anatomical structures are reported for each phase, distinguishing between visible and inferable ones. Additionally, the surgeons involved are identified, surgical instruments, and actions performed in each phase. A total of 11 phases were identified and characterized. Average surgery duration is 87 min.</p><p><strong>Conclusion: </strong>The generated surgical process model is a first step toward the development of a context-aware surgical assistant and can potentially be used as a roadmap by other research teams, operating room managers and surgical teams.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145670873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Assisted Radiology and Surgery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1