Pub Date : 2024-11-01Epub Date: 2024-06-25DOI: 10.1007/s11548-024-03169-0
Keita Takeda, Tomoya Sakai, Eiji Mitate
To address the background-bias problem in computer-aided cytology caused by microscopic slide deterioration, this article proposes a deep learning approach for cell segmentation and background removal without requiring cell annotation. A U-Net-based model was trained to separate cells from the background in an unsupervised manner by leveraging the redundancy of the background and the sparsity of cells in liquid-based cytology (LBC) images. The experimental results demonstrate that the U-Net-based model trained on a small set of cytology images can exclude background features and accurately segment cells. This capability is beneficial for debiasing in the detection and classification of the cells of interest in oral LBC. Slide deterioration can significantly affect deep learning-based cell classification. Our proposed method effectively removes background features at no cost of cell annotation, thereby enabling accurate cytological diagnosis through the deep learning of microscopic slide images.
{"title":"Background removal for debiasing computer-aided cytological diagnosis.","authors":"Keita Takeda, Tomoya Sakai, Eiji Mitate","doi":"10.1007/s11548-024-03169-0","DOIUrl":"10.1007/s11548-024-03169-0","url":null,"abstract":"<p><p>To address the background-bias problem in computer-aided cytology caused by microscopic slide deterioration, this article proposes a deep learning approach for cell segmentation and background removal without requiring cell annotation. A U-Net-based model was trained to separate cells from the background in an unsupervised manner by leveraging the redundancy of the background and the sparsity of cells in liquid-based cytology (LBC) images. The experimental results demonstrate that the U-Net-based model trained on a small set of cytology images can exclude background features and accurately segment cells. This capability is beneficial for debiasing in the detection and classification of the cells of interest in oral LBC. Slide deterioration can significantly affect deep learning-based cell classification. Our proposed method effectively removes background features at no cost of cell annotation, thereby enabling accurate cytological diagnosis through the deep learning of microscopic slide images.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2165-2174"},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541310/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Manual annotations for training deep learning models in auto-segmentation are time-intensive. This study introduces a hybrid representation-enhanced sampling strategy that integrates both density and diversity criteria within an uncertainty-based Bayesian active learning (BAL) framework to reduce annotation efforts by selecting the most informative training samples.
Methods: The experiments are performed on two lower extremity datasets of MRI and CT images, focusing on the segmentation of the femur, pelvis, sacrum, quadriceps femoris, hamstrings, adductors, sartorius, and iliopsoas, utilizing a U-net-based BAL framework. Our method selects uncertain samples with high density and diversity for manual revision, optimizing for maximal similarity to unlabeled instances and minimal similarity to existing training data. We assess the accuracy and efficiency using dice and a proposed metric called reduced annotation cost (RAC), respectively. We further evaluate the impact of various acquisition rules on BAL performance and design an ablation study for effectiveness estimation.
Results: In MRI and CT datasets, our method was superior or comparable to existing ones, achieving a 0.8% dice and 1.0% RAC increase in CT (statistically significant), and a 0.8% dice and 1.1% RAC increase in MRI (not statistically significant) in volume-wise acquisition. Our ablation study indicates that combining density and diversity criteria enhances the efficiency of BAL in musculoskeletal segmentation compared to using either criterion alone.
Conclusion: Our sampling method is proven efficient in reducing annotation costs in image segmentation tasks. The combination of the proposed method and our BAL framework provides a semi-automatic way for efficient annotation of medical image datasets.
目的:为训练自动分割中的深度学习模型而进行人工标注耗费大量时间。本研究介绍了一种混合表示增强采样策略,该策略在基于不确定性的贝叶斯主动学习(BAL)框架内整合了密度和多样性标准,通过选择信息量最大的训练样本来减少标注工作:实验在两个下肢数据集的核磁共振成像和 CT 图像上进行,重点是股骨、骨盆、骶骨、股四头肌、腘绳肌、内收肌、腓肠肌和髂腰肌的分割,并利用基于 U 网的 BAL 框架。我们的方法选择具有高密度和多样性的不确定样本进行人工修正,优化与未标记实例的最大相似性和与现有训练数据的最小相似性。我们分别使用骰子和一种称为降低标注成本(RAC)的拟议指标来评估准确性和效率。我们还进一步评估了各种采集规则对 BAL 性能的影响,并设计了一项消融研究来估算有效性:在 MRI 和 CT 数据集中,我们的方法优于或媲美现有的方法,在 CT 中实现了 0.8% 的骰子增加和 1.0% 的 RAC 增加(有统计学意义),在 MRI 中实现了 0.8% 的骰子增加和 1.1% 的 RAC 增加(无统计学意义)。我们的消融研究表明,与单独使用其中一种标准相比,结合密度和多样性标准可提高 BAL 在肌肉骨骼分割中的效率:结论:事实证明,我们的采样方法能有效降低图像分割任务中的注释成本。建议的方法与我们的 BAL 框架相结合,为高效注释医学图像数据集提供了一种半自动方法。
{"title":"Hybrid representation-enhanced sampling for Bayesian active learning in musculoskeletal segmentation of lower extremities.","authors":"Ganping Li, Yoshito Otake, Mazen Soufi, Masashi Taniguchi, Masahide Yagi, Noriaki Ichihashi, Keisuke Uemura, Masaki Takao, Nobuhiko Sugano, Yoshinobu Sato","doi":"10.1007/s11548-024-03065-7","DOIUrl":"10.1007/s11548-024-03065-7","url":null,"abstract":"<p><strong>Purpose: </strong>Manual annotations for training deep learning models in auto-segmentation are time-intensive. This study introduces a hybrid representation-enhanced sampling strategy that integrates both density and diversity criteria within an uncertainty-based Bayesian active learning (BAL) framework to reduce annotation efforts by selecting the most informative training samples.</p><p><strong>Methods: </strong>The experiments are performed on two lower extremity datasets of MRI and CT images, focusing on the segmentation of the femur, pelvis, sacrum, quadriceps femoris, hamstrings, adductors, sartorius, and iliopsoas, utilizing a U-net-based BAL framework. Our method selects uncertain samples with high density and diversity for manual revision, optimizing for maximal similarity to unlabeled instances and minimal similarity to existing training data. We assess the accuracy and efficiency using dice and a proposed metric called reduced annotation cost (RAC), respectively. We further evaluate the impact of various acquisition rules on BAL performance and design an ablation study for effectiveness estimation.</p><p><strong>Results: </strong>In MRI and CT datasets, our method was superior or comparable to existing ones, achieving a 0.8% dice and 1.0% RAC increase in CT (statistically significant), and a 0.8% dice and 1.1% RAC increase in MRI (not statistically significant) in volume-wise acquisition. Our ablation study indicates that combining density and diversity criteria enhances the efficiency of BAL in musculoskeletal segmentation compared to using either criterion alone.</p><p><strong>Conclusion: </strong>Our sampling method is proven efficient in reducing annotation costs in image segmentation tasks. The combination of the proposed method and our BAL framework provides a semi-automatic way for efficient annotation of medical image datasets.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2177-2186"},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: A large number of research has been conducted on the classification of medical images using deep learning. The thyroid tissue images can be also classified by cancer types. Deep learning requires a large amount of data, but every medical institution cannot collect sufficient number of data for deep learning. In that case, we can consider a case where a classifier trained at a certain medical institution that has a sufficient number of data is reused at other institutions. However, when using data from multiple institutions, it is necessary to unify the feature distribution because the feature of the data differs due to differences in data acquisition conditions.
Methods: To unify the feature distribution, the data from Institution T are transformed to have the closer distribution to that from Institution S by applying a domain transformation using semi-supervised CycleGAN. The proposed method enhances CycleGAN considering the feature distribution of classes for making appropriate domain transformation for classification. In addition, to address the problem of imbalanced data with different numbers of data for each cancer type, several methods dealing with imbalanced data are applied to semi-supervised CycleGAN.
Results: The experimental results showed that the classification performance was enhanced when the dataset from Institution S was used as training data and the testing dataset from Institution T was classified after applying domain transformation. In addition, focal loss contributed to improving the mean F1 score the best as a method that addresses the class imbalance.
Conclusion: The proposed method achieved the domain transformation of thyroid tissue images between two domains, where it retained the important features related to the classes across domains and showed the best F1 score with significant differences compared with other methods. In addition, the proposed method was further enhanced by addressing the class imbalance of the dataset.
目的:人们利用深度学习对医学图像进行了大量分类研究。甲状腺组织图像也可按癌症类型进行分类。深度学习需要大量数据,但每个医疗机构都无法收集到足够数量的数据用于深度学习。在这种情况下,我们可以考虑将某个医疗机构训练的分类器在其他医疗机构重复使用,因为该医疗机构拥有足够数量的数据。但是,在使用多个机构的数据时,由于数据获取条件的不同,数据的特征也不尽相同,因此有必要统一特征分布:为了统一特征分布,使用半监督 CycleGAN 进行域转换,将来自 T 机构的数据转换为与来自 S 机构的数据分布更接近的数据。所提出的方法增强了 CycleGAN 的功能,考虑到了类的特征分布,从而为分类进行适当的域转换。此外,为了解决每种癌症类型的数据数量不同的不平衡数据问题,在半监督 CycleGAN 中应用了几种处理不平衡数据的方法:实验结果表明,当使用 S 机构的数据集作为训练数据,并对 T 机构的测试数据集进行域转换后进行分类时,分类性能得到了提高。此外,作为一种解决类不平衡的方法,焦点丢失对提高平均 F1 分数的贡献最大:结论:所提出的方法实现了甲状腺组织图像在两个域之间的域转换,保留了与跨域类别相关的重要特征,与其他方法相比,F1得分最高,差异显著。此外,通过解决数据集的类不平衡问题,所提出的方法得到了进一步增强。
{"title":"Domain transformation using semi-supervised CycleGAN for improving performance of classifying thyroid tissue images.","authors":"Yoshihito Ichiuji, Shingo Mabu, Satomi Hatta, Kunihiro Inai, Shohei Higuchi, Shoji Kido","doi":"10.1007/s11548-024-03061-x","DOIUrl":"10.1007/s11548-024-03061-x","url":null,"abstract":"<p><strong>Purpose: </strong>A large number of research has been conducted on the classification of medical images using deep learning. The thyroid tissue images can be also classified by cancer types. Deep learning requires a large amount of data, but every medical institution cannot collect sufficient number of data for deep learning. In that case, we can consider a case where a classifier trained at a certain medical institution that has a sufficient number of data is reused at other institutions. However, when using data from multiple institutions, it is necessary to unify the feature distribution because the feature of the data differs due to differences in data acquisition conditions.</p><p><strong>Methods: </strong>To unify the feature distribution, the data from Institution T are transformed to have the closer distribution to that from Institution S by applying a domain transformation using semi-supervised CycleGAN. The proposed method enhances CycleGAN considering the feature distribution of classes for making appropriate domain transformation for classification. In addition, to address the problem of imbalanced data with different numbers of data for each cancer type, several methods dealing with imbalanced data are applied to semi-supervised CycleGAN.</p><p><strong>Results: </strong>The experimental results showed that the classification performance was enhanced when the dataset from Institution S was used as training data and the testing dataset from Institution T was classified after applying domain transformation. In addition, focal loss contributed to improving the mean F1 score the best as a method that addresses the class imbalance.</p><p><strong>Conclusion: </strong>The proposed method achieved the domain transformation of thyroid tissue images between two domains, where it retained the important features related to the classes across domains and showed the best F1 score with significant differences compared with other methods. In addition, the proposed method was further enhanced by addressing the class imbalance of the dataset.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2153-2163"},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139492884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01Epub Date: 2024-03-23DOI: 10.1007/s11548-024-03077-3
Wenqi Zhou, Xinzhou Li, Fatemeh Zabihollahy, David S Lu, Holden H Wu
Purpose: Accurate and rapid needle localization on 3D magnetic resonance imaging (MRI) is critical for MRI-guided percutaneous interventions. The current workflow requires manual needle localization on 3D MRI, which is time-consuming and cumbersome. Automatic methods using 2D deep learning networks for needle segmentation require manual image plane localization, while 3D networks are challenged by the need for sufficient training datasets. This work aimed to develop an automatic deep learning-based pipeline for accurate and rapid 3D needle localization on in vivo intra-procedural 3D MRI using a limited training dataset.
Methods: The proposed automatic pipeline adopted Shifted Window (Swin) Transformers and employed a coarse-to-fine segmentation strategy: (1) initial 3D needle feature segmentation with 3D Swin UNEt TRansfomer (UNETR); (2) generation of a 2D reformatted image containing the needle feature; (3) fine 2D needle feature segmentation with 2D Swin Transformer and calculation of 3D needle tip position and axis orientation. Pre-training and data augmentation were performed to improve network training. The pipeline was evaluated via cross-validation with 49 in vivo intra-procedural 3D MR images from preclinical pig experiments. The needle tip and axis localization errors were compared with human intra-reader variation using the Wilcoxon signed rank test, with p < 0.05 considered significant.
Results: The average end-to-end computational time for the pipeline was 6 s per 3D volume. The median Dice scores of the 3D Swin UNETR and 2D Swin Transformer in the pipeline were 0.80 and 0.93, respectively. The median 3D needle tip and axis localization errors were 1.48 mm (1.09 pixels) and 0.98°, respectively. Needle tip localization errors were significantly smaller than human intra-reader variation (median 1.70 mm; p < 0.01).
Conclusion: The proposed automatic pipeline achieved rapid pixel-level 3D needle localization on intra-procedural 3D MRI without requiring a large 3D training dataset and has the potential to assist MRI-guided percutaneous interventions.
{"title":"Deep learning-based automatic pipeline for 3D needle localization on intra-procedural 3D MRI.","authors":"Wenqi Zhou, Xinzhou Li, Fatemeh Zabihollahy, David S Lu, Holden H Wu","doi":"10.1007/s11548-024-03077-3","DOIUrl":"10.1007/s11548-024-03077-3","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate and rapid needle localization on 3D magnetic resonance imaging (MRI) is critical for MRI-guided percutaneous interventions. The current workflow requires manual needle localization on 3D MRI, which is time-consuming and cumbersome. Automatic methods using 2D deep learning networks for needle segmentation require manual image plane localization, while 3D networks are challenged by the need for sufficient training datasets. This work aimed to develop an automatic deep learning-based pipeline for accurate and rapid 3D needle localization on in vivo intra-procedural 3D MRI using a limited training dataset.</p><p><strong>Methods: </strong>The proposed automatic pipeline adopted Shifted Window (Swin) Transformers and employed a coarse-to-fine segmentation strategy: (1) initial 3D needle feature segmentation with 3D Swin UNEt TRansfomer (UNETR); (2) generation of a 2D reformatted image containing the needle feature; (3) fine 2D needle feature segmentation with 2D Swin Transformer and calculation of 3D needle tip position and axis orientation. Pre-training and data augmentation were performed to improve network training. The pipeline was evaluated via cross-validation with 49 in vivo intra-procedural 3D MR images from preclinical pig experiments. The needle tip and axis localization errors were compared with human intra-reader variation using the Wilcoxon signed rank test, with p < 0.05 considered significant.</p><p><strong>Results: </strong>The average end-to-end computational time for the pipeline was 6 s per 3D volume. The median Dice scores of the 3D Swin UNETR and 2D Swin Transformer in the pipeline were 0.80 and 0.93, respectively. The median 3D needle tip and axis localization errors were 1.48 mm (1.09 pixels) and 0.98°, respectively. Needle tip localization errors were significantly smaller than human intra-reader variation (median 1.70 mm; p < 0.01).</p><p><strong>Conclusion: </strong>The proposed automatic pipeline achieved rapid pixel-level 3D needle localization on intra-procedural 3D MRI without requiring a large 3D training dataset and has the potential to assist MRI-guided percutaneous interventions.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2227-2237"},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541278/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140195078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: The visualization of an anomaly area is easier in anomaly detection methods that use generative models rather than classification models. However, achieving both anomaly detection accuracy and a clear visualization of anomalous areas is challenging. This study aimed to establish a method that combines both detection accuracy and clear visualization of anomalous areas using a generative adversarial network (GAN).
Methods: In this study, StyleGAN2 with adaptive discriminator augmentation (StyleGAN2-ADA), which can generate high-resolution and high-quality images with limited number of datasets, was used as the image generation model, and pixel-to-style-to-pixel (pSp) encoder was used to convert images into intermediate latent variables. We combined existing methods for training and proposed a method for calculating anomaly scores using intermediate latent variables. The proposed method, which combines these two methods, is called high-quality anomaly GAN (HQ-AnoGAN).
Results: The experimental results obtained using three datasets demonstrated that HQ-AnoGAN has equal or better detection accuracy than the existing methods. The results of the visualization of abnormal areas using the generated images showed that HQ-AnoGAN could generate more natural images than the existing methods and was qualitatively more accurate in the visualization of abnormal areas.
Conclusion: In this study, HQ-AnoGAN comprising StyleGAN2-ADA and pSp encoder was proposed with an optimal anomaly score calculation method. The experimental results show that HQ-AnoGAN can achieve both high abnormality detection accuracy and clear visualization of abnormal areas; thus, HQ-AnoGAN demonstrates significant potential for application in medical imaging diagnosis cases where an explanation of diagnosis is required.
{"title":"High-quality semi-supervised anomaly detection with generative adversarial networks.","authors":"Yuki Sato, Junya Sato, Noriyuki Tomiyama, Shoji Kido","doi":"10.1007/s11548-023-03031-9","DOIUrl":"10.1007/s11548-023-03031-9","url":null,"abstract":"<p><strong>Purpose: </strong>The visualization of an anomaly area is easier in anomaly detection methods that use generative models rather than classification models. However, achieving both anomaly detection accuracy and a clear visualization of anomalous areas is challenging. This study aimed to establish a method that combines both detection accuracy and clear visualization of anomalous areas using a generative adversarial network (GAN).</p><p><strong>Methods: </strong>In this study, StyleGAN2 with adaptive discriminator augmentation (StyleGAN2-ADA), which can generate high-resolution and high-quality images with limited number of datasets, was used as the image generation model, and pixel-to-style-to-pixel (pSp) encoder was used to convert images into intermediate latent variables. We combined existing methods for training and proposed a method for calculating anomaly scores using intermediate latent variables. The proposed method, which combines these two methods, is called high-quality anomaly GAN (HQ-AnoGAN).</p><p><strong>Results: </strong>The experimental results obtained using three datasets demonstrated that HQ-AnoGAN has equal or better detection accuracy than the existing methods. The results of the visualization of abnormal areas using the generated images showed that HQ-AnoGAN could generate more natural images than the existing methods and was qualitatively more accurate in the visualization of abnormal areas.</p><p><strong>Conclusion: </strong>In this study, HQ-AnoGAN comprising StyleGAN2-ADA and pSp encoder was proposed with an optimal anomaly score calculation method. The experimental results show that HQ-AnoGAN can achieve both high abnormality detection accuracy and clear visualization of abnormal areas; thus, HQ-AnoGAN demonstrates significant potential for application in medical imaging diagnosis cases where an explanation of diagnosis is required.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2121-2131"},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71523347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01Epub Date: 2024-05-18DOI: 10.1007/s11548-024-03166-3
Joël L Lavanchy, Sanat Ramesh, Diego Dall'Alba, Cristians Gonzalez, Paolo Fiorini, Beat P Müller-Stich, Philipp C Nett, Jacques Marescaux, Didier Mutter, Nicolas Padoy
Purpose: Most studies on surgical activity recognition utilizing artificial intelligence (AI) have focused mainly on recognizing one type of activity from small and mono-centric surgical video datasets. It remains speculative whether those models would generalize to other centers.
Methods: In this work, we introduce a large multi-centric multi-activity dataset consisting of 140 surgical videos (MultiBypass140) of laparoscopic Roux-en-Y gastric bypass (LRYGB) surgeries performed at two medical centers, i.e., the University Hospital of Strasbourg, France (StrasBypass70) and Inselspital, Bern University Hospital, Switzerland (BernBypass70). The dataset has been fully annotated with phases and steps by two board-certified surgeons. Furthermore, we assess the generalizability and benchmark different deep learning models for the task of phase and step recognition in 7 experimental studies: (1) Training and evaluation on BernBypass70; (2) Training and evaluation on StrasBypass70; (3) Training and evaluation on the joint MultiBypass140 dataset; (4) Training on BernBypass70, evaluation on StrasBypass70; (5) Training on StrasBypass70, evaluation on BernBypass70; Training on MultiBypass140, (6) evaluation on BernBypass70 and (7) evaluation on StrasBypass70.
Results: The model's performance is markedly influenced by the training data. The worst results were obtained in experiments (4) and (5) confirming the limited generalization capabilities of models trained on mono-centric data. The use of multi-centric training data, experiments (6) and (7), improves the generalization capabilities of the models, bringing them beyond the level of independent mono-centric training and validation (experiments (1) and (2)).
Conclusion: MultiBypass140 shows considerable variation in surgical technique and workflow of LRYGB procedures between centers. Therefore, generalization experiments demonstrate a remarkable difference in model performance. These results highlight the importance of multi-centric datasets for AI model generalization to account for variance in surgical technique and workflows. The dataset and code are publicly available at https://github.com/CAMMA-public/MultiBypass140.
{"title":"Challenges in multi-centric generalization: phase and step recognition in Roux-en-Y gastric bypass surgery.","authors":"Joël L Lavanchy, Sanat Ramesh, Diego Dall'Alba, Cristians Gonzalez, Paolo Fiorini, Beat P Müller-Stich, Philipp C Nett, Jacques Marescaux, Didier Mutter, Nicolas Padoy","doi":"10.1007/s11548-024-03166-3","DOIUrl":"10.1007/s11548-024-03166-3","url":null,"abstract":"<p><strong>Purpose: </strong>Most studies on surgical activity recognition utilizing artificial intelligence (AI) have focused mainly on recognizing one type of activity from small and mono-centric surgical video datasets. It remains speculative whether those models would generalize to other centers.</p><p><strong>Methods: </strong>In this work, we introduce a large multi-centric multi-activity dataset consisting of 140 surgical videos (MultiBypass140) of laparoscopic Roux-en-Y gastric bypass (LRYGB) surgeries performed at two medical centers, i.e., the University Hospital of Strasbourg, France (StrasBypass70) and Inselspital, Bern University Hospital, Switzerland (BernBypass70). The dataset has been fully annotated with phases and steps by two board-certified surgeons. Furthermore, we assess the generalizability and benchmark different deep learning models for the task of phase and step recognition in 7 experimental studies: (1) Training and evaluation on BernBypass70; (2) Training and evaluation on StrasBypass70; (3) Training and evaluation on the joint MultiBypass140 dataset; (4) Training on BernBypass70, evaluation on StrasBypass70; (5) Training on StrasBypass70, evaluation on BernBypass70; Training on MultiBypass140, (6) evaluation on BernBypass70 and (7) evaluation on StrasBypass70.</p><p><strong>Results: </strong>The model's performance is markedly influenced by the training data. The worst results were obtained in experiments (4) and (5) confirming the limited generalization capabilities of models trained on mono-centric data. The use of multi-centric training data, experiments (6) and (7), improves the generalization capabilities of the models, bringing them beyond the level of independent mono-centric training and validation (experiments (1) and (2)).</p><p><strong>Conclusion: </strong>MultiBypass140 shows considerable variation in surgical technique and workflow of LRYGB procedures between centers. Therefore, generalization experiments demonstrate a remarkable difference in model performance. These results highlight the importance of multi-centric datasets for AI model generalization to account for variance in surgical technique and workflows. The dataset and code are publicly available at https://github.com/CAMMA-public/MultiBypass140.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2249-2257"},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541311/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140959178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Osteochondritis dissecans (OCD) of the humeral capitellum is a common cause of elbow disorders, particularly among young throwing athletes. Conservative treatment is the preferred treatment for managing OCD, and early intervention significantly influences the possibility of complete disease resolution. The purpose of this study is to develop a deep learning-based classification model in ultrasound images for computer-aided diagnosis.
Methods: This paper proposes a deep learning-based OCD classification method in ultrasound images. The proposed method first detects the humeral capitellum detection using YOLO and then estimates the OCD probability of the detected region probability using VGG16. We hypothesis that the performance will be improved by eliminating unnecessary regions. To validate the performance of the proposed method, it was applied to 158 subjects (OCD: 67, Normal: 91) using five-fold-cross-validation.
Results: The study demonstrated that the humeral capitellum detection achieved a mean average precision (mAP) of over 0.95, while OCD probability estimation achieved an average accuracy of 0.890, precision of 0.888, recall of 0.927, F1 score of 0.894, and an area under the curve (AUC) of 0.962. On the other hand, when the classification model was constructed for the entire image, accuracy, precision, recall, F1 score, and AUC were 0.806, 0.806, 0.932, 0.843, and 0.928, respectively. The findings suggest the high-performance potential of the proposed model for OCD classification in ultrasonic images.
Conclusion: This paper introduces a deep learning-based OCD classification method. The experimental results emphasize the effectiveness of focusing on the humeral capitellum for OCD classification in ultrasound images. Future work should involve evaluating the effectiveness of employing the proposed method by physicians during medical check-ups for OCD.
{"title":"Deep learning-based osteochondritis dissecans detection in ultrasound images with humeral capitellum localization.","authors":"Kenta Sasaki, Daisuke Fujita, Kenta Takatsuji, Yoshihiro Kotoura, Masataka Minami, Yusuke Kobayashi, Tsuyoshi Sukenari, Yoshikazu Kida, Kenji Takahashi, Syoji Kobashi","doi":"10.1007/s11548-023-03040-8","DOIUrl":"10.1007/s11548-023-03040-8","url":null,"abstract":"<p><strong>Purpose: </strong>Osteochondritis dissecans (OCD) of the humeral capitellum is a common cause of elbow disorders, particularly among young throwing athletes. Conservative treatment is the preferred treatment for managing OCD, and early intervention significantly influences the possibility of complete disease resolution. The purpose of this study is to develop a deep learning-based classification model in ultrasound images for computer-aided diagnosis.</p><p><strong>Methods: </strong>This paper proposes a deep learning-based OCD classification method in ultrasound images. The proposed method first detects the humeral capitellum detection using YOLO and then estimates the OCD probability of the detected region probability using VGG16. We hypothesis that the performance will be improved by eliminating unnecessary regions. To validate the performance of the proposed method, it was applied to 158 subjects (OCD: 67, Normal: 91) using five-fold-cross-validation.</p><p><strong>Results: </strong>The study demonstrated that the humeral capitellum detection achieved a mean average precision (mAP) of over 0.95, while OCD probability estimation achieved an average accuracy of 0.890, precision of 0.888, recall of 0.927, F1 score of 0.894, and an area under the curve (AUC) of 0.962. On the other hand, when the classification model was constructed for the entire image, accuracy, precision, recall, F1 score, and AUC were 0.806, 0.806, 0.932, 0.843, and 0.928, respectively. The findings suggest the high-performance potential of the proposed model for OCD classification in ultrasonic images.</p><p><strong>Conclusion: </strong>This paper introduces a deep learning-based OCD classification method. The experimental results emphasize the effectiveness of focusing on the humeral capitellum for OCD classification in ultrasound images. Future work should involve evaluating the effectiveness of employing the proposed method by physicians during medical check-ups for OCD.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2143-2152"},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541362/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139486838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01Epub Date: 2024-09-25DOI: 10.1007/s11548-024-03248-2
Marzieh Ershad Langroodi, Xi Liu, Mark R Tousignant, Anthony M Jarc
Purpose: Surgical skill evaluation that relies on subjective scoring of surgical videos can be time-consuming and inconsistent across raters. We demonstrate differentiated opportunities for objective evaluation to improve surgeon training and performance.
Methods: Subjective evaluation was performed using the Global evaluative assessment of robotic skills (GEARS) from both expert and crowd raters; whereas, objective evaluation used objective performance indicators (OPIs) derived from da Vinci surgical systems. Classifiers were trained for each evaluation method to distinguish between surgical expertise levels. This study includes one clinical task from a case series of robotic-assisted sleeve gastrectomy procedures performed by a single surgeon, and two training tasks performed by novice and expert surgeons, i.e., surgeons with no experience in robotic-assisted surgery (RAS) and those with more than 500 RAS procedures.
Results: When comparing expert and novice skill levels, OPI-based classifier showed significantly higher accuracy than GEARS-based classifier on the more complex dissection task (OPI 0.93 ± 0.08 vs. GEARS 0.67 ± 0.18; 95% CI, 0.16-0.37; p = 0.02), but no significant difference was shown on the simpler suturing task. For the single-surgeon case series, both classifiers performed well when differentiating between early and late group cases with smaller group sizes and larger intervals between groups (OPI 0.9 ± 0.08; GEARS 0.87 ± 0.12; 95% CI, 0.02-0.04; p = 0.67). When increasing the group size to include more cases, thereby having smaller intervals between groups, OPIs demonstrated significantly higher accuracy (OPI 0.97 ± 0.06; GEARS 0.76 ± 0.07; 95% CI, 0.12-0.28; p = 0.004) in differentiating between the early/late cases.
Conclusions: Objective methods for skill evaluation in RAS outperform subjective methods when (1) differentiating expertise in a technically challenging training task, and (2) identifying more granular differences along early versus late phases of a surgeon learning curve within a clinical task. Objective methods offer an opportunity for more accessible and scalable skill evaluation in RAS.
{"title":"Objective performance indicators versus GEARS: an opportunity for more accurate assessment of surgical skill.","authors":"Marzieh Ershad Langroodi, Xi Liu, Mark R Tousignant, Anthony M Jarc","doi":"10.1007/s11548-024-03248-2","DOIUrl":"10.1007/s11548-024-03248-2","url":null,"abstract":"<p><strong>Purpose: </strong>Surgical skill evaluation that relies on subjective scoring of surgical videos can be time-consuming and inconsistent across raters. We demonstrate differentiated opportunities for objective evaluation to improve surgeon training and performance.</p><p><strong>Methods: </strong>Subjective evaluation was performed using the Global evaluative assessment of robotic skills (GEARS) from both expert and crowd raters; whereas, objective evaluation used objective performance indicators (OPIs) derived from da Vinci surgical systems. Classifiers were trained for each evaluation method to distinguish between surgical expertise levels. This study includes one clinical task from a case series of robotic-assisted sleeve gastrectomy procedures performed by a single surgeon, and two training tasks performed by novice and expert surgeons, i.e., surgeons with no experience in robotic-assisted surgery (RAS) and those with more than 500 RAS procedures.</p><p><strong>Results: </strong>When comparing expert and novice skill levels, OPI-based classifier showed significantly higher accuracy than GEARS-based classifier on the more complex dissection task (OPI 0.93 ± 0.08 vs. GEARS 0.67 ± 0.18; 95% CI, 0.16-0.37; p = 0.02), but no significant difference was shown on the simpler suturing task. For the single-surgeon case series, both classifiers performed well when differentiating between early and late group cases with smaller group sizes and larger intervals between groups (OPI 0.9 ± 0.08; GEARS 0.87 ± 0.12; 95% CI, 0.02-0.04; p = 0.67). When increasing the group size to include more cases, thereby having smaller intervals between groups, OPIs demonstrated significantly higher accuracy (OPI 0.97 ± 0.06; GEARS 0.76 ± 0.07; 95% CI, 0.12-0.28; p = 0.004) in differentiating between the early/late cases.</p><p><strong>Conclusions: </strong>Objective methods for skill evaluation in RAS outperform subjective methods when (1) differentiating expertise in a technically challenging training task, and (2) identifying more granular differences along early versus late phases of a surgeon learning curve within a clinical task. Objective methods offer an opportunity for more accessible and scalable skill evaluation in RAS.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2259-2267"},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142332054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01Epub Date: 2024-06-07DOI: 10.1007/s11548-024-03193-0
E Cramer, A B Kucharski, J Kreimeier, S Andreß, S Li, C Walk, F Merkl, J Högl, P Wucherer, P Stefan, R von Eisenhart-Rothe, P Enste, D Roth
Purpose: We aim to investigate the integration of augmented reality (AR) within the context of increasingly complex surgical procedures and instrument handling toward the transition to smart operating rooms (OR). In contrast to cumbersome paper-based surgical instrument manuals still used in the OR, we wish to provide surgical staff with an AR head-mounted display that provides in-situ visualization and guidance throughout the assembly process of surgical instruments. Our requirement analysis supports the development and provides guidelines for its transfer into surgical practice.
Methods: A three-phase user-centered design approach was applied with online interviews, an observational study, and a workshop with two focus groups with scrub nurses, circulating nurses, surgeons, manufacturers, clinic IT staff, and members of the sterilization department. The requirement analysis was based on key criteria for usability. The data were analyzed via structured content analysis.
Results: We identified twelve main problems with the current use of paper manuals. Major issues included sterile users' inability to directly handle non-sterile manuals, missing details, and excessive text information, potentially delaying procedure performance. Major requirements for AR-driven guidance fall into the categories of design, practicability, control, and integration into the current workflow. Additionally, further recommendations for technical development could be obtained.
Conclusion: In conclusion, our insights have outlined a comprehensive spectrum of requirements that are essential for the successful implementation of an AI- and AR-driven guidance for assembling surgical instruments. The consistently appreciative evaluation by stakeholders underscores the profound potential of AR and AI technology as valuable assistance and guidance.
目的:我们的目标是研究如何将增强现实技术(AR)与日益复杂的手术过程和器械处理相结合,以实现向智能手术室(OR)的过渡。与手术室仍在使用的繁琐的纸质手术器械手册相比,我们希望为手术人员提供一个 AR 头戴式显示器,在整个手术器械组装过程中提供现场可视化和指导。我们的需求分析为该产品的开发提供了支持,并为其在外科手术中的应用提供了指导:方法:我们采用了以用户为中心的三阶段设计方法,包括在线访谈、观察研究和研讨会,其中研讨会包含两个焦点小组,参加者包括擦洗护士、循环护士、外科医生、制造商、诊所 IT 人员和消毒部门成员。需求分析基于可用性的关键标准。通过结构化内容分析对数据进行了分析:我们发现了目前使用纸质手册存在的十二个主要问题。主要问题包括:无菌用户无法直接处理非无菌手册、细节缺失、文字信息过多,这些都有可能延误程序的执行。AR 驱动指南的主要要求包括设计、实用性、控制和与当前工作流程的整合。此外,还可以获得更多的技术发展建议:总之,我们的见解概述了成功实施人工智能和增强现实技术驱动的手术器械组装指导所必需的全面要求。利益相关者的一致好评强调了 AR 和 AI 技术作为有价值的辅助和指导的巨大潜力。
{"title":"Requirement analysis for an AI-based AR assistance system for surgical tools in the operating room: stakeholder requirements and technical perspectives.","authors":"E Cramer, A B Kucharski, J Kreimeier, S Andreß, S Li, C Walk, F Merkl, J Högl, P Wucherer, P Stefan, R von Eisenhart-Rothe, P Enste, D Roth","doi":"10.1007/s11548-024-03193-0","DOIUrl":"10.1007/s11548-024-03193-0","url":null,"abstract":"<p><strong>Purpose: </strong>We aim to investigate the integration of augmented reality (AR) within the context of increasingly complex surgical procedures and instrument handling toward the transition to smart operating rooms (OR). In contrast to cumbersome paper-based surgical instrument manuals still used in the OR, we wish to provide surgical staff with an AR head-mounted display that provides in-situ visualization and guidance throughout the assembly process of surgical instruments. Our requirement analysis supports the development and provides guidelines for its transfer into surgical practice.</p><p><strong>Methods: </strong>A three-phase user-centered design approach was applied with online interviews, an observational study, and a workshop with two focus groups with scrub nurses, circulating nurses, surgeons, manufacturers, clinic IT staff, and members of the sterilization department. The requirement analysis was based on key criteria for usability. The data were analyzed via structured content analysis.</p><p><strong>Results: </strong>We identified twelve main problems with the current use of paper manuals. Major issues included sterile users' inability to directly handle non-sterile manuals, missing details, and excessive text information, potentially delaying procedure performance. Major requirements for AR-driven guidance fall into the categories of design, practicability, control, and integration into the current workflow. Additionally, further recommendations for technical development could be obtained.</p><p><strong>Conclusion: </strong>In conclusion, our insights have outlined a comprehensive spectrum of requirements that are essential for the successful implementation of an AI- and AR-driven guidance for assembling surgical instruments. The consistently appreciative evaluation by stakeholders underscores the profound potential of AR and AI technology as valuable assistance and guidance.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2287-2296"},"PeriodicalIF":2.3,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141285346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}