{"title":"Editorial for Special Issue on Foundation Models for Medical Image Analysis.","authors":"Xiaosong Wang, Dequan Wang, Xiaoxiao Li, Jens Rittscher, Dimitris Metaxas, Shaoting Zhang","doi":"10.1016/j.media.2024.103389","DOIUrl":"10.1016/j.media.2024.103389","url":null,"abstract":"","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":" ","pages":"103389"},"PeriodicalIF":10.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142739884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2024-11-30DOI: 10.1016/j.media.2024.103412
Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, Xiatian Zhu
Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labeled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal for medical imaging with both conditions invalid. To address this problem, we propose a novel DetailSelf-refinedPrototypeNetwork (DSPNet) to construct high-fidelity prototypes representing the object foreground and the background more comprehensively. Specifically, to construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modeling the multimodal structures with clustering and then fusing each in a channel-wise manner. Considering that the background often has no apparent semantic relation in the spatial dimensions, we integrate channel-specific structural information under sparse channel-aware regulation. Extensive experiments on three challenging medical image benchmarks show the superiority of DSPNet over previous state-of-the-art methods. The code and data are available at https://github.com/tntek/DSPNet.
{"title":"Few-shot medical image segmentation with high-fidelity prototypes.","authors":"Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, Xiatian Zhu","doi":"10.1016/j.media.2024.103412","DOIUrl":"10.1016/j.media.2024.103412","url":null,"abstract":"<p><p>Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labeled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal for medical imaging with both conditions invalid. To address this problem, we propose a novel DetailSelf-refinedPrototypeNetwork (DSPNet) to construct high-fidelity prototypes representing the object foreground and the background more comprehensively. Specifically, to construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modeling the multimodal structures with clustering and then fusing each in a channel-wise manner. Considering that the background often has no apparent semantic relation in the spatial dimensions, we integrate channel-specific structural information under sparse channel-aware regulation. Extensive experiments on three challenging medical image benchmarks show the superiority of DSPNet over previous state-of-the-art methods. The code and data are available at https://github.com/tntek/DSPNet.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"100 ","pages":"103412"},"PeriodicalIF":10.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2024-10-10DOI: 10.1016/j.media.2024.103363
Matthias Ivantsits, Leonid Goubergrits, Jan-Martin Kuhnigk, Markus Huellebrand, Jan Bruening, Tabea Kossen, Boris Pfahringer, Jens Schaller, Andreas Spuler, Titus Kuehne, Yizhuan Jia, Xuesong Li, Suprosanna Shit, Bjoern Menze, Ziyu Su, Jun Ma, Ziwei Nie, Kartik Jain, Yanfei Liu, Yi Lin, Anja Hennemuth
{"title":"Corrigendum to \"Detection and analysis of cerebral aneurysms based on X-ray rotational angiography - the CADA 2020 challenge\" [Medical Image Analysis, April 2022, Volume 77, 102333].","authors":"Matthias Ivantsits, Leonid Goubergrits, Jan-Martin Kuhnigk, Markus Huellebrand, Jan Bruening, Tabea Kossen, Boris Pfahringer, Jens Schaller, Andreas Spuler, Titus Kuehne, Yizhuan Jia, Xuesong Li, Suprosanna Shit, Bjoern Menze, Ziyu Su, Jun Ma, Ziwei Nie, Kartik Jain, Yanfei Liu, Yi Lin, Anja Hennemuth","doi":"10.1016/j.media.2024.103363","DOIUrl":"10.1016/j.media.2024.103363","url":null,"abstract":"","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":" ","pages":"103363"},"PeriodicalIF":10.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142400702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2024-11-26DOI: 10.1016/j.media.2024.103394
Qiang Ma, Kaili Liang, Liu Li, Saga Masui, Yourong Guo, Chiara Nosarti, Emma C Robinson, Bernhard Kainz, Daniel Rueckert
The Developing Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 h to process a single MRI scan, making it expensive for large-scale neuroimaging studies. In this paper, we propose a fast deep learning (DL) based pipeline for dHCP neonatal cortical surface reconstruction, incorporating DL-based brain extraction, cortical surface reconstruction and spherical projection, as well as GPU-accelerated cortical surface inflation and cortical feature estimation. We introduce a multiscale deformation network to learn diffeomorphic cortical surface reconstruction end-to-end from T2-weighted brain MRI. A fast unsupervised spherical mapping approach is integrated to minimize metric distortions between cortical surfaces and projected spheres. The entire workflow of our DL-based dHCP pipeline completes within only 24 s on a modern GPU, which is nearly 1000 times faster than the original dHCP pipeline. The qualitative assessment demonstrates that for 82.5% of the test samples, the cortical surfaces reconstructed by our DL-based pipeline achieve superior (54.2%) or equal (28.3%) surface quality compared to the original dHCP pipeline.
{"title":"The Developing Human Connectome Project: A fast deep learning-based pipeline for neonatal cortical surface reconstruction.","authors":"Qiang Ma, Kaili Liang, Liu Li, Saga Masui, Yourong Guo, Chiara Nosarti, Emma C Robinson, Bernhard Kainz, Daniel Rueckert","doi":"10.1016/j.media.2024.103394","DOIUrl":"10.1016/j.media.2024.103394","url":null,"abstract":"<p><p>The Developing Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 h to process a single MRI scan, making it expensive for large-scale neuroimaging studies. In this paper, we propose a fast deep learning (DL) based pipeline for dHCP neonatal cortical surface reconstruction, incorporating DL-based brain extraction, cortical surface reconstruction and spherical projection, as well as GPU-accelerated cortical surface inflation and cortical feature estimation. We introduce a multiscale deformation network to learn diffeomorphic cortical surface reconstruction end-to-end from T2-weighted brain MRI. A fast unsupervised spherical mapping approach is integrated to minimize metric distortions between cortical surfaces and projected spheres. The entire workflow of our DL-based dHCP pipeline completes within only 24 s on a modern GPU, which is nearly 1000 times faster than the original dHCP pipeline. The qualitative assessment demonstrates that for 82.5% of the test samples, the cortical surfaces reconstructed by our DL-based pipeline achieve superior (54.2%) or equal (28.3%) surface quality compared to the original dHCP pipeline.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"100 ","pages":"103394"},"PeriodicalIF":10.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1016/j.media.2024.103442
Gen Shi, Hao Lu, Hui Hui, Jie Tian
Precise cerebrovascular segmentation in Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) data is crucial for computer-aided clinical diagnosis. The sparse distribution of cerebrovascular structures within TOF-MRA images often results in high costs for manual data labeling. Leveraging unlabeled TOF-MRA data can significantly enhance model performance. In this study, we have constructed the largest preprocessed unlabeled TOF-MRA dataset to date, comprising 1510 subjects. Additionally, we provide manually annotated segmentation masks for 113 subjects based on existing external image datasets to facilitate evaluation. We propose a simple yet effective pretraining strategy utilizing the Frangi filter, known for its capability to enhance vessel-like structures, to optimize the use of the unlabeled data for 3D cerebrovascular segmentation. This involves a Frangi filter-based preprocessing workflow tailored for large-scale unlabeled datasets and a multi-task pretraining strategy to efficiently utilize the preprocessed data. This approach ensures maximal extraction of useful knowledge from the unlabeled data. The efficacy of the pretrained model is assessed across four cerebrovascular segmentation datasets, where it demonstrates superior performance, improving the clDice metric by approximately 2%-3% compared to the latest semi- and self-supervised methods. Additionally, ablation studies validate the generalizability and effectiveness of our pretraining method across various backbone structures. The code and data have been open source at: https://github.com/shigen-StoneRoot/FFPN.
{"title":"Benefit from public unlabeled data: A Frangi filter-based pretraining network for 3D cerebrovascular segmentation.","authors":"Gen Shi, Hao Lu, Hui Hui, Jie Tian","doi":"10.1016/j.media.2024.103442","DOIUrl":"https://doi.org/10.1016/j.media.2024.103442","url":null,"abstract":"<p><p>Precise cerebrovascular segmentation in Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) data is crucial for computer-aided clinical diagnosis. The sparse distribution of cerebrovascular structures within TOF-MRA images often results in high costs for manual data labeling. Leveraging unlabeled TOF-MRA data can significantly enhance model performance. In this study, we have constructed the largest preprocessed unlabeled TOF-MRA dataset to date, comprising 1510 subjects. Additionally, we provide manually annotated segmentation masks for 113 subjects based on existing external image datasets to facilitate evaluation. We propose a simple yet effective pretraining strategy utilizing the Frangi filter, known for its capability to enhance vessel-like structures, to optimize the use of the unlabeled data for 3D cerebrovascular segmentation. This involves a Frangi filter-based preprocessing workflow tailored for large-scale unlabeled datasets and a multi-task pretraining strategy to efficiently utilize the preprocessed data. This approach ensures maximal extraction of useful knowledge from the unlabeled data. The efficacy of the pretrained model is assessed across four cerebrovascular segmentation datasets, where it demonstrates superior performance, improving the clDice metric by approximately 2%-3% compared to the latest semi- and self-supervised methods. Additionally, ablation studies validate the generalizability and effectiveness of our pretraining method across various backbone structures. The code and data have been open source at: https://github.com/shigen-StoneRoot/FFPN.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103442"},"PeriodicalIF":10.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143008076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-16DOI: 10.1016/j.media.2025.103463
Defu Yang, Minjeong Kim, Yu Zhang, Guorong Wu
The recent advances in neuroimaging technology allow us to understand how the human brain is wired in vivo and how functional activity is synchronized across multiple regions. Growing evidence shows that the complexity of the functional connectivity is far beyond the widely used mono-layer network. Indeed, the hierarchical processing information among distinct brain regions and across multiple channels requires using a more advanced multilayer model to understand the synchronization across the brain that underlies functional brain networks. However, the principled approach for characterizing network organization in the context of multilayer topologies is largely unexplored. In this work, we present a novel multi-variate hub identification method that takes both the intra- and inter-layer network topologies into account. Specifically, we put the spotlight on the multilayer graph embeddings that allow us to separate connector hubs (connecting across network modules) with their peripheral nodes. The removal of these hub nodes breaks down the entire multilayer brain network into a set of disconnected communities. We have evaluated our novel multilayer hub identification method in task-based and resting-state functional images. Complimenting ongoing findings using mono-layer brain networks, our multilayer network analysis provides a new understanding of brain network topology that links functional connectivities with brain states and disease progression.
{"title":"Identifying multilayer network hub by graph representation learning.","authors":"Defu Yang, Minjeong Kim, Yu Zhang, Guorong Wu","doi":"10.1016/j.media.2025.103463","DOIUrl":"https://doi.org/10.1016/j.media.2025.103463","url":null,"abstract":"<p><p>The recent advances in neuroimaging technology allow us to understand how the human brain is wired in vivo and how functional activity is synchronized across multiple regions. Growing evidence shows that the complexity of the functional connectivity is far beyond the widely used mono-layer network. Indeed, the hierarchical processing information among distinct brain regions and across multiple channels requires using a more advanced multilayer model to understand the synchronization across the brain that underlies functional brain networks. However, the principled approach for characterizing network organization in the context of multilayer topologies is largely unexplored. In this work, we present a novel multi-variate hub identification method that takes both the intra- and inter-layer network topologies into account. Specifically, we put the spotlight on the multilayer graph embeddings that allow us to separate connector hubs (connecting across network modules) with their peripheral nodes. The removal of these hub nodes breaks down the entire multilayer brain network into a set of disconnected communities. We have evaluated our novel multilayer hub identification method in task-based and resting-state functional images. Complimenting ongoing findings using mono-layer brain networks, our multilayer network analysis provides a new understanding of brain network topology that links functional connectivities with brain states and disease progression.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103463"},"PeriodicalIF":10.7,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143023985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-15DOI: 10.1016/j.media.2025.103464
Chenlu Zhan, Peng Peng, Hongwei Wang, Gaoang Wang, Yu Lin, Tao Chen, Hongsen Wang
Medical Visual Question Answering aims to assist doctors in decision-making when answering clinical questions regarding radiology images. Nevertheless, current models learn cross-modal representations through residing vision and text encoders in dual separate spaces, which inevitably leads to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and Interpretable Medical-VQA model through Contrastive Representation Learning with Adversarial Masking. To achieve the learning of an aligned image-text representation, we first establish a unified dual-stream pre-training structure with the gradually soft-parameter sharing strategy for alignment. Specifically, the proposed strategy learns a constraint for the vision and text encoders to be close in the same space, which is gradually loosened as the number of layers increases, so as to narrow the distance between the two different modalities. For grasping the unified semantic cross-modal representation, we extend the adversarial masking data augmentation to the contrastive representation learning of vision and text in a unified manner. While the encoder training minimizes the distance between the original and masking samples, the adversarial masking module keeps adversarial learning to conversely maximize the distance. We also intuitively take a further exploration of the unified adversarial masking augmentation method, which improves the potential ante-hoc interpretability with remarkable performance and efficiency. Experimental results on VQA-RAD and SLAKE benchmarks demonstrate that UnICLAM outperforms existing 11 state-of-the-art Medical-VQA methods. More importantly, we make an additional discussion about the performance of UnICLAM in diagnosing heart failure, verifying that UnICLAM exhibits superior few-shot adaption performance in practical disease diagnosis. The codes and models will be released upon the acceptance of the paper.
{"title":"UnICLAM: Contrastive representation learning with adversarial masking for unified and interpretable Medical Vision Question Answering.","authors":"Chenlu Zhan, Peng Peng, Hongwei Wang, Gaoang Wang, Yu Lin, Tao Chen, Hongsen Wang","doi":"10.1016/j.media.2025.103464","DOIUrl":"https://doi.org/10.1016/j.media.2025.103464","url":null,"abstract":"<p><p>Medical Visual Question Answering aims to assist doctors in decision-making when answering clinical questions regarding radiology images. Nevertheless, current models learn cross-modal representations through residing vision and text encoders in dual separate spaces, which inevitably leads to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and Interpretable Medical-VQA model through Contrastive Representation Learning with Adversarial Masking. To achieve the learning of an aligned image-text representation, we first establish a unified dual-stream pre-training structure with the gradually soft-parameter sharing strategy for alignment. Specifically, the proposed strategy learns a constraint for the vision and text encoders to be close in the same space, which is gradually loosened as the number of layers increases, so as to narrow the distance between the two different modalities. For grasping the unified semantic cross-modal representation, we extend the adversarial masking data augmentation to the contrastive representation learning of vision and text in a unified manner. While the encoder training minimizes the distance between the original and masking samples, the adversarial masking module keeps adversarial learning to conversely maximize the distance. We also intuitively take a further exploration of the unified adversarial masking augmentation method, which improves the potential ante-hoc interpretability with remarkable performance and efficiency. Experimental results on VQA-RAD and SLAKE benchmarks demonstrate that UnICLAM outperforms existing 11 state-of-the-art Medical-VQA methods. More importantly, we make an additional discussion about the performance of UnICLAM in diagnosing heart failure, verifying that UnICLAM exhibits superior few-shot adaption performance in practical disease diagnosis. The codes and models will be released upon the acceptance of the paper.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103464"},"PeriodicalIF":10.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-15DOI: 10.1016/j.media.2025.103467
Dieuwertje Alblas, Julian Suk, Christoph Brune, Kak Khee Yeung, Jelmer M Wolterink
The orientation of a blood vessel as visualized in 3D medical images is an important descriptor of its geometry that can be used for centerline extraction and subsequent segmentation, labeling, and visualization. Blood vessels appear at multiple scales and levels of tortuosity, and determining the exact orientation of a vessel is a challenging problem. Recent works have used 3D convolutional neural networks (CNNs) for this purpose, but CNNs are sensitive to variations in vessel size and orientation. We present SIRE: a scale-invariant rotation-equivariant estimator for local vessel orientation. SIRE is modular and has strongly generalizing properties due to symmetry preservations. SIRE consists of a gauge equivariant mesh CNN (GEM-CNN) that operates in parallel on multiple nested spherical meshes with different sizes. The features on each mesh are a projection of image intensities within the corresponding sphere. These features are intrinsic to the sphere and, in combination with the gauge equivariant properties of GEM-CNN, lead to SO(3) rotation equivariance. Approximate scale invariance is achieved by weight sharing and use of a symmetric maximum aggregation function to combine predictions at multiple scales. Hence, SIRE can be trained with arbitrarily oriented vessels with varying radii to generalize to vessels with a wide range of calibres and tortuosity. We demonstrate the efficacy of SIRE using three datasets containing vessels of varying scales; the vascular model repository (VMR), the ASOCA coronary artery set, and an in-house set of abdominal aortic aneurysms (AAAs). We embed SIRE in a centerline tracker which accurately tracks large calibre AAAs, regardless of the data SIRE is trained with. Moreover, a tracker can use SIRE to track small-calibre tortuous coronary arteries, even when trained only with large-calibre, non-tortuous AAAs. Additional experiments are performed to verify the rotational equivariant and scale invariant properties of SIRE. In conclusion, by incorporating SO(3) and scale symmetries, SIRE can be used to determine orientations of vessels outside of the training domain, offering a robust and data-efficient solution to geometric analysis of blood vessels in 3D medical images.
{"title":"SIRE: Scale-invariant, rotation-equivariant estimation of artery orientations using graph neural networks.","authors":"Dieuwertje Alblas, Julian Suk, Christoph Brune, Kak Khee Yeung, Jelmer M Wolterink","doi":"10.1016/j.media.2025.103467","DOIUrl":"https://doi.org/10.1016/j.media.2025.103467","url":null,"abstract":"<p><p>The orientation of a blood vessel as visualized in 3D medical images is an important descriptor of its geometry that can be used for centerline extraction and subsequent segmentation, labeling, and visualization. Blood vessels appear at multiple scales and levels of tortuosity, and determining the exact orientation of a vessel is a challenging problem. Recent works have used 3D convolutional neural networks (CNNs) for this purpose, but CNNs are sensitive to variations in vessel size and orientation. We present SIRE: a scale-invariant rotation-equivariant estimator for local vessel orientation. SIRE is modular and has strongly generalizing properties due to symmetry preservations. SIRE consists of a gauge equivariant mesh CNN (GEM-CNN) that operates in parallel on multiple nested spherical meshes with different sizes. The features on each mesh are a projection of image intensities within the corresponding sphere. These features are intrinsic to the sphere and, in combination with the gauge equivariant properties of GEM-CNN, lead to SO(3) rotation equivariance. Approximate scale invariance is achieved by weight sharing and use of a symmetric maximum aggregation function to combine predictions at multiple scales. Hence, SIRE can be trained with arbitrarily oriented vessels with varying radii to generalize to vessels with a wide range of calibres and tortuosity. We demonstrate the efficacy of SIRE using three datasets containing vessels of varying scales; the vascular model repository (VMR), the ASOCA coronary artery set, and an in-house set of abdominal aortic aneurysms (AAAs). We embed SIRE in a centerline tracker which accurately tracks large calibre AAAs, regardless of the data SIRE is trained with. Moreover, a tracker can use SIRE to track small-calibre tortuous coronary arteries, even when trained only with large-calibre, non-tortuous AAAs. Additional experiments are performed to verify the rotational equivariant and scale invariant properties of SIRE. In conclusion, by incorporating SO(3) and scale symmetries, SIRE can be used to determine orientations of vessels outside of the training domain, offering a robust and data-efficient solution to geometric analysis of blood vessels in 3D medical images.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103467"},"PeriodicalIF":10.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely used approaches compare to each other, given the recent proliferation of foundation models (FMs) for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six FMs and six recent MIL methods by organizing different feature extractions and aggregations across seven clinically relevant end-to-end prediction tasks using WSIs from 4044 patients with four different cancer types. We tested state-of-the-art (SOTA) FMs in computational pathology, including CTransPath, PathoDuet, PLIP, CONCH, and UNI, as patch-level feature extractors. Feature aggregators, such as attention-based pooling, transformers, and dynamic graphs were thoroughly tested. Our experiments on cancer grading, biomarker status prediction, and microsatellite instability (MSI) prediction suggest that (1) FMs like UNI, trained with more diverse histological images, outperform generic models with smaller training datasets in patch embeddings, significantly enhancing downstream MIL classification accuracy and model training convergence speed, (2) instance feature fine-tuning, known as online feature re-embedding, to capture both fine-grained details and spatial interactions can often further improve WSI classification performance, (3) FMs advance MIL models by enabling promising grading classifications, biomarker status, and MSI predictions without requiring pixel- or patch-level annotations. These findings encourage the development of advanced, domain-specific FMs, aimed at more universally applicable diagnostic tasks, aligning with the evolving needs of clinical AI in pathology.
{"title":"When multiple instance learning meets foundation models: Advancing histological whole slide image analysis.","authors":"Hongming Xu, Mingkang Wang, Duanbo Shi, Huamin Qin, Yunpeng Zhang, Zaiyi Liu, Anant Madabhushi, Peng Gao, Fengyu Cong, Cheng Lu","doi":"10.1016/j.media.2025.103456","DOIUrl":"https://doi.org/10.1016/j.media.2025.103456","url":null,"abstract":"<p><p>Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely used approaches compare to each other, given the recent proliferation of foundation models (FMs) for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six FMs and six recent MIL methods by organizing different feature extractions and aggregations across seven clinically relevant end-to-end prediction tasks using WSIs from 4044 patients with four different cancer types. We tested state-of-the-art (SOTA) FMs in computational pathology, including CTransPath, PathoDuet, PLIP, CONCH, and UNI, as patch-level feature extractors. Feature aggregators, such as attention-based pooling, transformers, and dynamic graphs were thoroughly tested. Our experiments on cancer grading, biomarker status prediction, and microsatellite instability (MSI) prediction suggest that (1) FMs like UNI, trained with more diverse histological images, outperform generic models with smaller training datasets in patch embeddings, significantly enhancing downstream MIL classification accuracy and model training convergence speed, (2) instance feature fine-tuning, known as online feature re-embedding, to capture both fine-grained details and spatial interactions can often further improve WSI classification performance, (3) FMs advance MIL models by enabling promising grading classifications, biomarker status, and MSI predictions without requiring pixel- or patch-level annotations. These findings encourage the development of advanced, domain-specific FMs, aimed at more universally applicable diagnostic tasks, aligning with the evolving needs of clinical AI in pathology.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103456"},"PeriodicalIF":10.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-14DOI: 10.1016/j.media.2024.103449
Haolin Wang, Kai-Ni Wang, Jie Hua, Yi Tang, Yang Chen, Guang-Quan Zhou, Shuo Li
Accurate automatic polyp segmentation in colonoscopy is crucial for the prompt prevention of colorectal cancer. However, the heterogeneous nature of polyps and differences in lighting and visibility conditions present significant challenges in achieving reliable and consistent segmentation across different cases. Therefore, this study proposes a novel dynamic spectrum-driven hierarchical learning model (DSHNet), the first to specifically leverage image frequency domain information to explore region-level salience differences among and within polyps for precise segmentation. A novel spectral decoupler is advanced to separate low-frequency and high-frequency components, leveraging their distinct characteristics to guide the model in learning valuable frequency features without bias through automatic masking. The low-frequency driven region-level saliency modeling then generates dynamic convolution kernels with individual frequency-aware features, which regulate region-level saliency modeling together with the supervision of the hierarchy of labels, thus enabling adaptation to polyp heterogeneous and illumination variation simultaneously. Meanwhile, the high-frequency attention module is designed to preserve the detailed information at the skip connections, which complements the focus on spatial features at various stages. Experimental results demonstrate that the proposed method outperforms other state-of-the-art polyp segmentation techniques, achieving robust and superior results on five diverse datasets. Codes are available at https://github.com/gardnerzhou/DSHNet.
{"title":"Dynamic spectrum-driven hierarchical learning network for polyp segmentation.","authors":"Haolin Wang, Kai-Ni Wang, Jie Hua, Yi Tang, Yang Chen, Guang-Quan Zhou, Shuo Li","doi":"10.1016/j.media.2024.103449","DOIUrl":"https://doi.org/10.1016/j.media.2024.103449","url":null,"abstract":"<p><p>Accurate automatic polyp segmentation in colonoscopy is crucial for the prompt prevention of colorectal cancer. However, the heterogeneous nature of polyps and differences in lighting and visibility conditions present significant challenges in achieving reliable and consistent segmentation across different cases. Therefore, this study proposes a novel dynamic spectrum-driven hierarchical learning model (DSHNet), the first to specifically leverage image frequency domain information to explore region-level salience differences among and within polyps for precise segmentation. A novel spectral decoupler is advanced to separate low-frequency and high-frequency components, leveraging their distinct characteristics to guide the model in learning valuable frequency features without bias through automatic masking. The low-frequency driven region-level saliency modeling then generates dynamic convolution kernels with individual frequency-aware features, which regulate region-level saliency modeling together with the supervision of the hierarchy of labels, thus enabling adaptation to polyp heterogeneous and illumination variation simultaneously. Meanwhile, the high-frequency attention module is designed to preserve the detailed information at the skip connections, which complements the focus on spatial features at various stages. Experimental results demonstrate that the proposed method outperforms other state-of-the-art polyp segmentation techniques, achieving robust and superior results on five diverse datasets. Codes are available at https://github.com/gardnerzhou/DSHNet.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103449"},"PeriodicalIF":10.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}