Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10981266
Alex Ling Yu Hung, Kai Zhao, Kaifeng Pang, Qi Miao, Zhaozhi Wang, Wayne Brisbane, Demetri Terzopoulos, Kyunghyun Sung
Micro-ultrasound is an emerging imaging tool that complements MRI in detecting prostate cancer by offering high-resolution imaging at lower cost. However, reliable annotations for micro-ultrasound data remain challenging due to the limited availability of experts and a steep learning curve. To address the clear clinical need, we propose a click-based, user-guided volumetric micro-ultrasound prostate segmentation model requiring minimal user intervention and training data. Our model predicts the segmentation of the entire prostate volume after users place a few points on the two boundary image slices of the prostate. Experiments show that the model needs only a small amount of training data to achieve strong segmentation performance, with each of its components contributing to its overall improvement. We demonstrate that the level of expertise of the user scarcely affects performance. This makes prostate segmentation practically feasible for general users.
{"title":"MINIMALLY USER-GUIDED 3D MICRO-ULTRASOUND PROSTATE SEGMENTATION.","authors":"Alex Ling Yu Hung, Kai Zhao, Kaifeng Pang, Qi Miao, Zhaozhi Wang, Wayne Brisbane, Demetri Terzopoulos, Kyunghyun Sung","doi":"10.1109/isbi60581.2025.10981266","DOIUrl":"10.1109/isbi60581.2025.10981266","url":null,"abstract":"<p><p>Micro-ultrasound is an emerging imaging tool that complements MRI in detecting prostate cancer by offering high-resolution imaging at lower cost. However, reliable annotations for micro-ultrasound data remain challenging due to the limited availability of experts and a steep learning curve. To address the clear clinical need, we propose a click-based, user-guided volumetric micro-ultrasound prostate segmentation model requiring minimal user intervention and training data. Our model predicts the segmentation of the entire prostate volume after users place a few points on the two boundary image slices of the prostate. Experiments show that the model needs only a small amount of training data to achieve strong segmentation performance, with each of its components contributing to its overall improvement. We demonstrate that the level of expertise of the user scarcely affects performance. This makes prostate segmentation practically feasible for general users.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12104093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144152957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10981012
Kaifeng Pang, Qi Miao, Alex Ling Yu Hung, Kai Zhao, Eunsun Oh, Raymi Ramirez, Wayne Brisbane, Kyunghyun Sung
Micro-ultrasound (MicroUS) is a novel imaging technology with the potential to provide a low-cost and high-resolution approach for prostate cancer diagnosis. However, MicroUS is acquired in a non-uniform, fan-shaped sweep, where voxel size varies with distance from the probe and across slice angles. This irregular voxel distribution complicates reformatting into other imaging planes, making it challenging to conduct joint evaluations with other modalities such as MRI and histopathology. Existing interpolation-based reformatting methods lead to poor image resolution and introduce severe artifacts. In this paper, we propose MPR-Diff, a self-supervised diffusion model for super-resolution-based multi-planar reformation in prostate MicroUS imaging. Our method addresses the lack of high-resolution reference in the target plane by extracting simulated training patches from acquired slices. We performed both a quantitative evaluation and an expert reader study, demonstrating that our approach significantly enhances image resolution and reduces artifacts, thereby increasing the potential diagnostic value of MicroUS. Code is available at https://github.com/Calvin-Pang/MPR-Diff.
{"title":"MPR-DIFF: A SELF-SUPERVISED DIFFUSION MODEL FOR MULTI-PLANAR REFORMATION IN PROSTATE MICRO-ULTRASOUND IMAGING.","authors":"Kaifeng Pang, Qi Miao, Alex Ling Yu Hung, Kai Zhao, Eunsun Oh, Raymi Ramirez, Wayne Brisbane, Kyunghyun Sung","doi":"10.1109/isbi60581.2025.10981012","DOIUrl":"10.1109/isbi60581.2025.10981012","url":null,"abstract":"<p><p>Micro-ultrasound (MicroUS) is a novel imaging technology with the potential to provide a low-cost and high-resolution approach for prostate cancer diagnosis. However, MicroUS is acquired in a non-uniform, fan-shaped sweep, where voxel size varies with distance from the probe and across slice angles. This irregular voxel distribution complicates reformatting into other imaging planes, making it challenging to conduct joint evaluations with other modalities such as MRI and histopathology. Existing interpolation-based reformatting methods lead to poor image resolution and introduce severe artifacts. In this paper, we propose <b>MPR-Diff</b>, a self-supervised diffusion model for super-resolution-based multi-planar reformation in prostate MicroUS imaging. Our method addresses the lack of high-resolution reference in the target plane by extracting simulated training patches from acquired slices. We performed both a quantitative evaluation and an expert reader study, demonstrating that our approach significantly enhances image resolution and reduces artifacts, thereby increasing the potential diagnostic value of MicroUS. Code is available at https://github.com/Calvin-Pang/MPR-Diff.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12105648/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144153005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10980859
Jingru Fu, Adrian V Dalca, Bruce Fischl, Rodrigo Moreno, Malte Hoffmann
Rigid registration aims to determine the translations and rotations necessary to align features in a pair of images. While recent machine learning methods have become state-of-the-art for linear and deformable registration across subjects, they have demonstrated limitations when applied to longitudinal (within-subject) registration, where achieving precise alignment is critical. Building on an existing framework for anatomy-aware, acquisition-agnostic affine registration, we propose a model optimized for longitudinal, rigid brain registration. By training the model with synthetic within-subject pairs augmented with rigid and subtle nonlinear transforms, the model estimates more accurate rigid transforms than previous cross-subject networks and performs robustly on longitudinal registration pairs within and across magnetic resonance imaging (MRI) contrasts.
{"title":"LEARNING ACCURATE RIGID REGISTRATION FOR LONGITUDINAL BRAIN MRI FROM SYNTHETIC DATA.","authors":"Jingru Fu, Adrian V Dalca, Bruce Fischl, Rodrigo Moreno, Malte Hoffmann","doi":"10.1109/isbi60581.2025.10980859","DOIUrl":"10.1109/isbi60581.2025.10980859","url":null,"abstract":"<p><p>Rigid registration aims to determine the translations and rotations necessary to align features in a pair of images. While recent machine learning methods have become state-of-the-art for linear and deformable registration across subjects, they have demonstrated limitations when applied to longitudinal (within-subject) registration, where achieving precise alignment is critical. Building on an existing framework for anatomy-aware, acquisition-agnostic affine registration, we propose a model optimized for longitudinal, rigid brain registration. By training the model with synthetic within-subject pairs augmented with rigid and subtle nonlinear transforms, the model estimates more accurate rigid transforms than previous cross-subject networks and performs robustly on longitudinal registration pairs within and across magnetic resonance imaging (MRI) contrasts.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12237398/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144593150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10981128
Gihyeon Kim, Fangxu Xing, Hyoun-Joong Kong, Emiliano Santarnecchi, Helen A Shih, Thomas Bortfeld, Georges El Fakhri, Xiaofeng Liu, Jang-Hwan Choi, Jonghye Woo
Accurate prediction of glioblastoma patient survival can significantly aid in personalized treatment planning. While pre-operative multimodal magnetic resonance imaging (MRI) offers complementary information, current methods are constrained by relatively limited data and largely rely on hand-crafted features extracted from segmentation results. To address these issues, in this work, we propose a data-efficient multi-task framework to take advantage of hierarchical segmentation features within advanced Swin UNETR for survival prediction. By integrating multi-scale features, we are able to capture detailed spatial information and global context, while employing the shifted window mechanism to maintain computational efficiency and scalability for 3D volumes. We further alleviate survival data scarcity through segmentation pre-training, while the features are fine-tuned to align with the survival prediction task and refined by statistical F-values. In addition, age information is incorporated alongside the extracted features to enhance survival prediction performance. Through comprehensive evaluations on the BraTS dataset, we demonstrate that our model achieves superior segmentation accuracy and state-of-the-art survival prediction performance, offering a robust solution for clinical prognosis in glioblastoma patients.
{"title":"OVERALL SURVIVAL PREDICTION OF BRAIN TUMOR PATIENTS WITH MULTIMODAL MRI USING SWIN UNETR.","authors":"Gihyeon Kim, Fangxu Xing, Hyoun-Joong Kong, Emiliano Santarnecchi, Helen A Shih, Thomas Bortfeld, Georges El Fakhri, Xiaofeng Liu, Jang-Hwan Choi, Jonghye Woo","doi":"10.1109/isbi60581.2025.10981128","DOIUrl":"10.1109/isbi60581.2025.10981128","url":null,"abstract":"<p><p>Accurate prediction of glioblastoma patient survival can significantly aid in personalized treatment planning. While pre-operative multimodal magnetic resonance imaging (MRI) offers complementary information, current methods are constrained by relatively limited data and largely rely on hand-crafted features extracted from segmentation results. To address these issues, in this work, we propose a data-efficient multi-task framework to take advantage of hierarchical segmentation features within advanced Swin UNETR for survival prediction. By integrating multi-scale features, we are able to capture detailed spatial information and global context, while employing the shifted window mechanism to maintain computational efficiency and scalability for 3D volumes. We further alleviate survival data scarcity through segmentation pre-training, while the features are fine-tuned to align with the survival prediction task and refined by statistical F-values. In addition, age information is incorporated alongside the extracted features to enhance survival prediction performance. Through comprehensive evaluations on the BraTS dataset, we demonstrate that our model achieves superior segmentation accuracy and state-of-the-art survival prediction performance, offering a robust solution for clinical prognosis in glioblastoma patients.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12345447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10981104
Xuanzhao Dong, Wenhui Zhu, Xin Li, Guoxin Sun, Yi Su, Oana M Dumitrascu, Yalin Wang
Retinal fundus photography enhancement is important for diagnosing and monitoring retinal diseases. However, early approaches to retinal image enhancement, such as those based on Generative Adversarial Networks (GANs), often struggle to preserve the complex topological information of blood vessels, resulting in spurious or missing vessel structures. The persistence diagram, which captures topological features based on the persistence of topological structures under different filtrations, provides a promising way to represent the structure information. In this work, we propose a topology-preserving training paradigm that regularizes blood vessel structures by minimizing the differences of persistence diagrams. We call the resulting framework Topology Preserving Optimal Transport (TPOT). Experimental results on a large-scale dataset demonstrate the superiority of the proposed method compared to several state-of-the-art supervised and unsupervised techniques, both in terms of image quality and performance in the downstream blood vessel segmentation task. The code is available at https://github.com/Retinal-Research/TPOT.
{"title":"TPOT: TOPOLOGY PRESERVING OPTIMAL TRANSPORT IN RETINAL FUNDUS IMAGE ENHANCEMENT.","authors":"Xuanzhao Dong, Wenhui Zhu, Xin Li, Guoxin Sun, Yi Su, Oana M Dumitrascu, Yalin Wang","doi":"10.1109/isbi60581.2025.10981104","DOIUrl":"https://doi.org/10.1109/isbi60581.2025.10981104","url":null,"abstract":"<p><p>Retinal fundus photography enhancement is important for diagnosing and monitoring retinal diseases. However, early approaches to retinal image enhancement, such as those based on Generative Adversarial Networks (GANs), often struggle to preserve the complex topological information of blood vessels, resulting in spurious or missing vessel structures. The persistence diagram, which captures topological features based on the persistence of topological structures under different filtrations, provides a promising way to represent the structure information. In this work, we propose a topology-preserving training paradigm that regularizes blood vessel structures by minimizing the differences of persistence diagrams. We call the resulting framework Topology Preserving Optimal Transport (TPOT). Experimental results on a large-scale dataset demonstrate the superiority of the proposed method compared to several state-of-the-art supervised and unsupervised techniques, both in terms of image quality and performance in the downstream blood vessel segmentation task. The code is available at https://github.com/Retinal-Research/TPOT.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12380521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10980922
Jing Zhang, Yanjun Lyu, Xiaowei Yu, Lu Zhang, Chao Cao, Tong Chen, Minheng Chen, Yan Zhuang, Tianming Liu, Dajiang Zhu
Dynamic functional connectivity (dFC) using resting-state functional magnetic resonance imaging (rs-fMRI) is an advanced technique for capturing the dynamic changes of neural activities, and can be very useful in the studies of brain diseases such as Alzheimer's disease (AD). Yet, existing studies have not fully leveraged the sequential information embedded within dFC that can potentially provide valuable information when identifying brain conditions. In this paper, we propose a novel framework that jointly learns the embedding of both spatial and temporal information within dFC based on the transformer architecture. Specifically, we first construct dFC networks from rs-fMRI data through a sliding window strategy. Then, we simultaneously employ a temporal block and a spatial block to capture higher-order representations of dynamic spatio-temporal dependencies, via mapping them into an efficient fused feature representation. To further enhance the robustness of this feature representations by reducing the dependency of labeled data, we also introduce a contrastive learning strategy to manipulate different brain states. Experimental results on 345 subjects with 570 scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) demonstrate the superiority of our proposed method for MCI (Mild Cognitive Impairment, the prodromal stage of AD) prediction, highlighting its potential for early identification of AD. The code is available at: https://github.com/Nancy-Zhang-0/MCI_dFC_STT.
{"title":"CLASSIFFICATION OF MILD COGNITIVE IMPAIRMENT BASED ON DYNAMIC FUNCTIONAL CONNECTIVITY USING SPATIO-TEMPORAL TRANSFORMER.","authors":"Jing Zhang, Yanjun Lyu, Xiaowei Yu, Lu Zhang, Chao Cao, Tong Chen, Minheng Chen, Yan Zhuang, Tianming Liu, Dajiang Zhu","doi":"10.1109/isbi60581.2025.10980922","DOIUrl":"10.1109/isbi60581.2025.10980922","url":null,"abstract":"<p><p>Dynamic functional connectivity (dFC) using resting-state functional magnetic resonance imaging (rs-fMRI) is an advanced technique for capturing the dynamic changes of neural activities, and can be very useful in the studies of brain diseases such as Alzheimer's disease (AD). Yet, existing studies have not fully leveraged the sequential information embedded within dFC that can potentially provide valuable information when identifying brain conditions. In this paper, we propose a novel framework that jointly learns the embedding of both spatial and temporal information within dFC based on the transformer architecture. Specifically, we first construct dFC networks from rs-fMRI data through a sliding window strategy. Then, we simultaneously employ a temporal block and a spatial block to capture higher-order representations of dynamic spatio-temporal dependencies, via mapping them into an efficient fused feature representation. To further enhance the robustness of this feature representations by reducing the dependency of labeled data, we also introduce a contrastive learning strategy to manipulate different brain states. Experimental results on 345 subjects with 570 scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) demonstrate the superiority of our proposed method for MCI (Mild Cognitive Impairment, the prodromal stage of AD) prediction, highlighting its potential for early identification of AD. The code is available at: https://github.com/Nancy-Zhang-0/MCI_dFC_STT.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715849/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10981274
Minheng Chen, Chao Cao, Tong Chen, Yan Zhuang, Jing Zhang, Yanjun Lyu, Xiaowei Yu, Lu Zhang, Tianming Liu, Dajiang Zhu
The 3-hinge gyrus (3HG) is a newly defined folding pattern, which is the conjunction of gyri coming from three directions in cortical folding. Many studies demonstrated that 3HGs can be reliable nodes when constructing brain networks or connectome since they simultaneously possess commonality and individuality across different individual brains and populations. However, 3HGs are identified and validated within individual spaces, making it difficult to directly serve as the brain network nodes due to the absence of cross-subject correspondence. The 3HG correspondences represent the intrinsic regulation of brain organizational architecture, traditional image-based registration methods tend to fail because individual anatomical properties need to be fully respected. To address this challenge, we propose a novel self-supervised framework for anatomical feature embedding of the 3HGs to build the correspondences among different brains. The core component of this framework is to construct a structural similarity-enhanced multi-hop feature encoding strategy based on the recently developed Kolmogorov-Arnold network (KAN) for anatomical feature embedding. Extensive experiments suggest that our approach can effectively establish robust cross-subject correspondences when no one-to-one mapping exists. The code is available at github.com/m1nhengChen/SSE-CortexEmbed.
{"title":"USING STRUCTURAL SIMILARITY AND KOLMOGOROV-ARNOLD NETWORKS FOR ANATOMICAL EMBEDDING OF CORTICAL FOLDING PATTERNS.","authors":"Minheng Chen, Chao Cao, Tong Chen, Yan Zhuang, Jing Zhang, Yanjun Lyu, Xiaowei Yu, Lu Zhang, Tianming Liu, Dajiang Zhu","doi":"10.1109/isbi60581.2025.10981274","DOIUrl":"10.1109/isbi60581.2025.10981274","url":null,"abstract":"<p><p>The 3-hinge gyrus (3HG) is a newly defined folding pattern, which is the conjunction of gyri coming from three directions in cortical folding. Many studies demonstrated that 3HGs can be reliable nodes when constructing brain networks or connectome since they simultaneously possess commonality and individuality across different individual brains and populations. However, 3HGs are identified and validated within individual spaces, making it difficult to directly serve as the brain network nodes due to the absence of cross-subject correspondence. The 3HG correspondences represent the intrinsic regulation of brain organizational architecture, traditional image-based registration methods tend to fail because individual anatomical properties need to be fully respected. To address this challenge, we propose a novel self-supervised framework for anatomical feature embedding of the 3HGs to build the correspondences among different brains. The core component of this framework is to construct a structural similarity-enhanced multi-hop feature encoding strategy based on the recently developed Kolmogorov-Arnold network (KAN) for anatomical feature embedding. Extensive experiments suggest that our approach can effectively establish robust cross-subject correspondences when no one-to-one mapping exists. The code is available at github.com/m1nhengChen/SSE-CortexEmbed.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12711413/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10980778
Jue Jiang, Harini Veeraraghavan
Self-supervised learning (SSL) is an approach to pretrain deep networks with unlabeled datasets by using pretext tasks that use images as "ground truth". Pretext tasks have been shown to impact accuracy of task categories, e.g. segmentation vs. classification. However, versatility of SSL features to downstream tasks involving different modalities has not been studied. We benchmarked impact of SSL tasks such as contrastive predictive coding, token self-distillation, and generative masked image modeling (MIM) with 3D vision transformer performed using 10K 3D-CTs (or 1.89M images) from various disease sites. SSL pretraining was used to assess (a) multi-organ segmentation under data-limited fine tuning, (b) feature reuse and (c) organ localization with multi-head attention. Analysis showed that pretext tasks combining MIM and token self-distillation balanced local and global attention distance, produced higher segmentation accuracy in few-shot and data-limited settings for MRI and CT. Feature reuse was impacted by similarity of pretraining and fine-tuning modality.
{"title":"BENCHMARKING TRANSFERABILITY OF SELF-SUPERVISED PRETRAINING FOR MULTI-ORGAN SEGMENTATION ON DIFFERENT MODALITIES.","authors":"Jue Jiang, Harini Veeraraghavan","doi":"10.1109/isbi60581.2025.10980778","DOIUrl":"10.1109/isbi60581.2025.10980778","url":null,"abstract":"<p><p>Self-supervised learning (SSL) is an approach to pretrain deep networks with unlabeled datasets by using pretext tasks that use images as \"ground truth\". Pretext tasks have been shown to impact accuracy of task categories, e.g. segmentation vs. classification. However, versatility of SSL features to downstream tasks involving different modalities has not been studied. We benchmarked impact of SSL tasks such as contrastive predictive coding, token self-distillation, and generative masked image modeling (MIM) with 3D vision transformer performed using 10K 3D-CTs (or 1.89M images) from various disease sites. SSL pretraining was used to assess (a) multi-organ segmentation under data-limited fine tuning, (b) feature reuse and (c) organ localization with multi-head attention. Analysis showed that pretext tasks combining MIM and token self-distillation balanced local and global attention distance, produced higher segmentation accuracy in few-shot and data-limited settings for MRI and CT. Feature reuse was impacted by similarity of pretraining and fine-tuning modality.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12214241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10981138
Jiale Cheng, Fenqiang Zhao, Dan Hu, Chao Cao, Zhengwang Wu, Xinrui Yuan, Kangfu Han, Lu Zhang, Tianming Liu, Dajiang Zhu, Gang Li
The cortical 3-hinge gyrus (3HG) and its network (GyralNet) play key roles in understanding the regularity and variability of brain structure and function. However, existing cortical surface registration methods overlook these features, resulting in suboptimal alignment across subjects. Currently, no 3HG and GyralNet atlas exist for registration, and generation of the corresponding atlas requires extensive runtime using traditional methods. To enable better registration of these features, we introduce an unsupervised learning framework to jointly develop 3HGs and GyralNet atlas and register the individual cortical features onto the atlas. To incorporate the graph structure of 3HGs and GyralNet into the registration network, we convert them into surface distance maps, facilitating effective integration. To effectively learn large deformations, a multi-level spherical registration network based on spherical U-Net is introduced to perform registration in a coarse-to-fine manner. Experiments demonstrate our approach's ability to generate 3HGs and GyralNet atlas with detailed patterns and effectively improve registration accuracy.
{"title":"UNSUPERVISED CORTICAL SURFACE REGISTRATION NETWORK FOR ALIGNING GYRALNET.","authors":"Jiale Cheng, Fenqiang Zhao, Dan Hu, Chao Cao, Zhengwang Wu, Xinrui Yuan, Kangfu Han, Lu Zhang, Tianming Liu, Dajiang Zhu, Gang Li","doi":"10.1109/isbi60581.2025.10981138","DOIUrl":"10.1109/isbi60581.2025.10981138","url":null,"abstract":"<p><p>The cortical 3-hinge gyrus (3HG) and its network (GyralNet) play key roles in understanding the regularity and variability of brain structure and function. However, existing cortical surface registration methods overlook these features, resulting in suboptimal alignment across subjects. Currently, no 3HG and GyralNet atlas exist for registration, and generation of the corresponding atlas requires extensive runtime using traditional methods. To enable better registration of these features, we introduce an unsupervised learning framework to jointly develop 3HGs and GyralNet atlas and register the individual cortical features onto the atlas. To incorporate the graph structure of 3HGs and GyralNet into the registration network, we convert them into surface distance maps, facilitating effective integration. To effectively learn large deformations, a multi-level spherical registration network based on spherical U-Net is introduced to perform registration in a coarse-to-fine manner. Experiments demonstrate our approach's ability to generate 3HGs and GyralNet atlas with detailed patterns and effectively improve registration accuracy.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12178662/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01Epub Date: 2025-05-12DOI: 10.1109/isbi60581.2025.10980770
Jing Zhang, Xiaowei Yu, Yanjun Lyu, Lu Zhang, Tong Chen, Chao Cao, Yan Zhuang, Minheng Chen, Tianming Liu, Dajiang Zhu
Understanding brain disorders is crucial for accurate clinical diagnosis and treatment. Recent advances in Multimodal Large Language Models (MLLMs) offer a promising approach to interpreting medical images with the support of text descriptions. However, previous research has primarily focused on 2D medical images, leaving richer spatial information of 3D images under-explored, and single-modality-based methods are limited by overlooking the critical clinical information contained in other modalities. To address this issue, this paper proposes Brain-Adapter, a novel approach that incorporates an extra bottleneck layer to learn new knowledge and instill it into the original pre-trained knowledge. The major idea is to incorporate a lightweight bottleneck layer to train fewer parameters while capturing essential information and utilize a Contrastive Language-Image Pre-training (CLIP) strategy to align multimodal data within a unified representation space. Extensive experiments demonstrated the effectiveness of our approach in integrating multimodal data to significantly improve the diagnosis accuracy without high computational costs, highlighting the potential to enhance real-world diagnostic workflows.
{"title":"BRAIN-ADAPTER: ENHANCING NEUROLOGICAL DISORDER ANALYSIS WITH ADAPTER-TUNING MULTIMODAL LARGE LANGUAGE MODELS.","authors":"Jing Zhang, Xiaowei Yu, Yanjun Lyu, Lu Zhang, Tong Chen, Chao Cao, Yan Zhuang, Minheng Chen, Tianming Liu, Dajiang Zhu","doi":"10.1109/isbi60581.2025.10980770","DOIUrl":"10.1109/isbi60581.2025.10980770","url":null,"abstract":"<p><p>Understanding brain disorders is crucial for accurate clinical diagnosis and treatment. Recent advances in Multimodal Large Language Models (MLLMs) offer a promising approach to interpreting medical images with the support of text descriptions. However, previous research has primarily focused on 2D medical images, leaving richer spatial information of 3D images under-explored, and single-modality-based methods are limited by overlooking the critical clinical information contained in other modalities. To address this issue, this paper proposes Brain-Adapter, a novel approach that incorporates an extra bottleneck layer to learn new knowledge and instill it into the original pre-trained knowledge. The major idea is to incorporate a lightweight bottleneck layer to train fewer parameters while capturing essential information and utilize a Contrastive Language-Image Pre-training (CLIP) strategy to align multimodal data within a unified representation space. Extensive experiments demonstrated the effectiveness of our approach in integrating multimodal data to significantly improve the diagnosis accuracy without high computational costs, highlighting the potential to enhance real-world diagnostic workflows.</p>","PeriodicalId":74566,"journal":{"name":"Proceedings. IEEE International Symposium on Biomedical Imaging","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714489/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}